Algorithmes de bandits pour l'exploration d'arbres, avec applications aux jeux, à l'optimisation et au contrôle

old_uid5330
titleAlgorithmes de bandits pour l'exploration d'arbres, avec applications aux jeux, à l'optimisation et au contrôle
start_date2008/10/06
schedule13h30
onlineno
summaryBandit-based methods for tree search have recently gained popularity when applied to huge trees, e.g., in the game of go. Their efficient exploration policy of the tree enables to return rapidly a good value and improve precision if more time is provided. The UCT algorithm, a tree search method based on Upper Confidence Bounds (UCB) is believed to adapt locally to the effective smoothness of the tree. However, it is known that UCT may be over-optimistic for some problems, leading to high regret. I will describe alternative bandit-based approaches and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient cuts of sub-optimal branches with high confidence. I will then illustrate several fields of applications, such as minimax search in games, global optimization of Lipschitz functions, and discounted optimal control.
responsiblesBiau, Stoltz, Massart