2025
Conference papers
Preprints, Working Papers, ...
2024
Journal articles
Conference papers
We consider a multi-armed bandit problem specified by a set of one-dimensional exponential family distributions endowed with a multimodal structure. The multimodal structure naturally extends the unimodal structure and appears to be underlying in quite interesting ways popular structures such as linear or Lipschitz bandits. We introduce IMED-MB, an algorithm that optimally exploits the multimodal structure, by adapting to this setting the popular Indexed Minimum Empirical Divergence (IMED) algorithm. We provide instance-dependent regret analysis of this strategy. Numerical experiments show that IMED-MB performs well in practice when assuming unimodal, polynomial or Lipschitz mean function.
Monte-Carlo Tree Search (MCTS) is a widelyused strategy for online planning that combines Monte-Carlo sampling with forward tree search. Its success relies on the Upper Confidence bound for Trees (UCT) algorithm, an extension of the UCB method for multi-arm bandits. However, the theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection, leading to the development of Fixed-Depth-MCTS with a polynomial exploration bonus to balance exploration and exploitation [Shah et al., 2022]. Both UCT and Fixed-Depth-MCTS suffer from biased value estimation: the weighted sum underestimates the optimal value, while the maximum valuation overestimates it [Coulom, 2006]. The power mean estimator offers a balanced solution, lying between the average and maximum values. Power-UCT [Dam et al., 2019] incorporates this estimator for more accurate value estimates but its theoretical analysis remains incomplete. This paper introduces Stochastic-Power-UCT, an MCTS algorithm using the power mean estimator and tailored for stochastic MDPs. We analyze its polynomial convergence in estimating root node values and show that it shares the same convergence rate of O(n -1/2 ), with n is the number of visited trajectories, as Fixed-Depth-MCTS, with the latter being a special case of the former. Our theoretical results are validated with empirical tests across various stochastic MDP environments.
Poster communications
2023
Conference papers
2022
Journal articles
Conference papers
Poster communications
Reports
Preprints, Working Papers, ...
2021
Conference papers
2020
Conference papers
Preprints, Working Papers, ...
2019
Conference papers
Habilitation à diriger des recherches
Preprints, Working Papers, ...
2018
Journal articles
Conference papers
Poster communications
Preprints, Working Papers, ...
2017
Journal articles
Conference papers
Lectures
2016
Conference papers
Preprints, Working Papers, ...
2015
Journal articles
Preprints, Working Papers, ...
2014
Journal articles
Conference papers
Other publications
2013
Journal articles
Conference papers
Other publications
2012
Preprints, Working Papers, ...
2011
Conference papers
Reports
Theses
2010
Conference papers
Reports
2009
Conference papers