Finite-time analysis of the multiarmed bandit problem

old_uid7413
titleFinite-time analysis of the multiarmed bandit problem
start_date2009/10/08
schedule15h
onlineno
location_infoV106
summaryWe consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the
responsiblesBiau, Stoltz, Massart