|
Finite-time analysis of the multiarmed bandit problem| old_uid | 7413 |
|---|
| title | Finite-time analysis of the multiarmed bandit problem |
|---|
| start_date | 2009/10/08 |
|---|
| schedule | 15h |
|---|
| online | no |
|---|
| location_info | V106 |
|---|
| summary | We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the |
|---|
| responsibles | Biau, Stoltz, Massart |
|---|
| |
|