|
From UAV mission planning to Reinforcement Learning in large datasets. Some lessons learnt and perspectives| old_uid | 9028 |
|---|
| title | From UAV mission planning to Reinforcement Learning in large datasets. Some lessons learnt and perspectives |
|---|
| start_date | 2010/07/05 |
|---|
| schedule | 10h |
|---|
| online | no |
|---|
| summary | Consider the question of autonomous planning for an unmanned air vehicle (UAV).
Tackling such a problem requires modeling the possible non-deterministic and time-
dependent evolution of the environment. It is also desirable that its resolution rapidly
provides close-to-optimal plans and that the agent be able to adapt an existing plan when some new (possibly inconsistent) data appears. I will introduce this presentation with the example of a helicopter UAV patrol problem. My goal here is to draw lessons from the experience of solving this problem and transfer them to a more general framework. As this problem admits an intuitive representation as a Time-dependent Markov Decision Process, we shall study the TiMDPpoly algorithm designed to solve it and analyze how it answers (or not) the above requirements. Then we will focus on a current attempt at transposing the asynchronous Dynamic Programming scheme of TiMDPpoly into the model-free context of Reinforcement Learning in order to deal with very large experience databases with limited memory constraints and without the burden of building a model. We will discuss the benefits of this extension both in terms of genericity (the class of problems we can address), efficiency (the size and complexity of these problems) and adaptability (the capacity of the UAV to react quickly to a new environment). Our conclusion will bring us back to the original problem of mission planning for UAVs. |
|---|
| responsibles | <not specified> |
|---|
| |
|