|
Vision and language pre-training for robot navigation and manipulation| title | Vision and language pre-training for robot navigation and manipulation |
|---|
| start_date | 2024/06/04 |
|---|
| schedule | 14h-15h |
|---|
| online | no |
|---|
| location_info | Salle 314 |
|---|
| summary | Pre-training on large-scale datasets has significantly accelerated progress in various domains. However, collecting real robot data for pre-training remains expensive and lacks scalability. In this talk, I will demonstrate how we can leverage large-scale Internet data to enhance robot learning. Specifically, I will first present pre-training for vision-and-language navigation, where we take advantage of in-domain web image-captions and unlabeled 3D houses to improve models’ generalization capabilities in unseen environments. Next, I will delve into pre-training approaches for more complex robot manipulation which requires fine-grained visual perception and precise control. I will introduce a versatile pre-training framework based on web 3D objects to improve visual perception for robots. |
|---|
| responsibles | Vacher, Blusseau |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2024/05/30 12:53 UTC |
| |
|