search

actions - event

state: published

Vision and language pre-training for robot navigation and manipulation

title	Vision and language pre-training for robot navigation and manipulation
start_date	2024/06/04
schedule	14h-15h
online	no
location_info	Salle 314
summary	Pre-training on large-scale datasets has significantly accelerated progress in various domains. However, collecting real robot data for pre-training remains expensive and lacks scalability. In this talk, I will demonstrate how we can leverage large-scale Internet data to enhance robot learning. Specifically, I will first present pre-training for vision-and-language navigation, where we take advantage of in-domain web image-captions and unlabeled 3D houses to improve models’ generalization capabilities in unseen environments. Next, I will delve into pre-training approaches for more complex robot manipulation which requires fine-grained visual perception and precise control. I will introduce a versatile pre-training framework based on web 3D objects to improve visual perception for robots.
responsibles	Vacher, Blusseau

Workflow history

from state (1)	to state	comment	date
submitted	published		2024/05/30 12:53 UTC

hosted_by

Institut Henri Poincaré

speakers

event_of

Imaging in Paris (séminaire Parisien sur les Mathématiques de l’imagerie, Institut Henri Poincaré (IHP), UAR 839 Sorbonne Université / CNRS) (2023)

Event #1335948 - created on 2024/05/30