search

actions - event

state: published

Multimodal perception and reasoning

title	Multimodal perception and reasoning
start_date	2025/02/21
schedule	11h
online	no
location_info	visioconférence Big Blue Button
summary	Building on the strong textual processing capabilities of large language models, large vision-language models (VLMs) extend LLMs to handle visual inputs. They have brought significant improvements to multi-modal tasks such as visual question answering and image captioning. In particular, they paved the way for tasks involving complex visual reasoning. However, the transfer of LLM's internal knowledge and their reasoning ability to multimodal tasks remains limited. In this talk, I will present two of my recent work on evaluating and improving VLMs' perception and reasoning capabilities.
responsibles	Bawden

Workflow history

from state (1)	to state	comment	date
submitted	published		2025/02/19 12:49 UTC

hosted_by

Institut national de recherche en informatique et en automatique - Inria

speakers

event_of

Automatic language modelling and analysis & computational humanities (séminaire de l’équipe ALMAnaCH, Inria, Paris) (2024)

Event #2904922 - created on 2025/02/13