|
Multimodal perception and reasoning| title | Multimodal perception and reasoning |
|---|
| start_date | 2025/02/21 |
|---|
| schedule | 11h |
|---|
| online | no |
|---|
| location_info | visioconférence Big Blue Button |
|---|
| summary | Building on the strong textual processing capabilities of large language models, large vision-language models (VLMs) extend LLMs to handle visual inputs. They have brought significant improvements to multi-modal tasks such as visual question answering and image captioning. In particular, they paved the way for tasks involving complex visual reasoning. However, the transfer of LLM's internal knowledge and their reasoning ability to multimodal tasks remains limited. In this talk, I will present two of my recent work on evaluating and improving VLMs' perception and reasoning capabilities. |
|---|
| responsibles | Bawden |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2025/02/19 12:49 UTC |
| |
|