|
On visual grounding at the age of deep learning| title | On visual grounding at the age of deep learning |
|---|
| start_date | 2024/11/18 |
|---|
| schedule | 17h-18h |
|---|
| online | no |
|---|
| location_info | sur Zoom |
|---|
| summary | Language models, large and small, have traditionally been criticized for the lack of grounding, i.e. link to extralinguistic reality, including perceptual reality. By now, similar models can process both text and input in other modalities such as visual. The talk will discuss some empirical findings on (visual) grounding in neural models. Does visual grounding produce measurably different types of representations? How can the differences be characterized semantically? What linguistic aspects are multimodal models good at, and what is still lacking? |
|---|
| responsibles | Bernard |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2024/11/06 14:22 UTC |
| |
|