On visual grounding at the age of deep learning

titleOn visual grounding at the age of deep learning
start_date2024/11/18
schedule17h-18h
onlineno
location_infosur Zoom
summaryLanguage models, large and small, have traditionally been criticized for the lack of grounding, i.e. link to extralinguistic reality, including perceptual reality. By now, similar models can process both text and input in other modalities such as visual. The talk will discuss some empirical findings on (visual) grounding in neural models. Does visual grounding produce measurably different types of representations? How can the differences be characterized semantically? What linguistic aspects are multimodal models good at, and what is still lacking?
responsiblesBernard