search

actions - event

state: published
- view history

Multimodal machine learning: the case of vision-language transformers

title	Multimodal machine learning: the case of vision-language transformers
start_date	2023/11/03
schedule	14h-15h
online	no
location_info	Doyen 22 & via Teams
summary	Vision-Language transformer models combine information from the textual and visual modalities to extract multimodal representations. These models can be used as a basis for many multimodal vision-language tasks. Large pre-trained models based on the transformer architecture, inspired by recent advances in Natural Language Processing, have enabled great improvement on those tasks. In this presentation, I will give an overview of vision-language transformer models. I will introduce the different types of models, in terms of architecture and pre-training methods. I will also present the strengths and weaknesses of those different methods. Finally, I will talk about current challenges and emerging trends of research in vision-language machine learning.
responsibles	Rolin

Workflow history

from state (1)	to state	comment	date
submitted	published		2023/10/09 13:25 UTC

hosted_by

Université Catholique de Louvain

speakers

event_of

Traitement automatique du langage (séminaire du Centre de- (CENTAL), Institut Langage et Communication, UCLouvain, Louvain-La-Neuve, Belgique) (2023)

Event #788079 - latest update on 2023/10/09, created on 2023/10/09