|
Multimodal machine learning: the case of vision-language transformerstitle | Multimodal machine learning: the case of vision-language transformers |
---|
start_date | 2023/11/03 |
---|
schedule | 14h-15h |
---|
online | no |
---|
location_info | Doyen 22 & via Teams |
---|
summary | Vision-Language transformer models combine information from the textual and visual modalities to extract multimodal representations. These models can be used as a basis for many multimodal vision-language tasks. Large pre-trained models based on the transformer architecture, inspired by recent advances in Natural Language Processing, have enabled great improvement on those tasks.
In this presentation, I will give an overview of vision-language transformer models. I will introduce the different types of models, in terms of architecture and pre-training methods. I will also present the strengths and weaknesses of those different methods. Finally, I will talk about current challenges and emerging trends of research in vision-language machine learning. |
---|
responsibles | Rolin |
---|
Workflow historyfrom state (1) | to state | comment | date |
submitted | published | | 2023/10/09 13:25 UTC |
| |
|