search

actions - event

state: published

Typologically fair NLP

title	Typologically fair NLP
start_date	2022/12/02
schedule	14h-15h
online	yes
location_info	en présentiel et en visioconférence via le lien Teams
summary	The field of NLP has historically had a strong bias towards work that primarily uses English as a language of investigation. The situation is changing and multilingual NLP is booming. This talk starts with a description of the state of multilingual NLP, highlighting both its successes and its limitations. In particular, large multilingual pretrained models (PLM) such as mBERT or XLM-R have shown surprising cross-lingual capabilities but they cover a small fraction of the world's languages with large inequalities in performance. These inequalities stem from at least two sources: 1) NLP datasets are highly imbalanced with regards to typological diversity and 2) NLP models tend to be developed for English first and then adapted to other languages, which leads to biases in the model assumptions. I describe attempts at overcoming both of these limitations. To overcome data imbalance, I describe a method from algorithmic fairness which samples data from different sources in a way that is more robust to underrepresented languages than alternative sampling methods. To overcome model assumption biases, I describe a PLM which uses pixel-based representations of language instead of the commonly used subword representations. I conclude with some directions for working towards typologically fairer NLP.
responsibles	Rolin

Workflow history

from state (1)	to state	comment	date
submitted	published		2022/11/29 16:06 UTC

hosted_by

Université Catholique de Louvain

speakers

event_of

Traitement automatique du langage (séminaire du Centre de- (CENTAL), Institut Langage et Communication, UCLouvain, Louvain-La-Neuve, Belgique) (2022)

Event #393157 - latest update on 2022/11/29, created on 2022/11/29