How and Why to Deal with Human Label Variation in NLP

titleHow and Why to Deal with Human Label Variation in NLP
start_date2023/12/15
schedule14h-15h
onlineno
location_infoMore 56 (GPLO-DROIT) & via Teams
summaryHuman variation in labeling is typically considered noise. Annotation projects in computer vision and natural language processing typically aim at minimizing human label variation, to maximize data quality and in turn optimize and maximize machine learning metrics. However, variation in human labeling is ubiquitous, and the typical approach of minimizing human label variation by aggregation disregards human label variation. There exists increasing evidence that human label variation is signal rather than noise. In this talk, I will first illustrate the problem and then discuss approaches to tackle this fundamental issue at the interplay of language resources, data quality, machine learning modeling and evaluation. Overall, I will argue that looking at human label variation is critical for devising more human-facing, trustworthy language technology.
responsiblesRolin