Controlling Linguistic Variability in Large Language Models

titleControlling Linguistic Variability in Large Language Models
start_date2026/04/10
schedule11h
onlineno
location_infovisioconférence Big Blue Button
summaryLarge language models (LLMs) have a considerable impact on Natural Language Processing and, more broadly, on our society. Each of their limitations can therefore have major consequences. I propose to study their generalization across several phenomena of linguistic variability: how LLMs learn to model linguistic variability, and how to control it. The stakes of this question are multiple: theoretical, on one hand, since LLM training can be contrasted with language acquisition in humans; social, on the other hand, since language varies according to multiple sociolinguistic factors. Among the different linguistic levels of variability, I will begin by studying three: 1. morphological, when several affixes are in competition; 2. intralinguistic, for variability between dialects of the same language; 3. interlinguistic, for code-switching between multiple languages. For each level, I will analyze the probability that an LLM assigns to each variant, which depends on its calibration. Control of variability can then be achieved by modifying: the model's input, the model itself according to different learning methods, or the decoding method.
responsiblesBawden