Statistiques du langage qui révèlent des caractéristiques de l'organisation de la langue

titleStatistiques du langage qui révèlent des caractéristiques de l'organisation de la langue
start_date2025/12/05
schedule14h
onlineno
location_infosalle A4-32
summaryipf’s law is likely the most famous statistical law of language (Piantadosi 2014), stating that the frequency distribution of language units obeys a power law, with a few highly frequent items on the one hand that dominate the distribution, and a large number of rare events on the other hand (Evert 2004). There is more to it, however: as has been shown, Zipf’s law (or its refined variant, the Zipf-Mandelbrot law, henceforth ZM) also applies at the scale of individual structures, for open structures featuring a spectrum of collocates or “types” (Ellis 2012). For instance, the sequence (also called construction) that’s a N is associated to a frequency distribution over the different nouns that can come into it; not only can this distribution be captured by a ZM model, its associated ranking over nouns is also specific to that sequence and cannot be predicted from the overall ranking of nouns. In spite of its success in capturing key structuring principles, the ZM model fails to account for a number of features of the language organization. First of all, it systematically overpredicts the number of free associations a sequence can form, in a way that is not properly accounted for, even with variants assuming a finite pool of collocates to draw from. Second, “speciations” may occur over the historical course of a construction’s use, such that individual types deviate from the ZM distribution and achieve a higher level of frequency on their own. Why these speciation events happen and how the separate types keep interacting with the construction remain open questions in the field. Third, more complex structures, featuring two interacting schemas of variations (of the form it is ADJ to V; cf. Desagulier 2021), describe the interaction between two Zipfian patterns. Yet, their interaction is markedly different from what a free association between these patterns would predict. These deviations reveal that there is less freedom than could be expected in language production, which extensively relies on formulaic, ready-made elements (Erman & Warren 2000).
responsiblesVignes, Berestycki, Nadal