Acquisition of syntactic categories: a distributional approach


Mémoire de master 2
Auteur(s) : CHEMLA, Emmanuel
Directeur(s) : Anne Christophe
Date de soutenance : 2005
Intitulé de la formation : Master de sciences cognitives (Paris)
Format electronique :
Cote : Master 941
Résumé : The problem I wanted to address was : what type of information is available in the sample of language infants are exposed to when learning their mother tongue to recover syntactic categories ? More precisely, I proposed to investigate Toben Mintz' proposal relying on what he called Frequent Frames. He already found that they lead to the constitution of highly accurate groups in English (around 95%). I extended this result to French with the difference though that with the French Corpus I tested, Frequent Frames did not capture a portion of the corpus as impressive as they did in English (about half of the types of the studied corpora were categorized with 45 Frames in English agaisnt less than 10% forr 30 Franch Frequent Frames). I tried to apply the same mechanisme to Cantonese and Spanish corpora; results were not as good as in English and Franch, especially for Cantonese. Nevertheless accuracy still reached 70%. I tried to argue that this result does not mean that Frequent Frames were useless. Frequent Frames constitute a source of information that generally needs to be supplementes with other mechanisms. I have illustrated how such mechanisms could rely on other sources of information. At any rate, Frequent Frames seem to be the best algorithm relying on distributional information available up to now. Moreover, the comparison with other attempts lead to the conclusion that two properties of Frequent Frames were crucial for their efficiency : their discontinuity and their item-specificity.
Mots clés :