Statistical learning reappraised

old_uid3258
titleStatistical learning reappraised
start_date2007/10/12
schedule11h-12h
onlineno
summaryA seminal article in Science (Saffran et al., 1996) provided strong evidence for infants powerful abilities to extract linguistic representations from simple statistical relations among phonemes or syllables in a toy language, and thus to potentially bootstrap infants into natural language. This work had a tremendous impact on the field of language acquisition to the point that the term statistical learning is often perceived as circumscribed to computing the specific type of statistics proposed by Saffran and colleagues. Recently, however, an article in TICS (Yang, 2004) appeared to seriously challenge the usefulness of the whole enterprise of statistical learning. Yang showed that without prior (innate) linguistic constraints a statistical computational model failed to scale up to a large sample of natural language. In our paper, we show that a computational model that exploits simple transitional probabilities between phonemes at the edge of utterance boundaries (which very often come acoustically presegmented by clearpauses) can successfully discover a large number of words in a corpus of unsegmented child-directed English. Our model is both based on statistical sensitivities attested in infanthood (Mattys,Jusczyk, Luce, and Morgan,1999) and it scales up to natural language. Thus, our model represents a considerable advance that recasts the discussion on the role of statistical learning in language acquisition, and goes beyond the seemingly irreconcilable results produced by Saffran et al. (1996) and Yang (2004). Our results have at least two important theoretical consequences for the field of language acquisition. First, we discuss that the statistical learning endeavor needs to specify both what statistics infants can compute and whether these statistics can reliably signal linguistic structure in natural language. This can be done by a concerted effort that combines on the one side statistical analyses of large computerized samples of natural language (now extensively available), and on the other side experimental data with infants and young children. Second, we show that statistical learning should not be solely associated with computing transitional probabilities of one type. In fact, it encompasses both a class of statistics performed on a variety of perceptual and linguistic stimuli, and a class of potential learning procedures/algorithms. We further discuss how the innate constraints proposed by Yang might be consequences of statistical learning themselves, or might emerge from constraints on speech perception. References: Mattys, S.L., Jusczyk, P.W., Luce, P. A., & Morgan, J.L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465-49. Saffran, J.R., Aslin, R.N., and Newport, E.L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926-928. Yang, C. (2004). Universal grammar, statistics, or both? Trends in Cognitive Sciences, 8, 451-456.
oncancelsalle à confirmer
responsiblesPélissier