|
Statistical learning reappraisedold_uid | 3258 |
---|
title | Statistical learning reappraised |
---|
start_date | 2007/10/12 |
---|
schedule | 11h-12h |
---|
online | no |
---|
summary | A seminal article in Science (Saffran et al., 1996) provided strong evidence for infants
powerful abilities to extract linguistic representations from simple statistical relations
among phonemes or syllables in a toy language, and thus to potentially bootstrap infants
into natural language.
This work had a tremendous impact on the field of language acquisition to the point that the term statistical learning is often perceived as circumscribed to computing the specific type of statistics proposed by Saffran and colleagues. Recently, however, an article in TICS (Yang, 2004) appeared to seriously challenge the usefulness of the whole enterprise of statistical learning. Yang showed that without prior (innate) linguistic constraints a statistical computational model failed to scale up to a large sample of natural language.
In our paper, we show that a computational model that exploits simple transitional
probabilities between phonemes at the edge of utterance boundaries (which very often
come acoustically presegmented by clearpauses) can successfully discover a large number
of words in a corpus of unsegmented child-directed English.
Our model is both based on statistical sensitivities attested in infanthood (Mattys,Jusczyk,
Luce, and Morgan,1999) and it scales up to natural language. Thus, our model represents
a considerable advance that recasts the discussion on the role of statistical learning in
language acquisition, and goes beyond the seemingly irreconcilable results produced by
Saffran et al. (1996) and Yang (2004).
Our results have at least two important theoretical consequences for the field of language
acquisition. First, we discuss that the statistical learning endeavor needs to specify both
what statistics infants can compute and whether these statistics can reliably signal
linguistic structure in natural language. This can be done by a concerted effort that
combines on the one side statistical analyses of large computerized samples of natural
language (now extensively available), and on the other side experimental data with infants
and young children. Second, we show that statistical learning should not be solely
associated with computing transitional probabilities of one type. In fact, it encompasses
both a class of statistics performed on a variety of perceptual and linguistic stimuli, and a
class of potential learning procedures/algorithms. We further discuss how the innate
constraints proposed by Yang might be consequences of statistical learning themselves, or
might emerge from constraints on speech perception.
References:
Mattys, S.L., Jusczyk, P.W., Luce, P. A., & Morgan, J.L. (1999). Phonotactic and
prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465-49.
Saffran, J.R., Aslin, R.N., and Newport, E.L. (1996). Statistical learning by 8-month-old
infants. Science, 274, 1926-928.
Yang, C. (2004). Universal grammar, statistics, or both? Trends in Cognitive Sciences, 8,
451-456. |
---|
oncancel | salle à confirmer |
---|
responsibles | Pélissier |
---|
| |
|