Numerical simulation of voice quality

old_uid12197
titleNumerical simulation of voice quality
start_date2016/11/10
schedule11h-12h30
onlineno
summaryThe presentation is devoted to the investigation of the production, tuning and perception of vocal jitter, which is a salient feature of the voice quality of a majority of speakers (mild voice disorders included). Vocal jitter here designates the instantaneous vocal frequency jitter of the glottal airflow rate as well as the peak amplitude jitter (a.k.a. shimmer) of the speech cycles. The latter is generated via modulation distortion in the vocal tract with a frequency to amplitude jitter ratio of approximately 1 : 10. Source-internal amplitude shimmer is indeed dampened out owing to the self-sustained oscillations of the vocal folds as well as left-right coupling. Production and tuning of vocal jitter must be discussed in the framework of numerical simulations because it cannot be observed directly in living subjects and it does not exist in analogue models or in vitro vibrations of the vocal folds. Similarly, the investigation of the perception of vocal jitter per se and the influence of segmental and suprasegmental features on the perception thereof must also rely on synthetic stimuli because human speakers are unable to set the size of jitter freely. The presentation includes a short introduction to voice quality and its main acoustic features as well as a description of the synthesizer and a brief examination of the nefarious influence on synthetic voice quality of numerical instabilities of glottal models with a small number of degrees of freedom. Possible distal and proximal causes of vocal jitter are discussed. Of these, only the TA muscle twitch model is kept for simulation. It explains vocal frequency jitter in terms of muscle tension jitter, which is the outcome of the concurrent activity of several motor units. The control parameters of the jittered muscle tension are the dead time and firing rate of the motor neurones, the number of active motor units as well as the duration of the muscle fibre twitch response. The presentation also includes a discussion of the fine-tuning of the vocal jitter that occurs fold-internally and that is unrelated to motor unit activity per se. Fold-internal tuning is expected to occur because vocal frequency jitter is, for instance, observed to decrease with increasing vocal frequency and in some lax voices, whereas it is anticipated to increase in disorders and speaking conditions that involve a rise in the viscosity of the cover of the vocal folds. A straightforward explanation of internal jitter tuning is the asymmetric weighting of the body-cover coupling, the weight being the ratio of the amplitudes of vibration of the body and the cover. Finally, part of the presentation is devoted to the reporting of the perception of hoarseness in the presence of pitch drift, which is of practical relevance because pitch down (or up) drift is the rule rather than the exception in sustained vowels as well as connected speech. Perceptual experiments with three families of vocal frequency contours and two levels of vocal frequency jitter show that the contour has a large influence on the perception of hoarseness in sustained vowels, whatever the perceptual focus of the listeners. The presentation ends with a discussion of the relative contribution of frequency jitter and amplitude shimmer to the perception of roughness and hoarseness in sustained speech sounds.
responsiblesHueber