search

actions - event

state: published
- cancelpublished
- view workflow

End-to-end speech recognition from the raw waveform

old_uid	15963
title	End-to-end speech recognition from the raw waveform
start_date	2018/05/30
schedule	14h-15h30
online	no
location_info	salle C005
details	TALC seminar
summary	State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization.
responsibles	Dutech

hosted_by

Laboratoire lorrain de recherche en informatique et ses applications - LORIA

speakers

event_of

Image, Perception, Action et Cognition (séminaire iPAC du LORIA, UMR 7503 CNRS, Université de Lorraine et Inria, Nancy) (2017)

Event #169434 - latest update on 2022/05/17, created on 2018/05/23