|
End-to-end speech recognition from the raw waveform| old_uid | 15963 |
|---|
| title | End-to-end speech recognition from the raw waveform |
|---|
| start_date | 2018/05/30 |
|---|
| schedule | 14h-15h30 |
|---|
| online | no |
|---|
| location_info | salle C005 |
|---|
| details | TALC seminar |
|---|
| summary | State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. We study end-to-end systems trained directly from the raw waveform, introducing a trainable replacement of mel-filterbanks that uses a convolutional architecture, based on the scattering transform. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of melfilterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We then improve this model and another frontend previously proposed and based on gammatones. We perform open vocabulary experiments on Wall Street Journal and show a consistent and significant improvement in Word Error Rate of our trainable frontends over mel-filterbanks, even with random initialization. |
|---|
| responsibles | Dutech |
|---|
| |
|