|
Tackling Machine Translation of Noisy Text| old_uid | 17194 |
|---|
| title | Tackling Machine Translation of Noisy Text |
|---|
| start_date | 2019/01/11 |
|---|
| schedule | 11h |
|---|
| online | no |
|---|
| location_info | salle de Reunion C334 |
|---|
| summary | Despite their recent success, neural machine translation systems have proven to be brittle in the face of non-standard inputs that are far from their training domain. This is particularly salient for the kind of noisy, user-generated content ubiquitous on social media and the internet in general.
In this talk I will present MTNT, our first step to remedy this situation by proposing a testbed for Machine Translation of Noisy
Text. MTNT consists of parallel Reddit comments in three languages (English, French, Japanese) exhibiting a large amount of typos, grammar errors, code switching and more. I will discuss the challenges of the collection process, preliminary MT experiments and outlook for future work (and a sneak peek of ongoing follow-up research). |
|---|
| responsibles | Seddah |
|---|
| |
|