Tackling Machine Translation of Noisy Text

old_uid17194
titleTackling Machine Translation of Noisy Text
start_date2019/01/11
schedule11h
onlineno
location_infosalle de Reunion C334
summaryDespite their recent success, neural machine translation systems have proven to be brittle in the face of non-standard inputs that are far from their training domain. This is particularly salient for the kind of noisy, user-generated content ubiquitous on social media and the internet in general. In this talk I will present MTNT, our first step to remedy this situation by proposing a testbed for Machine Translation of Noisy Text. MTNT consists of parallel Reddit comments in three languages (English, French, Japanese) exhibiting a large amount of typos, grammar errors, code switching and more. I will discuss the challenges of the collection process, preliminary MT experiments and outlook for future work (and a sneak peek of ongoing follow-up research).
responsiblesSeddah