Data quality for low-resource MT

old_uid19293
titleData quality for low-resource MT
start_date2021/07/02
schedule16h
onlineno
summaryIn this talk I will present the findings of a collaborative audit of multilingual corpora, with special attention for low-resourced languages. We will discuss the challenges that come with building such corpora, and the risks of using them without inspection. With a case study on a subset of African languages I will illustrate the implications of building machine translation on low-quality parallel data.
responsiblesSeddah