German Text Simplification: Scarce Data and Other Challenges

titleGerman Text Simplification: Scarce Data and Other Challenges
start_date2023/12/01
schedule14h
onlineno
location_infoDoyen 22 & via Teams
summaryText simplification is an intra-lingual translation task in which documents or sentences of a complex source text are simplified for a specific target audience. Many new models for text simplification have been proposed in recent years and months, but unfortunately, we often cannot be very sure of their quality. In most cases, we know too little about the training data and what kind of simplification we can expect from the models. In addition, we too often rely on controversial automatic evaluations, especially in languages other than English. In our view, the success of automatic text simplification systems depends as much or even more on the quality of the parallel data used for training and evaluation than on the text simplification models themselves. This talk will look at each point of the text simplification pipeline, particularly the data and annotation aspect, and discuss how it could be improved. For example, it will include i) facilitating the construction of new high-quality text simplification corpora, ii) improving existing corpora through new annotations, including annotations of a) simplification operations, b) quality assessment, and c) error operations, and iii) rethinking the current evaluation process. We will illustrate the problematic areas using German texts as an example.
responsiblesRolin