|
German Text Simplification: Scarce Data and Other Challenges | title | German Text Simplification: Scarce Data and Other Challenges |
|---|
| start_date | 2023/12/01 |
|---|
| schedule | 14h |
|---|
| online | no |
|---|
| location_info | Doyen 22 & via Teams |
|---|
| summary | Text simplification is an intra-lingual translation task in which documents or sentences of a complex source text are simplified for a specific target audience. Many new models for text simplification have been proposed in recent years and months, but unfortunately, we often cannot be very sure of their quality. In most cases, we know too little about the training data and what kind of simplification we can expect from the models. In addition, we too often rely on controversial automatic evaluations, especially in languages other than English. In our view, the success of automatic text simplification systems depends as much or even more on the quality of the parallel data used for training and evaluation than on the text simplification models themselves.
This talk will look at each point of the text simplification pipeline, particularly the data and annotation aspect, and discuss how it could be improved. For example, it will include i) facilitating the construction of new high-quality text simplification corpora, ii) improving existing corpora through new annotations, including annotations of a) simplification operations, b) quality assessment, and c) error operations, and iii) rethinking the current evaluation process. We will illustrate the problematic areas using German texts as an example. |
|---|
| responsibles | Rolin |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2023/11/29 13:14 UTC |
| |
|