|
NLP Models for Field Linguistic Annotations in Computational Language Documentation| title | NLP Models for Field Linguistic Annotations in Computational Language Documentation |
|---|
| start_date | 2026/04/16 |
|---|
| schedule | 16h30-17h30 |
|---|
| online | no |
|---|
| location_info | En ligne |
|---|
| summary | More than half of the languages spoken today are considered to be endangered and may disappear by the end of the century. In this context, language documentation is a field of linguistics dedicated to the recording, annotation, and archival of language data. Since such annotations are primarily manual and require expert knowledge and time, computational language documentation aims to develop tools to assist linguists in several documentation steps using Natural Language Processing approaches. This presentation will focus on two of the tasks: (i) word segmentation to identify word boundaries in an unsegmented transcription of a recorded sentence and (ii) automatic interlinear glossing to predict linguistic annotations (glosses) for each word.
For the first task, we improve the performance of the Bayesian non-parametric models used until now through weak supervision, leveraging realistically available resources during documentation. We tackle the second task using a statistical sequence-labelling method that proved to be competitive with neural models. |
|---|
| responsibles | Bernard |
|---|
Workflow history| from state (1) | to state | comment | date |
| submitted | published | | 2026/04/09 09:30 UTC |
| |
|