search

actions - event

state: published

NLP Models for Field Linguistic Annotations in Computational Language Documentation

title	NLP Models for Field Linguistic Annotations in Computational Language Documentation
start_date	2026/04/16
schedule	16h30-17h30
online	no
location_info	En ligne
summary	More than half of the languages spoken today are considered to be endangered and may disappear by the end of the century. In this context, language documentation is a field of linguistics dedicated to the recording, annotation, and archival of language data. Since such annotations are primarily manual and require expert knowledge and time, computational language documentation aims to develop tools to assist linguists in several documentation steps using Natural Language Processing approaches. This presentation will focus on two of the tasks: (i) word segmentation to identify word boundaries in an unsegmented transcription of a recorded sentence and (ii) automatic interlinear glossing to predict linguistic annotations (glosses) for each word. For the first task, we improve the performance of the Bayesian non-parametric models used until now through weak supervision, leveraging realistically available resources during documentation. We tackle the second task using a statistical sequence-labelling method that proved to be competitive with neural models.
responsibles	Bernard

Workflow history

from state (1)	to state	comment	date
submitted	published		2026/04/09 09:30 UTC

speakers

event_of

Interactions entre linguistiques formelles et computationnelles (séminaire ILFC du GDR LIFT - Linguistique Informatique, Formelle et de Terrain) (2025)

Event #5261189 - created on 2026/03/10