Palenquero 2.0 — Open NLP Strategies for Corpus Building, Parsing, and Text Processing for Palenquero Creole (Colombia)

titlePalenquero 2.0 — Open NLP Strategies for Corpus Building, Parsing, and Text Processing for Palenquero Creole (Colombia)
start_date2026/01/19
schedule14h-16h30h
onlineno
location_infosalle 124 & sur zoom
summaryPalenquero 2.0 — Open NLP Strategies for Corpus Building, Parsing, and Text Processing for Palenquero Creole (Colombia) Daniel Jimenez-Casas (U. Pompeu Fabra) & Cristina de la Hoz Márquez (Kribí) Palenquero is a Spanish-based endangered creole language from Colombia with very limited resources for applying natural language processing (NLP) techniques. The project presented here seeks to promote the digital use of Palenquero and help increase its digital vitality by ensuring the availability of digital language resources and providing the necessary technical support to collect and curate it. Surveying the digital vitality of Palenquero, collecting a corpus, evaluating pipelines for text normalisation, and testing methods for automated part-of-speech tagging and parsing are the core activities of this project. As part of the corpus collection and a visiting fellowship project at the Leibniz Centre for General Linguistics (ZAS) in Berlin, I am currently using NLP techniques to look into the predicate negation in Palenquero. Palenquero features three types negation: preverbal, pre- and postverbal, and strictly postverbal. This last form is the most common, a typologically rare feature among the world’s languages and creoles (Dieck, 2000; Schwegler, 2013). Schwegler (1991) suggested the variation had pragmatic causes. However, Dieck (2000, 2002) argued the phenomenon had to do with semantics and morphosyntactic features. The discussion is still open and authors do not seem to come to an agreement (Schwegler, 2018). To contribute to the understanding of the predicate negation in Palenquero, I have proposed a corpus-based study with the goal of using natural language processing (NLP) techniques to understand how changes in register may influence the choice of negation patterns in Palenquero and contribute to the computational documentation of this endangered creole language. KRIBÍ (www.kribi.com.co) is an online initiative featuring language tools designed to promote the use of Palenquero creole on the internet. Launched in 2018, it includes a dictionary, a document collection, games, a news board, and social media channels. The project forms part of a broader set of efforts to strengthen both Palenquero language and culture, which also includes the International Film Festival Evaristo Márquez and a series of print publications — A ten mbila (It’s Alive) and Chitieno Luenga Suto (Speaking Palenquero).
responsiblesCabredo Hofherr