Computational morphology needs better lexical data

titleComputational morphology needs better lexical data
start_date2025/04/16
schedule16h30-17h30
onlineno
location_infoEn ligne
summaryRich resources are key to support comparative studies in computational linguistics. Yet existing datasets face a number of issues, among which problems of: – coverage: current resources document only a small proportion of the worlds languages – commensurability: it is rarely straightforward to compare resources – consistent presentation: many datasets fall short of machine readability due to small variations in coding – durability: project funding is temporary and data maintenance beyond their term is rarely ensured – technical skills: good data management require technical skills which are rarely taught to linguists. I illustrate this potential and these issues on the case of inflected resources for quantitative morphology. I outline a path to improvement through standardisation (specifically the Paralex standard: http://www.paralex-standard.org) and large scale international coordination.
responsiblesBernard