search

actions - event

state: published
- cancelpublished
- view workflow

Perspectives pour la structuration et le traitement automatique des informations textuelles issues de bases de données scientifiques et techniques

old_uid	10506
title	Perspectives pour la structuration et le traitement automatique des informations textuelles issues de bases de données scientifiques et techniques
start_date	2011/12/05
schedule	10h30
online	no
location_info	salle de séminaires 4B08R
summary	How efficient is modeling and automatically processing multi-source scientific and technical information mediated by a large set of documents ? Scientific and technical text analysis has been receiving rising attention within the social sciences through an increasing amount of text in electronic format and the explosion of digital databases/libraries. This textual data may principally come from articles and patents, but also from specialized databases such as financial and scientific projects databases, economics news, surveys, and far more from bibliographic websites or the blogosphere. In order to allow efficient access and use of this information, several challenges must be overcome: at an organizational level it is necessary to constitute work teams, policies and agreements, and to facilitate the access to information collected and produced. At a technical level, the approach to how to process heterogeneous textual data should be discussed, along with other aspects, such as the treatment of large-scale corpus, reduction of noise contained, possible duplication, multilingualism, and several further computer/user processing tasks. But where should we start? The automatic processing of multi-source scientific and technical information involves various computer sciences disciplines: data & knowledge engineering, text mining, natural language processing, information retrieval and visualization, or software ergonomics. To begin with, it is necessary to propose a sort of ‘meeting point ’: a framework where to bring together these disciplines in a focused way. Unfortunately, the heterogeneous and dynamic nature of the information, does not make that task easier. In this talk, we will present an approach to gathering, modeling, and preserving large-scale textual information, by linking bits of information, normalizing them and enriching this data. We are going to talk about an open source modular framework (in pre-alpha developement), called Scilmarin, designed to allow the automatic processing of large-scale multi-source textual information derived from scientific and technical databases. Also, we will present XML, a draft specification to model scientific and technical data. Finally, we will explore the possibility of Scilmarin to assume tasks involving automatic language processing, using other software tools such as Unitex.
responsibles	Sigogne, Rakho

hosted_by

Cité Descartes

speakers

event_of

Informatique Linguistique (séminaire de l’équipe du Laboratoire d'Informatique Gaspard Monge, LIGM) (2011)

Event #163978 - latest update on 2022/05/17, created on 2011/11/17