search

actions - event

state: published

Statistical Topic Modeling of Large Text Corpora

old_uid	12373
title	Statistical Topic Modeling of Large Text Corpora
start_date	2013/04/19
schedule	14h30
online	no
location_info	salle Dussane
summary	Statistical topic models (also known as latent Dirichlet allocation models) provide a flexible framework for extracting interpretable descriptions of large corpora of text documents. This talk will begin by reviewing the basic principles of topic models and discuss how these models are related to other approaches such as latent semantic analysis, matrix factorization techniques, and document clustering. We will illustrate how topic models can be used to address problems such as generating high-level summaries of document collections and automatically uncovering thematic trends in a corpus over time. The talk will also discuss recent extensions of topic modeling techniques such as using topic models for document classification and scalable algorithms for large corpora. Time permitting, we will also discuss how these types of models can be applied to data with relational information, such as social network data involving text content. A number of different text data sets will be used during the talk as illustrative examples, including news articles, historical newspaper records, scientific publications, and collections of email data.
responsibles	<not specified>

hosted_by

Ecole normale supérieure - ENS

speakers

event_of

Conférence (2012)

Event #165844 - latest update on 2022/05/17, created on 2013/04/18