|
Statistical Topic Modeling of Large Text Corporaold_uid | 12373 |
---|
title | Statistical Topic Modeling of Large Text Corpora |
---|
start_date | 2013/04/19 |
---|
schedule | 14h30 |
---|
online | no |
---|
location_info | salle Dussane |
---|
summary | Statistical topic models (also known as latent Dirichlet
allocation models) provide a flexible framework for extracting
interpretable descriptions of large corpora of text documents. This talk
will begin by reviewing the basic principles of topic models and discuss
how these models are related to other approaches such as latent semantic
analysis, matrix factorization techniques, and document clustering. We
will illustrate how topic models can be used to address problems such as
generating high-level summaries of document collections and
automatically uncovering thematic trends in a corpus over time. The
talk will also discuss recent extensions of topic modeling techniques
such as using topic models for document classification and scalable
algorithms for large corpora. Time permitting, we will also discuss how
these types of models can be applied to data with relational
information, such as social network data involving text content. A
number of different text data sets will be used during the talk as
illustrative examples, including news articles, historical newspaper
records, scientific publications, and collections of email data. |
---|
responsibles | <not specified> |
---|
| |
|