Statistical Topic Modeling of Large Text Corpora

old_uid12373
titleStatistical Topic Modeling of Large Text Corpora
start_date2013/04/19
schedule14h30
onlineno
location_infosalle Dussane
summaryStatistical topic models (also known as latent Dirichlet allocation models) provide a flexible framework for extracting interpretable descriptions of large corpora of text documents. This talk will begin by reviewing the basic principles of topic models and discuss how these models are related to other approaches such as latent semantic analysis, matrix factorization techniques, and document clustering. We will illustrate how topic models can be used to address problems such as generating high-level summaries of document collections and automatically uncovering thematic trends in a corpus over time.  The talk will also discuss recent extensions of topic modeling techniques such as using topic models for document classification and scalable algorithms for large corpora. Time permitting, we will also discuss how these types of models can be applied to data with relational information, such as social network data involving text content. A number of different text data sets will be used during the talk as illustrative examples, including news articles, historical newspaper records, scientific publications, and collections of email data.
responsibles<not specified>