|
Distributional Paraphrasing with Distributional and Hybrid Semantic Distance Measures| old_uid | 15846 |
|---|
| title | Distributional Paraphrasing with Distributional and Hybrid Semantic Distance Measures |
|---|
| start_date | 2015/06/19 |
|---|
| schedule | 11h |
|---|
| online | no |
|---|
| location_info | salle 165 |
|---|
| summary | Semantic distance measures estimate how close in meaning two words or
phrases (or larger text units) are. These measures are useful in
paraphrase generation, which in turn, is useful in NLP tasks such as
statistical machine translation (SMT), information retrieval (IR),
syntactic parsing, summarization and language generation. I will
start with presenting semantic measures: Lexicon-based semantic
measures rely a dictionary, thesaurus, or taxonomy (e.g., WordNet),
while distributional measures rely instead only on word distributions
in a large corpus of non-annotated text (word2vec being a recent
example). Lexicon-based measures tend to have higher correlation with
human judgments, but lower coverage than distributional measures,
especially for multi-word terms, specialized domains, resource-poor
("low density") languages, or non-classical semantic relations.
Therefore, we are motivated to explore hybrid lexicon/corpus-based
models that would benefit from both worlds. Previous work used shallow
thesaurus-based “concepts” (lists of related words) for defining a
coarse-grained aggregated distributional representation. I will show
that finer granularity, in hybrid models, can benefit from concept
information while retaining high-coverage word-based distributional
representation. Next, I will present a largely language-independent
distributional paraphrase generation method, employing some of these
semantic measures. Time permitting, I will conclude with describing
the integration and evaluation of paraphrasing in state-of-the-art SMT
and in the IR task of event discovery and annotation. |
|---|
| responsibles | Candito |
|---|
| |
|