|
Parse Correction with Specialized Models for Difficult Attachment Types| old_uid | 10031 |
|---|
| title | Parse Correction with Specialized Models for Difficult Attachment Types |
|---|
| start_date | 2011/05/27 |
|---|
| schedule | 11h-13h |
|---|
| online | no |
|---|
| summary | If statistical syntactic parsing has the advantage of offering a
technique for disambiguating and recovering syntactic structure from a
sentence, it also has the disadvantage of being subject to coverage
problems for different linguistic phenomena in the treebank used for
training. Our goal is to improve parsing performance for syntactic
structures that are difficult to recover accurately, in particular for
coordination and prepositional phrase (pp-) attachment. In this
presentation I will focus on parse correction, which tries to make the
most of the training data at our disposal by performing a second pass
after parsing that reconsiders individual attachments using richer
contextual information. I will also discuss initial work on a method
meant to address lexical coverage problems in the treebank used for
training: the injection of lexical association scores, calculated
automatically over a large text corpus, into a parse correction model
for pp-attachment.
In our approach to syntactic dependency parse correction, attachments
in an input parse tree are revised by choosing, for a given dependent,
the best governor from within a small set of candidates. Assuming that
a dependency parser's predicted parse tree for a sentence is mostly
accurate, parse correction can revise attachments by using the parse
tree's syntactic structure to restrict the set of candidate governors
and extract a rich set of features over the syntactic context to help
choose among the candidates. We consider a general corrective model
that can be applied to all dependents in the output trees of a
dependency parser, and we additionally explore specialized corrective
models specific to coordination and pp-attachment. These two phenomena
are often investigated as isolated problems, but here we treat them in
the more realistic context of syntactic parsing. Our specialized
corrective models are separately trained, and include expanded feature
sets specific to the type of attachment to be corrected. For
pp-attachment, in particular, the expanded feature set includes
lexical association scores between the pp and each candidate
governor. These lexical association scores are acquired automatically
through distributional methods over a large corpus, using classic
collocation extraction measures like mutual information, ttest, and
likelihood ratio.
In initial experiments, we obtain improvements in unlabeled attachment
score over two state-of-the-art statistical syntactic dependency
parsers for French (MaltParser and MSTParser). In addition to
presenting the results of these experiments, I will discuss
possibilities for improving both the dependency parse correction
algorithm and the methods used for acquiring and injecting lexical
association scores. |
|---|
| responsibles | Crabbé |
|---|
| |
|