Parse Correction with Specialized Models for Difficult Attachment Types

old_uid10031
titleParse Correction with Specialized Models for Difficult Attachment Types
start_date2011/05/27
schedule11h-13h
onlineno
summaryIf statistical syntactic parsing has the advantage of offering a technique for disambiguating and recovering syntactic structure from a sentence, it also has the disadvantage of being subject to coverage problems for different linguistic phenomena in the treebank used for training. Our goal is to improve parsing performance for syntactic structures that are difficult to recover accurately, in particular for coordination and prepositional phrase (pp-) attachment. In this presentation I will focus on parse correction, which tries to make the most of the training data at our disposal by performing a second pass after parsing that reconsiders individual attachments using richer contextual information. I will also discuss initial work on a method meant to address lexical coverage problems in the treebank used for training: the injection of lexical association scores, calculated automatically over a large text corpus, into a parse correction model for pp-attachment. In our approach to syntactic dependency parse correction, attachments in an input parse tree are revised by choosing, for a given dependent, the best governor from within a small set of candidates. Assuming that a dependency parser's predicted parse tree for a sentence is mostly accurate, parse correction can revise attachments by using the parse tree's syntactic structure to restrict the set of candidate governors and extract a rich set of features over the syntactic context to help choose among the candidates. We consider a general corrective model that can be applied to all dependents in the output trees of a dependency parser, and we additionally explore specialized corrective models specific to coordination and pp-attachment. These two phenomena are often investigated as isolated problems, but here we treat them in the more realistic context of syntactic parsing. Our specialized corrective models are separately trained, and include expanded feature sets specific to the type of attachment to be corrected. For pp-attachment, in particular, the expanded feature set includes lexical association scores between the pp and each candidate governor. These lexical association scores are acquired automatically through distributional methods over a large corpus, using classic collocation extraction measures like mutual information, ttest, and likelihood ratio. In initial experiments, we obtain improvements in unlabeled attachment score over two state-of-the-art statistical syntactic dependency parsers for French (MaltParser and MSTParser). In addition to presenting the results of these experiments, I will discuss possibilities for improving both the dependency parse correction algorithm and the methods used for acquiring and injecting lexical association scores.
responsiblesCrabbé