Large-Margin Methods for Natural Language Learning
Sequential data is everywhere: obvious examples being text (the web, or digital libraries), speech, and biological sequences. Algorithms which recover structure underlying this data are becoming increasingly important. This leads to an interesting class of learning problems: how to learn functions which map strings to other discrete structures such as trees, segmentations, or underlying state sequences? In this talk I will present new algorithms for these problems, derived from Freund and Schapire's voted perceptron algorithm for classification tasks. Properties of the algorithm depend directly on a modified notion of "margin" on training examples. I will describe how the algorithm can be used to rerank N-best output from an existing probabilistic model, using essentially arbitrary features of competing analyses; how the "kernel trick" applied to discrete structures can lead to efficient learning with representations tracking an exponential number of "sub-fragments" of a tree or tagged sequence; and how the algorithm can be used for efficient discriminative training of weighted automata. A first motivation for the new algorithms concerns *representation*: in comparison to hidden markov models, or probabilistic context-free grammars, the methods are considerably more flexible in the features that can be used to discriminate between competing structures. A second theme is *discriminative training*: the parameter estimation methods make relatively weak assumptions about the distribution underlying examples. During the talk I will present experiments with the methods, showing their utility on a number of natural language problems.