COMP 150TP: Text Processing: Inductive Techniques and Applications
Course Web page (this page)
Prerequisites: COMP 160 (Algorithms) or permission by
Class Times: Monday, Wednesday 5:10-6:30pm
Office: Halligan 230
Office Hours: Tue, Wed 3:30-4:30pm or by appointment
Text and Notes
Foundations of Statistical Language Processing, Christopher Manning
and Hinrich Schutze, 1999, MIT Press.
The book's web page
includes an errata list as well as useful resources
- Additional articles will be distributed to supplement the text.
Other Recommended Texts
James Allen, Natural Language Understanding, Addison-Wesley, 1995.
Eugene Charniak, Statistical Language Learning, MIT Press, 1993.
Daniel Jurafsky and James Martin, Speech and Language Processing,
Prentice Hall, 2000.
- Tom Mitchell, Machine Learning, McGraw-Hill, 1997.
Additional Articles and Pointers
SENSEVAL project home page
- A report/summary
English SENSEVAL: Report and Results A. Kilgarriff and J. Rosenzweig
Comparative Experiments on Disambiguating Word Senses: An Illustration
of the Role of Bias in Machine Learning
Raymond J. Mooney Proceedings
of the 1996 Conference on Empirical Methods in Natural Language
Processing, pp. 82-91, Philadelphia, PA, May 1996.
A Winnow-Based Approach to Spelling Correction
A. R. Golding and D. Roth,
Machine Learning, Volume 34, pp. 107-130 ,1999.
The Weighted Majority Algorithm,
N. Littlestone and M. Warmuth,
Information and Computation, Vol. 108, No. 2, pp. 212-261, 1994.
Learning Quickly when irrelevant attributes abound.
N. Littlestone, Machine Learning, 2:285-318, 1988.
Part of Speech Tagging Using a Network of Linear Separators
D. Roth & D. Zelenko.
- User Guides for
and its feature extractor
Transformation-Based Error-Driven Learning and Natural Language
Processing: A Case Study in Part of Speech Tagging
Computational Linguistics, Dec. '95
- An introduction to the application of the theory of probabilistic
functions of a Markov Process to automatic speech recognition.
Rabiner, Levinson, and Sondhi, The Bell System Technical Journal, Vol
62, No 4, , pages 1035-1074, 1983.
Three Generative, Lexicalised Models for Statistical Parsing
Proceedings of the 35th Annual Meeting of the ACL.
Michael Collins, 1997.
Written Homework Assignments
Computer-Based Homework Assignments
Practical Exercise 1
(identify whether a document is written is English, Spanish or French)
Practical Exercise 2
(word-sense disambiguation using the Naive-Bayes algorithm)
Practical Exercise 3
(part-of-speech tagging with SNoW)
Practical Exercise 4
(working with PCFGs)