Spring 2018 Course Descriptions
Natural language processing toolkits such as NLTK, Apache OpenNLP, and Stanford CoreNLP are in wide use as opaque boxes that mysteriously transform text into presumably useful data structures for working with text. In this course we will study the mathematics and algorithms behind these and other NLP toolkits to better understand how they do what they do. We will meet the Singular Value Decomposition and Conditional Random Fields, among other bits of mathematics. We will code several topic modeling and tagging algorithms that use these bits of math--from scratch--applying what we learn in hands-on projects. We will come away with a deeper understanding of how text is processed by a computer.
Prerequisite: Linear algebra (MATH 0070, MATH 0072 or equivalent). Statistics (ES 56 or equivalent). Computer programming. Or consent of instructor.