Comp 136: Statistical Pattern Recognition
Department of Computer Science
Tufts University
Spring 2019

Course Web Page: http://www.cs.tufts.edu/comp/136/

Quick Links:         Office Hours         Piazza         Textbook         Schedule

Announcement(s):
  • (4/23) A review session is scheduled on Tuesday (4/30) in 111A (same room for class) held by TA.
  • (4/1) Finalized remainder of schedule. Posted report guidelines under Course Work and Marking. Posted final exam information under Schedule.
  • (3/25) Updated schedule.
  • (3/25) Solution weight vector file irlsw.csv updated.
  • (2/28) Check Piazza for a few announcements.
  • (2/24) Please pick up your graded quiz 1 re-do from a TA OH.
  • (2/22) Please write the number of hours you spent for every homework in the future at the beginning of the homework.
  • (2/22) Updated clarifications for expected results of pp2. Check Piazza for detail clarification.
  • (2/19) Updated course work and marking policy. Extended deadline of programming project 2 to Tuesday, March 5.
  • (2/16) Updated schedule. Check Piazza for question regarding prerequisites review session on Monday, February 18. TA OH will be held on Monday from 7-8p.
  • (2/14) Updated schedule. Check Piazza for schedule-related announcement (including information on "re-do" of first quiz).
  • (2/10) Reading through March 7 posted.
  • (2/6) A new data set added to verify the perplexity calculation for pp1 .
  • (2/5) Xinmeng's OH is posted.
  • (1/30) Starting next week: Rishit's OH will move to 6-7p on Thursday. Xinmeng's OH will be finalized via Piazza poll (please vote by Sunday evening).
  • (1/23) Policy on lecture notes posted to Piazza.
  • (1/15) Course information posted.

Course Overview:

Machine learning is a headline-making field that has already influenced many areas of our lives including healthcare, finance, and communication and has the potential to make transformative changes in others such as education and criminal justice, to name just a few. Its applicability across a variety of domains speaks to the robustness and maturity of the statistical tools underlying much of modern machine learning. This course provides a comprehensive introduction to such tools including probabilistic models and their associated algorithms with a particular focus on Bayesian modeling. The main concepts are developed rigorously with written assignments used for reinforcing ideas and programming projects used for grounding them in practical machine learning contexts. By the end of the course, students should feel comfortable defining new models suitable for their own applications and developing the corresponding algorithms. The (tentative) list of topics includes regression and classification problems, model selection, kernel methods, and graphical models. The course requires background in various areas including calculus, algebra, probability, algorithms, optimization, and programming. We provide brief reviews of relevant material during lectures for students who have not covered all of these areas or have but feel "rusty".

Prerequisites:

MATH 42; MATH 70; EE 104 or MATH 162; COMP 40 or COMP 105 or a programming course using Matlab or Python. COMP 135, or COMP 131 are recommended but not required. Or permission of instructor.

Class Times:

(Q+ Block) TR 7:30-8:45p, Halligan Hall 111A

Course Staff:


Rishit        

Instructor:

Rishit Sheth
Office Hours: Thurs 6-7pm
Location for Office Hours: Halligan 235-B
Email: rishit.sheth@tufts.edu
Xinmeng

Teaching Assistant:

Xinmeng Li
Office Hours: Mon 7-8p, Tues 1-2p, Wed 7-8p, and Fri 12-1p
Location for Office Hours: Halligan Extension
Email: xinmeng.li@tufts.edu

Piazza Site

All communication over Piazza is subject to the collaboration policy.

References

We will be mainly following [B], but supplementing with some extra material. Note, the other texts can be useful in providing a different perspective on some of the topics covered in the course.

Course Work and Marking:

Extensions to assignments and/or opportunities to make up quizzes/exam will only be granted in cases of documented medical or family emergencies.

Policies on Collaboration and Academic Integrity:

You may discuss the problems and general ideas about their solutions with other students, and similarly you may consult other textbooks or the web. However, you must work out the details on your own and code/write-out the solution on your own. Every such collaboration (either getting help or giving help) and every use of text or electronic sources must be clearly cited and acknowledged in the submitted homework. Failure to follow these guidelines may result in disciplinary action for all parties involved. For further questions, please see review the Tufts Academic Integrity Policy.

Accessibility Policy:

Tufts University values the diversity of our students, staff, and faculty, and recognizes the important contribution each student makes to our unique community. Tufts is committed to providing equal access and support to all qualified students through the provision of reasonable accommodations so that each student may fully participate in the Tufts experience. If you have a disability that requires reasonable accommodations, please contact the Student Accessibility Services office at Accessibility@tufts.edu or 617-627-4539 to make an appointment with an SAS representative to determine appropriate accommodations. Please be aware that accommodations cannot be enacted retroactively, making timeliness a critical aspect for their provision.

Tufts and the teaching staff of COMP 136 strive to create a learning environment that is welcoming to students of all backgrounds. If you feel unwelcome for any reason, please let us know so we can work to make things better. You can let us know by talking to anyone on the teaching staff. If you feel uncomfortable talking to members of the teaching staff, consider reaching out to your academic advisor, the department chair, or your dean.

Schedule:

Unless otherwise specified, readings assignments are from [B].

Date Topics Lecture Due
Thurs, Jan 17 Probabilistic models Introduction to course. A simple probabilistic model. Skim read Chapter 1.
Tues, Jan 22 Probabilistic models Probability distributions. Parameter estimation. Written assignment 1 out. Read Sections 1.2.4, 2.1, 2.2.
Thurs, Jan 24 Linear regression Least squares. Read Section 3.1.
Tues, Jan 29 Linear regression Linear algebra review part 1. Programming project 1 (data) out. Skim Appendix C. Written assignment 1 due.
Thurs, Jan 31 Linear regression Linear algebra review part 2. Read Section 2.3.
Tues, Feb 5 Linear regression Linear algebra review part 3. Multivariate normal (Gaussian templates). Bayesian linear regression. Read Section 3.3.
Thurs, Feb 7 Model selection In-class quiz. Written assignment 2 out. Read Sections 1.3, 3.4-3.5. Programming project 1 due.
Tues, Feb 12 Review
Thurs, Feb 14 Classification Discriminants. Generative models part 1. Read Chapter 4 up through Section 4.2.
Tues, Feb 19 Classification (and probabilistic models) In-class quiz ("re-do" of Feb 7 quiz). Generative models part 2 (and the exponential family). Programming project 2 (data) out. (The exponential family is covered in Section 2.4.) Written assignment 2 due.
Thurs, Feb 21 No class
Tues, Feb 26 Classification Discriminative models and logistic regression. Read Section 4.3.
Thurs, Feb 28 Classification In-class quiz. Bayesian logistic regression. Written assignment 3 out. Read Sections 4.4-4.5.
Tues, Mar 5 Kernels Gaussian processes. Read Sections 6.4.1-6.4.6. Programming project 2 due.
Thurs, Mar 7 Kernels Dual representation. Constructing kernels. Read Chapter 6 up through Section 6.2.
Tues, Mar 12 Kernels Support vector machines part 1. Programming project 3 (data) out. Read Chapter 7 up througth 7.1. Written assignment 3 due.
Thurs, Mar 14 Kernels Support vector machines part 2.
Tues, Mar 19 No class
Thurs, Mar 21 No class
Tues, Mar 26 Graphical models Bayesian networks. Conditional independence. Read Chapter 8 up througth 8.2.
Thurs, Mar 28 Graphical models Markov random fields. Basic inference. Written assignment 4 out. Read Section 8.3. Skim Section 8.4. Programming project 3 due.
Tues, Apr 2 Sampling In-class quiz. Basic sampling. Markov chain Monte Carlo. Read Chapter 11 up through 11.1.4 and Sections 11.2-11.3. Topic proposal due.
Thurs, Apr 4 Unsupervised learning Latent Dirichlet allocation. Read the original paper up to Section 5.2 which sets up the model, and skim the experiments section. The Bayesian approach is applied via variational inference which we cover later. An alternative sampling-based solution is described in Section 4 of Steyvers & Griffiths (2007) Probabilistic topic models.
Tues, Apr 9 Model selection Gaussian mixture models. Expectation maximization for GMMs. Programming project 4 (data) out. Read Chapter 9 up through 9.2. Written assignment 4 due.
Thurs, Apr 11 Model selection General EM. Read Sections 9.3-9.4.
Tues, Apr 16 Variational inference Basic concept. "Approximating a Gaussian". Read Chapter 10 up through 10.2.
Thurs, Apr 18 Variational inference Variational linear regression. Slides. Read Section 10.3. Programming project 4 due.
Tues, Apr 23 Unsupervised learning In-class quiz. Principal component analysis and probabilistic variants. Read Chapter 12 up through 12.2.3.
Thurs, Apr 25 Review Short report due.
7-9p 6-9pm, Mon, May 6 Final Exam The exam is closed-book (no notes, books, calculators, etc. allowed). The exam includes all the material covered in lecture, reading, homeworks, quizzes. The exam will not cover the following sub-sections of material assigned in reading: 2.3.5, 2.3.7, 2.3.8, 2.4.3, 4.3.6, 7.1.4, 7.1.5, 8.3, 8.4, 9.3.2, 9.3.3.