Comp 136: Statistical Pattern Recognition
Department of Computer Science
Tufts University
Fall 2017

Course Web Page: http://www.cs.tufts.edu/comp/136/ (redirects to this page)

Announcement(s):
  • (10/4) Roni Khardon will not hold office hours on Monday 10/9. Alternate OH for that week will be held on Tuesday 10/10 6-7pm.
  • (9/22) A typo in pp1 said the deadline is Wednesday 10/2. The correct deadline is Monday 10/2 (the date was correct the day was not). The on line version linked below was corrected.
  • (9/12) Typo on handout. The definition of the gamma function should have a x-1 exponent instead of x. On line version corrected. Please mark your paper handouts.
  • (9/7) TA Office Hours posted.
  • (9/5) Course information posted.
  • Previous announcements

What is this course about?

Machine learning has matured and expanded over the last two decades and is now used extensively as a tool in many application areas. Much of modern machine learning relies on probabilistic models or other mathematical formulations of the "data mining problem" and derives algorithms that are based on such formulations.
This course provides a comprehensive introduction to such models and algorithms; we will cover the main existing approaches and algorithms focusing on rigorous mathematical development of the ideas. Assignments are used to reinforce the ideas, with some programming assignments to help ground the ideas in a practical machine learning context. The course aims to provide students with the understanding and ability to define new models (suitable for their problem or application) and develop the corresponding algorithms. It is therefore an excellent gateway to starting research in this field.
The course requires background in various areas including calculus, algebra, probability, algorithms, optimization and programming. Many students entering the course have not covered all these areas or have but are are "rusty". I provide brief "refresher crash courses" during the lectures to bring students up to speed.

Syllabus:

Statistical foundations and algorithms for machine learning with a focus on Bayesian modeling. Topics include: classification and regression problems, regularization, model selection, kernel methods, support vector machines, Gaussian processes, Graphical models.

Prerequisites:

Prerequisite: Prerequisites: MATH 42 (Calc III); MATH 70 (Linear Algebra); EE 104 or MATH 162 (Probability/Statistics); COMP 40 or COMP 105 or a programming course using Matlab. COMP 135 (Machine Learning), or COMP 131 (AI) are recommended but not required. Or permission of instructor.

Class Times:

(G+ Block) MW 1:30-2:45.
Location: Barnum/Dana Hall 104.

Instructor:

Roni Khardon
Email: roni@cs.tufts.edu
Office Hours: Monday 6-7, Halligan 230.

GIFT Fellow:

Rishit Sheth, Rishit.Sheth@tufts.edu
This year we are lucky to have Rishit Sheth work as a GIFT fellow on improving the course. Rishit will deliver a couple of lectures, and will be available for help with assignments and projects via Piazza and email.

Teaching Assistant:

Mohammad Karimzadeh, Mohammad.Karimzadeh@tufts.edu
Office Hours: Tuesday 2-3, Wednesday 3-4, Thursday 2-3, Friday 12-1.
Location for Office Hours: Halligan 121

Textbooks

We will be mainly following [B]. The other texts can provide a different perspective on (or exposition of) similar material and can be helpful. All these are on reserve at the Tisch Library.

Course Work and Marking

The course will include
Written assignments (30%):
that exercise mathematical derivations and help complement the theory given in class.
Quizzes (20%):
we will have 5 quizzes, each 15 minutes in length, on the following dates: Wednesday 9/20, Wednesday 10/11, Wednesday 11/1, Monday 11/20, Monday 12/11.
Programming assignments (30%):
implmeneting algorithms and testing them in a practical machine learning context.
Final Project (20%):
a large empirical project of student's choice. A project can be "application focused", "methods focused", or both. More details and ideas for projects will be posted. Important dates:
Rules for late submissions:
All work must be turned in on the date specified. Unless there is a last minute emergency, please notify Roni Khardon of special circumstances at least two days in advance. If you haven't finished an assignment by the due date, Please turn in the work you have done (even partial) on time , and it will be evaluated for partial credit.

Policy on collaboration on homework assignments:

You may discuss the problems and general ideas about their solutions with other students, and similarly you may consult other textbooks or the web. However, you must work out the details on your own and code/write-out the solution on your own. Every such collaboration (either getting help of giving help) and every use of text or electronic sources must be clearly cited and acknowledged in the submitted homework. Failure to follow these guidelines may result in disciplinary action for all parties involved. Any questions? for this and other issues concerning academic integrity please consult the detailed guidelines and policy available from the office of the Dean of Student Affairs.

Tentative List of Topics and Reading Assignments

Review of topics from probability theory and algebra. Linear regression and classification. Kernel methods. Gaussian processes. Graphical models, and learning and inference algorithms for them.

Reading, References, and Assignments

Unless otherwise specified readings assignments are from [B].

Unit Topics/Comments Reading/Dates
Part I Bayesian Models and Inference: In this part we develop the basic concepts of Bayesian modelling and reasoning that are used throughout the course.  
Background Reading This is more of an overview than an introduction but it is well worth reading; skim through w/o expecting to get all the details. Chapter 1
Informal Assignment Please familiarize yourself a system that will facilitates programming with text processign and matrix algebra.
Matlab: There are many online tutorials for Matlab for example Matlab tutorial UBC and Matlab tutorial MIT.
SciPy: NumPy and SciPy provide such functionality using Python. See the SciPy Tutorial.
Write at least one 10-line program, using at least one function, and making at least one plot.
No submission required. This will prepare you for the next programming assignments.
Lectures 9/6, 9/11 Maximum likelihood and Bayesian estimates for beta, Bernoulli and univariate Gaussians Sections 2.1, 2.2, 1.2.4
see also Inferring the complete form of probability distributions
Assignment 1 hw1.pdf 9/20
Programming Project 1 pp1.pdf
Data for this programming project is in this zip file
10/2
Part II Bayesian Linear Regression: In this part we develop much of the mathematical machinery of the course while covering the topic of linear regression.  
Lecture 9/13, 9/18 Linear Regression Section 3.1
Lecture 9/18 Linear Algebra Review Any introductory linear algebra text; appendix C
Lectures 9/20, 9/25 Multivariate Normal Distributions Section 2.3
see also Handy formulas for MVN distributions
Lectures 9/25, 9/27 Bayesian Linear Regression with prior for (w,eta) Section 3.3
see also Slide Copies
Lecture 10/2,10/4 Model Selection Section 3.4-5
Assignment 2 hw2.pdf 10/11
Programming Project 2 pp2.pdf
Data for this programming project is in this zip file
10/23
10/9 University Holiday No classes    
Part III Bayesian Models for Classification and Prediction: In this part we extend the framework to handle other prediction tasks - mostly prediction of categories (discrete values).  
Lecture 10/11 Generative models: Classification with (mostly) Linear models Section 4.1-2
Lecture 10/16 Logistic Regression Section 4.3
Lecture 10/18 Exponential Family Distributions Section 2.4
Lecture 10/23 Generalized Linear models Section 4.3
Lecture 10/25 Bayesian Logistic Regression Section 4.4-5, Section 1.5
Assignment 3 hw3.pdf 11/1
Programming Project 3 pp3.pdf
Data for this programming project is in this zip file
11/13
Part IV Graphical Models: In this part we abstract the framework to capture random variables and (in)dependencies among them, together with generic inference algorithms (or algorithmic templates) that work across such models.  
Lectures 10/30, 11/1 Overview of Graphical Models and Sampling Methods Chapters 8 and 11
[RN] section 14.4
Lecture 11/6 Topic models (Gibbs Sampling) lecture slides

Steyvers & Griffiths (2007) Probabilistic topic models
Additional Reading The original LDA paper develops variational inference for the model (we cover variational inference in later lectures). Blei, Ng & Jordan (2003) Latent Dirichlet Allocation
Lectures 11/8, 11/13 The EM Algorithm Chapter 9
lecture slides
Lectures 11/15, 11/20 Variational Inference Chapter 10
lecture slides
Additional Reading
The PPC paper discusses both sampling and variational approximation on a model which is partly directed and partly undirected.
Lu & Leen (2007) Penalized Probabilistic Clustering
Information for Project project.pdf
Some ideas for projects
11/8, 11/13, 11/29, and 12/15
Part V Kernel Methods: In this part we use Kernels for learning non-linear models. This cuts accross regression, classification, unsupervised learning, as well as probabilistic and non-probabilistic techniques.  
Lecture 11/27 Kernel functions and Kernel Methods. Perceptron, nearest neighbors, least squares. Sections 6.1-3
Chapters 2,3 of [CST]
Lecture 11/29 Gaussian processes. Sections 6.4.1-4
(TBA) Slide Copies
Assignment 4 hw4.pdf 12/11