Syllabus


CS 136: Statistical Pattern Recognition (SPR)
Department of Computer Science, Tufts University
Class Meetings for Spring 2022: Tues and Thurs 4:30-5:45pm ET
-- Location: In person at Joyce Cummings Center 260
-- If classes are being held remotely, find the Zoom link at piazza post Expect classes to be in person except for 01/20 unless you hear otherwise!
Instructor: Ike Lage, Part Time Lecturer in Computer Science.
  • Office hours (held over zoom): Wednesday 1:30-2:30, and Thursday 2:30-3:30 (Zoom link at piazza post)
  • Contact: isaac.lage(AT)tufts.edu for personal issues only (for almost all questions, use Piazza forums)
Grad TA: Kapil Devkota
Piazza:

Course Overview

This course provides the theoretical and computational foundations for probabilistic machine learning. The focus is on probabilistic models, which are especially useful for any application where observed data could be noisy, sometimes missing, or not available in large quantities. We emphasize representing uncertainty with formal distributions and trying to average over these distributions when making decisions (as done in the Bayesian approach).

We will study the following probabilistic models:

  • unigram models for discrete word count data (e.g. next word prediction)
  • regression and classification
  • mixture models
  • hidden Markov models and other models for sequential data
  • general directed graphical models.

Algorithms studied include: gradient descent (first-order and second-order), expectation maximization, variational inference, and Markov chain Monte Carlo methods.

Objectives

After completing this course, students will be able to:

  • Demonstrate formal mathematical understanding of probabilistic models.
  • Given an applied data analysis task, select a relevant probabilistic model, fit the model on a relevant dataset using an appropriately chosen approximate inference method, and analyze the results.
  • Analyze numerical accuracy and stability of common probabilistic ML algorithms (e.g. avoiding overflow/underflow)
  • Analyze scalability considerations of common probabilistic ML methods (including runtime and memory complexity requirements).

Prerequisites

This course intends to provide students a solid foundation in statistical machine learning methods.

To achieve this objective, we expect students to be familiar with the following before taking the course:

  • Probability theory
    • e.g. you could explain the difference between a probability density function (PDF) and a cumulative distribution function (CDF)
  • Basic linear algebra (comfort with matrix/vector notation, have at least seen inverses/determinants before)
    • e.g. you could write the closed-form solution of least squares linear regression using basic matrix operations (multiply, inverse)
  • First-order gradient-based optimization
    • e.g. you could code up a simple gradient descent procedure in Python to find the minimum of functions like f(x) = x^2
  • Basic supervised machine learning methods
    • e.g. you can describe the mathematical learning objectives for linear regression and logistic regression
  • Coding in Python with modern open-source data science libraries
    • Basic array operations in numpy (computing inner products, inverting matrices, etc.)
    • Making basic plots or grids of plots in matplotlib
    • Training basic classifiers (like LogisticRegression) in scikit-learn

Practically, this means having successfully completed at least one of these courses (or their equivalent outside of Tufts):

With instructor permission, diligent students who are lacking in a few of these areas of coursework could be able to catch-up on core concepts via self study and thus still be able to complete the course effectively. Please see the community-sourced Resources Page for a list of potentially useful resources for self-study.

Enrolling and Wait Lists

As of the start of semester, we expect to have about 30 students enrolled in the course. We are currently at capacity, but some students may drop the course and leave openings for others (we may see 5-10 openings in the first week of classes as schedules shift).

Our top priority is to provide each enrolled student with our full support, including the ability to get prompt answers to questions on Piazza and in office hours as well as the ability to get high-quality feedback on submitted homeworks, exams, and projects in a timely manner.

We understand some students are on the wait list (either formally on the wait list on SIS system, or just conceptually would like to be in the course). It is possible that students currently on the wait list may be added as space opens up in the course.

Prof. Ike Lage will make the final decision about all wait list candidates by Fri 01/28, well before the ADD deadline.

To be considered for enrollment, you must do these two things:

  • Complete and submit HW0 by end of day Sat 01/29.
    • This action shows you have the necessary skills and would take the course seriously
  • Message the instructor by end of day Fri 01/28 via email with subject "CS 136 Wait List Request"
    • Explain your current state within your degree program (e.g. sophomore undergraduate in CS, Ph.D. student in Math)
    • Explain why taking the course this semester would be important to you.

Class Format for Spring 2022

As of this moment, we expect this course to be held fully in person with the exception of office hours which may be a mix of in person and over Zoom. However should we be forced by the ongoing pandemic to transition to a partial or fully virtual format for any amount of time, we will move the relevant in person meetings (course meetings, office hours, etc.) to Zoom.

After the first week, we will expect students to be signed up on Piazza (accessible to any student either enrolled or on the waitlist). This is how we will communicate any changes to the course meeting locations with you. If you have not heard otherwise, expect course meetings and office hours to be in person in their usual locations.

What will we do in class

Each class session will occur at the scheduled time (Tues and Thurs from 430-545pm ET) in [FILL ROOM].

Before each class: You are expected to watch the "Do Before Class" videos posted on that day's Schedule. These consist of 1-3 video lectures (broken into 10-20 minute segments about a coherent topic) that should total at most 30 minutes of viewing time. (These were recorded for the last iteration of this course taught by Mike Hughes). These explain important background concepts that will be crucial for your understanding of the lecture. Your understanding of them will be informally evaluated with an in-class activity.

You should also download any relevant in-class exercises or notebooks to prepare (materials will be posted on the Schedule under the "Download before class" sub-heading when there are any).

In class: We will typically have the following structure:

  • First 5 min.: Course Announcements (instructor led)
  • Next 40 min.: Content lecture (instructor led)
  • Next 20 min.: Breakout into small groups to work through exercises and solidify concepts
  • Last 10 min.: Summarize key concepts from the lecture and re-cap breakout exercises

Unit quizzes Once per unit, we will have a short, in-class quiz instead of the breakout groups. These will consist of short answer questions that test your understanding of the core unit concepts. We will collect these and provide you with feedback on your performance, however they will contribute to your final grade only based on completion as a form of participation. They serve primarily as a signal to you about how well you have understood the concepts in the unit.

Attendance

Participation in class is strongly encouraged, as you will get hands-on practice with material and have a chance to ask questions of the instructor and TA, as well as your peers. Please do your best to actively participate both during lecture and breakout sessions! We also want to foster a supportive learning enviornment for all students, so please do your best to be respectful of your peers' contributions as well.

We do not require attendance at any class or track attendance. We understand that circumstances (COVID-related or otherwise) may make it difficult to attend every class. We hope you make the effort, but we designed the course so that you can succeed even if you miss a meeting or two. Course meetings will not be recorded, but each day will include links to additional presentations of the material covered in class on the Schedule page under "Reference materials". These will include readings from the textbook (and occasionally other high-quality sources), and short video lectures on the key contents from the lecture. (These were recorded for the last iteration of this course taught by Mike Hughes). Additionally, lecture notes will be uploaded within 24 hours of the class.

We do count a small part of a student's grade as participation, which is measured in two ways: * We will have a short, in class quiz for every unit. Rather than grading these based on correctness, we will give participation credit for showing up and completing the quiz. (We will provide you with feedback on your performance but it won't count towards your final grade.) These quiz dates will be listed on the syllabus by the start of every unit. If you are unable to come to class on a day we have a quiz, you can come to my (Ike's) office hours to discuss the material from the unit instead, or if you are unable to do that, email me and we will figure something else out.

  • An additional, small portion of your participation grade will be based on the following two measures:
  • being regularly active in Piazza forum discussions (posting 8 or more times throughout the semester)
  • being regularly active in live class discussions (being a memorable participant in 8 or more meeting breakouts throughout the semester)

What will we do outside of class?

Reference Materials: In the Schedule page, each class will have a list of "Support materials" that you are expected to consult to solidify your understanding of the key concepts. These include readings from the textbook, and video lectures from last iteration of this course taught by Mike Hughes, that cover a similar set of topics in a slightly different way. We recommend attending the class lectures (or watching the video lectures if you are unable to attend class) before consulting the textbook reading as the texbook covers some topics in more detail than you will be expected to understand them. Having the framework of the lectures should help you navigate the texbook to gain a deeper understanding of the course's core topics.

Assignments: The course is organized into 5 topical units, each of which will have the following assigned work outside of class:

  • 1 written homework (HW), to build math skills (derivation and analysis, resulting in a LaTeX typeset report)
  • 1 coding practical (CP), to build implementation skills (auto-graded Python exercises + a short report of figures and analysis)

For a complete list of graded assigned work, see the Assignments page.

Note that there is a special "first homework" (HW0) designed to make sure you have necessary prerequisite knowledge and that you become familiar with LaTeX for math report preparation.

Late work Policy

We want students to develop the skills of planning ahead and delivering work on time. To facilitate learning, we also want to be able to release solutions quickly and discuss recent assignments soon after deadlines. On the other hand, we know that this semester offers particular challenges, and we wish to be flexible and accommodating within reason.

With these goals in mind, we have the following policy:

Each student will have 240 total late hours (= 10 late days) to use throughout the semester across all homeworks, coding practicals and project checkpoints. Note that you cannot use late days for the final project report submission.

For each individual assignment, you can submit beyond the posted deadline at most 96 hours (4 days) and still receive full credit. Thus, for one assignment due at Thurs 11:59pm ET, you could submit by the following Monday at 11:59pm ET.

This late work deadline is key to our classroom goals. It allows us to always release homework solutions on Tuesday mornings a few days before the in-class quiz on that unit, and lets us discuss the assignment in class on Tuesday afternoon without issue.

The timestamp recorded on Gradescope will be official. Late time is rounded up to the nearest hour. For example, if the assignment is due at 3pm and you turn it in at 3:05pm, you have used one whole hour.

Beyond your allowance of 10 late days, zero credit will be awarded except in cases of truly unforeseen exceptional circumstances (e.g. family emergency, medical emergency). Students with exceptional circumstances should contact the instructor to make other arrangements as soon as possible.

Midterm Exam

There will be one formal exam: a midterm that will be taken in class. See the Schedule for the specific dates and times.

This exam will test the key concepts covered up to that point in the course, with mostly a focus on mathematical analysis skills but perhaps a computational question or two.

The short unit quizzes should be considered good preparation for the kinds of questions that might appear on an exam.

Final Project

There will be a final project that will be composed of a series of project checkpoints throughout the second half of the semester, and a final report due at the end of the semester. The goal of this project is to familiarize yourself with the process of selecting, applying, evaluating and improving the models we study in this class in the context of a real dataset. You can find out more information about the final project at the project page Project

Workload

Each week, you should expect to spend about 8-12 hours on this class.

Here's our recommended break-down of how you'll spend time each week:

  • 0.5 hr / wk preparation before Tues class (lecture videos)
  • 1.25 hr / wk active participation in Tues class
  • 0.5 hr / wk preparation before Thurs class (lecture videos)
  • 1.25 hr / wk active participation in Thurs class
  • 5 hr / wk on HW or CP assignment (one assignment due each week, takes ~5 hr total)
  • 1.5 hr / wk consulting reference materials or attending office hours to solidify your understanding of the week's content

This totals to ~10.00 hr / wk.

Grading

Final grades will be computed based on a numerical score via the following weighted average:

  • 25% homeworks (2% HW0, 23% evenly averaged across HW1-HW5)
  • 25% coding practicals (one per unit, evenly averaged across all 5 units)
  • 20% midterm exam score
  • 25% final project (consists of 3-4 checkpoints and a final report, see project page for more details)
  • 5% participation (see the Attendance section of this page for a complete description)

When assigning grades given a final numerical score (from 0.0 to 1.0), the following scale will be used:

  • 0.94-1.00 : A
  • 0.90-0.94 : A-
  • 0.87-0.90 : B+
  • 0.83-0.87 : B
  • 0.80-0.83 : B-
  • 0.77-0.80 : C+
  • 0.73-0.77 : C
  • 0.70-0.73 : C-
  • 0.67-0.70 : D
  • 0.63-0.67 : D
  • 0.60-0.63 : D-
  • Below 0.60: F

The highest possible grade of "A+" will be awarded at the instructor's discretion.

We do not round up grades. This means you must earn at least an 0.83 (not 0.825 or 0.8295 or 0.8299) to earn a B.

Textbook

As a primary textbook, we will use "Pattern Recognition and Machine Learning" by Christopher M. Bishop.

A free PDF is available online from the author:

<https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf>

Other suggested resources can be found on the Resources Page.

Computing Environment

See the Resources page for expectations about your computing environment. We expect you should have the tools to complete this course, as long as you have reliable internet access and access to a modern desktop or laptop computer (Not just a chromebook) with at least 1GB RAM and 2.0GHz processor and ability to install our course Python environment.

If you have concerns about your computing resources being adequate , please contact the course staff via Piazza ASAP.

Collaboration Policy

Our ultimate goal is for each student to fully understand the course material. With this goal in mind, we have the following policy:

For quizzes and exams, all work should be done individually, with no collaboration with others whatsoever.

For homeworks and coding practicals, we have the following policy for student work:

You must write anything that will be turned in -- all code and all written solutions -- on your own without help from others. You may not share any code or solutions with others, regardless of if they are enrolled in the class or not.

We do encourage high-level interaction with your classmates. After you have spent at least 10 minutes thinking about the problem on your own, you may verbally discuss assignments with others in the class. You may work out solutions together on whiteboards, laptops, or other media, but you are not allowed to take away any written or electronic information from joint work sessions with others. No notes, no diagrams, and no code. Emails, text messages, and other forms of virtual communication also constitute “notes” and should not be used preparing solutions.

When preparing your solutions, you may always consult textbooks, materials on the course website, or existing content on the web for general background knowledge. However, you cannot ask for answers through any question answering websites such as (but not limited to) Quora, StackOverflow, etc. If you see any material having the same problem and providing a solution, you cannot check or copy the solution provided. If general-purpose material was helpful to you, please cite it in your solution.

Collaboration Statement

Along with all submitted work, you must include the names of any people you worked with, and in what way you worked them (discussed ideas, debugged math, team coding). We may occasionally check in with groups to ascertain that everyone in the group was participating in accordance with this policy.

Academic Integrity Policy

This course will strictly follow the Academic Integrity Policy of Tufts University. Students are expected to finish course work independently when instructed, and to acknowledge all collaborators appropriately when group work is allowed. Submitted work should truthfully represent the time and effort applied.

Please refer to the Academic Integrity Policy at the following URL: https://students.tufts.edu/student-affairs/student-life-policies/academic-integrity-policy

Accessibility

Tufts and the instruction team of CS 136 strive to create a learning environment that is welcoming to students of all backgrounds.

If you feel unwelcome for any reason, please talk to your instructor so we can work to make things better. If you feel uncomfortable talking to members of the teaching staff, consider reaching out to your academic advisor, the department chair, or your dean.

Please see the detailed accessibility policy at the following URL: <https://students.tufts.edu/student-accessibility-services>