Syllabus | Introduction to Machine Learning

COMP 135: Introduction to Machine Learning
Department of Computer Science, Tufts University
Course Websites:

Schedule and Materials: https://www.cs.tufts.edu/comp/135/2019s/

Piazza Discussion Forum: https://piazza.com/tufts/spring2019/comp135/home

Starter code: https://github.com/tufts-ml-courses/comp135-19s-assignments

Class Meetings for Spring 2019:

Lecture: Mon and Wed 3:00-4:15pm in Halligan 111A

Recitation Sessions (led by TAs): Mon 7:30 - 8:30 pm in Halligan 111B

Instructor: Mike Hughes, Assistant Professor of Computer Science

Contact: Please use Piazza. For extreme personal issues only: mhughes(AT)cs.tufts.edu

Teaching Assistants (TAs):

Mike Pietras • Rui Chen • Manh (Duc) Nguyen • Minh Nguyen • Yirong (Wayne) Tang

For help, come to our [Office Hours]

Jump to: [Overview] • [Prereqs] • [Deliverables] • [Collaboration-Policy]

Course Overview and Objectives

WHAT: How can a machine learn from data or experience to improve performance at a given task? How can a machine achieve performance that generalizes well to new situations under limited time and memory resources? These are the fundamental questions of machine learning, a growing field of knowledge that combines techniques from computer science, optimization, and statistics.

This class will provide a comprehensive overview of two major areas of machine learning:

Supervised Learning: Given a set of inputs and outputs, how can we make predictions about future outputs?
Unsupervised Learning: What are the major underlying patterns in a given dataset? Can we find clusters that summarize the data well? Can we find lower-dimensional representations of each example that do not lose important information?

We will also provide some brief exposure to reinforcement learning.

HOW: We will explore several aspects of each core idea: intuitive conceptual understanding, rigorous mathematical derivation, in-depth software implementation, and practical deployment using existing libraries. Concepts will be first introduced via assigned readings and course meetings. Weekly recitation sessions will help students put key concepts into practice. Regular homeworks will build both conceptual and practical skills. Finally, open-ended practical projects -- often organized like a contest -- will allow students to demonstrate mastery.

WHY: Our goal is to prepare you to effectively apply machine learning methods to problems that might arise in "the real world" -- in industry, medicine, education, and beyond.

After completing this course, students will be able to:

Identify relevant real-world problems as instances of canonical machine learning problems (e.g. clustering, regression, etc.)
Design and implement an effective solution to a regression, binary classification, or multi-class classification problem.
Design and implement basic clustering, dimensionality reduction, and recommendation system algorithms.
Compare and contrast evaluation methods for various predictive tasks (including receiver operating curves, precision-recall curves, and calibration plots).
Develop and implement effective strategies for preprocessing data representations, partitioning data into training and heldout sets, and tuning hyperparameters.

Prerequisites

Programming: Students should be comfortable with writing non-trivial programs (e.g., COMP 15 or equivalent). We will use Python, a popular language for ML applications that is also beginner friendly.

Please consult our Python Setup Instructions page to get setup a Python environment for COMP 135.

By the first homework, students will be expected to do the following without much help:
- Perform vector operations in numpy (computing inner products, multiplying matrices, inverting matrices, etc.)
- Create line plots in matplotlib

Mathematics: Basic familiarity with multivariate calculus (integrals, derivatives, vector derivatives) is essential. Prior experience with linear algebra and probability theory will also be useful.

With instructor permission, diligent students who are lacking in a few of these areas will hopefully be able to catch-up on core concepts via self study and thus still be able to complete the course effectively. Please see the community-sourced Prereq. Self-Study Resources Page for a list of potentially useful resources for self-study.

Materials

We will regularly use several textbooks available for free online (either in browser or via downloadable PDFs):

Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Springer, 2013. Corrected 8th printing, 2017. [PDF]
Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2nd Edition, Springer, 2009. Corrected 12th printing, 2017. [PDF]
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. MIT Press, 2016.
Evaluating Machine Learning Models by Alice Zheng. O'Reilly, 2015. [PDF]

Coursework and Deliverables

There are three primary tasks for students throughout the course:

Two exams: a midterm and a final
- Midterm will be during a normally scheduled class period
- Final will be at the appointed final exam hour and location for this class
- Dates will be posted on the schedule: schedule.html
- Makeup exams will not be issued except in cases of serious and unpredictable documented events such as medical illness or family emergency.
8 homework assignments (written and code exercises)
- Assignments & Instructions: assignments.html#homeworks
- Due dates are posted on the schedule: schedule.html
- PDF writeups and Python code will be turned in via Gradescope.
3 projects: open-ended programming challenges
- Instructions: assignments.html#projects.
- Due dates are posted on the schedule: schedule.html
- Code will be turned into Gradescope and/or Kaggle.
- PDF writeups will be turned in via Gradescope.

Late work policy for homeworks and projects: We want students to develop the skills of planning ahead and delivering work on time. We also want to be able to release solutions quickly and discuss recent work as soon as the next class meeting. With these goals in mind, we have the following policy:

Each student will have 120 total late hours (5 late days) to use throughout the semester across the 8 homeworks and 3 projects.

For each individual assignment (homework or project), you can submit beyond the posted deadline at most 48 hours (2 days) and still receive full credit.

The time recorded on Gradescope will be official. Late time is rounded up to the nearest hour. For example, if the assignment is due at 3pm and you turn it in at 3:30pm, you have used one whole hour.

Beyond your allowance of late hours, zero credit will be awarded.

Grading

Final grades will be computed based on a numerical score via the following weighted average:

30% homeworks
30% projects
18% midterm exam
20% final exam
2% class participation

When assigning grades, the following scale numerical scale will be used:

0.93-1.00 : A
0.90-0.93 : A-
0.87-0.90 : B+
0.83-0.87 : B
0.80-0.83 : B-
0.77-0.80 : C+
0.73-0.77 : C
0.70-0.73 : C-

External Software

Each assignment will provide specific instructions about which open-source machine learning packages (such as scikit-learn, tensorflow, pytorch, shogun, etc.) you are allowed to use.

If you are allowed to use a package, there are two caveats:

Do not use a tool blindly: You are expected to show a deep understanding of any method you apply, as demonstrated by your writeup.

Beware of autograder requirements: If the problem requires you to submit code to an autograder, we will need to run the code using only the prescribed default software environment. Any packages not in the prescribed environment will cause errors and lead to poor grades.

Collaboration Policy

Our ultimate goal is for each student to fully understand the course material. With this goal in mind, we have the following policy:

Homework and Practicals

You must write anything that will be turned in -- all code and all written solutions -- on your own without help from others. You are responsible for everything that you hand in. You may not share any written code or solutions with other students.

However, we do encourage high-level interaction with your classmates. After you have spent at least 10 minutes thinking about the problem on your own, you may verbally discuss homework assignments with other students in the class. You may work out solutions together on whiteboards, laptops, or other media, but you are not allowed to take away any written or electronic information from joint work sessions. No notes, no diagrams, and no code. Emails, text messages, and other forms of virtual communication also constitute “notes” and should not be used when discussing problems.

When preparing your solutions, you may consult textbooks or existing content on the web for general background knowledge. However, you cannot ask for answers through any question answering websites such as (but not limited to) Quora, StackOverflow, etc. If you see any material having the same problem and providing a solution, you cannot check or copy the solution provided. If general-purpose material was helpful to you, please cite it in your solution.

Piazza & Collaboration

When using the Piazza forum, you should be aware of the policies previously mentioned while post posting questions and providing answers. Questions may be posted as either private (viewable only by yourself and course staff) or public (additionally viewable by all students for the course registered on Piazza).

Some issues warrant public questions and responses, such as: misconceptions or clarifications about the instructions, conceptual questions, errors in documentation, etc.

Some issues are better with private posts, including: debugging questions that include extensive amounts of code, questions that reveal a portion of your solution, etc.

Please use your best judgment when selecting private vs. public. If in doubt, make it private.

Academic Integrity Policy

This course will strictly follow the Academic Integrity Policy of Tufts University. Students are expected to finish course work independently when instructed, and to acknowledge all collaborators appropriately when group work is allowed. Submitted work should truthfully represent the time and effort applied.

Please refer to the Academic Integrity Policy at the following URL: https://students.tufts.edu/student-affairs/student-life-policies/academic-integrity-policy

Accessibility

Tufts and the instructor of COMP 135 strive to create a learning environment that is welcoming students of all backgrounds. Please see the detailed accessibility policy at the following URL: https://students.tufts.edu/student-accessibility-services