Syllabus


COMP 135: Introduction to Machine Learning (Intro ML)
Department of Computer Science, Tufts University
Class Meetings for Fall 2020:
Course Websites:
Instructor: Mike Hughes, Assistant Professor of Computer Science
  • Contact: Please use Piazza. For extreme personal issues only: mhughes(AT)cs.tufts.edu
Teaching Assistants (TAs):
  • Rui Chen • Sheng Xu • Victor Arsenescu • Xi Chen • Xiaohui Chen • Lily Zhang • Zhitong Zhang
  • For help, come to our [Office Hours] or post on Piazza

Jump to: [Overview] • [Class-Format] • [Wait-List] • [Prereqs] • [Deliverables] • [Late-Work] • [Collaboration-Policy]

Course Overview and Objectives

WHAT: How can a machine learn from data or experience to improve performance at a given task? How can a machine achieve performance that generalizes well to new situations under limited time and memory resources? These are the fundamental questions of machine learning, a growing field of knowledge that combines techniques from computer science, optimization, and statistics.

This class will provide a comprehensive overview of supervised machine learning:

  • Supervised Learning: Given a collection of inputs and corresponding outputs for a prediction task, how can we make accurate predictions of the outputs that correspond to future inputs?
    • Unit 1: Regression with linear and neighbor methods
    • Unit 2: Classification with linear and neighbor methods
    • Unit 3: Neural networks
    • Unit 4: Trees and ensembles
    • Unit 5: Kernel methods
    • Unit 6: Recommendation Systems

We will also provide some brief exposure to unsupervised learning and reinforcement learning.

  • Unsupervised Learning: What are the underlying patterns in a given dataset? Can we find lower-dimensional representations of each example that do not lose important information?
  • Reinforcement Learning: How can an agent learn from interacting with an environment and receiving feedback about its actions?

HOW: We will explore several aspects of each core idea: intuitive conceptual understanding, mathematical analysis, in-depth software implementation, and practical deployment using existing libraries. Concepts will be first introduced via assigned readings and short video lectures. Weekly in-class live sessions will help students summarize major ideas and put key concepts into practice. Regular homeworks will build both conceptual and practical skills. Finally, open-ended practical projects -- often organized like a contest -- will allow students to demonstrate mastery.

WHY: Our goal is to prepare you to effectively apply machine learning methods to problems that might arise in "the real world" -- in industry, medicine, education, and beyond.

After completing this course, students will be able to:

  • Identify relevant real-world problems as instances of canonical machine learning problems (e.g. clustering, regression, dimensionality reduction, etc.)
  • Design and implement an effective solution to a regression, binary classification, or multi-class classification problem, using available open-source libraries when appropriate and writing from-scatch code when necessary.
  • Compare and contrast appropriate evaluation metrics for supervised learning predictive tasks (such as confusion matrices, receiver operating curves, precision-recall curves).
  • Design and implement effective strategies for preprocessing data representations, partitioning data into training and heldout sets, and selecting hyperparameters.
  • Identify relevant ethical and social considerations when deploying a supervised learning or representation learning method into society, including fairness to different individuals or subgroups.
  • Describe basic dimensionality reduction and recommendation system algorithms.

Enrolling and Wait Lists

As of the start of semester, we expect to have 120 students enrolled in the course. We are currently at capacity, but some students may drop the course and leave openings for others (usually we see 10-20 openings in the first week of classes as schedules shift).

Our top priority is to provide each enrolled student with our full support, including the ability to get prompt answers to questions on Piazza and in office hours as well as the ability to get high-quality feedback on submitted homeworks, exams, and projects in a timely manner.

We understand some students are on the wait list (either formally on the wait list on SIS system, or just conceptually would like to be in the course). It is possible that students currently on the wait list may be added, but only if there is adequate staff support.

Prof. Mike Hughes will make the final decision about all wait list candidates by end of day on Monday 9/21 (just before the ADD deadline), which is when the first homework will be turned in and fully graded.

To be considered for enrollment, you should do these two things:

  • Complete and submit HW0 by end of day Wed 9/16.
    • This action shows you have the necessary skills and would take the course seriously
  • Message the instructor by end of day Wed 9/16 via email with subject containing "COMP 135 Wait List Request", explaining your current state within the degree program (e.g. sophomore undergraduate in CS, Ph.D. student in Cog. Sci.) and why taking the course this semester would be important to you.

Class Format for Fall 2020

Due to the ongoing pandemic, this course will be in a hybrid format for Fall 2020 semester. Due to the large class size and the need to keep our whole community safe, most interactions will be virtual, including all in-class sessions and most office hours. Only a one time 1-on-1 meeting will be in person, with accomodations possible (more info below).

We expect we can accommodate any student who needs to complete the course in a fully remote environment. If you have concerns about your computing resources being adequate (see Resources page for expectations), please contact the course staff via Piazza ASAP.

Attendance

Participation in class is strongly encouraged, as you will get hands-on practice with material and have a chance to ask questions of the instructor and TAs, as well as your peers.

We do not require attendance at any class or track attendance.

Instructional material (readings, notes, and videos) will always be "prerecorded" and released on the Schedule page in advance, under "Do Before Class".

We will record video and audio for the main track of each interactive class session to capture important announcements and highlight key takeaways. releasing that video within 24 hours to the Piazza resources page. However, the most valueable learning interactions may occur in breakout rooms that cannot be recorded.

We do count a small part of a student's grade as participation, which can be fulfilled either via being active in Piazza forum discussions or in live class discussions.

How to attend class

After the first day, we will expect students to be signed up on Piazza (accessible to any student either enrolled or on the waitlist).

We will post relevant links to virtual class meetings (and office hours) on the "Resources" page of Piazza.

What will we do in class

Each synchronous class session will occur at the scheduled time (Mon and Wed from 430-545pm ET).

Before each class, you are expected to complete the "Do Before Class" activities posted on the Schedule. These include textbook readings as well as watch prerecorded videos (posted to Canvas). You should also download any relevant in-class demo notebooks to prepare.

In class, we will typically have the following structure, all over Zoom:

  • First 5 min.: Course Announcements (instructor led)
  • Next 25 min.: Key concepts for the day (instructor led)
  • Next 35 min.: Breakout into small groups to work through lab and discuss
  • Last 10 min.: Recap of key concepts and lessons learned

We will strive to create an exciting, highly interactive virtual classroom, with lots of opportunities for students to ask questions and get feedback from the professor, TAs, and peers.

Each student is responsible for shaping this environment: please participate actively and respectfully!

In-Person Component

We will have a required one-time small group short meeting with a member of course staff, so we can get to know you and shape the course to your goals and needs. We have found that requiring this interaction is critical to improving student engagement and retention.

This meeting will happen by default in person (but only in a setting where it is safe to do so). We will gladly accommodate students who request a remote meeting, by holding the meeting over Zoom.

See Piazza post on Required Office Hours visit for details about scheduling your appointment and signing the official log to get this counted.

Prerequisites

Programming: Students should be comfortable with writing non-trivial programs (e.g., COMP 15 or equivalent). We will use Python, a popular language for ML applications that is also beginner friendly.

Please consult our Python Setup Instructions page to get setup a Python environment for COMP 135.

  • By the first homework (HW0), students will be expected to do the following without much help:
    • Load and transform datasets with numpy
    • Perform vector mathematical operations in numpy (computing inner products, multiplying matrices, inverting matrices, etc.)

Essential Mathematics background: Familiarity with multivariate calculus (esp. derivatives and vector derivatives) is essential.

Useful Mathematics background: Prior experience with linear algebra and probability theory will also be useful.

With instructor permission, diligent students who are lacking in a few of the useful (but not essential) areas will hopefully be able to catch-up on core concepts via self study and thus still be able to complete the course effectively. Please see the community-sourced Self-Study Resources Page for a list of potentially useful resources for self-study.

Textbooks

We will regularly use several textbooks available for free online (either in browser or via downloadable PDFs):

Coursework and Deliverables

There are several primary deliverables for students in the course:

  • 5 homework assignments
    • PDF writeups and auto-graded Python code will be turned in via Gradescope.
    • Code will be evaluated by an autograder on Gradescope
    • Report figures and short answers will be evaluated by TA graders
  • 5 quizzes, one after each of the major units
    • All quizzes will be turned in via Gradesc ope.
    • Multiple choice questions will be evaluated by autograder on Gradescope
    • Short answer questions will be evaluated by TA graders
    • Makeup quizzes will not be issued except in cases of serious and unpredictable documented events such as medical illness or family emergency.
  • 3 projects: open-ended programming challenges
    • Results and relevant code will be turned into Gradescope
    • Polished PDF reports will be turned in via Gradescope
  • An in-person meeting with course staff (with accommodations possible)
    • Sign-up information and details will be posted by the end of September to Piazza

Late work Policy

We want students to develop the skills of planning ahead and delivering work on time. To facilitate learning, we also want to be able to release solutions quickly and discuss recent assignments soon after deadlines. On the other hand, we know that fall 2020 offers particular challenges, and we wish to be flexible and accommodating within reason.

With these goals in mind, we have the following policy:

Homeworks and lateness

Each student will have 192 total late hours (= 8 late days) to use throughout the semester across all homeworks.

For each individual assignment, you can submit beyond the posted deadline at most 96 hours (4 days) and still receive full credit. Thus, for one assignment in the course due on Thu 9:00am ET, you could submit by the following Mon at 9:00am ET.

This late work deadline is key to our classroom goals. It allows us to always release homework solutions on Monday mornings a few days before the required quiz on that unit is due, and lets us discuss the assignment in class on Monday afternoon without issue.

The timestamp recorded on Gradescope will be official. Late time is rounded up to the nearest hour. For example, if the assignment is due at 3pm and you turn it in at 3:05pm, you have used one whole hour.

Beyond your allowance of 192 late hours, zero credit will be awarded except in cases of truly unforeseen exceptional circumstances (e.g. family emergency, medical emergency). Students with exceptional circumstances should contact the instructor to make other arrangements.

Quizzes and lateness

Quizzes CANNOT be turned in late. After the due date, you can receive zero credit. We will drop the lowest quiz grade (so only 4 of 5 quizzes will count to final grade).

This deadline is key to our classroom goals. Quizzes assess what you as an individual understand about the course material. Allowing lateness might encourage intentional or unintentional sharing of answers.

Students with unforeseen and exceptional circumstances may contact the instructor to make other arrangements (likely in the form of a makeup oral exam).

Projects and lateness

Projects are open-ended and involve working with peers on significant code implementation and written reports. Because many "solutions" are possible, we will strive to be flexible, while still incentivizing students to turn in high-quality work on time so we can grade in a timely manner.

Projects require significant work. Please start early (at least 2 weeks before deadline) and make a careful plan with your group.

Projects turned in by the posted due date will be eligible for up to 100% of the points.

Projects turned in up to one week after the posted due date will be eligible for up to 90% of the points.

After 1 week, students with unforeseen and exceptional circumstances may contact the instructor to make other arrangements. With instructor approval, as long as you turn in high-quality work by the end of the semester, you can still earn up to 60% of the points. We intend that students in this situation could still pass the course if needed.

Workload

Each week, you should expect to spend about 10-15 hours on this class.

Here's our recommended break-down of how you'll spend time each week:

  • 1.25 hr / wk preparation before Mon class (reading, lecture videos)
  • 1.25 hr / wk active participation in Mon class
  • 1.25 hr / wk preparation before Wed class (reading, lecture videos)
  • 1.25 hr / wk active participation in Wed class
  • 3.00 hr / wk on homework (due every two weeks, so each hw takes 6 hr total)
  • 4.00 hr / wk on project (due every four weeks, so each proj takes 16 hr total)
  • 1.50 hr / wk preparing for quiz (quizzes happen every 2 weeks, so each quiz is 3 hr total)
  • 0.50 hr / wk taking quiz

This totals to 14.00 hr / wk

Grading

Final grades will be computed based on a numerical score via the following weighted average:

  • 22% average of homework scores (HW0 weighted 2%, HW1-HW5 weighted 5% each after dropping the lowest score)
  • 40% average of quiz scores (Q1-Q5, weighted equally after dropping the lowest score)
  • 36% average of project scores (ProjA, ProjB, and ProjC, weighted equally)
  • 2% participation in the required meeting as well as in class and in Piazza discussions

When assigning grades, the following scale numerical scale will be used:

  • 0.93-1.00 : A
  • 0.90-0.93 : A-
  • 0.87-0.90 : B+
  • 0.83-0.87 : B
  • 0.80-0.83 : B-
  • 0.77-0.80 : C+
  • 0.73-0.77 : C
  • 0.70-0.73 : C-
  • 0.67-0.70 : D+
  • 0.63-0.67 : D
  • 0.60-0.63 : D-
  • 0.6 or below : F

This means you must earn at least an 0.83 (not 0.825 or 0.8295 or 0.8299) to earn a B instead of a B-.

Any rounding up will be at the instructor's discretion, as will the highest possible grade of "A+".

Collaboration Policy

Our ultimate goal is for each student to fully understand the course material.

For quizzes and exams, all work should be done individually, with no collaboration with others whatsoever.

For homeworks and projects and papers, we have the following policy for student work:

You must write anything that will be turned in -- all code and all written solutions -- on your own without help from others. You may not share any code or solutions with others, regardless of if they are enrolled in the class or not.

We do encourage high-level interaction with your classmates. After you have spent at least 10 minutes thinking about the problem on your own, you may verbally discuss assignments with others in the class. You may work out solutions together on whiteboards, laptops, or other media, but you are not allowed to take away any written or electronic information from joint work sessions with others. No notes, no diagrams, and no code. Emails, text messages, and other forms of virtual communication also constitute “notes” and should not be used preparing solutions.

When preparing your solutions, you may always consult textbooks, materials on the course website, or existing content on the web for general background knowledge. However, you cannot ask for answers through any question answering websites such as (but not limited to) Quora, StackOverflow, etc. If you see any material having the same problem and providing a solution, you cannot check or copy the solution provided. If general-purpose material was helpful to you, please cite it in your solution.

For work that is intended to be done individually (homework), we interpret "others" as as anyone else, whether in the class or not.

For work that is intended to be done on small teams (projects), we interpret "others" above as anyone not on your team.

Remember, you are responsible for everything that you (or your team) hands in. You should understand it and be able to answer questions about it, if asked.

Required Collaboration Statement

Along with all submitted work, you will fill out a short form declaring the names of any others you got help from, and in what way you worked them (discussed ideas, debugged math, team coding). Turning in this form will certify your compliance with this policy.

Along with all submitted small team work, you will fill out a short form describing how the team collaborated and divided the work. All team members must contribute significantly to the solution. We may occasionally check in with some teams to ascertain that everyone in the group was participating in accordance with this policy.

Piazza & Collaboration

When using the Piazza forum, you should be aware of the policies previously mentioned while post posting questions and providing answers. Questions may be posted as either private (viewable only by yourself and course staff) or public (additionally viewable by all students for the course registered on Piazza).

Some issues warrant public questions and responses, such as: misconceptions or clarifications about the instructions, conceptual questions, errors in documentation, etc.

Some issues are better with private posts, including: debugging questions that include extensive amounts of code, questions that reveal a portion of your solution, etc.

Please use your best judgment when selecting private vs. public. If in doubt, make it private.

External Software

Each assignment will provide specific instructions about which open-source machine learning packages (such as scikit-learn, autograd, tensorflow, pytorch, etc.) you are allowed to use.

If you are allowed to use a package, there are two caveats:

Do not use a tool blindly: You are expected to show a deep understanding of any method you apply, as demonstrated by your writeup.

Beware of autograder requirements: If the problem requires you to submit code to an autograder, we will need to run the code using only the prescribed default software environment. Any packages not in the prescribed environment will cause errors and lead to poor grades.

Academic Integrity Policy

This course will strictly follow the Academic Integrity Policy of Tufts University. Students are expected to finish course work independently when instructed, and to acknowledge all collaborators appropriately when group work is allowed. Submitted work should truthfully represent the time and effort applied.

Please refer to the Academic Integrity Policy at the following URL: https://students.tufts.edu/student-affairs/student-life-policies/academic-integrity-policy

Accessibility

Tufts and the instructor of COMP 135 strive to create a learning environment that is welcoming students of all backgrounds and abilities. Respect is demanded at all times throughout the course. Participation is not only required, it is expected that everyone in the course is treated with dignity and respect. We realize everyone comes from a different background with different experiences and abilities. Our knowledge will always be used to better everyone in the class.

If you have a disability that requires reasonable accommodations, please contact the Student Accessibility Services office at Accessibility@tufts.edu or 617-627-4539 to make an appointment with an SAS representative to determine appropriate accommodations. Please be aware that accommodations cannot be enacted retroactively, making timeliness a critical aspect for their provision.

Please see the detailed accessibility policy at the following URL: https://students.tufts.edu/student-accessibility-services

If you feel uncomfortable or unwelcome for any reason, please talk to your instructor so we can work to make things better. If you feel uncomfortable talking to members of the teaching staff, consider reaching out to your academic advisor, the department chair, or your dean.