Syllabus


CS 135: Introduction to Machine Learning (Intro ML)
Dept. of Computer Science, Tufts University
Class Meetings for Fall 2023: Tue and Thu 10:30 - 11:45am ET in Braker 001
Instructor: Mike Hughes, Assistant Professor of Computer Science
Grad TAs: Preetish Rath · Si Liu · Sipei Li
Have questions? Need help?
  • Get rapid online help via Piazza discussion forum (enrolled students only)
  • Get in-person help at regular Office Hours
  • For personal issues, email the instructor: mhughes(AT)cs.tufts.edu

Quick Links: [Prereqs] [Wait List] [In Class] [Homeworks] [Collaboration Policy] [Late Work Policy] [Grading Rubric]

Course Overview

WHAT: How can a machine learn from data or experience to improve performance at a given task? How can a machine achieve performance that generalizes well to new situations? These are the fundamental questions of machine learning, a growing field of knowledge that combines techniques from computer science, optimization, linear algebra, and statistics.

This class will provide a comprehensive overview of supervised machine learning:

Supervised Learning: Given a collection of inputs and corresponding outputs for a prediction task, how can we make accurate predictions of the outputs that correspond to future inputs?

  • Unit 1: Regression with linear and neighbor methods
  • Unit 2: Classification with linear and neighbor methods
  • Unit 3: Neural networks
  • Unit 4: Trees and ensembles
  • Unit 5: Kernel methods
  • Unit 6: Recommendation Systems and Dimensionality Reduction

This course provides only a very brief taste of other parts of ML, such as unsupervised learning and reinforcement learning. [Other courses at Tufts](resources.html) cover these in far more depth.

HOW: We will explore several aspects of each core idea: intuitive conceptual understanding, mathematical analysis, in-depth software implementation, and practical deployment using existing libraries.

Week-after-week, students will do the following

  • Complete assigned readings to gain a first introduction to key concepts
  • Attend in-class live sessions that summarize ideas and make connections
  • Attend office hours to get questions answered
  • Complete homeworks will build both conceptual and practical skills.
  • Complete open-ended practical projects -- often organized like a contest -- to demonstrate mastery.

WHY: Our goal is to prepare you to effectively apply machine learning methods to problems that might arise in "the real world" -- in industry, medicine, education, and beyond.

Objectives

After completing this course, students will be able to:

  • Identify relevant real-world problems as instances of canonical machine learning problems (e.g. classification, regression, dimensionality reduction, etc.)
  • Design and implement an effective solution to a regression, binary classification, or multi-class classification problem, using available open-source libraries when appropriate and writing from-scatch code when necessary.
  • Compare and contrast appropriate evaluation metrics for supervised learning predictive tasks (such as confusion matrices, receiver operating curves, precision-recall curves).
  • Design and implement effective strategies for preprocessing data representations, partitioning data into training and heldout sets, and selecting hyperparameters.
  • Identify relevant ethical and social considerations when deploying a supervised learning or representation learning method into society, including fairness to different individuals or subgroups.
  • Describe basic dimensionality reduction and recommendation system algorithms.

Enrolling and Wait Lists

As of 2023-08-29 (a week before class starts), we have 125 students enrolled in the course. This represents the capacity of the assigned lecture hall as well as the max capacity of our assigned TA budget, so we cannot add any more students.

Thus, currently, the enrollment list is frozen. No additional students will be automatically enrolled.

That said, some students may drop the course and leave openings for others (usually we see 5-15 openings in the first week of classes as schedules shift).

To be considered for enrollment if a slot opens up, you must do these two things:

  • Email the instructor by end of day Thu 9/7 via email with exact subject "CS 135 Enrollment Request: Code T-Rex"
    • Explain your current state within your degree program (e.g. sophomore undergraduate in CS, Ph.D. student in Math)
    • Explain why taking the course this semester would be important to you.
    • Confirm that your hw0 will be been completed by Fri 9/8
  • Complete and submit HW0 by end of day Fri 9/8 (earlier than the deadline for already enrolled students)
    • This action shows you have the necessary skills and would take the course seriously

Prof. Mike Hughes will make the final decision about all wait list candidates by noon on Mon 9/11.

Due to limited capacity, it is somewhat likely that zero slots will be provided to any wait-list candidates.

Prerequisites

Programming: Students should be comfortable with writing non-trivial programs (e.g., COMP 15 or equivalent). We will use Python, a popular language for ML applications that is also beginner friendly.

Please consult our Python Setup Instructions page to get setup a Python environment for CS 135.

By the first homework (HW0), students will be expected to do the following without much help:

  • Load and transform datasets with numpy
  • Perform vector mathematical operations in numpy (computing inner products, multiplying matrices, inverting matrices, etc.)
  • Create line plots in matplotlib

Essential Mathematics background: Familiarity with multivariate calculus, especially derivatives and vector derivatives, is valuable.

Useful Mathematics background: Prior exposure with linear algebra and probability theory will also be useful.

With instructor permission, diligent students who are lacking in a few of the useful (but not essential) areas will hopefully be able to catch-up on core concepts via self study and thus still be able to complete the course effectively. Please see the community-sourced Self-Study Resources Page for a list of potentially useful resources for self-study.

Textbooks

We will regularly use several textbooks available for free online (either in browser or via downloadable PDFs):

Class Format for Fall 2023

The course is organized into 6 topical units (each about 2 weeks long), which will govern in-class and out-of-class work.

Synchronous course meetings will be in-person throughout the fall.

As of end of first day of class we will expect students to be signed up on Piazza. This is how we will communicate any changes to the course meeting locations with you. If you have not heard otherwise, expect course meetings and office hours to be in person in their usual locations.

Attendance

Participation in class is strongly encouraged, as you will get hands-on practice with material and have a chance to ask questions of the instructor and TAs, as well as your peers.

We do not require attendance at any class or track attendance.

Instructional material (readings, notes, and videos) will be released on the Schedule page in advance, under "Do Before Class".

We will record video and audio for the main track of each interactive class session to capture important announcements and highlight key takeaways. releasing that video within 24 hours to the Piazza resources page.

What will we do in class

Each synchronous class session will occur at the scheduled time.

Before each class, you are expected to complete the "Do Before Class" activities posted on the Schedule. These include textbook readings as well as (sometimes) prerecorded videos. You should also download any relevant in-class demo notebooks to prepare.

In each 75 min. class, we will typically have the following structure

  • First 5 min.: Course Announcements (instructor led)
  • Next 45 min.: Lecture on key concepts for the day (instructor led)
  • Next 20 min.: Continue lecture or breakout into small group exercises
  • Last 5 min.: Wrap-up and takeaway messages

We will strive to create an exciting, highly interactive classroom, with lots of opportunities for students to ask questions and get feedback from the professor, TAs, and peers.

Each student is responsible for shaping this environment: please participate actively and respectfully!

What will we do outside of class?

Here are the primary deliverables in the course:

6 homework assignments

  • Instructions: assignments.html#homeworks
  • Due dates are posted on the schedule: schedule.html
  • Code will be evaluated by an autograder on Gradescope
  • PDF Reports (figures and short answers) will be evaluated by TA graders

2 projects: open-ended programming challenges

  • Instructions: assignments.html#projects.
  • Due dates are posted on the schedule: schedule.html
  • Submissions will be posted to leaderboard on Gradescope
  • PDF Reports (detailed descriptions of process and conclusions) will be evaluated by TA graders

2 exams: midterm and final

  • Dates are posted on the schedule
  • Each exam will take 60 minutes.
  • Think of these as 'practice interviews' for an ML position in industry

Late work Policy

Homeworks and lateness

Each student will have 192 total late hours (= 8 late days) to use throughout the semester across homeworks HW1-HW5. No late hours are allowed on HW0.

For each individual assignment, you can submit beyond the posted deadline at most 96 hours (4 days) and still receive full credit. Thus, for one assignment in the course due on Thu 11:59pm ET, you could submit by the following Mon at 11:59pm ET.

This late work deadline is key to our classroom goals. It allows us to always release homework solutions on Tue mornings and discuss the solution in class.

The timestamp recorded on Gradescope will be official. Late time is rounded up to the nearest hour. For example, if the assignment is due at 3pm and you turn it in at 3:05pm, you have used one whole hour.

Beyond your allowance of 192 late hours, zero credit will be awarded except in cases of truly unforeseen exceptional circumstances (e.g. family emergency, medical emergency). Students with exceptional circumstances should contact the instructor to make other arrangements.

Projects and lateness

Projects are open-ended and involve working with peers on significant code implementation and written reports. Because many "solutions" are possible, we will strive to be flexible, while still incentivizing students to turn in high-quality work on time so we can grade in a timely manner.

Projects require significant work. Please start early (at least 2 weeks before deadline) and make a careful plan with your group.

Projects turned in by the posted due date will be eligible for up to 100% of the points.

Projects turned in up to four days after the posted due date will be eligible for up to 85% of the points.

Students with unforeseen and exceptional circumstances may contact the instructor to make other arrangements. Without explicit instructor approval for an extension beyond four days, we may score your project zero total points.

Exams and lateness

Exams must occur on the assigned date.

Students with unforeseen and exceptional circumstances may contact the instructor to make other arrangements (likely in the form of a makeup oral exam).

Workload

Each week, you should expect to spend about 10-15 hours on this class.

Here's our recommended break-down of how you'll spend time each week:

  • 1.25 hr / wk preparation before Tue class (reading, lecture videos)
  • 1.25 hr / wk active participation in Tue class
  • 1.25 hr / wk preparation before Thu class (reading, lecture videos)
  • 1.25 hr / wk active participation in Thu class
  • 6.00 hr / wk on homework or project, whichever is due next

This totals to 11.00 hr / wk

Typically, by assignment

  • for each HW you are given 2 weeks from release to due date. We expect about 8 hours are needed.
  • for each Project you are given 3+ weeks. We expect about 16 hours are needed from each team member.

Grading

Final grades will be computed based on a numerical score via the following weighted average:

Last updated: Nov. 28, 2023

  • 28.06% Homeworks (HW0 weighted 3.06%, HW1-HW5 weighted 5% each)
  • 36.60% Projects (A and B weighted equally)
  • 16.67% midterm exam
  • 16.67% final exam
  • 2% participation in class, office hours, and in Piazza discussions

Below is the original, now deprecated scheme advertised in August '23 (included an extra project)

  • 23% average of homework scores (HW0 weighted 3%, HW1-HW5 weighted 5% each)
  • 45% average of project scores (ProjA, ProjB, and ProjC, weighted equally)
  • 15% midterm exam
  • 15% final exam
  • 2% participation in class, office hours, and in Piazza discussions

When assigning grades, the following scale numerical scale will be used:

  • 0.93-1.00 : A
  • 0.90-0.93 : A-
  • 0.87-0.90 : B+
  • 0.83-0.87 : B
  • 0.80-0.83 : B-
  • 0.77-0.80 : C+
  • 0.73-0.77 : C
  • 0.70-0.73 : C-
  • 0.67-0.70 : D+
  • 0.63-0.67 : D
  • 0.60-0.63 : D-
  • 0.6 or below : F

This means you must earn at least an 0.83 (not 0.825 or 0.8295 or 0.8299) to earn a B instead of a B-.

Any rounding up will be at the instructor's discretion, as will the highest possible grade of "A+".

Collaboration Policy

Our ultimate goal is for each student to fully understand the course material.

For exams, all work must be done individually, with no collaboration with others whatsoever.

For homeworks and projects, we have the following policy for student work.

You must write anything that will be turned in -- all code and all written solutions -- on your own without help from others. You may not share any code or solutions with others, regardless of if they are enrolled in the class or not.

We do encourage high-level interaction with your classmates. After you have spent at least 10 minutes thinking about the problem on your own, you may verbally discuss assignments with others in the class. You may work out solutions together on whiteboards, laptops, or other media, but you are not allowed to take away any written or electronic information from joint work sessions with others. No notes, no diagrams, and no code. Emails, text messages, and other forms of virtual communication also constitute “notes” and should not be used preparing solutions.

When preparing your solutions, you may always consult textbooks, materials on the course website, or existing content on the web for general background knowledge. However, you cannot ask for answers through any question answering websites such as (but not limited to) Quora, StackOverflow, etc. If you see any material having the same problem and providing a solution, you cannot check or copy the solution provided. If general-purpose material was helpful to you, please cite it in your solution.

RE: AI assistive technologies such as ChatGPT:

  • For homeworks, you cannot use any AI assistance at all.
  • For open-ended projects, you may use such technologies to "automate the boring stuff" in terms of code development, but the high-level plan and vision for the project should be yours. You are expected to fully understand any code you use. You should write every word of your report yourself (no AI-assisted writing). Your report should disclose all steps that involved AI assistance.

For homeworks, which are intended to be done individually, please interpret "others" as as anyone else, whether in the class or not.

For project work intended to be done in small teams, please interpret "others" above as anyone not on your team.

Remember, you are responsible for everything that you (or your team) hands in. You should understand it and be able to answer questions about it, if asked.

Required Collaboration Statement

Along with all submitted work, you will fill out a short form declaring the names of any others you got help from, and in what way you worked them (discussed ideas, debugged math, team coding). Turning in this form will certify your compliance with this policy.

Along with all submitted small team work, you will fill out a short form describing how the team collaborated and divided the work. All team members must contribute significantly to the solution. We may occasionally check in with some teams to ascertain that everyone in the group was participating in accordance with this policy.

Piazza & Collaboration

When using the Piazza forum, you should be aware of the policies previously mentioned while post posting questions and providing answers. Questions may be posted as either private (viewable only by yourself and course staff) or public (additionally viewable by all students for the course registered on Piazza).

Some issues warrant public questions and responses, such as: misconceptions or clarifications about the instructions, conceptual questions, errors in documentation, etc.

Some issues are better with private posts, including: debugging questions that include extensive amounts of code, questions that reveal a portion of your solution, etc.

Please use your best judgment when selecting private vs. public. If in doubt, make it private.

External Software

Each assignment will provide specific instructions about which open-source machine learning packages (such as scikit-learn, autograd, tensorflow, pytorch, etc.) you are allowed to use.

If you are allowed to use a package, there are two caveats:

Do not use a tool blindly: You are expected to show a deep understanding of any method you apply, as demonstrated by your writeup.

Beware of autograder requirements: If the problem requires you to submit code to an autograder, we will need to run the code using only the prescribed default software environment. Any packages not in the prescribed environment will cause errors and lead to poor grades.

Academic Integrity Policy

This course will strictly follow the Academic Integrity Policy of Tufts University. Students are expected to finish course work independently when instructed, and to acknowledge all collaborators appropriately when group work is allowed. Submitted work should truthfully represent the time and effort applied.

Please refer to the Academic Integrity Policy at the following URL: https://students.tufts.edu/student-affairs/student-life-policies/academic-integrity-policy

Accessibility

Tufts and the instructor of COMP 135 strive to create a learning environment that is welcoming students of all backgrounds and abilities. Respect is demanded at all times throughout the course. Participation is not only required, it is expected that everyone in the course is treated with dignity and respect. We realize everyone comes from a different background with different experiences and abilities. Our knowledge will always be used to better everyone in the class.

If you have a disability that requires reasonable accommodations, please contact the Student Accessibility Services office at Accessibility@tufts.edu or 617-627-4539 to make an appointment with an SAS representative to determine appropriate accommodations. Please be aware that accommodations cannot be enacted retroactively, making timeliness a critical aspect for their provision.

Please see the detailed accessibility policy at the following URL: https://students.tufts.edu/student-accessibility-services

If you feel uncomfortable or unwelcome for any reason, please talk to your instructor so we can work to make things better. If you feel uncomfortable talking to members of the teaching staff, consider reaching out to your academic advisor, the department chair, or your dean.