Comp 135: Introduction to Machine Learning
Department of Computer Science
Tufts University
Fall 2014

Course Web Page (this page): http://www.cs.tufts.edu/comp/135/

Announcement(s):
  • (11/19) Programming Project 4 clarification: please use k=5 for k-nearest neighbors.
  • Office hours on Thursday+Friday 10/30-31 are canceled. Please email Eric Murray or Roni Khardon or post to piazza with any questions.
  • Clarification for project 3: Please see Eric's post in response to sensitivity to initialization on piazza. The project writeup did not discuss data normalization (see bottom right slide on page 3 of features lecture, L6). If you do not normalize the data the distance function is not very informative, and in the case of this assignment apparently the clustering values appear stable.
    Since we only figured out the discrepancy too close to the deadline you can submit your code and results as is without normalization. Please make sure that we can run the submitted code in case we want to check how it behaves on normalized data.
  • Temporary change in office hours: office hours on Tuesday 10/14 4:30-5:30 will be held by Megfei Cao. Office Hours on Thursday 10/16, 1:30-3:00 will be held by Roni Khardon.
  • (10/7) Information for midterm exam posted.
  • (9/29) Per popular demand, Piazza discussion site is being set up. Please sign up at comp135 Piazza site
  • Course information, Reading, Assignments, etc updated regularly in the table below

What is this course about?

Machine learning is the science of collecting and analyzing data and turning it into predictions, encapsulated knowledge, or actions. There are many ways and scenarios by which data can be obtained, many different models and algorithms for data analysis, and many potential applications. In recent years machine learning has attracted attention due to commercial successes and widespread use.
The course gives a broad introduction to machine learning aimed at upper level undergraduates and beginning graduate students. Some mathematical aptitude is required, but generally speaking we focus on baseline algorithms, practical aspects, and breadth and leave detailed analysis to more advanced courses: Statistical Pattern Recognition , Computational Learning Theory , Learning, Planning and Acting in Complex Environments .

Syllabus:

An overview of methods whereby computers can learn from data or experience and make decisions accordingly. Topics include supervised learning, unsupervised learning, reinforcement learning, and knowledge extraction from large databases with applications to science, engineering, and medicine.

Prerequisites:

Comp 15 and COMP/MATH 22 or 61 or consent of instructor. (Comp 160 is highly recommended).

Class Times:

Monday and Wednesday, 4:30-5:45, Halligan Hall 111A

Instructor:

Roni Khardon
Office Hours: Tuesday 4:30-5:30, and Mon/Wed 5:45-6:15 (after lecture)
Office: Halligan 230
Phone: 1-617-627-5290
Email: roni@cs.tufts.edu

Teaching Assistants:

Course Work and Marking

The course grade will be determined by a combination of
Written homework assignments (20%)
these assignments exercise and reinforce class material.
Experimental/Programming projects (30%)
these assignments exercise and reinforce class material. Projects will include both programming assignments and use of existing machine learning software.
Rules for late submissions:
All work must be turned in on the date specified. Unless there is a last minute emergency, please notify me of special circumstances at least two days in advance. Otherwise, If you haven't finished an assignment, turn in what you have on the due date, and it will be evaluated for partial credit.
In-class midterm exam (20%), Wednesday, October 22.
Final exam (30%) , scheduled according to K Block schedule on Wednesday, December 17, 3:30-5:30.
Note: If your final exam grade is higher than the midterm the midterm is discounted and the final will count for 50%.

Collaboration:

On homework assignments and projects: You may discuss the problems and general ideas about their solutions with other students, and similarly you may consult other textbooks or the web. However, you must work out the details on your own and code/write-out the solution on your own. Every such collaboration (either getting help of giving help) and every use of text or electronic sources must be clearly cited and acknowledged in the submitted homework.
On exams: no collaboration is allowed.
Failure to follow these guidelines may result in disciplinary action for all parties involved. Any questions? for this and other issues concerning academic integrity please consult the booklet available from the office of the Dean of Student Affairs.

Tentative List of Topics

[We may not cover all sub-topics]

Textbooks and Material Covered

No single text covers all the material for this course at the right level. We have the following texts on reserve in the library. Other material will be selected from research and survey articles or other notes. Detailed reading assignments and links to material will be posted.

Software

Reading, References, and Assignments

Lecture Topic Reading/Assignments/Notes Due Date
L1 Introduction to Machine Learning Read the introductory chapter of [M], [WF], [F] or [A]
See also lecture slides.
week 1
  Supervised Learning Basics:    
L2 Instance based learning [M] Chapter 8 is cloest to class material; or [RN] 18.8; or [DHS] 4.4-4.6.
Andrew Moore's tutorial on kd-trees
See also lecture slides.
 
L2-3 Decision Trees [M] Chapter 3; or [RN] 18.1-4; or [F] Chapter 5.
See also lecture slides.
 
  Optional Reading T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996.  
  Empirical/Programming Assignment 1 Project 1 and corresponding Data 9/17
  Written Assignment 1 Assignment 1 9/22
L4 Naive Bayes Algorithm [M] 6.1-6.2, and 6.9-6.10; [DHS] Section 2.9; [F] 9.2; [WF] 4.2; See also new book chapter from [M]
See also lecture slides.
Lecture also provided a basic introduction to probability and working with random variables.
 
L5 Linear Threshold Units [M] 4.1-4.4; DHS 5.5; See also new book chapter from [M]
See also lecture slides.
 
L6 Features (selection, transformation, discretization) Wrappers for Feature Subset Selection Ron Kohavi, George H. John Artificial Intelligence, 1996. (Read till section 3.2 inclusive.)
Supervised and unsupervised discretization of continuous features. James Dougherty, Ron Kohavi, and Mehran Sahami. International Conference on Machine Learning, 1995.
See also lecture slides.
 
L6-7 Evaluating Machine Learning Outcomes [M] Ch 5; [F] Ch 12
See also lecture slides.
 
  Optional Reading Foster Provost, Tom Fawcett, Ron Kohavi The Case Against Accuracy Estimation for Comparing Induction Algorithms Proc. 15th International Conf. on Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998.
Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997.
 
  Written Assignment 2 Assignment 2 10/1
L8 Clustering [DHS] 10.6-7,10.9; [F] 8.4-5
See also lecture slides.
 
  Empirical/Programming Assignment 2 Project 2 and corresponding Data 10/15
L9-10 Unsupervised and Semi-Supervised Learning with EM [M] Section 6.12; [A] 7.4; [F] 9.4; [DHS] 3.9
Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103-134, 2000. (The entire paper is relevant; you can skip section 5.3)
See also lecture slides .
 
  Written Assignment 3 Assignment 3 10/20
L11   Review of math topics from recent lectures.  
L12 Association Rules [F] 6.3; [WF] 4.5
Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993.
See also lecture slides.
 
  Optional Reading Real World Performance of Association Rule Algorithms Zheng et al, KDD 2001.
Mining the Most Interesting Rules Bayardo et all, KDD 1999.
Dynamic Itemset Counting and Implication Rules for Market Basket Data Brin et al, SIGMOD 1997.
Discovering All Most Specific Sentences Gunopulos et al, TODS, 2003.
 
(L14) Midterm Exam 10/22 Material for the exam includes everything covered up to October 15 (all the material above this point in the table). Everything discussed in class is included for the exam. The reading assignments are supporting materials that should be useful in review and study but I will not hold you responsible for details in the reading that were not discussed in class.
The Exam is closed book; no notes or books are allowed; no calculators or other machines of any sort are allowed.
The exam will aim to test whether you have grasped the main concepts, problems, ideas, and algorithms that we have covered, including the intuition behind these. Generally speaking, the exam will not test your technical wizardry with overly long equations or calculations, but, on the other hand, it is sure to include some shorter ones.
 
  Empirical/Programming Assignment 3 Project 3 and corresponding Data 10/29
L13,L15,L16 Computational learning theory [M] 7.1, 7.2, 7.3, 7.5, [RN] 18.5 and (for perceptron) [DHS] 5.5.2 or [CST] 2.1.1  
  Written Assignment 4 Assignment 4 11/12
L17 Neural Networks [M] Chapter 4, [RN] 18.7, [DHS] 6.1-5.
See also lecture slides.
 
L18-20 Kernels, Dual Perceptron, Support Vector Machines [CST] pages: 9-19 and 26-32, [RN] 18.9, [F] 7.3
A practical guide to support vector classification C.-W. Hsu, C.-C. Chang, C.-J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003.
 
L21 Active Learning Active Learning Literature Survey  
  Optional Reading The Robot Scientist Adam IEEE Computer (Volume:42, Issue: 8) 2009.  
  Empirical/Programming Assignment 4 Project 4 and corresponding Data 12/3
L22-23 Overview of MDPs and Reinforcement Learning [RN] Sections 17.1-3 and Chapter 21; [M] 13.1-3 &bnsp