Comp 135: Introduction to Machine Learning
Department of Computer Science
Tufts University
Fall 2010

Course Web Page (this page): http://www.cs.tufts.edu/comp/135/

Announcement(s):
  • (1/3) Graded exam and projects can be collected in the CS main office.
  • (12/6) Review session will be held on Wednesday 12/8 in the open block (12:00-1:00) in H-108.
  • (12/2) Information for final exam posted.
  • (11/4) Information for final projects posted.

Syllabus:

Description: The course covers the main paradigms in machine learning including supervised learning, unsupervised learning and reinforcement learning. The focus is on practical aspects: ideas underlying various methods, design of algorithms using these ideas, and their empirical evaluation. We will discuss well established techniques as well as new developments from recent research.

Relation to Other Courses: The course is gives an introduction to machine learning and is aimed at upper level undergraduates and beginning graduate students. Some mathematical aptitude is required, but the course emphasizes practical aspects and baseline algorithms over mathematical sophistication and analysis, or open research issues. These are explored in our other advanced courses: Information Theory for Machine Learning , Statistical Pattern Recognition , Computational Learning Theory , Learning, Planning and Acting in Complex Environments , Problems in Chemistry, and Bioengineering , Statistical Relational Learning .

Prerequisites: Formal prerequisites are Comp 15 and Math 22 or consent of instructor. Comp 160, Algorithms, is highly recommended.

Class Times:

Tuesday and Thursday, 12:00-1:15, Halligan Hall 106

Instructor:

Roni Khardon
Office Hours: Thursday 4:15-5:15 Halligan 230, or by appointment.
Office: Halligan 230
Phone: 1-617-627-5290
Fax: 1-617-627-3220
Dept.: 1-617-627-3217
Email: roni@cs.tufts.edu

Teaching Assistant:

Yuyang Wang
Email: ywang02@cs.tufts.edu
Office hours: Monday 1:00-2:00, Tuesday 4:00-5:00 in Halligan extension; or by appointment.

Course Work and Marking

The course mark will be determined by a combination of
Homework assignments (30%)
These will include both exercises reviewing the material and experimental machine learning work. The latter will include both programming assignments and use of existing machine learning software.
Rules for late submissions: All work must be turned in on the date specified. Please notify me of special circumstances at least two days in advance. Otherwise, If you haven't finished an assignment, turn in what you have on the due date, and it will be evaluated for partial credit.
Major project (20%)
A large individual or group-run experimental project. The project can apply some machine learning methods to real world data, or empirically investigate some core machine learning issue. The project will be performed during the last 4 weeks of the semester. It will be graded based on the quality of work/experiments/programming as well as a final project report. Details to be announced.
In-class exam (Tuesday Oct 19, 25%)
In-class exam (Thursday Dec 9, 25%)

Collaboration:

Unless you are doing a group project all work must be done individually and written up individually. However, I encourage discussion among students on the topics of exercises as this often improves the learning experience. If you discuss the work with other students, please state briefly but clearly, at the top of your writeup, whom you discussed the work with and to what extent. Please see the booklet " Academic Integrity" available from the Dean of Students' Office.

Tentative List of Topics

[We may not cover all sub-topics]

Textbooks and Material Covered

All books listed below should be on reserve for the course in the Tisch Library. No single text covers all the material for this course. The Mitchell text covers a reasonable portion and is an excellent reference to have; we will be using this text as our default textbook. Some of the material is from research articles. Detailed reading assignments and links to material will be posted.

Software

Resources

Reading, References, and Assignments

Note on links to articles: We cannot post local copies but I have provided links to articles that exist on the Internet. Some of these require subscription which Tufts does have so all articles are accessible when using department machines. Department machines also have the appropriate software. If working from elsewhere you may need a PDF viewer, a Postscript viewer, and/or "gunzip" software to unpack some of the compressed files.

Topic Reading/Assignments Due Date
Introduction to Machine Learning Read Chapter 1 of [M] week 1
Assignment 1 Assignment 1 9/16
Supervised Learning Basics:    
Decision Trees Read Chapter 3 of [M]. week 2
Supplemental Reading T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996. (read at least section 3) week 2
Assignment 2 Assignment 2 10/1 (noon)
Overview of MDPs and Reinforcement Learning [RN] Sections 17.1-3; [M] 13.1-3 week 3
Evaluating Machine Learning Outcomes Read Chapter 5 of [M].
For ROC and precision/recall curves read Section 5.7 of [WP].
week 3/4
Additional Reading for Evaluating Machine Learning Foster Provost, Tom Fawcett, Ron Kohavi The Case Against Accuracy Estimation for Comparing Induction Algorithms Proc. 15th International Conf. on Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998.
Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997.
This reading is optional (material only partly discussed in class)
Version Spaces [M] Sections 2.1-2.3 and 2.6-2.8 (skim the rest of chapter 2). Week 5
Computational learning theory [M] Sections 7.1-7.3.1 Week 5
Assignment 3 Assignment 3
Please see additional explanations Q&A for assignment 3
10/14
Instance based learning [M] Sections 8.1-8.4.  
Feature Selection Wrappers for Feature Subset Selection Ron Kohavi, George H. John Artificial Intelligence, 1996. (Read at least portion till section 3.2 inclusive.)  
Exam1 10/19 Information for first class exam  
Feature Discretization Supervised and unsupervised discretization of continuous features. James Dougherty, Ron Kohavi, and Mehran Sahami. International Conference on Machine Learning, 1995.
Some additional variants and comparisons are given in: Error-Based and Entropy-Based Discretization of Continuous Features. Ron Kohavi and Mehran Sahami, Knowledge Discovery in Databases 1996.
This reading is optional (material not covered in class)
Learning Relational Rules [M] Sections 10.1-10.5.  
Additional Reading for Rule Learning Fast Effective Rule Induction , William W. Cohen, Proc. of the 12th International Conference on Machine Learning, 1995. (a detailed study of growing and prunning) This reading is optional (material only partly discussed in class)
Additional Reading for Rule Learning Applications of Inductive Logic Programming. I. Bratko and S.H. Muggleton, Communications of the ACM, 38(11):65-70, 1995. This reading is optional (material not discussed in class)
Assignment 4 Assignment 4
Please see additional clarification for assignment 4
11/2
Perceptrons, Neural Networks, Support Vector Machines and Kernels [M] Chapter 4.
[CST] pages: 9-19 and 26-32 (you may skip the proofs on pages 14,17)
 
Additional Reading for Linear Threshold Functions D.P. Helmbold and J. Kivinen, and M. Warmuth Relative Loss Bounds for Single Neurons IEEE Transactions on Neural Networks, Vol. 10(6), pp. 1291--1304, November 1999 (optional reading: read introduction and experimental section)
Yoav Freund, Robert E. Schapire Large Margin Classification Using the Perceptron Algorithm , Machine Learning Journal 1999, (optional reading: read sections 1, 2, and experiments)
Roni Khardon and Gabriel Wachman Noise Tolerant Variants of the Perceptron Algorithm Journal of Machine Learning Research (JMLR) 8(Feb):227--248, 2007. (optional reading: skim per variant algorithms and results)
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer Online Passive-Aggressive Algorithms Journal of Machine Learning Research (JMLR) 7(Mar):551--585, 2006. (optional reading: read sections 2,3,10)
A practical guide to support vector classification C.-W. Hsu, C.-C. Chang, C.-J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003. (optional reading)
This reading is optional (material only partly discussed in class)
Assignment 5 Assignment 5
Please see additional clarifications for assignment 5
11/16
Final Projects Information for projects proposal due: 11/10, 11/16.
report due: 12/10
Statistical Models for Estimation and Classification [M] Sections 6.2-3, and 6.6-10.
[DHS] Section 2.9.
 
Additional Reading for Statistical Models [DHS] 3.1-3.4. This reading is optional.
Clustering Read Sections 10.6, 10.7, 10.9 of [DHS].
Douglas Fisher Knowledge acquisition via incremental conceptual clustering Machine Learning 1987, Vol 2: 139-172. (Read at least sections 3.1 and 4.2)
Optional reading: [WF] Sections 4.8 and 6.6
 
Unsupervised and Semi-Supervised Learning with EM [M] Section 6.12
Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103-134, 2000. (Read all; you can skip section 5.3)
 
Additional Reading for EM [DHS] Section 3.9 This reading is optional.
Exam2 12/9 Information for final exam  
Active Learning Support Vector Machine Active Learning with Applications to Text Classification Simon Tong, Daphne Koller; JMLR 2(Nov):45-66, 2001. This reading is optional.
Association Rules Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993. This reading is optional.
Aggregation Methods An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139-158, 2000. This reading is optional.
Collective Classification Collective Classification in Network Data. Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. AI Magazine, vol.29, no.3, pp 93-106, 2008. This reading is optional.