Comp 135: Introduction to Machine Learning
Department of Computer Science
Tufts University
Fall 2008

Course Web Page (this page): http://www.cs.tufts.edu/comp/135/

Announcement(s):
  • (12/12) Projects have been graded and can be collected at the CS main office.
  • (12/11) Exams have been graded and can be collected at the CS main office.
  • (12/2) Review session will be help tomorrow (12/3) 12:00-1:15 in H-106
  • (12/2) Exam info: topics for aggregation methods added to detailed list.
  • (11/26) Information for second class exam posted below.
  • (11/25) Project Presentations scheduled for Monday 12/8 12:00-1:30, in Halligan 106.
  • (10/6) New Course Offered Spring 2009: Comp 150SRL - Statistical Relational Learning .

Syllabus:

Description: The course covers the main paradigms in machine learning including supervised learning, unsupervised learning and reinforcement learning. The focus is on practical aspects: ideas underlying various methods, design of algorithms using these ideas, and their empirical evaluation. We will discuss well established techniques as well as new developments from recent research.

Relation to Other Courses: The course is gives an introduction to machine learning and is aimed at upper level undergraduates and beginning graduate students. Some mathematical aptitude is required, but the course emphasizes practical aspects and baseline algorithms over mathematical sophistication and analysis, or open research issues. These are explored in our other advanced courses: Information Theory for Machine Learning , Statistical Pattern Recognition , Computational Learning Theory , Learning, Planning and Acting in Complex Environments , Problems in Chemistry, and Bioengineering .
New Course Spring 2009: Comp 150SRL - Statistical Relational Learning .

Prerequisites: Formal prerequisites are Comp 15 and Math 22 or consent of instructor. Comp 160, Algorithms, is highly recommended.

Class Times:

Tuesday and Thursday, 12:00-1:15, Halligan Hall 106

Instructor:

Roni Khardon
Office Hours: Tuesday (after class) 1:15-2:15, or by appointment.
Office: Halligan 230
Phone: 1-617-627-5290
Fax: 1-617-627-3220
Dept.: 1-617-627-3217
Email: roni@cs.tufts.edu

Teaching Assistant:

Saket Joshi
Email: sjoshi01@cs.tufts.edu
Office hours: Mondays and Wednesdays 10-11 in Halligan 120.

Course Work and Marking

The course mark will be determined by a combination of
Homework assignments (30%)
These will include both exercises reviewing the material and experimental machine learning work. The latter will include both programming assignments and use of existing machine learning software.
Rules for late submissions: All work must be turned in on the date specified. Please notify me of special circumstances at least two days in advance. Otherwise, If you haven't finished an assignment, turn in what you have on the due date, and it will be evaluated for partial credit.
Final project (30%)
A large individual or group-run experimental project. The project can apply some machine learning methods to real world data, or empirically investigate some core machine learning issue. The project will be graded based on the quality of work/experiments/programming as well as a final project report. Details to be announced.
In-class exam (Thursday Oct 16, 20%)
In-class exam (Thursday Dec 4, 20%)

Collaboration:

Unless you are doing a group project all work must be done individually and written up individually. However, I encourage discussion among students on the topics of exercises as this often improves the learning experience. If you discuss the work with other students, please state briefly but clearly, at the top of your writeup, whom you discussed the work with and to what extent. Please see the booklet "Academic Integrity" available from the Dean of Students' Office.

Tentative List of Topics

[We are likely to skip a few sub-topics]

Textbooks and Material Covered

All books listed below should be on reserve for the course in the Tisch Library. No single text covers all the material for this course. The Mitchell text covers a reasonable portion and is an excellent reference to have; we will be using this text as our default textbook. Some of the material is from research articles. Detailed reading assignments and links to material will be posted.

Software

Resources

Reading, References, and Assignments

Note on links to articles: We cannot post local copies but I have provided links to articles that exist on the Internet. Some of these require subscription which Tufts does have so all articles are accessible when using department machines. Department machines also have the appropriate software. If working from elsewhere you may need a PDF viewer, a Postscript viewer, and/or "gunzip" software to unpack some of the compressed files.

Instance based learning:
Topic Reading/Assignments Due Date
Introduction to Machine Learning Read Chapter 1 of [M]  
Supervised Learning Basics:    
Decision Trees Read Chapter 3 of [M].  
Version Spaces Read Chapter 2 of [M].
Slide copy from class
 
Supplemental Reading T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996. (read at least section 3)
D. Page & S. Ray (2003). Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction International Joint Conference on Artificial Intelligence, 2003.
 
Assignment 1 Assignment 1 9/16
Assignment 2 Assignment 2 9/25
Neural Networks Read Chapter 4 of [M].  
Supplemental Reading D.P. Helmbold and J. Kivinen, and M. Warmuth Relative Loss Bounds for Single Neurons IEEE Transactions on Neural Networks, Vol. 10(6), pp. 1291--1304, November 1999 (read introduction and experimental section)
Yoav Freund, Robert E. Schapire Large Margin Classification Using the Perceptron Algorithm , Machine Learning Journal 1999, (read sections 1, 2, and experiments)
Roni Khardon and Gabriel Wachman Noise Tolerant Variants of the Perceptron Algorithm Journal of Machine Learning Research (JMLR) 8(Feb):227--248, 2007. (optional reading: skim per variant algorithms and results)
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer Online Passive-Aggressive Algorithms Journal of Machine Learning Research (JMLR) 7(Mar):551--585, 2006. (optional reading: read sections 2,3,10)
 
Background on Probability Read Appendix C.1-4 in Cormen et. al. Algorithms text, or introductory parts of any probability text.  
Evaluating Hypotheses and Algorithms Read Chapter 5 of [M].  
Supplemental Reading (only partly discussed in class) Foster Provost, Tom Fawcett, Ron Kohavi The Case Against Accuracy Estimation for Comparing Induction Algorithms Proc. 15th International Conf. on Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998.
Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997.
 
Assignment 3 Assignment 3 10/14
Exam1 10/16 Information for first class exam  
Unsupervised Learning:    
Clustering Read Sections 10.6, 10.7, 10.9 of [DHS].
Douglas Fisher Knowledge acquisition via incremental conceptual clustering Machine Learning 1987, Vol 2: 139-172. (Read at least sections 3.1 and 4.2)
Optional reading: [WF] Sections 4.8 and 6.6
 
Final Projects Information for projects 12/8
Statistical Models Maximum likelihood and Bayesian estimation: Section 6.2 of [M]. Optional reading: [DHS] 3.1-3.4.
EM Algorithm: Slide copies from class; Section 6.12 of [M].
 
Statistical Models for Classification [M] Sections 6.2, 6.9, 6.10.  
Supplemental Reading Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103-134, 2000. (Read all; can skip section 5.3)  
Assignment 4 Assignment 4 11/18
Instance based learning [M] Sections 8.1-8.4.  
Feature Selection Wrappers for Feature Subset Selection Ron Kohavi, George H. John Artificial Intelligence, 1996. (Read at least portion till section 3.2 inclusive.)  
Association Rules Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993.  
Supplemental Reading Real World Performance of Association Rule Algorithms Zijian Zheng, Ron Kohavi, Llew Mason Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001. (optional reading; check out dataset charcteristics and performance analyssis of different algorithms)
Dynamic Itemset Counting and Implication Rules for Market Basket Data Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, 1997. (optional reading; introduces conviction as measure of interestingness)
Discovering All Most Specific Sentences D. Gunopulos, R. Khardon , H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharma, ACM Transactions on Database Systems , Vol 28, pages 140-174, June 2003. (optional reading; introduces transversals based algorithm)
 
Learning Relational Rules [M] Sections 10.1-10.5.  
Supplemental Reading Fast Effective Rule Induction , William W. Cohen, Proc. of the 12th International Conference on Machine Learning, 1995. (optional reading; detailed study of growing and prunning)
Learning Horn Expressions with LogAn-H , Marta Arias, Roni Khardon and Jerome Maloberti, Journal of Machine Learning Research (JMLR) 8(Mar):549--587, 2007. (optional reading; bottom up method; not covered in class)
 
Kernel Methods and Support Vector Machines Please refer to slides distributed in the lecture
[CST] pages: 9-19 and 26-32 (you may skip the proofs on pages 14,17)
 
Supplemental Reading A practical guide to support vector classification C.-W. Hsu, C.-C. Chang, C.-J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003. (optional reading)
[CST] Cahpter 6. (optional reading)
A tutorial on support vector machines for pattern recognition Christopher J. C. Burges, Data Mining and Knowledge Discovery, 1998. (optional reading)
 
Reinforcement learning [M] chapter 13  
Aggregation Methods (Bagging, Boosting) Please refer to slides distributed in the lecture
An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139-158, 2000.
 
Supplemental Reading A short introduction to boosting. Yoav Freund and Robert E. Schapire. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, 1999. (optional reading; a good accessible overview)
Boosting the margin: A new explanation for the effectiveness of voting methods. Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. The Annals of Statistics, 26(5):1651-1686, 1998. (optional reading; source of margin graphs in slides; the introduction is informative and accessible)
Improved boosting algorithms using confidence-rated predictions Robert E. Schapire and Yoram Singer Machine Learning, 37, 1999. (optional reading; source of confidence rated version in slides)
 
Computational learning theory [M] sections 7.1-7.3.1  
Exam2 12/4 Information for second class exam