
Relation to Other Courses: The course is gives an introduction to machine learning and is aimed at upper level undergraduates and beginning graduate students. Some mathematical aptitude is required, but the course emphasizes practical aspects and baseline algorithms over mathematical sophistication and analysis, or open research issues. These are explored in our other advanced courses: Information Theory for Machine Learning , Statistical Pattern Recognition , Computational Learning Theory , Learning, Planning and Acting in Complex Environments , Problems in Chemistry, and Bioengineering , Statistical Relational Learning .
Prerequisites: Formal prerequisites are Comp 15 and Math 22 or consent of instructor. Comp 160, Algorithms, is highly recommended.
Topic  Reading/Assignments  Due Date 
Introduction to Machine Learning  Read Chapter 1 of [M]  week 1 
Assignment 1  Assignment 1  9/16 
Supervised Learning Basics:  
Decision Trees  Read Chapter 3 of [M].  week 2 
Supplemental Reading  T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996. (read at least section 3)  week 2 
Assignment 2  Assignment 2  10/1 (noon) 
Overview of MDPs and Reinforcement Learning  [RN] Sections 17.13; [M] 13.13  week 3 
Evaluating Machine Learning Outcomes 
Read Chapter 5 of [M].
For ROC and precision/recall curves read Section 5.7 of [WP]. 
week 3/4 
Additional Reading for Evaluating Machine Learning 
Foster Provost, Tom Fawcett, Ron Kohavi
The Case Against Accuracy Estimation for Comparing Induction
Algorithms
Proc. 15th International Conf. on
Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998. Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997. 
This reading is optional (material only partly discussed in class) 
Version Spaces  [M] Sections 2.12.3 and 2.62.8 (skim the rest of chapter 2).  Week 5 
Computational learning theory  [M] Sections 7.17.3.1  Week 5 
Assignment 3 
Assignment 3
Please see additional explanations Q&A for assignment 3 
10/14 
Instance based learning  [M] Sections 8.18.4.  
Feature Selection  Wrappers for Feature Subset Selection Ron Kohavi, George H. John Artificial Intelligence, 1996. (Read at least portion till section 3.2 inclusive.)  
Exam1 10/19  Information for first class exam  
Feature Discretization 
Supervised and unsupervised discretization of continuous features.
James Dougherty, Ron Kohavi, and Mehran Sahami.
International Conference on Machine Learning, 1995.
Some additional variants and comparisons are given in: ErrorBased and EntropyBased Discretization of Continuous Features. Ron Kohavi and Mehran Sahami, Knowledge Discovery in Databases 1996. 
This reading is optional (material not covered in class) 
Learning Relational Rules  [M] Sections 10.110.5.  
Additional Reading for Rule Learning  Fast Effective Rule Induction , William W. Cohen, Proc. of the 12th International Conference on Machine Learning, 1995. (a detailed study of growing and prunning)  This reading is optional (material only partly discussed in class) 
Additional Reading for Rule Learning  Applications of Inductive Logic Programming. I. Bratko and S.H. Muggleton, Communications of the ACM, 38(11):6570, 1995.  This reading is optional (material not discussed in class) 
Assignment 4 
Assignment 4
Please see additional clarification for assignment 4 
11/2 
Perceptrons, Neural Networks, Support Vector Machines and Kernels 
[M] Chapter 4.
[CST] pages: 919 and 2632 (you may skip the proofs on pages 14,17) 

Additional Reading for Linear Threshold Functions 
D.P. Helmbold and J. Kivinen, and M. Warmuth
Relative Loss Bounds for Single Neurons
IEEE Transactions on Neural
Networks, Vol. 10(6), pp. 12911304, November 1999
(optional reading: read introduction and experimental section)
Yoav Freund, Robert E. Schapire Large Margin Classification Using the Perceptron Algorithm , Machine Learning Journal 1999, (optional reading: read sections 1, 2, and experiments) Roni Khardon and Gabriel Wachman Noise Tolerant Variants of the Perceptron Algorithm Journal of Machine Learning Research (JMLR) 8(Feb):227248, 2007. (optional reading: skim per variant algorithms and results) Koby Crammer, Ofer Dekel, Joseph Keshet, Shai ShalevShwartz, Yoram Singer Online PassiveAggressive Algorithms Journal of Machine Learning Research (JMLR) 7(Mar):551585, 2006. (optional reading: read sections 2,3,10) A practical guide to support vector classification C.W. Hsu, C.C. Chang, C.J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003. (optional reading) 
This reading is optional (material only partly discussed in class) 
Assignment 5 
Assignment 5
Please see additional clarifications for assignment 5 
11/16 
Final Projects  Information for projects  proposal due: 11/10, 11/16. report due: 12/10 
Statistical Models for Estimation and Classification 
[M] Sections 6.23, and 6.610.
[DHS] Section 2.9. 

Additional Reading for Statistical Models  [DHS] 3.13.4.  This reading is optional. 
Clustering 
Read Sections 10.6, 10.7, 10.9 of [DHS].
Douglas Fisher Knowledge acquisition via incremental conceptual clustering Machine Learning 1987, Vol 2: 139172. (Read at least sections 3.1 and 4.2) Optional reading: [WF] Sections 4.8 and 6.6 

Unsupervised and SemiSupervised Learning with EM 
[M] Section 6.12
Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103134, 2000. (Read all; you can skip section 5.3) 

Additional Reading for EM  [DHS] Section 3.9  This reading is optional. 
Exam2 12/9  Information for final exam  
Active Learning  Support Vector Machine Active Learning with Applications to Text Classification Simon Tong, Daphne Koller; JMLR 2(Nov):4566, 2001.  This reading is optional. 
Association Rules  Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993.  This reading is optional. 
Aggregation Methods  An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139158, 2000.  This reading is optional. 
Collective Classification  Collective Classification in Network Data. Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina EliassiRad. AI Magazine, vol.29, no.3, pp 93106, 2008.  This reading is optional. 