
Lecture  Topic  Reading/Assignments/Notes  Due Date 
L1  Introduction to Machine Learning 
Read the introductory chapter of [M], [WF], [F] or [A]
See also lecture slides. 
week 1 
Supervised Learning Basics:  
L2  Instance based learning 
[M] Chapter 8 is cloest to class material;
or [RN] 18.8; or [DHS] 4.44.6.
Andrew Moore's tutorial on kdtrees See also lecture slides. 

L23  Decision Trees 
[M] Chapter 3;
or [RN] 18.14; or [F] Chapter 5.
See also lecture slides. 

Optional Reading  T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996.  
Empirical/Programming Assignment 1  Project 1 and corresponding Data  9/17  
Written Assignment 1  Assignment 1  9/22  
L4  Naive Bayes Algorithm 
[M] 6.16.2, and 6.96.10;
[DHS] Section 2.9;
[F] 9.2; [WF] 4.2;
See also
new book chapter from [M]
See also lecture slides. Lecture also provided a basic introduction to probability and working with random variables. 

L5  Linear Threshold Units 
[M] 4.14.4; DHS 5.5;
See also
new book chapter from [M]
See also lecture slides. 

L6  Features (selection, transformation, discretization) 
Wrappers for Feature Subset Selection
Ron Kohavi, George H. John
Artificial Intelligence, 1996.
(Read till section 3.2 inclusive.)
Supervised and unsupervised discretization of continuous features. James Dougherty, Ron Kohavi, and Mehran Sahami. International Conference on Machine Learning, 1995. See also lecture slides. 

L67  Evaluating Machine Learning Outcomes 
[M] Ch 5; [F] Ch 12
See also lecture slides. 

Optional Reading 
Foster Provost, Tom Fawcett, Ron Kohavi
The Case Against Accuracy Estimation for Comparing Induction
Algorithms
Proc. 15th International Conf. on
Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998. Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997. 

Written Assignment 2  Assignment 2  10/1  
L8  Clustering 
[DHS] 10.67,10.9; [F] 8.45
See also lecture slides. 

Empirical/Programming Assignment 2  Project 2 and corresponding Data  10/15  
L910  Unsupervised and SemiSupervised Learning with EM 
[M] Section 6.12; [A] 7.4; [F] 9.4; [DHS] 3.9
Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103134, 2000. (The entire paper is relevant; you can skip section 5.3) See also lecture slides . 

Written Assignment 3  Assignment 3  10/20  
L11  Review of math topics from recent lectures.  
L12  Association Rules 
[F] 6.3; [WF] 4.5
Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993. See also lecture slides. 

Optional Reading 
Real World Performance of Association Rule Algorithms
Zheng et al, KDD 2001.
Mining the Most Interesting Rules Bayardo et all, KDD 1999. Dynamic Itemset Counting and Implication Rules for Market Basket Data Brin et al, SIGMOD 1997. Discovering All Most Specific Sentences Gunopulos et al, TODS, 2003. 

(L14)  Midterm Exam 10/22 
Material for the exam includes everything covered up to October 15
(all the material above this point in the table).
Everything discussed in class is included for the exam.
The reading assignments are supporting materials that should be useful in review and study but I will not hold you responsible for details in the reading that were not discussed in class.
The Exam is closed book; no notes or books are allowed; no calculators or other machines of any sort are allowed. The exam will aim to test whether you have grasped the main concepts, problems, ideas, and algorithms that we have covered, including the intuition behind these. Generally speaking, the exam will not test your technical wizardry with overly long equations or calculations, but, on the other hand, it is sure to include some shorter ones. 

Empirical/Programming Assignment 3  Project 3 and corresponding Data  10/29  
L13,L15,L16  Computational learning theory  [M] 7.1, 7.2, 7.3, 7.5, [RN] 18.5 and (for perceptron) [DHS] 5.5.2 or [CST] 2.1.1  
Written Assignment 4  Assignment 4  11/12  
L17  Neural Networks 
[M] Chapter 4, [RN] 18.7, [DHS] 6.15.
See also lecture slides. 

L1820  Kernels, Dual Perceptron, Support Vector Machines 
[CST] pages: 919 and 2632, [RN] 18.9, [F] 7.3
A practical guide to support vector classification C.W. Hsu, C.C. Chang, C.J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003. 

L21  Active Learning  Active Learning Literature Survey  
Optional Reading  The Robot Scientist Adam IEEE Computer (Volume:42, Issue: 8) 2009.  
Empirical/Programming Assignment 4  Project 4 and corresponding Data  12/3  
L2223  Overview of MDPs and Reinforcement Learning  [RN] Sections 17.13 and Chapter 21; [M] 13.13  
Written Assignment 5  Assignment 5  12/8  
L24  Aggregation Methods 
[F] Chapter 11, [A] Chapter 17
Explaining AdaBoost In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, Springer, 2013. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139158, 2000. See also lecture slides. 

Optional Reading 
Boosting the margin: A new explanation for the effectiveness of voting methods.
Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
The Annals of Statistics, 26(5):16511686, 1998.
(source of margin graphs in slides;
the introduction is informative and accessible)
Improved boosting algorithms using confidencerated predictions Robert E. Schapire and Yoram Singer Machine Learning, 37, 1999. (source of confidence rated Aadaboost version in slides) 

L25  Learning Relational Rules 
[M] Sections 10.110.5.
See also lecture slides. 

Optional Reading 
Fast Effective Rule Induction
, William W. Cohen,
Proc. of the 12th International Conference on Machine Learning, 1995.
(a detailed study of growing and prunning)
Applications of Inductive Logic Programming. I. Bratko and S.H. Muggleton, Communications of the ACM, 38(11):6570, 1995. 

L26  Hidden Markov Models 
[A] Chapter 15, [DHS] 3.10
See also lecture slides. 

Final Exam 
Material for the exam includes everything covered during the semester (i.e., it is cumulative).
Everything discussed in class and homework assignments is included for the exam.
The reading assignments are supporting materials that should be useful in review and study but I will not hold you responsible for details in the reading that were not discussed in class or assignments.
The Exam is closed book; no notes or books are allowed; no calculators or other machines of any sort are allowed. The exam will aim to test whether you have grasped the main concepts, problems, ideas, and algorithms that we have covered, including the intuition behind these. Generally speaking, the exam will not test your technical wizardry with overly long equations or calculations, but, on the other hand, it is sure to include some shorter ones. 