
Linfeng.Liu
Email:
Linfeng.Liu@tufts.edu,
TA Office Hours (all in Halligan 121):
Monday: 12pm (Elkotby),
Tuesday: 34pm (Elkotby), 78pm (Liu),
Wednesday 78 (Liu),
Thursday: 45pm (Elkotby), 78 (Liu),
Friday: 121pm (Elkotby), 34pm (Liu).
Note about slides: slides are used as an aid to lecture delivery. While I provide copies of most of the slides they do not cover everything that is disussed in class, and they are not a replacement to the assigned reading.
Lecture  Topic  Reading/Assignments/Notes  Due Date 
9/6  Introduction to Machine Learning 
Read the introductory chapter of [M].
See also lecture slides. Alternate reading: Read the introductory chapter of one of [WF], [F] or [A]. 

Supervised Learning Basics:  
9/11  Instance based learning 
[M] Chapter 8.
See also lecture slides. See also Andrew Moore's tutorial on kdtrees See also original paper describing the Relief Method Alternate reading: [RN] 18.8 or [DHS] 4.44.6. 

9/13, 9/18  Decision Trees 
[M] Chapter 3.
See also lecture slides. Alternate reading: [RN] 18.14 or [F] Chapter 5. 

Optional Reading  T. Dietterich, M. Kearns, and Y. Mansour Decision Tree Learning and Boosting Applying the Weak Learning Framework to Understand and Improve C4.5. International Conference on Machine Learning, 1996.  
Written Assignment 1  Assignment 1  9/25  
Empirical/Programming Assignment 1  Project 1 and corresponding Data  9/27  
9/20  Probability Basics 
Lecture provides a basic and brief introduction to probability theory
and working with random variables.
Please review relevant material from your discrete math, algorithms, or probability and statistics course. 

9/25, 9/27  Maximum Likelihood Estimation and the Naive Bayes Algorithm 
[M] 6.16.2, and 6.96.10.
See also new book chapter from [M] See also lecture slides. Alternative reading: [DHS] Section 2.9; [F] 9.2; [WF] 4.2. 

9/27, 10/2  Evaluating Machine Learning Outcomes 
[M] Ch 5.
See also lecture slides. Alternative reading: [F] Ch 12 

Additional Optional Reading 
Foster Provost, Tom Fawcett, Ron Kohavi
The Case Against Accuracy Estimation for Comparing Induction
Algorithms
Proc. 15th International Conf. on
Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998. Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997. 

10/4  Features (selection, transformation, discretization) 
None of our sources is a perfect match for this lecture.
Relevant reading includes some portions of [F] Chapter 10, and [A] Chapter 6 See also lecture slides. 

Additional Optional Reading 
Wrappers for Feature Subset Selection
Ron Kohavi, George H. John
Artificial Intelligence, 1996.
(Read till section 3.2 inclusive.)
Supervised and unsupervised discretization of continuous features. James Dougherty, Ron Kohavi, and Mehran Sahami. International Conference on Machine Learning, 1995. 

Written Assignment 2  Assignment 2  10/16  
Empirical/Programming Assignment 2  Project 2 and corresponding Data  10/16  
10/11; 10/16  Linear Threshold Units 
[M] 4.14.4
See also new book chapter from [M] See also lecture slides. Alternative reading: [DHS] 5.5; 

Written Assignment 3  Assignment 3  10/23  
Wednesday 10/18  Midterm Exam 
Material for the exam includes everything covered up to 10/11 (that
is, excluding clustering and, in the Linear Threshold Units section, material up to the Perceptron algorithm).
Everything discussed in class and homework assignments is included for the exam.
Lecture slides are not comprehensive and I expect you to read the assigned materials which should be useful in review and study. But I will not hold you responsible for technical details in the reading that were not discussed in class or assignments. The Exam is closed book; no notes or books are allowed; no calculators or other machines of any sort are allowed. The exam will aim to test whether you have grasped the main concepts, problems, ideas, and algorithms that we have covered, including the intuition behind these. Generally speaking, the exam will not test your technical wizardry with overly long equations or calculations, but, on the other hand, it is sure to include some shorter ones. 

10/23  Clustering 
[F] 8.45
See also lecture slides. Alternative reading: [DHS] 10.67,10.9. 

Empirical/Programming Assignment 3  Project 3 and corresponding Data  11/6  
10/25  Unsupervised and SemiSupervised Learning with EM 
[M] Section 6.12
Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103134, 2000. (The entire paper is relevant; you can skip section 5.3) See also lecture slides . Alternative reading: [A] 7.4; [F] 9.4; [DHS] 3.9 

10/30  Association Rules 
[F] 6.3
Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993. See also lecture slides. Alternative reading: [WF] 4.5 

Optional Reading 
Real World Performance of Association Rule Algorithms
Zheng et al, KDD 2001.
Mining the Most Interesting Rules Bayardo et all, KDD 1999. Dynamic Itemset Counting and Implication Rules for Market Basket Data Brin et al, SIGMOD 1997. Discovering All Most Specific Sentences Gunopulos et al, TODS, 2003. 

Written Assignment 4  Assignment 4  11/6  
11/1, 11/6  Neural Networks 
[M] Chapter 4.
See also lecture slides. Alternative reading: [RN] 18.7, [DHS] 6.15. 

Optional Reading 
Going Deeper with Convolutions
Deep Residual Learning for Image Recognition Learning Phrase Representations using RNN Encoderâ€“Decoder for Statistical Machine Translation 

Empirical/Programming Assignment 4  Project 4 and corresponding Data  12/4  
11/8; 11/13  Kernels, Dual Perceptron, Support Vector Machines 
[F] 7.3.
A practical guide to support vector classification C.W. Hsu, C.C. Chang, C.J. Lin. Technical report, Department of Computer Science, National Taiwan University. July, 2003. See also lecture slides. Alternative reading: [CST] pages: 919 and 2632, [RN] 18.9, 

Written Assignment 5  Assignment 5  12/4  
11/15  Active Learning 
Active Learning Literature Survey
See also lecture slides. 

Optional Reading 
The Robot Scientist Adam IEEE Computer (Volume:42, Issue: 8) 2009.
Support Vector Machine Active Learning with Applications to Text Classification Simon Tong, Daphne Koller; JMLR 2(Nov):4566, 2001. 

11/21  Aggregation Methods 
[F] Chapter 11.
See also lecture slides. Alternative reading: [A] Chapter 17 

Optional Reading 
Explaining AdaBoost
In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, Springer, 2013.
An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139158, 2000. Useful information about Random Forests Boosting the margin: A new explanation for the effectiveness of voting methods. Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. The Annals of Statistics, 26(5):16511686, 1998. (source of margin graphs in slides; the introduction is informative and accessible) Improved boosting algorithms using confidencerated predictions Robert E. Schapire and Yoram Singer Machine Learning, 37, 1999. (source of confidence rated Aadaboost version in slides) 

11/27; 11/29  Computational learning theory 
[M] 7.1, 7.2, 7.3, 7.5,
and (for perceptron) [DHS] 5.5.2
Topics covered: online learning, the Perceptron convergence theorem, weighted majority, PAC learning, Agnostic PAC learning. Alternative reading: [RN] 18.5 and (for perceptron) [CST] 2.1.1 