|
Relation to Other Courses:
The course is gives an introduction to machine learning and is aimed
at upper level undergraduates and beginning graduate students. Some
mathematical aptitude is required, but the course emphasizes
practical aspects and baseline algorithms over mathematical
sophistication and analysis, or open research issues.
These are
explored in our other
advanced courses:
Information Theory for Machine Learning
,
Statistical Pattern Recognition
,
Computational Learning Theory
,
Learning, Planning and Acting in Complex Environments
,
Problems in Chemistry, and Bioengineering
.
New Course Spring 2009:
Comp 150SRL - Statistical Relational Learning
.
Prerequisites: Formal prerequisites are Comp 15 and Math 22 or consent of instructor. Comp 160, Algorithms, is highly recommended.
| Topic | Reading/Assignments | Due Date |
| Introduction to Machine Learning | Read Chapter 1 of [M] |   |
| Supervised Learning Basics: |   |   |
| Decision Trees | Read Chapter 3 of [M]. |   |
| Version Spaces |
Read Chapter 2 of [M].
Slide copy from class |
  |
| Supplemental Reading |
T. Dietterich, M. Kearns, and Y. Mansour
Decision Tree Learning and Boosting
Applying the Weak Learning Framework to Understand and Improve C4.5.
International Conference on Machine Learning, 1996.
(read at least section 3)
D. Page & S. Ray (2003). Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction International Joint Conference on Artificial Intelligence, 2003. |
  |
| Assignment 1 | Assignment 1 | 9/16 |
| Assignment 2 | Assignment 2 | 9/25 |
| Neural Networks | Read Chapter 4 of [M]. |   |
| Supplemental Reading |
D.P. Helmbold and J. Kivinen, and M. Warmuth
Relative Loss Bounds for Single Neurons
IEEE Transactions on Neural
Networks, Vol. 10(6), pp. 1291--1304, November 1999
(read introduction and experimental section)
Yoav Freund, Robert E. Schapire Large Margin Classification Using the Perceptron Algorithm , Machine Learning Journal 1999, (read sections 1, 2, and experiments) Roni Khardon and Gabriel Wachman Noise Tolerant Variants of the Perceptron Algorithm Journal of Machine Learning Research (JMLR) 8(Feb):227--248, 2007. (optional reading: skim per variant algorithms and results) Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer Online Passive-Aggressive Algorithms Journal of Machine Learning Research (JMLR) 7(Mar):551--585, 2006. (optional reading: read sections 2,3,10) |
  |
| Background on Probability | Read Appendix C.1-4 in Cormen et. al. Algorithms text, or introductory parts of any probability text. |   |
| Evaluating Hypotheses and Algorithms | Read Chapter 5 of [M]. |   |
| Supplemental Reading (only partly discussed in class) |
Foster Provost, Tom Fawcett, Ron Kohavi
The Case Against Accuracy Estimation for Comparing Induction
Algorithms
Proc. 15th International Conf. on
Machine Learning, 1998.
T. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms Neural Computation 10(7), 1998. Stephen Salzberg On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining and Knowledge Discovery, 1997. |
  |
| Assignment 3 | Assignment 3 | 10/14 |
| Exam1 10/16 | Information for first class exam |   |
| Unsupervised Learning: |   |   |
| Clustering |
Read Sections 10.6, 10.7, 10.9 of [DHS].
Douglas Fisher Knowledge acquisition via incremental conceptual clustering Machine Learning 1987, Vol 2: 139-172. (Read at least sections 3.1 and 4.2) Optional reading: [WF] Sections 4.8 and 6.6 |
  |
| Final Projects | Information for projects | 12/8 |
| Statistical Models |
Maximum likelihood and Bayesian estimation:
Section 6.2 of [M].
Optional reading: [DHS] 3.1-3.4.
EM Algorithm: Slide copies from class; Section 6.12 of [M]. |
  |
| Statistical Models for Classification | [M] Sections 6.2, 6.9, 6.10. |   |
| Supplemental Reading | Text Classification Using Labeled and Unlabeled Documents using EM Nigam et. al, Machine Learning Volume 39, pages 103-134, 2000. (Read all; can skip section 5.3) |   |
| Assignment 4 | Assignment 4 | 11/18 |
| Instance based learning | [M] Sections 8.1-8.4. |   |
| Feature Selection | Wrappers for Feature Subset Selection Ron Kohavi, George H. John Artificial Intelligence, 1996. (Read at least portion till section 3.2 inclusive.) |   |
| Association Rules | Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993. |   |
| Supplemental Reading |
Real World Performance of Association Rule Algorithms
Zijian Zheng, Ron Kohavi, Llew Mason
Proceedings of the Seventh ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2001.
(optional reading; check out dataset charcteristics and performance analyssis of
different algorithms)
Dynamic Itemset Counting and Implication Rules for Market Basket Data Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur. SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, 1997. (optional reading; introduces conviction as measure of interestingness) Discovering All Most Specific Sentences D. Gunopulos, R. Khardon , H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharma, ACM Transactions on Database Systems , Vol 28, pages 140-174, June 2003. (optional reading; introduces transversals based algorithm) |
  |
| Learning Relational Rules | [M] Sections 10.1-10.5. |   |
| Supplemental Reading |
Fast Effective Rule Induction
, William W. Cohen,
Proc. of the 12th International Conference on Machine Learning, 1995.
(optional reading; detailed study of growing and prunning)
Learning Horn Expressions with LogAn-H , Marta Arias, Roni Khardon and Jerome Maloberti, Journal of Machine Learning Research (JMLR) 8(Mar):549--587, 2007. (optional reading; bottom up method; not covered in class) |
  |
| Kernel Methods and Support Vector Machines |
Please refer to slides distributed in the lecture
[CST] pages: 9-19 and 26-32 (you may skip the proofs on pages 14,17) |
  |
| Supplemental Reading |
A practical guide to support vector classification
C.-W. Hsu, C.-C. Chang, C.-J. Lin.
Technical report, Department of Computer Science,
National Taiwan University. July, 2003.
(optional reading)
[CST] Cahpter 6. (optional reading) A tutorial on support vector machines for pattern recognition Christopher J. C. Burges, Data Mining and Knowledge Discovery, 1998. (optional reading) |
  |
| Reinforcement learning | [M] chapter 13 |   |
| Aggregation Methods (Bagging, Boosting) |
Please refer to slides distributed in the lecture
An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Dietterich, T. Machine Learning, 40 (2) 139-158, 2000. |
  |
| Supplemental Reading |
A short introduction to boosting.
Yoav Freund and Robert E. Schapire.
Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, 1999.
(optional reading; a good accessible overview)
Boosting the margin: A new explanation for the effectiveness of voting methods. Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. The Annals of Statistics, 26(5):1651-1686, 1998. (optional reading; source of margin graphs in slides; the introduction is informative and accessible) Improved boosting algorithms using confidence-rated predictions Robert E. Schapire and Yoram Singer Machine Learning, 37, 1999. (optional reading; source of confidence rated version in slides) |
  |
| Computational learning theory | [M] sections 7.1-7.3.1 |   |
| Exam2 12/4 | Information for second class exam |   |