(12/19) Exam 2 and the project have been graded and can be
picked up in the CS main office.
Syllabus:
Description:
The course covers the main paradigms in machine learning including
supervised learning, unsupervised learning and reinforcement learning.
The focus is on practical aspects: ideas underlying various methods,
design of algorithms using these ideas, and their empirical
evaluation. We will discuss well established techniques as well as
new developments from recent research.
The course mark will be determined by a combination of
Homework assignments (30%)
These will include both "pencil and paper" exercises reviewing the
material and experimental machine learning work. The latter will
include both programming assignments and use of existing machine
learning software.
Rules for late submissions: All work must be turned in on the date
specified. Please notify me of special circumstances
at least two days in advance.
Otherwise, If you haven't
finished an assignment, turn in what you have on the due date, and it will
be evaluated for partial credit.
Final project (30%)
A large individual or group-run experimental project.
The project will be graded based on the quality of
work/experiments/programming as well as a final project report.
Details to be announced.
In-class exam (Tuesday, Oct 25, 20%)
In-class exam (Thursday, Dec 8, 20%)
Collaboration:
Unless you are doing a group project all work must be done
individually and written up individually. However,
I encourage discussion among students on the topics of exercises as
this often improves the learning experience. If you discuss the work
with other students, please state briefly but clearly,
at the top of your writeup, whom you discussed the
work with and to what extent.
Tentative List of Topics (8/05)
[We are likely to skip one or two sub-topics]
Supervised Learning Basics: Introduction, decision trees, linear
threshold elements and neural networks. Experimental evaluation.
Unsupervised learning and clustering: simple clustering
algorithms, statistical (maximum likelihood and Bayesian) models of
learning, k-means as clustering. The EM algorithm.
Related supervised methods: Naive-Bayes classifier. Instance
based learning.
Data Mining: association rules
Supervised Learning Revisited:
Utilizing relations among examples.
Learning rules and Inductive Logic Programming.
Kernel methods and support vector machines.
Attribute selection.
Multi-class problems.
Boosting and related methods.
Reinforcement learning: Markov Decision Processes. Temporal difference
and Q learning.
Textbooks and Material Covered
No text covers all the material.
The Mitchell text covers a large portion
and I will be assigning readings from it.
Some of the material is from recent research articles
(see reading list below).
Machine Learning. Tom M. Mitchell, McGraw-Hill, 1997
An introduction to support vector machines : and other kernel-based
learning methods.
N. Cristianini and J. Shawe-Taylor.
Cambridge University Press, 2000.
Ian H. Witten, Eibe Frank.
Data Mining: Practical Machine Learning Tools and Techniques with
Java Implementations.
[Describes algorithms and background on the weka system]
Pattern Classification (2nd edition), by R. Duda, P. Hart, and
D. Stork, John Wiley & Sons, 2001.
Software
Weka: we will be using the weka java package for various things.
Weka is accessible on our linux and sun machines
You can also download
and install it on your computer from
the weka home page.
Using Weka: To set up the java and weka environments
run "source /comp/150ML/files/setup/setup.weka.sun"
or "source /comp/150ML/files/setup/setup.weka.linux".
See the bottom of set up files for examples of running the system.
Here is a local version (with added local instructions) of the
README
file of the system.
(topic 6) Unsupervised learning and clustering + related supervised methods
We are not following any text very closely.
Here are the topics covered, in order, and readings for them:
Clustering: sections 10.6, 10.7, 10.9 of [Duda et al] text.
[Optional additional reading: weka book section 6.6]
Maximum likelihood and Bayesian estimation:
Section 6.2 of Mitchell's text.
[Optional additional reading [Duda et al] sections 3.1-3.4.]
The EM algorithm: Slide copies from class; Section 6.12 of Mitchell's text.
Bayes/ML methods for classification problems: Sections 6.2, 6.9, 6.10 of Mitchell's text.
[Optional (not covered in class): MAP decisions over labels (in contrast with
hypotheses): Sections 6.3, 6.7 of Mitchell's text.]
Instance based learning: Sections 8.1-8.4 of Mitchell's text.
Association Rules
Read (at least) pages 1-16 of the following survey:
Survey on Frequent Pattern Mining
B. Goethals.
Manuscript, 2003
You should at least skim through the following article that introduced
the association rules problem:
Mining Association Rules between Sets of Items in Large Databases
Rakesh Agrawal, Tomasz Imielinski, Arun Swami
Proceedings of the 1993 ACM SIGMOD International Conference on
Management of Data, 1993.
Several other sources mentioned in class are:
Real World Performance of Association Rule Algorithms
Zijian Zheng, Ron Kohavi, Llew Mason
Proceedings of the Seventh ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2001.
Dynamic Itemset Counting and Implication Rules for Market Basket Data
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur.
SIGMOD 1997, Proceedings ACM SIGMOD International Conference on
Management of Data, 1997.
Mining the Most Interesting Rules
J. Bayardo Jr. and R. Agrawal.
In Proc. of the 5th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pages 145-154, August 1999
The algorithm using transversals is
from (see pages 17-22):
Discovering All Most Specific Sentences
D. Gunopulos, R. Khardon , H. Mannila, S. Saluja,
H. Toivonen, and R. S. Sharma,
ACM Transactions on Database Systems , Vol 28, pages
140-174, June 2003.
Learning Rules and Inductive Logic Programming
Reading Chapter 10 of Mitchell's text.
Some of the material discussed in class is from
I. Bratko and S.H. Muggleton,
Applications of Inductive Logic
Programming.
Communications of the ACM, 38(11):65-70, 1995.
Additional Information about RIPPER (special pruning) can be found in
William W. Cohen,
Fast Effective Rule Induction
Proc. of the 12th International Conference on Machine Learning, 1995.
Additional Information about FOIL (greedy method) can be found in
Ross Quinlan,
Learning First-Order Definitions of Functions,
JAIR vol 5, 139-161, October 1996.
Additional Information about LogAnH (LGG based method) can be found in
Marta Arias, Roni Khardon and Jerome Maloberti,
Learning Horn Expressions with LogAn-H
Technical Report (Long version of papers from ICML 2000 and ILP 2004)
Additional Information about Progol (Inverse Entailment method)
can be found in
S.H. Muggleton,
Inverse entailment and Progol.
New Generation Computing, 13:245-286, 1995.
Kernel Methods and Support Vector Machines
Reading: N. Cristianini and J. Shawe-Taylor text listed above (on reserve in
library) pages: 9-19 and 26-32 (you may skip the proofs on pages
14,17).
You may also want to skim Chapter 8 describing various
applications.
Additional information, tutorials etc are available through the
kernel-machines site
Topics for Projects:
The projects are intended to provide experience with the complete
life-cycle of experimental machine learning work.
So you should
design experiments, carry them out and analyze the results.
Different types of projects are possible.
You could introduce a new problem or data set (ideally ones that
require thinking and experimentation on the best representation and/or
task for them) and apply some machine learning methods to it.
You could investigate some less well understood properties or variants
of learning algorithms.
You could try to reproduce results from the literature.
Scanning through recent ICML papers (proceedings linked above)
can be a good source for inspiration.
You can also choose one of the projects listed below.
I can help you explore ideas to find a project topic and delimit
a reasonable scope for it. Please come to talk to me to discuss your
project topic.
Proposal:
Once we agree on the topic and scope you should write a short "project
proposal" (at most 1/2 page of text) describing the work you intend to
do and hand in a copy
by Thursday Nov 10 (preferably before but should be done by this date) .
Project Presentations:
The last one or two classes will be devoted to project presentations
where you will give a short talk to describe your project and results.
Project Reports:
Your project writeup (5-10 pages) is due on Monday Dec 12 noon.
The report should explain what problem you were
studying, what you were trying to test discover or verify, the
software written for this purpose, and the results. You should provide
numerical results (e.g. accuracies) as well as qualitative
evaluations/summary of these. Wherever possible you should qualify
comparisons (e.g. "algorithm A is better than B for this task") using
statistical tests/intervals.
In addition you should submit copies of programs, and samples of data
where appropriate.
Project Option I: feature selection
In this project you will experiment with two methods for feature
subset selection. The first method applies the following
simple idea or a variant of it: we can compute the information gain
for each feature on the data (NB you will have to decide how to handle
numeric data - one option is to translate each such feature to a set
of binary features, based on plausible split points
and run the method using binary features).
Then we choose the top n features and
run the algorithm only with these. The number of features to use n can
be chosen using some iterative method and cross validation
with a validation set (not the test set).
The second uses the wrapper method, first proposed in:
Irrelevant Features and the Subset Selection Problem
George H. John, Ron Kohavi, Karl Pfleger
International Conference on Machine Learning, 1994.
and expanded on in
Wrappers for Feature Subset Selection
Ron Kohavi, George H. John
Artificial Intelligence, 1996.
you should pick one variant (forward/backward search, the type of
search) and run experiments with it.
You should pick several datasets (from WEKADATA and ones used in
homeworks or others of your choice).
In order to test sensitivity of the methods and/or learning algorithms
you can artificially add irrelevant features to these datasets.
You should then test the
effect of feature selection
on Perceptron and EG (your code from homework 2) and two
additional algorithms (using weka).
Project Option II: study discretization methods
Repeat and/or modify some of the experiments reported in
Supervised and unsupervised discretization of continuous features.
James Dougherty, Ron Kohavi, and Mehran Sahami.
International Conference on Machine Learning, 1995.
You may also want to look at the following for reference
Error-Based and Entropy-Based Discretization of Continuous Features.
Ron Kohavi and Mehran Sahami,
Knowledge Discovery in Databases 1996.
In particular, you should compare Fayyad and Irani's entropy based
method (section 3.3) with equal frequency (see first paragraph of
section 2 and then cf. section 3.1), or with a method of your choice.
For methods that require the number of partitions as a parameter
(as in equal frequency) you should use an iterative method
to choose the best value using a
validation set (not the test set).
You should then test the
effect of feature discretization on Naive Bayes (using weka),
the Perceptron and EG (your code from homework 2) and one
additional algorithm (using weka).
Project Option IV: on line algorithms for spam filtering
Devise a setup, feature construction and training algorithms for spam
filtering with small storage and training costs.
Assignments
(Weeks 1/2) Please familiarize yourself with using the weka system.
Read the README file above and try to use the system on the given
examples to see that you can use it.
Homework Assignment 1 (due 9/27):
Text Classification using Decision Trees:
postscript,
pdf
Homework Assignment 2 (due 10/18):
GD and EG algorithms for single neural unit
postscript,
pdf
Homework Assignment 3 (due 11/3):
Evaluating hypotheses and algorithms
postscript,
pdf