Course Web Page (this page):
http://www.cs.tufts.edu/comp/135/
| Announcement(s): |
- (8/19) Initial Course Information Posted
|
|
Syllabus:
Description:
The course covers the main paradigms in machine learning including
supervised learning, unsupervised learning and reinforcement learning.
The focus is on practical aspects: ideas underlying various methods,
design of algorithms using these ideas, and their empirical
evaluation. We will discuss well established techniques as well as
new developments from recent research.
Relation to Other Courses:
The course is gives an introduction to machine learning and is aimed
at upper level undergraduates and beginning graduate students. Some
mathematical aptitude is required, but the course emphasizes
practical aspects and baseline algorithms over mathematical
sophistication and analysis, or open research issues.
These are
explored in our other
advanced courses:
Information Theory for Machine Learning
,
Statistical Pattern Recognition
,
Computational Learning Theory
,
Learning, Planning and Acting in Complex Environments
,
Problems in Chemistry, and Bioengineering
.
Prerequisites:
Formal prerequisites are Comp 15 and Math 22 or consent of instructor.
Comp 160, Algorithms, is highly recommended.
Class Times:
Tuesday and Thursday, 12:00-1:15, Halligan Hall 106
Instructor:
Roni Khardon
Office: Halligan 230
Phone: 1-617-627-5290
Fax: 1-617-627-3220
Dept.: 1-617-627-3217
Email: roni@cs.tufts.edu
Course Work and Marking
The course mark will be determined by a combination of
-
Homework assignments (30%)
- These will include both exercises reviewing the
material and experimental machine learning work. The latter will
include both programming assignments and use of existing machine
learning software.
Rules for late submissions: All work must be turned in on the date
specified. Please notify me of special circumstances
at least two days in advance.
Otherwise, If you haven't
finished an assignment, turn in what you have on the due date, and it will
be evaluated for partial credit.
-
Final project (30%)
-
A large individual or group-run experimental project.
The project can apply some machine learning methods to
real world data, or empirically investigate some core machine learning issue.
The project will be graded based on the quality of
work/experiments/programming as well as a final project report.
Details to be announced.
-
In-class exam (Date TBA, 20%)
-
-
In-class exam (Date TBA, 20%)
Collaboration:
Unless you are doing a group project all work must be done
individually and written up individually. However,
I encourage discussion among students on the topics of exercises as
this often improves the learning experience. If you discuss the work
with other students, please state briefly but clearly,
at the top of your writeup, whom you discussed the
work with and to what extent.
Please see the booklet "Academic
Integrity" available from the Dean of Students' Office.
Tentative List of Topics
[We are likely to skip a few sub-topics]
- Supervised Learning Basics: Introduction, decision trees, linear
threshold elements and neural networks. Experimental evaluation.
- Unsupervised learning and clustering: simple clustering
algorithms, statistical (maximum likelihood and Bayesian) models of
learning, k-means as clustering. The EM algorithm. Spectral
clustering.
-
Unsupervised Data Mining: Association rules.
-
Supervised Learning Algorithms:
Naive-Bayes classifier,
Logistic Regression,
Instance based learning,
Learning rules and Inductive Logic Programming,
Kernel methods and support vector machines,
Aggregation methods and boosting.
-
Methodology:
Attribute selection, normalization and discretization.
Multi-class problems.
- Variants and Extensions:
Active Learning,
Semi-supervised learning,
Utilizing relations among examples,
Computational Learning Theory.
-
Reinforcement learning: Markov Decision Processes. Temporal difference
and Q learning.
Textbooks and Material Covered
All books listed below should be on reserve for the course in the
Tisch Library.
No single text covers all the material for this course.
The Mitchell text covers a reasonable portion
and is an excellent reference to have; we will be using this text as
our default textbook.
Some of the material is from research articles.
Detailed reading assignments and links to material will be posted.
-
[M]: Machine Learning. Tom M. Mitchell, McGraw-Hill, 1997
-
[CST]: An introduction to support vector machines : and other kernel-based
learning methods.
N. Cristianini and J. Shawe-Taylor.
Cambridge University Press, 2000.
-
[WF]: Ian H. Witten, Eibe Frank.
Data Mining: Practical Machine Learning Tools and Techniques.
2nd Edition, 2005.
[Describes algorithms and background on the weka system]
- [DHS]: Pattern Classification (2nd edition), by R. Duda, P. Hart, and
D. Stork, John Wiley & Sons, 2001.
Software
-
Weka: we will be using the weka java package for various things.
Weka is accessible on our linux and sun machines.
You can also download
and install it on your computer from
the weka home page.
-
Using Weka on CS Department Computers:
To set up the java and weka environments
run "source /comp/150ML/files/setup/setup.weka"
See the bottom of set up file for examples of running the system.
Here is a local version (with added local instructions) of the
README
file of the system.
Resources
Reading, References, and Assignments