150ML: Information for second class exam 12/8/05
The exam will have aims and structure similar to the previous
one.
The exam
aims to test whether you have grasped the main concepts,
problems, ideas and algorithms and the intuition behind all these.
As before you will not be asked to develop massive formulas but
should be able to cope with simple cases.
Material for the exam includes everything since the beginning
of the semester but emphasis will be given to material covered
since the first exam.
The material includes everything discussed in class, and the textbook
reading up to and including kernels.
[Boosting and other aggregation methods are not included.]
For topics not in the text I do not expect you to know every detail in
the papers but I expect that you know the portions discussed in class.
Here is a list of topics we covered since the previous exam:
-
Clustering: evaluation measures, why clustering is ill defined, hierarchical,
k-means, "soft" k-means.
-
Probability estimates: joint probabilities, independence, marginal
distribution,
Bayes rule, prior and posterior probabilities, the likelihood
function, Maximum likelihood and MAP estimates. Estimation with hidden
variables, basic ideas of the EM algorithm, what does it maximize? how
does one derive a concrete EM algorithm.
-
Naive Bayes: its probabilistic assumptions, m-estimates, relation to
linear threshold elements.
-
IBL: k-nearest neighbors, weak points and variants.
-
Association rules: what they are, the problem of frequent sets,
lattice structure and algorithms using it. The Apriori
algorithm. [other algorithms not included for exam] Ranking rules:
confidence, lift, and conviction.
-
Learning Rules: sequential covering algorithms, growing a single rule,
gain/info formulas for evaluating conditions. Relational Learning:
problem setup using B,H,E, gain/information criterion, least general
generalizations, the Progol algorithm.
-
Kernel methods: dual perceptron and kernel perceptron, what is a
kernel, kernel examples, kernel k-Nearest neighbors.