Displaying publications 1 to 10 of 10 publications associated with the Machine Learning Group in 1997:
Authors: R. Khardon
Technical Report, TR-10-97, Harvard University
Authors: Lane, T. and Brodley, C. E.
Proceedings of the National Information Systems Security Conference
pp. 366-380 1997 October
Abstract: The anomaly detection problem has been widely studied in the computer security literature. In this paper we present a machine learning approach to anomaly detection. Our system builds user profiles based on command sequences and compares current input sequences to the profile using a similarity measure. The system must learn to classify current behavior as consistent or anomalous with past behavior using only positive examples of the account's valid user. Our empirical results demonstrate that this is a promising approach to distinguishing the legitamate user from an intruder.
Authors: Brodley, C.E. and Smyth, P.
Statistics and Computing
vol. 7,No. 1
Abstract: In this paper we present a perspective on the overall process of developing classifiers for real-world classification problems. Specifically, we identify, categorize and discuss the various problem-specific factors that influence the development process. Illustrative examples are provided to demonstrate the iterative nature of the process of applying classification algorithms in practice. In addition, we present a case study of a large scale classification application using the process framework described, providing an end-to-end example of the iterative nature of the application process. The paper concludes that the process of developing classification applications for operational use involves many factors not normally considered in the typical discussion of classification models and algorithms.
Authors: Friedl, M. and Brodley, C. E.
Remote Sensing of Environment
vol. 61 num 3
Abstract: Decision tree classification algorithms have significant potential for land cover mapping problems and have not been tested in detail by the remote sensing community relative to more conventional pattern recognition techniques such as maximum likelihood classification. In this paper we present several types of decision tree classification algorithms and evaluate them on three different remote sensing data sets. The decision tree classification algorithms tested include a univariate decision tree, a multivariate
decision tree, and a hybrid decision tree capable of including several different types of classification algorithms within a single decision
tree structure. Classification accuracies produced by each of these decision tree algorithms are compared with both maximum likelihood and linear discriminant function classifiers. Results from this analysis show that the decision tree algorithms consistently outperform the maximum likelihood and linear discriminant function classifiers in terms of classification accuracy.
In particular, the hybrid tree consistently produced the highest classification accuracies for the data sets tested. More generally, the results from this work show that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure. Further,
decision tree algorithms are strictly non-parametric and therefore make no assumptions regarding the distribution of input data, and are also flexible and robust with respect to non-linear and noisy relationships among input features and class labels.
Authors: Stough, T. and Brodley, C. E.
Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
pp. 255-258 1997 August
Abstract: In order to be of use to scientists, large image databases need to be analyzed to create a catalogue of the objects of interest. One approach is to apply a multiple tiered search algorithm that uses reduction techniques of increasing computational complexity to select the desired objects from the database. The
first tier of this type of algorithm is the focus of attention (FOA) algorithm. FOA selects candidate regions from the image data and passes them to the next tier of the algorithm. In this paper we present a new approach to FOA that employs multiple matched filters (MMF), one for each object prototype, to detect the regions of interest. The MMF are formed using k-means clustering on a set of example image patches identified by experts. An inovation of the approach is to radically reduce the dimensionality of the feature space used by the k-means algorithm by spoiling the sample image patches. This approach was motivated by the need to accurately detect small volcanos in the Magellan probe data from Venus. An empirical evaluation of the approach illustrates that MMF perform better than a single matched filter for high true detection rates.
Authors: Moss, J. E.,Utgoff, P., Cavozos, J., Precup, D., Stefanovic, D., Brodley, C. E. and Scheeff, D.
Neural Information Processing System
Abstract: Program execution speed on modern computers is sensitive, by a factor of two or more, to the order in which instructions are presented to the processor. To realize potential execution efciency, an optimizing compiler must employ a heuristic algorithm for instruction scheduling. Such algorithms are painstakingly hand-crafted, which is expensive and time-consuming. We show how to cast the instruction scheduling problem as a learning task, obtaining the heuristic scheduling algorithm automatically. Our focus is the narrower problem of scheduling straight-line code (also called basic blocks of instructions). Our empirical results show that just a few features are adequate for quite good performance at this task for a real modern processor, and that any of several supervised learning methods perform nearly optimally with respect to the features used.
Authors: Lane, T. and Brodley, C. E.
AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management
Abstract: Two problems of importance in computer security are to 1) detect the presence of an intruder masquerad- ing as the valid user and 2) detect the perpetration of abusive actions on the part of an otherwise innocuous user. We have developed an approach to these prob- lems that examines sequences of user actions (UNIX commands) to classify behavior as normal or anoma- lous. In this paper we explore the matching function needed to compare a current behavioral sequence to a historical prole. We discuss the diculties of per- forming matching in human-generated data and show that exact string matching is insucient to this do- main. We demonstrate a number of partial matching functions and examine their behavior on user com- mand data. In particular, we explore two methods for weighting scores by adjacency of matches as well as two growth functions (polynomial and exponential) for scoring similarities. We nd, empirically, that the optimal similarity measure is user dependant but that measures based on the assumption of causal linkage between user commands are superior for this domain.