Displaying publications 1 to 7 of 7 publications associated with the Machine Learning Group in 1998:
Authors: Lane, T. and Brodley, C. E.
The Fourth International Conference on Knowledge Discovery and Data Mining
pp. 259-263 1998 January
Abstract: The task in the computer security domain of anomaly detection is to characterize the behaviors of a computer user (the `valid' , or `normal' user) so that unusual occurrences can be detected by comparison of the current input stream to the valid user's profile. This task requires an online learning system that can respond to concept drift and handle discrete non-metric time sequence data. We present an architecture for online learning in the anomaly detection domain and address the issues of incremental updating of system parameters and instance selection. We demonstrate a method for measuring direction and magnitude of concept drift in the classification space and present and evaluate approaches to the above stated issues which make use of the drift measurement.
Authors: Shyu, C., Brodley, C. E., Kak, A., Kosaka, A., Aisen, A. and Broderick, L
Proceedings of the Workshop on Content-Based Access of Image/Video Library held in conjunction with CVPR98
Authors: Aizenstein, H, Blum, A., Khardon, R., Kushilevitz, A., Pitt, L., Roth, D.
SIAM Journal of Computing
vol. 27, 6, pp. 1505-1530
Authors: Bradford, J., Kunz, C., Kohavi, R., Brunk, C. and Brodley, C. E.
Tenth European Conference on Machine Learning
pp. 131-136 1998 April
Abstract: We describe an experimental study of pruning methods for decision tree classiers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CARTís cost-complexity pruning and C4.5ís error- based pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. We perform an empirical comparison of these methods and evaluate them with re- spect to the following three criteria: loss, mean-squared-error (MSE), and log-loss. We provide a bias-variance decomposition of the MSE to show how pruning aects the bias and variance. We found that applying the Laplace correction to estimate the probability distributions at the leaves was benecial to all pruning methods, both for loss minimization and for estimating probabilities. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of 10. While no method dominated others on all datasets, even for the same domain dierent pruning mechanisms are better for dif- ferent loss matrices. We show this last result using Receiver Operating Characteristics (ROC) curves.
Authors: Kapadia, N. H., Brodley, C. E., Fortes, J. A. B., and Lundstrom, M. S.
Proceedings of the 1998 Workshop on Advances in Parallel and Distributed Systems (APADS)
Abstract: This paper reports on an application of artificial intelligence to achieve demand-based scheduling within the context of a network-computing infrastructure. The described AI system uses tool-specific, run-time input to predict the resource-usage characteristics of runs. Instance-based learning with locally weighted polynomial regression is employed because of the need to simultaneously learn multiple polynomial concepts and the fact that knowledge is acquired incrementally in this domain. An innovative use of a two-level knowledge base allows the system to account for short-term variations in compute-server and network performance and exploit temporal and spatial locality of runs. Instance editing allows the approach to be tolerant to noise and computationally feasible for extended use. The learning system was tested on three tools during normal use of the Purdue University Network Computing Hubs. Results indicate that the described instance-based learning technique using locally weighted regression with a locally linear model works well for this domain.
Authors: Lane, T. and Brodley, C. E.
Fifth ACM Conference on Computer and Communications Security
pp. 150-158 1998 November
Abstract: The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of an individual, system, or network in terms of temporal sequences of discrete data. We present an approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, unordered observations into a metric space via a similarity measure that encodes intra-attribute dependencies. Classification boundaries are selected from an a posteriori characterization of valid user behaviors, coupled with a domain heuristic. An empirical evaluation of the approach on user command data demonstrates that we can accurately differentiate the profiled user from alternative users when the available features encode sufficient information. Furthermore, we demonstrate that the system detects anomalous conditions quickly ó an important quality for reducing potential damage by a malicious user. We present several techniques for reducing data storage requirements of the user profile, including instance-selection methods and clustering. As empirical evaluation shows that a new greedy clustering algorithm reduces the size of the user model by 70%, with only a small loss in accuracy.