Displaying publications 1 to 7 of 7 publications associated with the Machine Learning Group in 2006:
Authors: D. Sculley and C. Brodley
DCC 2006: Data Compression Conference 2006 Proceedings
Abstract: The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compressionbased similarity measures compute similarity within these feature spaces. Thus, compression-based methods are not a “parameter free” magic bullet for feature selection and data representation, but are instead concrete similarity measures within defined feature spaces, and are therefore akin to explicit feature vector models used in standard machine learning algorithms. To underscore this point, we find theoretical and empirical connections between traditional machine learning vector models and compression, encouraging cross-fertilization in future work.
Authors: R. Lomasky, C. E. Brodley, S. Bencic, M. Aernecke, and D. Walt
NIPS Workshop: Testing of Deployable Learning and Decision Systems
Abstract: For some supervised learning tasks, researchers can control the data generation process. In such cases, it would be beneﬁcial to have feedback during learning to guide future data collection. Our research is motivated by a real-world problem: discrimination of vapors with an “artiﬁcial nose”. The nose’s accuracy is vital, because it will be deployed to detect harmful gases in critical situations, such as an airport or a subway. We address how to improve accuracy if insufﬁcient examples have been observed to accurately deﬁne the class’s decision boundaries. This problem differs from situations in which active learning is applicable. Active learning either requests labels for existing data or explicitly queries the feature space. In contrast, our task allows us to ask for additional examples from speciﬁc classes. In this paper we propose an adaptive heuristic to identify from which classes instances should be added during the learning process. We evaluate our methods on the artiﬁcial nose data and show signiﬁcant improvement over random sampling.
Authors: Jacob, N. and Brodley, C. E.
22nd Annual Computer Security Applications Conference
Abstract: Signature-matching intrusion detection systems can experience significant decreases in performance when the load on the IDS-host increases. We propose a solution that off-loads some of the computation performed by the IDS to the graphics processing unit (GPU). Modern GPUs are programmable, stream-processors capable of high-performance computing that in recent years have been used in non-graphical computing tasks. The major operation in a signature-matching IDS is matching values seen operation to known black-listed values, as such, our solution implements the string-matching on the GPU. The results show that as the CPU load on the IDS host system increases, PixelSnort's performance is significantly more robust and is able to outperform conventional Snort by up to 40%
Authors: Khardon,R., Arias, M., Servedio, R. A.
Information and Computation
vol. 204, pp. 816-834
Authors: Ozdoganoglu, H., Jalote, A., Vijaykumar, T. N., Brodley, C. E., and Kuperman, B.
IEEE Transactions on Computers
55 (10), pp 1271-1285
Abstract: A buffer overflow attack is perhaps the most common attack used to compromise the security of a host. This attack can be used to change the function return address and redirect execution to the attacker's code. We present a hardware-based solution, called SmashGuard, to protect against all known forms of attack on the function return addresses stored on the program stack. With each function call instruction, the current return address is pushed onto a hardware stack. A return instruction compares its address to the return address from the top of the hardware stack. An exception is raised to signal the mismatch. Because the stack operations and checks are done in hardware in parallel with the usual execution of instructions, our best-performing implementation scheme has virtually no performance overhead (because we are modifying hardware, it is impossible to guarantee zero overhead without an actual hardware implementation). While previous software-based approaches' average performance degradation for the SPEC2000 benchmarks is only 2.8 percent, their worst-case degradation is up to 8.3 percent. Apart from the lack of robustness in performance, the software approaches' key disadvantages are less security coverage and the need for recompilation of applications. SmashGuard, on the other hand, is secure and does not require recompilation of applications.
Authors: M. Arias and R. Khardon
Journal of Computer and System Sciences
Volume 72, Issue 1, Pages 72-94