Algorithms and Representations for Mining Massive Collections of Time Series and Shapes
To date, the vast majority of research on time series and shape data mining has focused on similarity search and clustering. I believe that these problems should now be regarded as essentially solved. In particular, there are now fast exact techniques for searching and clustering patterns under both the Euclidean distance and Dynamic Time Warping, the two most useful distance measures. However, from a knowledge discovery viewpoint, there are much more interesting problems, the detection of previously unknown patterns and relationships in time series and shape databases. Two concrete examples are finding the most unusual objects (discord discovery) and finding repeated objects (motif discovery).
While there are many representations that can be used to solve these problems (i.e. wavelets, Fourier methods etc), in this talk I argue that solutions which are scalable to massive datasets will require symbolic representations. The talk will be illustrated with examples from anthropology, law enforcement, biology and mining of historical texts.
BIO: Dr. Keogh's research interests are in Data Mining, Machine Learning and Information Retrieval. He has published more than 80 papers, including ten papers in SIGKDD and ten papers in IEEE ICDM. Several of his papers have won “best paper” awards. In addition he has won several teaching awards. He is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases” and a grant from Aerospace Corp to develop a time series visualization tool for monitoring space launch telemetry. Dr Keogh has given well received tutorials on time series, machine learning and data mining all over the world.