Students are encouraged to contribute links for this page
Handbook of Data Compression, by David Salomon and Giovanni Motta, 5th edition, Springer, 2009
Introduction to Data Compression, by Khalid Sayood, 4th edition, Morgan Kaufmann, 2012
An Introduction to Kolmogorov Complexity and Its Applications, Ming Li and Paul Vitanyi, Third Edition, Springer Verlag, 2008
Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison-Wesley, 2006.
Applied Data Mining, by Paolo Giudici and Silvia Figini, Wiley, 2009.
Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten and Eibe Frank, Morgan Kaufmann, 2005.
The Fourth Paradigm: Data-Intensive Scientific Discovery, edited by Tony Hey, Stewart Tansley, and Kristin Tolle
Dataclysm: Who We Are (When We Think No One's Looking), by Christian Rudder, Crown, 2014.
Big data in Education
Video with Mona Vernon
and Una-May O'Reilly
Coursera course by Ryan Baker
Wall Street Journal Article
PRWeb article about RANDA project
Stanford Social Innovation Review article
MindShift blog article
Brief history of education big data debate
Campus Technology article
Big Data Landscape article about apps for higher education
Collegestats graphic on big data in education
Elements of Effective E-learning, by Dusti Howell
The Design of Adaptive E-Learning System Based on Students' Learning Styles, by Herman Dwi Surjono
Article on Learning Styles and elearning by Larry McNutt and Marie Brennan
Netflix and Big Data
Real-time processing of Big Data
Thread pools and the worker-queue model
The history of Storm
Documentation for Storm
Blog article by Storm's author
Storm at Wayfair
Apache Kafka documentation
Comparison of Storm and S4 (from an S4 point of view)
Articles on Genetic Algorithms
Bellingham, Richard. "Using Big Data Analytics and Genetic Algorithms
to Predict Street Crime and Optimise Crime Reduction Measures." Economics
and Social Research Council. ESRC, 1 June 2013.
Formoso, Carl. "Genetic Search Algorithms for Large Problems." (n.d.): 1-5. Washington State Department of Social and Health Services.
Marczyk, Adam. "Genetic Algorithms and Evolutionary Computation." Genetic Algorithms and Evolutionary Computation. The TalkOrigins Archive, 2004.
Solon, Olivia. "How Big Data Analysts Reappropriate Algorithms from Evolution and Warfare (Wired UK)." Future Science, Culture & Technology News & Reviews. Wired UK, 6 Jan. 2012.
Stuart, Keith Douglas, and Maciej Majewski. "Artificial Creativity in Linguistics Using Evolvable Fuzzy Neural Networks, Springer Link, 2008.
Verma, Gunjan, and Vineeta Verma. "Role and Applications of Genetic Algorithm in Data Mining." International Journal of Computer Applications 48.17 (2012): 5-8. IJCA, June 2012.
Xu Yang, Mingming Zeng, Quanhui Liu, and Xiaofeng Wang. "A Genetic Algorithm Based Multilevel Association Rules Mining for Big Datasets." Mathematical Problems in Engineering. Hindawi Publishing Corporations, 1 July 2014.
The Lambda Architecture
Dealing With Dirty Data
Best Practices in Data Cleaning, by Jason W. Osborne, SAGE Publications,
Stanford lecture on data cleaning
Wikipedia article on imputing missing values
KDD lecture on Scaling Out Big Data Missing Value Imputations, by Christos Anagnostopoulos
Testing Big Data Applications
Controlled Experiments at Large Scale
Effective Testing Strategies for MapReduce Applications
Testing Hadoop Applications
Big Data: Testing Approach to Overcome Quality Challenges
Webinar - Perfomance Testing Approach for Big Data Applications (Sponsored by Impetus)
Online Controlled Experiments at Large Scale
What are best methods for testing big data applications?
How many web pages do people visit per data? (blog)
Storm Testing API Demo
Big Data in Healthcare
Groves et. al, The ‘big data’ revolution in healthcare, McKinsey Quarterly
Cottle, Hoover, et al, Transforming HealthCare Through Big Data, Institute for Health Technology Transformation (2012).
Jeremy Ginsberg, Detecting influenza epidemics using search engine query data, Nature.
Lazer, Kennedy, King, et al. The Parable of Google Flu: Traps in Big Data Analysis, Science.
HIPAA privacy rules
World Bank statistics on health expenditures
Article about big data and heart disease
Healthcare Hashtag Project
Big Data and Robotics
Survey of Research on Cloud Robotics and Automation. Ben Kehoe, Sachin Patil,
Pieter Abbeel, Ken Goldberg
Robots with Their Heads in the Clouds, Aspen Ideas Festival.
NY Times article on surgical robots
NY Times interview with Ken Goldberg
RoboBrain Project at Cornell
Atlantic article on cloud robotics
DARPA robotics challenge
IEEE Transactions on Automation Science and Engineering special issue on cloud robotics
SXSW Panel on Cloud Robotics and Automation
Systems Handbook, by Ricci, Rokach, Shapira, and Kantor, Springer, 2011.
A Survey of Collaborative Filtering Techniques, by Su and Khoshgoftaar, Advances in Artificial Intelligence, 2009.
Evaluation of Item-Based Top-N Recommendation Algorithms, by George Karypis, 10th Conference of Information and Knowledge Management
Data Preprocessing and Cleaning
on smoothing by Rafael A. Irizarry and Hector Corrada Bravo
Paper on random forests by Leo Breiman
EM tutorial by Jeff Bilmes
Powerpoint presentation on data preprocessing from NYU
Data preprocessing lecture by Han, Kamber, and Pei
Data mining lecture by Tan, Steinbach, and Kumar
Supervised Learning and Data Mining
web-based tools and diseases, Fabricio F. Costa, Drug Discovery Today, 2013
Individual genomes and personalized medicine, Christos Katsios and Dimitrios H. Roukos, Personalized Medicine, 2010.
The path to personalized medicine, Margaret A. Hamburg, NEJM, 2010.
The Impact of Online Networks and Big Data in Life Sciences, Ruchita Gujarathi and Fabricio F. Costa, Social Networking, 2014.
Big data in biomedicine, Fabricio F. Costa, Drug discovery today , 2014.
DrugBank 4.0, Vivian Law et al, Nucleic Acids Research, 2014.
The Gene Ontology (GO) database and informatics resource, The Gene Ontology Consortium, Nucleic Acids Research, 2004.
KEGG: Kyoto Encyclopedia of Genes and Genomes, Minoru Kanehisa and Susumu Goto, Nucleic Acids Research, 2000.
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Ron Edgar, Michael Domrachev, and Alex E. Lasha, Nucleic Acids Research, 2002.
Lean Big Data integration in systems biology and systems pharmacology, Avi Ma’ayan et al, Trends in Pharmacological Sciences , 2014.
Whole genome sequencing as a diagnostic test: challenges and opportunities, Caitlin C. Chrystoja and Eleftherios P. Diamandis, Clinical Chemistry, 2014.
Biomedical cloud computing with Amazon Web Services, Vincent A. Fusaro et al, PLoS Computational Biology, 2011.
Dynamic Clinical Data Mining: Search Engine-Based Decision Support, Leo Anthony Celi et al, JMIR Medical Informatics , 2014.
Big Data and Psychology
psychology manifesto, G. Miller, Perspectives on Psychological Science,
Crowdsourcing for Cognitive Science The Utility of Smartphones, Brown HR et el, PLoS ONE, 2014.
Wijnand Ijsselsteijn, Human-Technology Interaction Group. Eindhoven University of Technology
paper on MapReduce by Dean and Ghemawat
MapReduce on MongoDB
Hadoop main page
Hadoop and MongoDB Use Cases
Survey paper on Parallel Data Processing with MapReduce, by Lee, Lee, Choi, Chung, and Moon
Google Dataflow, possible successor to MapReduce
Apache Spark main page
Sort Benchmark Home Page