Course Aims and Description

This is a data science elective aimed at upper level undergraduates and graduate students. We will present computational methods and analyses in the context of bioinformatics and biomedical data. Upon the completion of the course, students will be able to:

These aims will be achieved through problem sets, in-class exercises and discussions, readings, and a course project. There will be weekly assignments throughout the term.

Grading will be based on five problem set homework assignments, a course project broken up into several other homework assignments, online quizzes (probably not-quite-weekly), and class participation.

Course Staff:

Professor Donna Slonim (she/her) is the course instructor.

Computer Science PhD student Hao Zhu will be our primary teaching assistant.

Email addresses are firstname dot lastname at tufts dot edu, but you can reach all the course staff via Piazza at all times.

Office Hours:

Course Requirements:


Course Materials

For homeworks, slides, and other class information, go to the private course materials page. You will need to log in using your CS department account and password. An account will be created for all students registered for the course in SIS who do not already have one.

Tentative Course Schedule:

Shaded rows refer to past dates.
DATE TOPICS READING DUE ON FRIDAY (Unless otherwise stated)
Weds., Sept. 7 Course introduction. Generative models. Probability distribution functions in R. Discrete and continuous distributions.
Mon., Sept. 12 Poisson and normal distribution functions; uses in Monte Carlo simulation. Extreme values. Holmes and Huber (all readings are Chapter/Section numbers in this textbook), 1.1-1.3
Weds., Sept. 14 R vectors, lists data frames. Problem set basics. Empirical cumulative distribution function. Multinomials. Plagiarism workshop from University of Guelph
Mon., Sept. 19
(ADD DATE: Sept. 20)
Simulations to estimate power. Quantiles. Fitting distributions with VCD. 1.4-1.5
Weds., Sept. 21 Course project introduction. Data science principles. Fitting distributions to data in VCD. 3.1-3.3 Problem Set 1
Mon., Sept. 26 Grammars. Graphics: base R and ggplot introduction. Maximum likelihood intro.
Data file for class: MFgenes.txt
2.1-2.5, 3.4-3.6
Weds., Sept. 28 MAP vs MLE, Bayes' theorem. More on maximum likelihood. 2.9 Project Introduction
Mon., Oct. 3 Mixture models. Zero-inflation. Non-parametric bootstrap. 4.1-4.4
Weds., Oct. 5 Incidentalome. Paper discussion questions. Boostrapping. Incidentalome paper Problem Set 2
Weds., Oct. 12 Problem Set 3 prep: grep, RDS; transcriptomics intro. Classexercise.rds. and instructions. Also, dbtable.txt, Bootstrap exercise instructions Project Methods and Data
Mon., Oct. 17 Dimension reducing projections. Hypothesis testing 7.1-7.5; 6.1-6.9
Weds., Oct. 19 RNA sequencing data; Variance stabilizing transformation; heteroskedicity.
Files for class exercises: vardata.txt, pcaExercise.pdf data2filter.txt
4.4.3-4.4.4; 8.1-8.5 Problem Set 3
Mon., Oct. 24 RNA-sequencing, design matrices
DESeq exercise, ratcounts.txt, ratmorphineannot.csv
Weds., Oct. 26 Quiz 4 review; flow cytometry / PS 4 prep. Data filtering with dplyr.
Instructions and data2filter.txt for possible in-class exercise.
R/stats Memes; Project Methods and Data Revisions (only if requested)
Mon., Oct. 31 dplyr and tidy data; faceting.
diabetes.rds and db.instructions.pdf
polls.txt, dailyshow.rds,
Weds., Nov. 2 Problem set 3. Project time management. Functional enrichment. Problem Set 4
Mon., Nov. 7 Functional enrichment; Linear regression. 10.3; 7.3-7.4
Weds., Nov. 9 Linear regression. Project planning. Effective data presentation.
Longley economic data and instructions for in-class exercise.
DUE WEDNESDAY, 11/9: Project schedule/implementation plan
Mon., Nov. 14 Logistic regression, CNVs
diabetes.rds, logistic regression exercise instructions
Weds., Nov. 16 Multitable canonical correlation analysis; research and project discussion 9.6; (optional: Kashyap paper) Initial Project Results done by Friday
Mon., Nov. 21 PS 4 follow-up; correspondence analysis
preterm.txt and Instructions for in class exercise
9.4-9.5 DUE MONDAY, 11/21: Preliminary integrated paper draft (combine and edit previous submissions as needed to ensure coherence; include online initial project results)
Mon., Nov. 28 Correspondence analysis trajectories, temporal gradients
In-class poster design questions
Poster template critique
Weds., Nov. 30 Linear discriminant analysis and more about classification
wine.RData; instructions
12.1-12.4 Problem Set 5
Mon., Dec. 5 Outlier detection, Quiz 6, more on functional programming
Weds., Dec. 7 Problem set follow up; Experimental design. Moving forward. 13.1-13.6 DUE WEDNESDAY, 12/7: Posters
Mon., Dec. 12 Poster session DUE MONDAY, 12/12: Completed paper revisions, including remaining project results