Course Aims and Description

This is a data science elective aimed at upper level undergraduates and graduate students. We will present computational methods and analyses in the context of bioinformatics and biomedical data. Upon the completion of the course, students will be able to:

These aims will be achieved through problem sets, in-class exercises and discussions, readings, and a course project. There will be weekly assignments throughout the term.

Grading will be based on five problem set homework assignments, a course project broken up into several other homework assignments, and class participation.

Course Staff:

Professor Donna Slonim (she/her) is the course instructor.

Computer Science PhD student Gizem Cicekli will be our primary teaching assistant, assisted by Nick Camarda from the MD/PhD program.

Email addresses are firstname dot lastname at tufts dot edu (Nick's first name is Nicholas for this purpose), but you can reach all the course staff via Piazza at all times.

Instructor Office Hours: Weds. 2-3:15 on Zoom; Thurs. 1:30-3 in-person; or by appointment. Dr. Slonim's office is currently in Halligan 107B (until we move into the Cummings Center mid-semester).

TA Office Hours: Mon. (Gizem) 1-2PM on Zoom; Tue. (Nick) 3-4PM on Zoom; Fri. (Gizem) 12-1PM in-person at Paige Hall 007; or by appointment.

Course Requirements:


Policies


Course Materials

For homeworks, slides, and other class information, go to the private course materials page. You will need to log in using your CS department account and password. An account will be created for all students registered for the course in SIS who do not already have one.

Tentative Course Schedule:

Shaded rows refer to past dates.
DATE TOPICS READING DUE ON FRIDAY
Thurs., Sept. 9 Course introduction. Generative models. Probability distribution functions in R. Discrete and continuous distributions.
Tues., Sept. 14 Poisson and normal distribution functions; uses in Monte Carlo simulation. Extreme values. Holmes and Huber (all readings are Chapter/Section numbers in this textbook), 1.1-1.3
Thurs., Sept. 16 R vectors, lists data frames. Problem set basics. Empirical cumulative distribution function. Multinomials.
Tues., Sept. 21 Simulations to estimate power. Quantiles. Fitting distributions with VCD. 1.4-1.5
Thurs., Sept. 23 Course project introduction. Data science principles. Fitting distributions to data in VCD. 3.1-3.3 Problem Set 1
Tues., Sept. 28 Grammars. Graphics: base R and ggplot introduction. Maximum likelihood intro.
Data file for class: MFgenes.txt
2.1-2.5, 3.4-3.6
Thurs., Sept. 30 MAP vs MLE, Bayes' theorem. More on maximum likelihood. 2.9 Project Introduction
Tues., Oct. 5 The incidentalome. Mixture models. 4.1-4.2, Incidentalome paper
Thurs., Oct. 7 More about mixture models. Zero-inflation. Dirichlet mixtures
In-class exercise on mixtures
4.2; optional reading: Brown, et al. on Dirichlet mixture models for protein families. Problem Set 2
Tues., Oct. 12 Infinite mixtures; Non-parametric bootstrap
dbtable.txt, and Instruction files for class exercises
4.3
Thurs., Oct. 14 Variance stabilizing transformations; problem set 3.
Files for class exercises: vardata.txt, classexercise.rds, and instructions
Project Methods and Data
Tues., Oct. 19 Single cell sequencing intro; hypothesis testing. 4.4.3-4.4.4
Thurs., Oct. 21
(Matt Zunitch)
Dimension reducing projections. 5.3, 7.1-7.5 Problem Set 3
Tues., Oct. 26 Modeling: linear regression. Data filtering.
diabetes.rds, Longley economic data and data2filter.txt for possible in-class exercises.
7.3-7.4
Thurs., Oct. 28 Dplyr, tidy data. Logistic regression. R/stats Memes; Project Methods and Data Revisions (only if requested)
Tues., Nov. 2 Copy number variation. More on dplyr and the tidyverse.
Thurs., Nov. 4 RNA-sequencing: Intro and normalization. 8.7-8.8 Problem Set 4
Tues., Nov. 9 CNV, CNV+PCA, RNA-sequencing. 9.1-9.3 DUE TUESDAY Project schedule/implementation plan
Thurs., Nov. 11 NO CLASS: VETERANS' DAY
Tues., Nov. 16 RNA-sequencing, design matrices 8.9-8.10
Thurs., Nov. 18 Flow cytometry. Communicating across boundaries. Initial project results
Tues., Nov. 23 Correspondance analysis; presentations 9.5 Preliminary integrated paper draft (combine and edit previous submissions as needed to ensure coherence)
Thurs., Nov. 25 NO CLASS: THANKSGIVING BREAK
Tues., Nov. 30 Functional enrichment; Canonical correlation analysis 9.6
Thurs., Dec. 2 Linear discriminant analysis 12.1-12.3 Problem Set 5
Tues., Dec. 7 Experimental design. Communicating visually. 13.1-13.6
Thurs., Dec. 9 Problem sets 4 and 5; moving forward Posters
Tues., Dec. 14 Poster session Remaining project results and completed paper revisions, submitted by last day of class