Introduction to Computational Biology


Comp167, Spring 2020

Prof. Donna Slonim

Introduction to Computational Biology

slonim_AT_cs.tufts.edu

Monday and Wednesday, 10:30-11:45am, Halligan Room 111A

http://www.cs.tufts.edu/~slonim

Office hours: now on Zoom (see classmaterials page for link), Mondays 1:15-2:30 and Fridays, 10-11; or by appointment.

TA Office hours and location: scheduled on piazza as needed


Course Description

 Policies

Course Materials


Course Schedule:

Shaded rows refer to past dates.
DATE TOPICS READING OPTIONAL READING
Weds., Jan. 15 Class overview and administrivia.
Introduction to sequences and sequence comparison.
This course Syllabus.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
For CS students new to biology: Larry Hunter's article, Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
Mon., Jan. 20 NO CLASS
Weds., Jan. 22 Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
Mon., Jan. 27 Sequence alignment: gaps, scoring matrices. Hwk 1 out ZB: Sections 4.3, 4.4, 5.1
Weds., Jan. 29; ADD date Database search, BLAST, FASTA algorithms
Significance of alignment scores.
ZB: 4.6-4.7, 5.3 (except the section on suffix trees).
Mon., Feb. 3 Database search: BLAST. Significance of alignment scores, Information Content ZB: 5.4 Altschul's tutorial on statistics of sequence similarity scores. Altschul's slides on information theory, scoring matrices, and E-values. Compressive BLAST
Weds., Feb. 5 DNA motifs, profiles. Gibbs sampling. Iterative search. Hwk 1 due; Hwk 2 out ZB: 6.1, 6.6 Original paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
Mon., Feb. 10 Multiple sequence alignment: star alignment, NP completeness Ron Shamir's MSA notes ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4
Weds., Feb.12 Multiple sequence alignment and profiles
Mon., Feb. 17 NO CLASS; hwk 2 due by midnight
Weds., Feb. 19; DROP date Compressive BLAST. Midterm review Compressive BLAST
THURSDAY, Feb. 20 MIDTERM 1
Mon., Feb. 24 Sequence assembly: Introduction. deBruijn graphs and Eulerian paths.
Weds., Feb. 26 Sequence assembly: Evaluating assemblies. Overlap graphs, Hamiltonian paths Hwk 3 out ZB: 5.3(pp. 141-3, on suffix trees) Schuster's review article on sequencing methods; Mardis' more detailed article about "next generation" sequencing technologies. The paper about the SOAPdenovo assembler.
GAGE: Evaluating short-read assemblies; this may be useful in defining terms needed to complete homework 3.
Mon., Mar. 2 Suffix trees for overlap graphs, other applications of them. Hidden Markov Model (HMM) intro Rabiner handout, pp. 257-266.
Weds., Mar. 4 Gene finding and intro to Hidden Markov Models (HMMs) ZB: 9.2-9.7 Rabiner handout, pp. 257-266.
Mon., Mar. 9 Guest Lecture: Anselm Blumer on read mapping via the Burrows-Wheeler transform. Hwk 3 due TBA reading TBA reading
Weds., Mar. 11 Hidden Markov Models; EM algorithms. ZB: 10.2- 10.8, short paper on EM algorithms Durbin: chapter 3
Mon., Mar. 16 NO CLASS: SPRING BREAK
Weds., Mar. 18 NO CLASS: SPRING BREAK
Mon., Mar. 23 NO CLASS: EXTENDED SPRING BREAK
Zoom test session: 10:30-11:00
Weds., Mar. 25 Gene finding including HMMs. EM algorithms. Follow-up on Burrows-Wheeler. Hwk 4 out ZB: 10.2- 10.8, short paper on EM algorithms Durbin: chapter 3
Mon., Mar. 30 Gene expression: technology, normalization, detecting differential expression
ZB: 15.1, 16.1 Slonim review article
Weds., Apr. 1 Gene expression: normalization, detecting differential expression
Hwk 4 due
ZB: 16.2-16.3, 16.5 Golub and Slonim et al., on leukemia classification,
Mon., Apr. 6 Gene expression: clustering and classification. ZB: 16.4
Weds., Apr. 8 Functional interpretation: gene set analysis, Gene Ontology, functional enrichment; midterm review. Midterm 2 out after class. Gene Set Enrichment Analysis
Fri., Apr. 10 NO CLASS; Midterm 2 due
Mon., Apr. 13 Introduction to phylogeny. ZB: 7.1, 7.3 Mona Singh's phylogeny notes
Weds., Apr. 15 Phylogeny Hwk 5 out ZB: 8.1-8.4
Mon., Apr. 20 NO CLASS: Patriots' Day
Weds., Apr. 22 Protein interaction networks, properties; centrality, pathway centrality, bottlenecks, function from structure, PPI networks, modules. Hwk 5 due Module finding challenge Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics; Pathway Centrality paper.
Mon., Apr. 27 Anomaly detection for precision medicine. Class wrap-up.
Take-home final project out
Noto, et al., 2015 on anomaly detection Pietras, et al., 2020 on temporal anomaly detection
Weds., May 6 Take-home final project due

Course Description

Course aims:
This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:

These aims will be achieved through readings, problem sets, and implementation of some of the algorithms we discuss. About half of the course will focus on molecular sequences and sequence manipulation, while the rest will focus more on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology.

Updated on 3/21/20 after campus closing: Grading will be based on homework assignments (both exercises and programming), one in-class midterm, one take-home midterm (to be completed during a two-hour period of your choice out of a two-day window), and a final data science project to be done at home, due by our previously assigned exam time. There will no longer be a final exam.

Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.

Course Staff:

The teaching assistant for the course is Sophia Jannetty (sophia dot jannetty at tufts dot edu). Office hours are to be determined.

Course Requirements:


Policies



Course Materials

For homeworks, slides, and other class information, go to the private course materials page.
Last updated April 27, 2020.