Introduction to Computational Biology


Comp167, Fall 2017

Prof. Donna Slonim

Introduction to Computational Biology

slonim_AT_cs.tufts.edu

Monday and Wednesday, 10:30-11:45am, Halligan Room 111A

http://www.cs.tufts.edu/~slonim

Office hours: Halligan 107B, Tuesdays 1:30-2:30 and Fridays, 11-12; or by appointment.

TA Office hours: Monday, 1-2:30pm (Hannah Voelker), Tuesday, 8-9:30 pm (Dan Meyer), or by appt.
Loc TBD in Halligan.


Course Description

 Policies

Course Materials


Tentative Schedule:

DATE TOPICS READING OPTIONAL READING
Weds., Sep. 6 Class overview and administrivia.
Introduction to sequences and sequence comparison.
Syllabus handout.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
For CS students new to biology: Larry Hunter's article, Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
Mon., Sep. 11 Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
Weds., Sep. 13 Sequence alignment: gaps, scoring matrices. Hwk 1 out ZB: Sections 4.3, 4.4, 5.1
Mon., Sep. 18 Database search, BLAST, FASTA algorithms
Significance of alignment scores.
ZB: 4.6-4.7, 5.3 (except the section on suffix trees).
Weds., Sep. 20 Database search: Significance of alignment scores, Information Content, compressive BLAST ZB: 5.4 Altschul's tutorial on statistics of sequence similarity scores. Warren Gish's webpage on information theory and alignment scoring statistics. Compressive BLAST
Mon., Sep. 25 DNA motifs, profiles. Gibbs sampling. Iterative search. Hwk 1 due; Hwk 2 out ZB: 6.1, 6.6 Original paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
Weds., Sep. 27 Finish Gibbs sampling, MEME. Multiple sequence alignment: optimal method Ron Shamir's MSA notes ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4
Mon., Oct. 2 Multiple sequence alignment: progressive methods, NP completeness, star alignment
Weds., Oct. 4 Finish MSA; midterm review Hwk 2 due
Mon., Oct 9 NO CLASS
Weds., Oct. 11 MIDTERM 1
Mon., Oct. 16 Sequence assembly: Introduction. deBruijn graphs and Eulerian paths.
Weds., Oct. 18 Sequence assembly: Evaluating assemblies. Overlap graphs, Hamiltonian paths Hwk 3 out ZB: 5.3(pp. 141-3, on suffix trees) Schuster's review article on next generation sequencing; Mardis' more detailed article about next generation sequencing technologies. The paper about the SOAPdenovo assembler.
GAGE: Evaluating short-read assemblies; this may be useful in defining terms needed to complete homework 3.
Mon., Oct. 23 Suffix trees for overlap graphs, other applications of them. Gene finding intro ZB: 9.2-9.7
Weds., Oct. 25 Hidden Markof Models (HMMs). Rabiner handout, pp. 257-266.
Mon., Oct. 30 Finish HMMs and their use in gene finding. EM algorithms. Hwk 3 due ZB: 10.2- 10.8, short paper on EM algorithms Durbin: chapter 3
Weds., Nov. 1 Gene expression: technology, normalization, detecting differential expression
Hwk 4 out
ZB: 15.1, 16.1 Slonim review article
Mon., Nov. 6 Gene expression: clustering and classification. ZB: 16.2-16.3, 16.5 Golub and Slonim et al., on leukemia classification,
Weds., Nov. 8 Gene expression and function: gene set analysis, Gene Ontology, functional enrichment ZB: 16.4 Gene Set Enrichment Analysis
Mon., Nov. 13 Introduction to phylogeny. Hwk 4 due ZB: 7.1, 7.3 Mona Singh's phylogeny notes
Weds., Nov. 15 Phylogeny ZB: 8.1-8.4
Mon., Nov. 20 MIDTERM 2
Weds., Nov. 22 NO CLASS: THANKSGIVING BREAK
Mon., Nov. 27 Profile HMMs, posterior decoding, protein families. Hwk 5 out TBD
Weds., Nov. 29 Anomaly detection for precision medicine; Noto, et al., 2015 on anomaly detection
Mon., Dec. 4 Protein interaction networks, properties Alm and Arkin review of biological networks Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics.
Weds., Dec. 6 Protein function prediction in PPI networks. Hwk 5 due
Mon., Dec. 11 Disease and gene networks. Class wrap up
Weds., Dec. 19, 3:30-5:30pm FINAL EXAM

Course Description

Course aims:
This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:

These aims will be achieved through readings, problem sets, and implementation of some of the algorithms we discuss. About half of the course will focus on molecular sequences and sequence manipulation, while the rest will focus more on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology. Grading will be based on homework assignments (both exercises and programming), two in-class midterms, and a final exam to be held at 3:30pm on Dec. 19th (the E+ block final exam slot). Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.

Course Staff:

The teaching assistants for the course are Hannah Voelker and Dan Meyer. They will be holding office hours on Mondays from 1-2:30pm and Tuesdays, 8-9:30 pm somewhere in Halligan. Look for them in the 2nd floor collaboration room, in Halligan 107 during the day, or in the computer labs. Additional office hours may be scheduled before exams or particular homework assignments are due.

Course Requirements:


Policies



Course Materials

For homeworks, slides, and other class information, go to the private course materials page.
Last updated September 12, 2017.