Introduction to Computational Biology


Comp167, Spring 2017

Prof. Donna Slonim

Introduction to Computational Biology

slonim_AT_cs.tufts.edu

Monday and Wednesday, 10:30-11:45am, Halligan Room 111A

http://www.cs.tufts.edu/~slonim

Office hours: Halligan 107B, Tuesdays 1:30-2:30 and Fridays, 11-12; or by appointment.

TA Office hours: Halligan 107, time TBA or by appt.


Course Description

 Policies

Course Materials


Tentative Schedule:

DATE TOPICS READING OPTIONAL READING
Mon., Jan. 23 Class overview and administrivia.
Introduction to sequences and sequence comparison.
Syllabus handout.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
For CS students new to biology: Larry Hunter's article, Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
Weds., Jan. 25 Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
Mon., Jan. 30 Sequence alignment: gaps, scoring matrices ZB: Sections 4.3, 4.4, 5.1
Weds., Feb. 1 Database search, BLAST, FASTA algorithms
Significance of alignment scores.
ZB: 4.6-4.7, 5.3 (except the section on suffix trees).
Mon., Feb. 6 Russ Altman talk at noon in Nelson Aud.!
Database search: Significance of alignment scores, Information Content, compressive BLAST
ZB: 5.4 Altschul's tutorial on statistics of sequence similarity scores. Warren Gish's webpage on information theory and alignment scoring statistics. Compressive BLAST
Weds., Feb. 8 DNA motifs, profiles. Gibbs sampling. Iterative search ZB: 6.1, 6.6 Original paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
Mon., Feb. 13 Multiple sequence alignment: star alignment, NP completeness Ron Shamir's MSA notes ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4
Weds., Feb. 15: Multiple sequence alignment: iterative and progressive methods
Mon., Feb. 20 NO CLASS
Weds., Feb. 22 Introduction to phylogeny ZB: 7.1, 7.3 Mona Singh's phylogeny notes
THURSDAY, Feb. 23
TUFTS Monday
Phylogeny ZB: 8.1-8.4
Mon., Feb. 27 MIDTERM 1
Weds., Mar. 1 Finish phylogeny, Sequence assembly: Intro and Overlap graphs ZB: 5.3(pp. 141-3) Schuster's review article on next generation sequencing; Mardis' more detailed article about next generation sequencing technologies. The paper about the SOAPdenovo assembler.
GAGE: Evaluating short-read assemblies; this may be useful in defining terms needed to complete homework 3.
Mon., Mar. 6 Sequence assembly: deBruijn graphs and Eulerian paths; suffix tree intro
Weds., Mar. 8 More on suffix trees for overlap graphs. Gene finding intro ZB: 9.2-9.7
Mon., Mar. 13 Hidden Markof Models (HMMs) Rabiner handout, pp. 257-266.
Weds., Mar. 15 Finish HMMs and their use in gene finding. EM algorithms. ZB: 10.2- 10.8, short paper on EM algorithms Durbin: chapter 3
Mon., Mar. 20 NO CLASS: SPRING BREAK
Weds., Mar. 22 NO CLASS: SPRING BREAK
Mon., Mar. 27 Gene expression: technology, normalization, detecting differential expression ZB: 15.1, 16.1 Slonim review article
Weds., Mar. 29 Gene expression and SNPs: clustering and classification, scalability of methods ZB: 16.2-16.3, 16.5 Golub and Slonim et al., on leukemia classification,
Mon., Apr. 3 Gene expression and function: gene set analysis methods, the Gene Ontology, functional enrichment ZB: 16.4 Gene Set Enrichment Analysis
Weds., Apr. 5 Protein interaction networks Alm and Arkin review of biological networks Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics.
Mon., Apr. 10 Anomaly detection for precision medicine; networks and systems biology ZB: chapter 17 Noto, et al., 2015 on anomaly detection
Weds., Apr. 12 MIDTERM 2
Mon., Apr. 17 NO CLASS: Patriots' Day
Weds., Apr. 19 Hescott: network alignment and phenotypes
Mon., Apr. 24 Introduction to protein structure prediction ZB: chapter 2; 11.1, 11.4-11.5
Weds., Apr. 26 Predicting secondary and super-secondary structure, evaluation ZB: 13.2-13.5
Mon., May 1 Wrap-up; computational themes revisited
Mon., May 8, 3:30-5:30pm FINAL EXAM

Course Description

Course aims:
This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:

These aims will be achieved through readings, problem sets, and implementation of some of the algorithms we discuss. About half of the course will focus on molecular sequences and sequence manipulation, while the rest will focus more on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology. Grading will be based on homework assignments (both exercises and programming), two in-class midterms, and a final exam to be held at 3:30pm on May 8th (the E+ block final exam slot). Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.

Course Staff:

The graduate teaching assistant for the course is Jake Crawford, who can be reached at John Crawford _AT_ tufts.edu.
TA office hours: Halligan 107, time TBA.

Course Requirements:


Policies



Course Materials

For homeworks, slides, and other class information, go to the private course materials page.
Last updated Jan 19, 2017