Comp167, Spring 2016
Prof. Donna Slonim
Introduction to Computational Biology
Monday and Wednesday, 1:30-2:45pm, Halligan Room 111B
Office hours: Halligan 107B, Mondays 3:00-4:30; Fridays, 11-12; or by appointment.
TA Office hours: Halligan 107, Tuesdays, 4-5:30 or by appt. Piazza site
|Mon., Jan. 25|| Class overview and administrivia.
Introduction to sequences and sequence comparison.
| Syllabus handout.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
| For CS students new to biology: Larry Hunter's article,
Molecular Biology for Computer Scientists.
For BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
|Weds., Jan. 27|| Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
|ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2|| Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
|Mon., Feb. 1||Sequence alignment: gaps, scoring matrices||ZB: Sections 4.3, 4.4, 5.1|
|Weds., Feb. 3|| Database search, BLAST, FASTA
Significance of alignment scores.
|ZB: 4.6-4.7, 5.3-5.4||Altschul's tutorial on statistics of sequence similarity scores. Warren Gish's webpage on information theory and alignment scoring statistics.|
|Mon., Feb. 8||Multiple sequence alignment||Ron Shamir's MSA notes||ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4|
|Weds., Feb. 10||DNA motifs, profiles. Gibbs sampling.||ZB: 6.1, 6.6, short paper on EM algorithms||Original paper on the Gibbs sampler for local multiple alignment|
|Mon., Feb. 15||NO CLASS: Tufts Holiday|
|Weds., Feb. 17||Gene finding, intro to HMMs||ZB: 9.2-9.7; 10.2- 10.8||Rabiner handout, pp. 257-266.|
| THURSDAY., Feb. 18:
|Finish HMMs and their use in gene finding.||ZB: 4.8-4.9; ZB: 9.2-9.7; 10.2- 10.8||Durbin: chapter 3|
|Mon., Feb. 22||Profile HMMs; introduction to sequence assembly||ZB: 6.2, Nagarajan and Pop's Sequencing overview||Eric Green's historical review article on genomic sequencing methods|
|Weds., Feb. 24||Sequence assembly, overlap graphs and suffix trees||GAGE: Evaluating short-read assemblies; this will be useful in completing homework 3.|| Schuster's review article on next generation sequencing;
Mardis' more detailed article about
next generation sequencing technologies.
The paper about the SOAPdenovo assembler.
|Mon., Feb. 29||Sequence assembly, deBruijn graphs and Eulerian paths||ZB: 5.3|
|Weds., Mar. 2||EXAM 1|
|Mon., Mar. 7||Gene expression: technology, normalization, detecting differential expression||ZB: 15.1, 16.1||Slonim review article|
|Weds., Mar. 9||Gene expression: gene set analysis methods, the Gene Ontology, functional enrichment||ZB: 16.4||Gene Set Enrichment Analysis|
|Mon., Mar. 14||Gene expression: clustering and classification||ZB: 16.2-16.3, 16.5|| Golub and Slonim et al., on
|Weds., Mar. 16||Introduction to phylogeny||ZB: chapter 7||Mona Singh's phylogeny notes|
|Mon., Mar. 21||NO CLASS: SPRING BREAK|
|Weds., Mar. 23||NO CLASS: SPRING BREAK|
|Mon., Mar. 28||Phylogeny||ZB: 8.1-8.4|
|Weds., Mar. 30||Protein interaction networks||Alm and Arkin review of biological networks||Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics.|
|Mon., Apr. 4||Networks and systems biology||ZB: chapter 17 or TBA|
|Weds., Apr. 6||EXAM 2|
|Mon., Apr. 11||(Daniels) Big data and Compressive BLAST|
|Weds., Apr. 13||(Daniels) Sublinear search techniques for big data|
|Mon., Apr. 18||NO CLASS: Patriots' Day|
|Weds., Apr. 20||Introduction to protein structure prediction||ZB: chapter 2; 11.1, 11.4-11.5|
|Mon., Apr. 25||Predicting secondary and super-secondary structure, evaluation||ZB: 13.2-13.5|
|Weds., Apr. 27||Project presentations|
|Mon., May 2||Project presentations|
In this course, students will develop an understanding of the key computational challenges in molecular biology, or any field in which the onslaught of data requires sophsticated algorithms and data structures for scalability. We will discuss algorithms used to solve some of these problems, and we will introduce ongoing areas of research in the fields of bioinformatics and computational biology. Grading will be based on homework assignments (both written and computer-based), two in-class exams, and a written course project. Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.
We are likely to have some guest lecturers this semester, including:
Prerequisites: Comp 15 and at least one 100-level computer science course, or graduate standing in Computer Science, or permission of the instructor. Graduate standing in a related field (Biomedical Engineering, Biology, Genetics) may be sufficient with no further prerequisites; check with the instructor. Comfort writing programs in some language is essential, as homework assignments will include some implementation projects. In the past, students have successfully used Perl, Python, Java, C, and C++. If you have another preference, please discuss your choice of language with the TA.
Readings: The course textbook is Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum, published by Garland Science (a subsidiary of Taylor & Francis Group). Copies of the text should be available in the Medford campus bookstore, or you can order it online.
Readings from this text will be listed in the syllabus where appropriate. Supplementary readings from the literature or from some of the recommended textbooks listed below will be listed as well. If you have no biology background, you may want to supplement the readings as well by getting a good introductory molecular biology text. (Several online texts are available for looking up occasional details).
Other recommended books:
Computational resources: You will need access to a computer with an internet connection and support for whatever programming language / tools you intend to use. The computer science department will provide you with an account on our systems for this purpose, though you are welcome to use other machines as well. If you need help in obtaining computational resources, please contact the instructor or teaching assistant as soon as possible.
All sources used should be cited. In other words, if you discuss a homework problem with a classmate, you should list that classmate as one of your references for that problem. A special note about finding solutions on the web: be warned that not everything you read online is correct. (This is true of print sources as well, but the risk increases greatly online.) Even data from supposedly reputable sources, such as slides posted by faculty at Tufts or other universities, may not have been reviewed by an editor and might contain crucial typos. For this reason, I'd like to discourage you from using Google to tackle the problem sets, but if you choose to do so, you must cite the URL(s) that you used. Directly copying text or code from any source without attribution is plagiarism and will be dealt with accordingly.