Comp167, Fall 2017
Prof. Donna Slonim
Introduction to Computational Biology
Monday and Wednesday, 10:30-11:45am, Halligan Room 111A
Office hours: Halligan 107B, Tuesdays 1:30-2:30 and Fridays, 11-12; or by appointment.
TA Office hours: Monday, 1-2:30pm (Hannah Voelker), Tuesday, 8-9:30 pm (Dan Meyer), or by appt.
|Weds., Sep. 6|| Class overview and administrivia.
Introduction to sequences and sequence comparison.
| Syllabus handout.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
| For CS students new to biology: Larry Hunter's article,
Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
|Mon., Sep. 11|| Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
|ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2|| Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
|Weds., Sep. 13||Sequence alignment: gaps, scoring matrices. Hwk 1 out||ZB: Sections 4.3, 4.4, 5.1|
|Mon., Sep. 18|| Database search, BLAST, FASTA algorithms
Significance of alignment scores.
|ZB: 4.6-4.7, 5.3 (except the section on suffix trees).|
|Weds., Sep. 20||Database search: Significance of alignment scores, Information Content, compressive BLAST||ZB: 5.4||Altschul's tutorial on statistics of sequence similarity scores. Warren Gish's webpage on information theory and alignment scoring statistics. Compressive BLAST|
|Mon., Sep. 25||DNA motifs, profiles. Gibbs sampling. Iterative search. Hwk 1 due; Hwk 2 out||ZB: 6.1, 6.6|| Original
paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
|Weds., Sep. 27||Finish Gibbs sampling, MEME. Multiple sequence alignment: optimal method||Ron Shamir's MSA notes||ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4|
|Mon., Oct. 2||Multiple sequence alignment: progressive methods, NP completeness, star alignment|
|Weds., Oct. 4||Finish MSA; midterm review Hwk 2 due|
|Mon., Oct 9||NO CLASS|
|Weds., Oct. 11||MIDTERM 1|
|Mon., Oct. 16||Sequence assembly: Introduction. deBruijn graphs and Eulerian paths.|
|Weds., Oct. 18||Sequence assembly: Evaluating assemblies. Overlap graphs, Hamiltonian paths Hwk 3 out||ZB: 5.3(pp. 141-3, on suffix trees)|| Schuster's review article on next generation sequencing;
Mardis' more detailed article about
next generation sequencing technologies.
The paper about the SOAPdenovo assembler.
GAGE: Evaluating short-read assemblies; this may be useful in defining terms needed to complete homework 3.
|Mon., Oct. 23||Suffix trees for overlap graphs, other applications of them. Gene finding intro||ZB: 9.2-9.7|
|Weds., Oct. 25||Hidden Markof Models (HMMs).||Rabiner handout, pp. 257-266.|
|Mon., Oct. 30||Finish HMMs and their use in gene finding. EM algorithms. Hwk 3 due||ZB: 10.2- 10.8, short paper on EM algorithms||Durbin: chapter 3|
|Weds., Nov. 1||
Gene expression: technology, normalization, detecting differential expression
Hwk 4 out
|ZB: 15.1, 16.1||Slonim review article|
|Mon., Nov. 6||Gene expression: clustering and classification.||ZB: 16.2-16.3, 16.5|| Golub and Slonim et al., on
|Weds., Nov. 8||Gene expression and function: gene set analysis, Gene Ontology, functional enrichment||ZB: 16.4||Gene Set Enrichment Analysis|
|Mon., Nov. 13||Introduction to phylogeny. Hwk 4 due||ZB: 7.1, 7.3||Mona Singh's phylogeny notes|
|Weds., Nov. 15||Phylogeny||ZB: 8.1-8.4|
|Mon., Nov. 20||MIDTERM 2|
|Weds., Nov. 22||NO CLASS: THANKSGIVING BREAK|
|Mon., Nov. 27||Profile HMMs, posterior decoding, protein families. Hwk 5 out||TBD|
|Weds., Nov. 29||Anomaly detection for precision medicine;||Noto, et al., 2015 on anomaly detection|
|Mon., Dec. 4||Protein interaction networks, properties||Alm and Arkin review of biological networks||Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics.|
|Weds., Dec. 6||Protein function prediction in PPI networks. Hwk 5 due|
|Mon., Dec. 11||Disease and gene networks. Class wrap up|
|Weds., Dec. 19, 3:30-5:30pm||FINAL EXAM|
This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:
These aims will be achieved through readings, problem sets, and implementation of some of the algorithms we discuss. About half of the course will focus on molecular sequences and sequence manipulation, while the rest will focus more on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology. Grading will be based on homework assignments (both exercises and programming), two in-class midterms, and a final exam to be held at 3:30pm on Dec. 19th (the E+ block final exam slot). Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.
The teaching assistants for the course are Hannah Voelker and Dan Meyer. They will be holding office hours on Mondays from 1-2:30pm and Tuesdays, 8-9:30 pm somewhere in Halligan. Look for them in the 2nd floor collaboration room, in Halligan 107 during the day, or in the computer labs. Additional office hours may be scheduled before exams or particular homework assignments are due.
Comp 15 and at least one 100-level computer science course, or graduate standing in Computer Science, or permission of the instructor.
No biology background required!
Graduate standing in a related field (Biomedical Engineering, Biology, Genetics) may be sufficient with no further prerequisites; check with the instructor.
Comfort writing complex programs from scratch in some programming language is essential, as homework assignments will include several implementation projects. In the past, students have successfully used Perl, Python, Java, C, and C++. If you have another preference, please discuss your choice of language with the TA.
The course textbook is Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum, published by Garland Science (a subsidiary of Taylor & Francis Group). Copies of the text should be available in the Medford campus bookstore, or you can order it online.
Readings from this text will be listed in the syllabus where appropriate. Supplementary readings from the literature or from some of the recommended textbooks listed below will be listed as well. If you have no biology background, you may want to supplement the readings as well by getting a good introductory molecular biology text. (Several online texts are available for looking up occasional details).
Other recommended books:
You will need access to a computer with an internet connection and support for whatever programming language / tools you intend to use. The computer science department will provide you with an account on our systems for this purpose, though you are welcome to use other machines as well. If you need help in obtaining computational resources, please contact the instructor or teaching assistant as soon as possible. You will need to use your computer science department account to submit your code through our "provide" system, and to ensure that it will run correctly on our system.
Turning things in on time is important both for consistency in grading, and because it allows us to discuss homeworks in class, allowing students who did the work on time to have questions answered about it before the relevant exams.
Therefore, please try to keep on top of deadlines. If you know in advance that you have to be away for a legitimate reason (e.g. grad school interviews that couldn't be moved to a non-class day), please talk to me in advance. If you have a serious illness or personal issue that you feel warrants an exception to this policy, please have your academic dean contact me and we will work something out.
All sources used should be cited. In other words, if you discuss a homework problem with a classmate, you should list that classmate as one of your references for that problem. A special note about finding solutions on the web: be warned that not everything you read online is correct. (This is true of print sources as well, but the risk increases greatly online.) Even data from supposedly reputable sources, such as slides posted by faculty at Tufts or other universities, may not have been reviewed by an editor and might contain crucial typos. For this reason, I'd like to discourage you from using Google to tackle the problem sets, but if you choose to do so, you must cite the URL(s) that you used. Directly copying text or code from any source without attribution is plagiarism and will be dealt with accordingly.