Comp167, Spring 2020
Prof. Donna Slonim
Introduction to Computational Biology
Monday and Wednesday, 10:30-11:45am, Halligan Room 111A
Office hours: now on Zoom (see classmaterials page for link), Mondays 1:15-2:30 and Fridays, 10-11; or by appointment.
TA Office hours and location: scheduled on piazza as needed
|Weds., Jan. 15|| Class overview and administrivia.
Introduction to sequences and sequence comparison.
| This course Syllabus.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
| For CS students new to biology: Larry Hunter's article,
Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
|Mon., Jan. 20||NO CLASS|
|Weds., Jan. 22|| Sequence alignment:
Global alignment. Dynamic programming. Local alignment.
|ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2|| Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
|Mon., Jan. 27||Sequence alignment: gaps, scoring matrices. Hwk 1 out||ZB: Sections 4.3, 4.4, 5.1|
|Weds., Jan. 29; ADD date|| Database search, BLAST, FASTA algorithms
Significance of alignment scores.
|ZB: 4.6-4.7, 5.3 (except the section on suffix trees).|
|Mon., Feb. 3||Database search: BLAST. Significance of alignment scores, Information Content||ZB: 5.4||Altschul's tutorial on statistics of sequence similarity scores. Altschul's slides on information theory, scoring matrices, and E-values. Compressive BLAST|
|Weds., Feb. 5||DNA motifs, profiles. Gibbs sampling. Iterative search. Hwk 1 due; Hwk 2 out||ZB: 6.1, 6.6|| Original
paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
|Mon., Feb. 10||Multiple sequence alignment: star alignment, NP completeness||Ron Shamir's MSA notes||ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4|
|Weds., Feb.12||Multiple sequence alignment and profiles|
|Mon., Feb. 17||NO CLASS; hwk 2 due by midnight|
|Weds., Feb. 19; DROP date||Compressive BLAST. Midterm review||Compressive BLAST|
|THURSDAY, Feb. 20||MIDTERM 1|
|Mon., Feb. 24||Sequence assembly: Introduction. deBruijn graphs and Eulerian paths.|
|Weds., Feb. 26||Sequence assembly: Evaluating assemblies. Overlap graphs, Hamiltonian paths Hwk 3 out||ZB: 5.3(pp. 141-3, on suffix trees)|| Schuster's review article on sequencing methods;
Mardis' more detailed article about
"next generation" sequencing technologies.
The paper about the SOAPdenovo assembler.
GAGE: Evaluating short-read assemblies; this may be useful in defining terms needed to complete homework 3.
|Mon., Mar. 2||Suffix trees for overlap graphs, other applications of them. Hidden Markov Model (HMM) intro||Rabiner handout, pp. 257-266.|
|Weds., Mar. 4||Gene finding and intro to Hidden Markov Models (HMMs)||ZB: 9.2-9.7||Rabiner handout, pp. 257-266.|
|Mon., Mar. 9||Guest Lecture: Anselm Blumer on read mapping via the Burrows-Wheeler transform. Hwk 3 due||TBA reading||TBA reading|
|Weds., Mar. 11||Hidden Markov Models; EM algorithms.||ZB: 10.2- 10.8, short paper on EM algorithms||Durbin: chapter 3|
|Mon., Mar. 16||NO CLASS: SPRING BREAK|
|Weds., Mar. 18||NO CLASS: SPRING BREAK|
|Mon., Mar. 23|| NO CLASS: EXTENDED SPRING BREAK
Zoom test session: 10:30-11:00
|Weds., Mar. 25||Gene finding including HMMs. EM algorithms. Follow-up on Burrows-Wheeler. Hwk 4 out||ZB: 10.2- 10.8, short paper on EM algorithms||Durbin: chapter 3|
|Mon., Mar. 30||
Gene expression: technology, normalization, detecting differential expression
||ZB: 15.1, 16.1||Slonim review article|
|Weds., Apr. 1|| Gene expression: normalization, detecting differential expression
Hwk 4 due
|ZB: 16.2-16.3, 16.5|| Golub and Slonim et al., on
|Mon., Apr. 6||Gene expression: clustering and classification.||ZB: 16.4|
|Weds., Apr. 8||Functional interpretation: gene set analysis, Gene Ontology, functional enrichment; midterm review. Midterm 2 out after class.||Gene Set Enrichment Analysis|
|Fri., Apr. 10||NO CLASS; Midterm 2 due|
|Mon., Apr. 13||Introduction to phylogeny.||ZB: 7.1, 7.3||Mona Singh's phylogeny notes|
|Weds., Apr. 15||Phylogeny Hwk 5 out||ZB: 8.1-8.4|
|Mon., Apr. 20||NO CLASS: Patriots' Day|
|Weds., Apr. 22||Protein interaction networks, properties; centrality, pathway centrality, bottlenecks, function from structure, PPI networks, modules. Hwk 5 due||Module finding challenge||Yu, et al., on bottlenecks in protein networks ; Przytycka, Singh, and Slonim review of network dynamics; Pathway Centrality paper.|
|Mon., Apr. 27|| Anomaly detection for precision medicine.
Take-home final project out
|Noto, et al., 2015 on anomaly detection||Pietras, et al., 2020 on temporal anomaly detection|
|Weds., May 6||Take-home final project due|
This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:
These aims will be achieved through readings, problem sets, and implementation of some of the algorithms we discuss. About half of the course will focus on molecular sequences and sequence manipulation, while the rest will focus more on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology.
Updated on 3/21/20 after campus closing: Grading will be based on homework assignments (both exercises and programming), one in-class midterm, one take-home midterm (to be completed during a two-hour period of your choice out of a two-day window), and a final data science project to be done at home, due by our previously assigned exam time. There will no longer be a final exam.
Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.
The teaching assistant for the course is Sophia Jannetty (sophia dot jannetty at tufts dot edu). Office hours are to be determined.
Comp 15 and at least one 100-level computer science course, or graduate standing in Computer Science, or permission of the instructor.
No biology background required!
Graduate standing in a related field (Biomedical Engineering, Biology, Genetics) may be sufficient with no further prerequisites; check with the instructor.
Comfort writing complex programs from scratch in some programming language is essential, as homework assignments will include several implementation projects. We allow some flexibility on what language you choose for implementation; select something that you are comfortable with and that seems suitable for the task. The most common computer languages students have used successfully are Python, C++, C, and Java. If you have another preference, please discuss your choice of language with the TA.
The course textbook is Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum, published by Garland Science (a subsidiary of Taylor & Francis Group). Copies of the text should be available in the Medford campus bookstore, or you can order it online.
Readings from this text will be listed in the syllabus where appropriate. Supplementary readings from the literature or from some of the recommended textbooks listed below will be listed as well. If you have no biology background, you may want to supplement the readings as well by getting a good introductory molecular biology text. (Several online texts are available for looking up occasional details).
Other recommended books:
You will need access to a computer with an internet connection and support for whatever programming language / tools you intend to use. The computer science department will provide you with an account on our systems for this purpose, though you are welcome to use other machines as well. If you need help in obtaining computational resources, please contact the instructor or teaching assistant as soon as possible. Any code you write for your homeworks will be graded based on its ability to run on the machine "homework." You will also need a Gradescope account to submit your work and receive feedback on it.
Turning things in on time is important both for consistency in grading, and because it allows us to discuss homeworks in class, allowing students who did the work on time to have questions answered about it before the relevant exams.
Therefore, please try to keep on top of deadlines. If you know in advance that you have to be away for a legitimate reason (e.g. grad school interviews that couldn't be moved to a non-class day), please talk to me in advance. If you have a serious illness or personal issue that you feel warrants an exception to this policy, please have your academic dean contact me and we will work something out.
All sources used should be cited. In other words, if you discuss a homework problem with a classmate, you should list that classmate as one of your references for that problem. A special note about finding solutions on the web: be warned that not everything you read online is correct. (This is true of print sources as well, but the risk increases greatly online.) Even data from supposedly reputable sources, such as slides posted by faculty at Tufts or other universities, may not have been reviewed by an editor and might contain crucial typos. For this reason, I'd like to discourage you from using Google to tackle the problem sets, but if you choose to do so, you must cite the URL(s) that you used. Directly copying text or code from any source without attribution is plagiarism and will be dealt with accordingly.