Course Aims and Description
- Comprehend the biological background, nature, and relevance of
computational problems in
molecular biology.
- Assess the efficiency of computational methods for handling data-rich problems in the field.
- Understand computational techniques and probabilistic models for working effectively with
large data sets.
- Discuss and evaluate tradeoffs involved in choosing how to tackle hard
computational problems.
- Develop experience applying theoretical CS material in a practical setting.
Mastery of these aims will be achieved and assessed through readings, discussions, problem sets, algorithm implementation or data analysis assignments, two in-class midterms, and a final exam. About half of the course will focus on molecular sequences and sequence manipulation; the rest will focus on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology.
Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.
Course Staff and office hours:
Professor Donna Slonim (she/her) is the course instructor.
CS PhD student Gizem Cicekli (she/her) will be our graduate teaching assistant. TA office hours: Mondays and Wednesdays, 2:30-4 in the JCC 3rd floor kitchen area.
Email addresses are firstname dot lastname at tufts dot edu, but you can reach all the course staff at once via Piazza at all times.
Instructor Office Hours: Thurs., 1:30-3pm; Fri., 1-2:30pm, or by appointment. In-person office hours will be held in Cummings 322; Zoom office hours (any day that we have classes online, or by request) and online appointments will be at my personal Zoom room (see private course page for links).
Course Requirements
Prerequisites: CS 15 and at least one 100-level computer science course, or graduate standing in Computer Science, or permission of the instructor.
No biology background required!
Graduate standing in a related field (Biomedical Engineering, Biology, Genetics) may be sufficient providing your computer programming background is strong enough; check with the instructor, and read the following paragraphs first.
Homework assignments will include several implementation projects in Python. We will learn about algorithms in class and in the readings, but you will then be expected to implement them from scratch and apply them, without any formal help in designing the code.
Also essential will be some basic understanding of algorithm analysis, as is typically covered in Comp 15. You should be familiar with asymptotic analysis of algorithmic running times and Big O notation, at least at an introductory level. Comp 160 (Algorithms) is helpful but not essential as a prerequisite; material used here will help you when you take Algorithms if you have not yet done so.
Readings: The course textbook is Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum, published by Garland Science (a subsidiary of Taylor & Francis Group). Copies of the text should be available in the Medford campus bookstore, or you can order it online. The cost of renting an online version for the duration of the semester (through the final exam date of May 8th) was $35 when we last checked in January 2023; online orders are available immediately.
Readings from this text will be listed in the schedule where appropriate. Supplementary readings from the literature or from some of the recommended textbooks listed below appear on the schedule as well.
If you have no biology background, you may want to supplement the readings as well by getting a good introductory molecular biology text. (Several online texts are available for looking up occasional details).
Some material will be presented through a collective "journal club" activity every few weeks. Please read the the journal club papers listed in the schedule before class on the indicated day. During class, you will join a group of students, and each group will be given a slide with questions on it about some aspect of the paper. You group will collaboratively edit the slide with answers to the questions on that slide. We will then have each team present their slide in order, culminating in a presentation covering the key points of the whole paper.
Other recommended books:
- Bioinformatics and Functional Genomics , by Jonathan Pevsner. A readable introduction to the field. Aimed primarily at biologists, provides somewhat less detail than the course text but may be slightly more approachable.
- The Cartoon Guide to Genetics by Larry Gonick and Mark Wheelis. A surprisingly good and serious introduction to the biological concepts covered in this course.
- An Introduction to Bioinformatics Algorithms, by N. Jones and P. Pevzner. A new algorithms text focusing on examples motivated by computational biology. Helpful if you've never taken an algorithms class; provides a more gentle introduction to selected topics than the following book.
- Introduction to Algorithms, by T. Cormen, C. Leiserson, R. Rivest, and C. Stein. The cannonical algorithms textbook. Has nothing to do with biology, but should be on every computer scientist's bookshelf.
- Introduction to Computational Molecular Biology, by J. Setubal and J. Meidanis. A detailed text focused on computational biology algorithms, aimed at computer scientists, from 1997.
- Biological Sequence Analysis, by R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. A good computational biology text focusing on sequence analysis, HMMs, and phylogeny. Includes an excellent whirlwind introduction to statistics.
- Molecular Biology, by David Freifelder. A general introductory molecular biology text. Easy to read, a gentle introduction to the topic.
- Molecular Biology of the Gene, by J. Watson, N. Hopkins, J. Roberts, J. Steitz, and Alan Weiner. A more advanced and detailed molecular biology text. A very thorough index makes this a good reference book.
Computational resources:
If you need help in obtaining computational resources, you
need an account but never received email about its creation,
or you are a non-traditional student or auditor who may
not be enrolled in SIS,
please contact the instructor or teaching assistant
as soon as possible. If you didn't receive email but did take a
CS course in a prior semester, try
resetting your
password first; most of the time the
account still exists and this will work.
Any code you write for your homework
will be graded based on its ability to run on the machine
homework.cs.tufts.edu. Please
test your code there; just because
it works on your laptop does not mean it will work on a
different machine or platform!
Policies
Grading: Grades will be based on five homework assignments, each of which will include both written and programming components (45%), two in-class midterms (15% each), a final exam (20%), and class participation (5%).
Late policy: Submissions are due by midnight on the indicated date; Gradescope's timestamp is official. For late work, we are going to use a token policy in this class this semester. You will have 10 tokens for the term. You may use up to 2 tokens per assignment; each token gets you an extra day (24 hours as counted by Gradescope). You don't need to tell anyone, just submit and we will count the number of late days as the number of tokens used. It is your job to keep track of your token usage. Beyond the 10 tokens, we will not accept late submissions; submit what you have by the deadline for partial credit. Turning work in on time is important for consistency in grading, because it allows us to discuss homeworks in class in a timely fashion, and because content builds on previous material so it is important to figure out quickly if you are lost.
As usual, in the case of serious illness or other truly exceptional circumstances (e.g., situations where your Academic Dean is involved), let us know and we will work something out.
Diversity, Inclusion, and Collegiality: Tufts, the Computer Science Department, and the course staff intend to create a welcoming environment in which all students feel supported and believe that their learning needs and perspectives are valued. We intend to present materials in ways that are respectful to students of any background, ethnicity, race, culture, gender, sexual orientation, or age. We welcome your suggestions on how to improve course effectiveness for yourself or others. If you have religious conflicts with class meetings or requirements, please connect with the course staff.
In this class, we will encourage questions, discussions, and some assignments that involve interacting in groups. While disagreements and differing opinions can be an important part of the learning experience, we expect all students to treat each other with collegiality and respect. Please reach out to course staff if there are any issues with inter-student interactions. While we do not expect this will be necessary, please be reminded that we will, if needed, follow the steps outlined in Tufts' sexual misconduct and non-discrimination policies.
Please also be aware that Tufts faculty are "mandated reporters": if we see, hear, or learn about any kind of discrimination or sexual misconduct, we are required to report it to the university. If you would like to access confidential counseling for an issue, you can find relevant resources here.
Accomodation for Students with Disabilities: Tufts University values the diversity of our students, staff, and faculty, recognizing the important contribution each student makes to our unique community. Tufts is committed to providing equal access and support to all qualified students through the provision of reasonable accommodations, so that each student may fully participate in the Tufts experience.
If you have a disability that requires reasonable accommodations, please contact the Student Accessibility and Academic Resources (StAAR) Center or call 617-627-4539 to make an appointment with a StAAR representative to determine appropriate accommodations. Please be aware that accommodations cannot be enacted retroactively, making timeliness a critical aspect for their provision.
In addition to following the standard procedures, if you have a disability and would like to discuss how we can better support your learning, please feel free to set up an appointment with course staff.
Academic Integrity: The Tufts academic integrity policy and code of conduct appears here. In particular, plagiarism will not be tolerated. Submitting as your own any written work or code that you did not write yourself, without the help of any other person or entity, is a violation of the academic integrity process.
Please see our collaboration policy below describing what is and is not acceptable in the context of this course. If you are not certain what constitutes plagiarism, please see the academic integrity resources at the link above.
Please be aware that if Tufts faculty find evidence of academic misconduct, we are required to report it to the university. Penalties can be truly draconian. The time you save in using someone else's work will be lost ten times over as you work through the academic integrity process. So please, don't put yourself through it. We are eager to help you learn what you need to in order to complete complete the work yourself.
Collaboration Policy: All written work and code submitted should be your own unless you obtain prior permission to collaborate. You are free to discuss assignments with others in the class unless specifically asked not to, but you must write up your answers and code yourself.
We reserve the right to use computational tools to identify instances of plagiarism or materials (text or code) first written by someone - or something - else, whether published online or previously or concurrently submitted at Tufts. We may make use of plagiarism or similarity detection tools such as TurnItIn, Moss, GPTZero, or other methods to detect inappropriate conduct. We also reserve the right to ask you to verbally explain, in person, any content in your written homework.
All sources used should be cited. In other words, if you discuss a homework problem with a classmate, you should list that classmate as one of your references for that problem. Please also be warned that not everything you read online is correct. (This is true of print sources as well, but the risk increases greatly online.) Even data from supposedly reputable sources, such as slides posted by faculty at Tufts or other universities, may not have been reviewed by an editor and might contain crucial typos. For this reason, I'd like to discourage you from using Google to tackle the problem sets, but if you choose to do so, you must cite the URL(s) that you used. Directly copying text or code from any source without attribution is plagiarism and will be dealt with accordingly.
Course Materials
For homeworks, slides, and other class information, go to the private course materials page. You will need to log in using your CS department account and password. An account will be created for all students registered for the course in SIS who do not already have one.Tentative Course Schedule:
Updates will occur during the term: check back frequently. Shaded rows refer to past dates.DATE | TOPICS | READING | OPTIONAL READING | |
Thurs., Jan. 19 | Class overview and administrivia. Introduction to sequences and sequence comparison. |
This course Syllabus. Zvelebil & Baum (ZB): Chapter 1 and Section 4.1 |
For CS students new to biology: Larry Hunter's article,
Molecular Biology for Computer Scientists. For bio or BME students or others with less formal CS background: either Corman, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness. |
|
Tues., Jan. 24 | Sequence alignment: Global alignment. Dynamic programming. |
This course syllabus ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 (pp. 127-135 only) |
Global alignment: Durbin, pp. 17-22. | |
Thurs., Jan. 26 | Local alignment. Scoring schemes, gaps. | ZB: Sections 4.4, 5.2 (pp. 135-140), | Local alignment: Durbin, pp. 23-24, 29-30 | |
Tues., Jan. 31 |
Sequence alignment: scoring matrices, PAM and BLOSUM.
Hwk 1 out |
ZB: Sections 4.3, 5.1 | ||
Thurs., Feb. 2 | Database search: introduction, BLAST, significance of alignment scores | ZB: 4.6-4.7, 5.4 | Altschul's tutorial on statistics of sequence similarity scores. Altschul's slides on information theory, scoring matrices, and E-values. | |
Mon., Feb. 6 | Hwk 1 part 1 due | |||
Tues., Feb. 7 | Multiple sequence alignment: introduction, star alignment, scoring, NP-completeness | Ron Shamir's MSA notes | ZB: 4.5 (pp. 90-93), 6.4-6.5; Durbin, 6.1--6.4 | |
Thurs., Feb. 9 | MSA iterative/progressive ; DNA motifs, profiles. | ZB: 6.1, 6.6 | ||
Mon., Feb. 13 | Hwk 1 part 2 due | |||
Tues., Feb. 14 | Gibbs sampling. Iterative search methods.
Hwk 2 out |
Original paper on the Gibbs sampler for local multiple alignment | Durbin, 6.1--6.4; Original paper on MEME algorithm | |
Thurs., Feb. 16 | Compressive BLAST journal club, sublinear search | Compressive BLAST paper | ||
Fri., Feb. 17 | Hwk 2 part 1 due | |||
Tues., Feb. 21 | Midterm 1 | |||
Thurs., Feb. 23 | NO CLASS: Presidents' Day | |||
Fri., Feb. 24 | Hwk 2 part 2 due | |||
Tues., Feb. 28 | Sequence assembly: deBruijn graphs Eulerian paths; handling errors and repeats | The paper about the SOAPdenovo assembler. | ||
Thurs., Mar. 2 | Hwk 3 out Overlap graphs, Hamiltonian paths, OLC assembly. |
ZB: 5.3(pp. 141-3) | ARACHNE paper on overlap-based whole genome assembly. | |
Tues., Mar. 7 |
Sequencing / read-mapping journal club on MAQ | MAQ paper on read mapping and variant calling | ||
Thurs., Mar. 9 | Gene finding and Markov models | ZB: 9.2-9.7 | ||
Fri., Mar. 10 | Hwk 3 part 1 due | |||
Tues., Mar. 14 | Hidden Markov Models (HMMs) | Rabiner handout, pp. 257-266. | ||
Thurs., Mar. 16 | Hidden Markov Models; EM algorithms. HMM uses in gene finding. | ZB: 10.2- 10.8, short paper on EM algorithms | Durbin: chapter 3 | |
Fri., Mar. 17 | Hwk 3 part 2 due | |||
Tues., Mar. 21 | NO CLASS: Spring Break | |||
Thurs., Mar. 23 | NO CLASS: Spring Break | |||
Tues., Mar. 28 |
Hwk 4 out Finish Hidden Markov Models; Baum-Welch/ EM algorithms. HMM uses in gene finding. Profile HHMs. Hwk 4 intro |
ZB: 10.2- 10.8, short paper on EM algorithms | Durbin: chapter 3 | |
Thurs., Mar. 30 | Gene expression: detecting differential expression, multiple testing | ZB: 15.1, 16.1, 16.4 | Slonim review article | |
Mon., Apr. 3 | Hwk 4 part 1 due | |||
Tues., Apr. 4 | Gene expression: clustering and classification, functional enrichment | ZB: 16.2-16.3, 16.5 | Golub and Slonim et al., on leukemia classification | |
Thurs., Apr. 6 | Midterm 2 | |||
Mon., Apr. 10 | Hwk 4 part 2 due | |||
Tues., Apr. 11 | Functional interpretation: gene set enrichment analysis journal club | Gene Set Enrichment Analysis | ||
Thurs., Apr. 13 | Introduction to phylogeny | ZB: 7.1, 7.3 | Mona Singh's phylogeny notes | |
Tues., Apr. 18 | Hwk 5 out Phylogeny: distance-based tree inference methods |
ZB: 8.1-8.4 | ||
Thurs., Apr. 20 | Phylogeny: finish parsimony methods. Bioinformatics ethics discussion | |||
Mon., Apr. 24 | Hwk 5 part 1 due | |||
Tues., Apr. 25 | Bioinformatics ethics discussion | |||
Thurs., Apr. 27 | Anomaly detection for precision medicine | |||
Fri., Apr. 28 | Hwk 5 part 2 due | |||
Date and time TBA: | Final exam review (Please fill out Piazza poll!) |
|||
Mon., May 8, 12pm-2pm | Final Exam (in our usual classroom) |