How To Get To Class

Welcome to Comp 167!

We'd like to encourage students to come to class via Sococo, the department's virtual collaboration space, in which case you can connect in Zoom without knowing the Zoom link or password. Just put your avatar in the room "Cummings 475" on the lower left of the main floor of "Virtual Halligan," and the Zoom link opens automatically during class time. You do need Tufts two-factor authentication (2FA) to log into Sococo: see the description under "computational resources" below.

If you'd rather connect directly through Zoom, the class Zoom link is on the private class materials page, which is also described under computational resources. You need a CS account to log in here. Both a CS account and Tufts 2FA are available to anyone who is enrolled in the class - talk to course staff if you need help with either of these.

Course Aims and Description

This is a computer science elective aimed at upper level undergraduates and graduate students. Upon the completion of the course, students will be able to:

Mastery of these aims will be achieved and assessed through readings, problem sets, algorithm implementation or data analysis assignments, and in-class quizzes (which will replace longer and more formal exams typical in non-pandemic years). About half of the course will focus on molecular sequences and sequence manipulation; the rest will focus on issues of interpretation, which require more complex data and methods. We will talk about scalability and how and when approximate solutions are appropriate. Finally, we will introduce ongoing areas of research in the fields of bioinformatics and computational biology.

Students will also be expected to contribute to class discussion and group activities, to do the assigned reading, and to read supplementary background materials as they find necessary.

Course Staff:

Professor Donna Slonim is the course instructor.

CS PhD student James Mattei will be our graduate teaching assistant. Office hours: Mondays 1:30-3:00 and Wednesdays 10:30-12:00. Office hours will be held in zoom link posted on Piazza and the private page.

Email addresses are firstname dot lastname at tufts dot edu, but you can reach all the course staff at once via Piazza at all times.

Instructor Office Hours: Weds., 4:15-5:30 and Fri., 2:30-3:45, or by appointment. Office hours will be held in Slonim office on lower left of main floor in Sococo, or personal zoom room (see private course page for links).

Course Requirements

Prerequisites: Comp 15 and at least one 100-level computer science course, or graduate standing in Computer Science, or permission of the instructor.

No biology background required!

Graduate standing in a related field (Biomedical Engineering, Biology, Genetics) may be sufficient with no further prerequisites; check with the instructor, and read the following paragraphs first

Comfort writing complex programs from scratch in some programming language is essential, as homework assignments will include several implementation projects. We allow some flexibility on what language you choose for implementation; select something that you are comfortable with and that seems suitable for the task. The most common computer languages students have used successfully are Python, C++, and Java. If you have another preference, please discuss your choice of language with the TA.

Also essential will be some basic understanding of algorithm analysis, as covered in Comp 15. You should be familiar with asymptotic analysis of algorithmic running times and Big O notation, at least at an introductory level. Comp 160 (Algorithms) is helpful but not essential as a prerequisite; material used here will help you when you take Algorithms if you have not yet done so.

Readings: The course textbook is Understanding Bioinformatics by Marketa Zvelebil and Jeremy O. Baum, published by Garland Science (a subsidiary of Taylor & Francis Group). Copies of the text should be available in the Medford campus bookstore, or you can order it online. The cost of renting an online version for the duration of the semester was $35 when I last checked in January 2021; online orders are available immediately.

Readings from this text will be listed in the schedule where appropriate. Supplementary readings from the literature or from some of the recommended textbooks listed below appear on the schedule as well.

If you have no biology background, you may want to supplement the readings as well by getting a good introductory molecular biology text. (Several online texts are available for looking up occasional details).

We are going to try something new this term that will help us connect with each other in this online format: collective "journal clubs." Please read the the journal club papers listed in the schedule before class on the indicated day. During class, you will join a group in a breakout room, and each group will be given a slide with questions on it about some aspect of the paper. You group will use your breakout room time to edit the slide with answers to the questions on that slide. We will then return to the full group, where each team will present their slide in order, culminating in a presentation covering the key points of the whole paper.

Other recommended books:

Computational resources:

A note on privacy: This semester, we expect class attendance to be more variable than usual. Some students may even be in other time zones. Accordingly, please understand that we intend to record class meetings on Zoom and share them via the private course materials page. Links to the class materials will only be available to those with CS department accounts (this will include auditors with the approval of course staff). Office hours will not be recorded unless you are informed otherwise in particular cases (e.g. a review session on a particular topic that students have asked us to record). Please talk to course staff if you have concerns about this.


Policies

Grading: Grades will be based on five homework assignments, which will include both written and programming components (60%), five in-class quizzes (32%; the lowest grade will be dropped), and course participation (8%), which will include participating in in-class exercises, discussion, journal clubs, and contributing on Piazza.

Late policy: Submissions are due by midnight on the indicated date; Gradescope's timestamp is official. For late work, we are going to use a token policy in this class this semester. You will have 10 tokens for the term. You may use up to 3 tokens per assignment; each token gets you an extra day (24 hours as counted by Gradescope). You don't need to tell anyone, just submit and we will count the number of late days as the number of tokens used. It is your job to keep track of your token usage. Beyond the 10 tokens, we will not accept late submissions; submit what you have for partial credit. Turning work in on time is important both for consistency in grading, and because it allows us to discuss homeworks in class in a timely fashion.

As usual, in the case of serious illness or other truly exceptional circumstances (e.g., situations where your Academic Dean is involved), let us know and we will work something out.

Diversity, Inclusion, and Collegiality: Tufts, the Computer Science Department, and the course staff intend to create a welcoming environment in which all students feel supported and believe that their learning needs and perspectives are valued. We intend to present materials in ways that are respectful to students of any background, ethnicity, race, culture, gender, sexual orientation, or age. We welcome your suggestions on how to improve course effectiveness for yourself or others. If you have religious conflicts with class meetings or requirements, please connect with the course staff.

In this class, we will encourage questions, discussions, and some assignments that involve interacting in groups. While disagreements and differing opinions can be an important part of the learning experience, we expect all students to treat each other with collegiality and respect. Please reach out to course staff if there are any issues with inter-student interactions. While we do not expect this will be necessary, please be reminded that we will, if needed, follow the steps outlined in Tufts' sexual misconduct and non-discrimination policies.

Please also be aware that Tufts faculty are "mandated reporters": if we see, hear, or learn about any kind of discrimination or sexual misconduct, we are required to report it to the university. If you would like to access confidential counseling for an issue, you can find relevant resources here.

Accomodation for Students with Disabilities: Tufts University values the diversity of our students, staff, and faculty, recognizing the important contribution each student makes to our unique community. Tufts is committed to providing equal access and support to all qualified students through the provision of reasonable accommodations, so that each student may fully participate in the Tufts experience.

If you have a disability that requires reasonable accommodations, please contact the Student Accessibility Services (SAS) office at Accessibility@tufts.edu or 617-627-4539 to make an appointment with an SAS representative to determine appropriate accommodations. Please be aware that accommodations cannot be enacted retroactively, making timeliness a critical aspect for their provision.

You can find more information on Tufts accessibility policies and procedures here.

In addition to following the standard procedures, if you have a disability and would like to discuss how we can better support your learning, please feel free to set up an appointment with course staff.

Academic Integrity: The Tufts academic integrity policy and code of conduct appears here. In particular, plagiarism will not be tolerated.

Please see our collaboration policy below describing what is and is not acceptable in the context of this course. If you are not certain what constitutes plagiarism, please see the academic integrity resources at the link above.

Please be aware that if Tufts faculty find evidence of academic misconduct, we are required to report it to the university.

Collaboration Policy: All written work and code submitted should be your own unless you obtain prior permission to collaborate. You are free to discuss assignments with others in the class unless specifically asked not to, but you must write up your answers and code yourself. We reserve the right to use computational tools to identify instances of plagiarism or materials (text or code) first written by someone else - whether published online or previously or concurrently submitted at Tufts.

All sources used should be cited. In other words, if you discuss a homework problem with a classmate, you should list that classmate as one of your references for that problem. Please also be warned that not everything you read online is correct. (This is true of print sources as well, but the risk increases greatly online.) Even data from supposedly reputable sources, such as slides posted by faculty at Tufts or other universities, may not have been reviewed by an editor and might contain crucial typos. For this reason, I'd like to discourage you from using Google to tackle the problem sets, but if you choose to do so, you must cite the URL(s) that you used. Directly copying text or code from any source without attribution is plagiarism and will be dealt with accordingly.


Course Materials

For homeworks, slides, and other class information, go to the private course materials page. You will need to log in using your CS department account and password. An account will be created for all students registered for the course in SIS who do not already have one.

Tentative Course Schedule:

Updates will occur during the term: check back frequently. Shaded rows refer to past dates.
DATE TOPICS READING OPTIONAL READING
Mon., Feb. 1 Class overview and administrivia.
Introduction to sequences and sequence comparison.
This course Syllabus.
Zvelebil & Baum (ZB): Chapter 1 and Section 4.1
For CS students new to biology: Larry Hunter's article, Molecular Biology for Computer Scientists.
For bio or BME students or others with less formal CS background: either Cormen, Leiserson, Rivest and Stein Chapters 2 + 3, or Jones and Pevzner, Chapter 2: Bio O notation, NP-completeness.
Weds., Feb. 3 Sequence alignment:
Global alignment. Dynamic programming.
ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
Mon., Feb. 8 Finish global alignment. Local alignment. Scoring schemes.
Hwk 1 out
ZB: Sections 4.2, 4.5 (pp. 87-89 only); 5.2 Global alignment: Durbin, pp. 17-22.
Local alignment: Durbin, pp. 23-24, 29-30
Weds. Feb. 10 Sequence alignment: gaps, scoring matrices, PAM and BLOSUM ZB: Sections 4.3, 4.4, 5.1
Fri., Feb. 12 Hwk 1 part 1 due
Mon., Feb. 15 NO CLASS (Holiday)
TUES., Feb. 16
(Monday schedule)
DB search, BLAST, Significance of alignment scores
Quiz 1
ZB: 4.6, 4.7, 5.4 Altschul's tutorial on statistics of sequence similarity scores. Altschul's slides on information theory, scoring matrices, and E-values.
Weds., Feb. 17 Compressive BLAST (journal club) Compressive BLAST paper
Fri., Feb. 19 Hwk 1 part 2 due
Mon., Feb. 22 DNA motifs, profiles. ZB: 6.1, 6.6 Original paper on the Gibbs sampler for local multiple alignment
Original paper on MEME algorithm
Weds., Feb. 24 Gibbs sampling. Iterative search methods. Multiple sequence alignment: scoring, optimal methods, star alignment
Hwk 2 out
Ron Shamir's MSA notes, ZB: 4.5 (pp. 90-93) Durbin, 6.1--6.4
Mon., Mar. 1 Multiple sequence alignment: iterative and progressive methods ZB: 6.4-6.5
Tues., Mar. 2 Hwk 2 part 1 due
Weds, Mar. 3 Sequence assembly: Introduction. deBruijn graphs and Eulerian paths.
Quiz 2
GAGE: Evaluating short-read assemblies
Fri., Mar. 5 Hwk 2 part 2 due
Mon., Mar. 8 deBruijn graphs, Eulerian paths ZB: 5.3(pp. 141-3) Dan Gusfield's introduction to suffix trees
Weds., Mar. 10 OLC Sequence assembly (journal club) ARACHNE paper on overlap-based whole genome assembly; supplemental methods text on k-mer sorting Schuster's review article on sequencing methods; Mardis' more detailed article about "next generation" sequencing technologies. The paper about the SOAPdenovo assembler.
Mon., Mar. 15 Overlap graphs, Hamiltonian paths. Suffix trees for overlap detection.
Hwk 3 out
ZB: 5.3(pp. 141-3) Dan Gusfield's introduction to suffix trees
Weds., Mar. 17 Gene finding and intro to Hidden Markov Models (HMMs) ZB: 9.2-9.7 Rabiner handout, pp. 257-266.
Fri., Mar. 19 Hwk 3 part 1 due
Mon., Mar. 22 Hidden Markov Models (HMMs)
Quiz 3
Rabiner handout, pp. 257-266.
Weds., Mar. 24 NO CLASS: break day
Fri., Mar. 26 Hwk 3 part 2 due
Mon., Mar. 29 Hidden Markov Models; Durbin: chapter 3
Weds., Mar. 31 Finish Hidden Markov Models; EM algorithms. HMM uses in gene finding. ZB: 10.2- 10.8 ; short paper on EM algorithms
Mon., Apr. 5 Gene expression: technology, normalization, detecting differential expression
Hwk 4 out
ZB: 15.1, 16.1, 16.4 Slonim review article
Weds., Apr. 7 Gene expression: clustering and classification. ZB: 16.2-16.3, 16.5 Golub and Slonim et al., on leukemia classification,
Fri., Apr. 9 Hwk 4 part 1 due
Mon., Apr. 12 Functional interpretation: gene set analysis, Gene Ontology, functional enrichment. (journal club) Gene Set Enrichment Analysis
Weds., Apr. 14 Introduction to phylogeny.
Quiz 4
ZB: 7.1, 7.3 Mona Singh's phylogeny notes
Fri., Apr. 16 Hwk 4 part 2 due
Mon., Apr. 19 NO CLASS: Patriots' Day
Weds., Apr. 21 Phylogeny
Hwk 5 out
ZB: 8.1-8.4
Mon., Apr. 26 Bioinformatics ethics discussion
Hwk 5 part 1 due
Belmont Report
Weds., Apr. 28 Bioinformatics ethics discussion
Quiz 5
Mon., May 3 anomaly detection for precision medicine
Hwk 5 part 2 due
Noto, et al., 2015 on anomaly detection Pietras, et al., 2020 on temporal anomaly detection