This lab has three goals. First, you will be introduced to the editor for writing Python scripts and will use it to write a short program using your understanding of variables and strings. Second, you will begin to use and interpret BLAST results, in part to investigate the origins of antibiotic resistance in human pathogens. Finally, you will become familiar with writing electronic lab notes about your computational work.
You can find most of the web links you need for this lab under the "links" tab on the course web site.
What you will submit for lab assignments
In each lab, you will be asked to do some bioinformatics analysis and answer some specific questions. You may also write programs. At the end of the lab period, you should minimally submit a document with the answers to these questions, and a separate python program file for each programming question, through Trunk. Make sure you and your lab partner have your names in all submitted documents.
In addition, you should create an electronic lab notebook documenting your bioinformatics analysis efforts and perhaps your program development efforts. Most people combine this notebook file with the file containing the answers to the lab questions, but you may choose to keep it separate. Discuss with your lab partner which you prefer.
You may record your lab notes and responses in any way that you choose so long as we are able to read them. A plain text file is fine: you may edit such files using Notepad++ on the lab machines. Alternatively, Microsoft Office is also available on the lab machines.
Students often ask how much detail to include in electronic lab notebooks. The answer is that you should include enough detail that someone else could reproduce your efforts. That said, it needn't be written expansively; sentence fragments and lists of options for online tools are fine for lab notebooks (although not for Project submissions).
When you are using online bioinformatics tools, you should document all options you chose and why you chose them. If you are programming in Python, your code may tell the whole story (if it is well written and includes comments), so you don't necessarily need to write anything in the notebook about programming. On the other hand, if you went through a process of trying different ideas in your program, you may want to document these design decisions or the process you followed.
Working in pairs
When working with a partner in lab, at any given time one person should be typing and the other should be closely observing and suggesting content. For programming in particular, it has been shown that working in pairs helps reduce frustration and identify errors more quickly. In a pedagogical setting, this works best when you swap roles frequently, so please remember to take turns. Counter-intuitively, it may also be more effective to have the person who is less confident about a particular part of the material be the one who types when you are working on that part.
Saving your files
Be sure to save your files in or below your home directory (drive Z:\ on the lab machines), not to the local desktop, to ensure that they will be saved once you log out and will be accessible from other lab machines.
When you are doing lab work with a partner, we strongly suggest that both of you save copies of your submissions. One simple way to do this would be to have the team member who is not logged in in lab be the one to submit the files via Trunk. Another would be to email a copy of the submission to that member of the team. Before leaving, or on another occasion, the recipient can log in to one of the lab machines, read the email or go to the Trunk site, and save the files to their own home directory.
Creating and editing Python programs
You will be writing and running your programs using a tool called IDLE. The advantage of IDLE is that it is simple to use, it offers all the functionality we need for this course, and it is available for both Mac and Windows platforms if you want to work on your own machines.
To start IDLE on the lab machines, go to Start -> All Programs -> Python 2.7, and run "IDLE (Python GUI)". This will start IDLE in the interpreter window.
To create a new python file, click on the File menu and choose "New File". This will pop up a new editor window with a blank python script. To save it, click on the File menu and select "Save". Type in the filename. For example, your file might be called "SomeName.py". The ".py" extension is important. It tells you - and various computer programs - that this text file contains a Python script. We suggest that you save it under your Z: directory so that it will remain available after you log out.
The first line of each Python script should be a comment line that gives both of your names. E.g.,
# This is the solution to Lab 2, problem 3, by Joe Shmoe and Penny Python
Running and testing your Python code
For now, we will run our programs within IDLE. Make sure your cursor is in the editor window. Find the Run menu at the top of the screen, and choose "Run Module" from the Run menu pulldown. After asking if it's okay to save the program first (click OK), IDLE will run your entire script in the interpreter window, where it will ask for any input and print any output.
When you try to run your script, it may not work at all if the program's syntax is incorrect. In this case, you should see an error message with a line number, or red highlighting the line where IDLE thinks your syntax error is located. Note that IDLE is not always completely accurate when reporting which line contains the error; if it really looks right to you, look at the lines just before and after as well.
Revise your program and continue to test it until you are satisfied with the results.
To start, you will create a Python script from scratch. You may want to review the Code Academy lesson on Strings and Console Output for reminders about syntax. The section on String Methods may help you figure out how to get string length or convert to uppercase, and the section on Advanced Printing shows some ways of printing both strings and variables. There are several ways of solving this problem; for now, any solution is fine.
firstname = "Donna" lastname = "Slonim"
Then, create a new variable called totalLen. Have the program use the variables you created to compute the sum of the length of your first name and the length of your last name, and store that value in the variable totalLen. For example, "Donna" has 5 letters, and "Slonim" has 6, so the value of totalLen should be 11.
Print the value of all three variables. For example, the output in
my case would look like:
Donna Slonim 11
Following the instructions under "Running your work", above, test your program until it runs and produces the correct results. Then continue to modify this same program as described in steps ii-iv, below.
Print the value of this new variable too. Run and test your code again.
Again, run and test your code until it does the right thing.
At this point, the output of your program (which now prints the values of
all five variables) might look like the following:
Donna Slonim 11 Donna Slonim DONNA SLONIM
You want to compute what percentage of the total length of your
first and last names is accounted for by your first name. Use variables that
hold the length of each and figure out what mathematical operations
you need to do to compute the
percentage. For example, in my case, my 5-letter first name is approximately
45.45% of the total length of 11 letters. (Python will print more
digits after the decimal point; we'll learn later how to get
exactly the desired amount of precision.) Print this percentage out
after the other variables. For example:
Donna Slonim 11 Donna Slonim DONNA SLONIM 45.454545454545453
Submit this program with your writeup. In your writeup, document any issues you had with writing the code, and explain whether you got the answer you were expecting when computing the percentage. If not, why not? Explain what you had to do to get the correct result.
Download the file Lab1.unknownseq.fasta from the online version of this lab. This is a text file containing the sequence of a known and well-characterized protein in FASTA format. You will use BLAST to figure out what protein it is.
Run a protein BLAST search against the "Reference proteins" (RefSeq) database to identify this sequence. Recall from your reading in Chapter 2 that RefSeq is a database that provides canonical representative sequences for each transcript and protein product, and that mRNA sequences have RefSeq IDs starting with "NM" (e.g., NM_000518.4), while protein sequences have RefSeq IDs starting with "NP" (e.g., NP_000509.1). Use the defaults for all other options.
(In case NCBI BLAST is not working, choose the EBI BLAST server link from the Links page, select the Protein link under the header "NCBI BLAST," and compare to the "uniprotkb_reference_proteomes" database. To see the alignments, you will then need to look at the box marked "Apply to selection:" to the left of the summary table, and click the "Show" button under the heading "Alignments.")
In your lab writeup, document any choices you needed to make, and answer the following questions:
Download the file Lab1.resistgene.fasta from the online version of this lab. This is the ermB gene, which confers erythromycin resistance in Streptococcus agalactiae.
Start a new nucleotide BLAST search. Use the nr/nt database, limit the organism to bacteria, and check the Exclude box (the one on the line below the organism line) to exclude sequences from uncultured/environmental samples. (These are bacterial sequences from environmental samples where we don't necessarily know what organisms they came from; such hits won't help us address our question, so we are excluding them.) Use the defaults for all other options.
Answer the following questions:
Submitting your work:
First, if you are working with a partner, please be sure that both of your names are in all the files you intend to upload.
You will submit your work on Trunk.
Log in to Trunk if you haven't yet done so. Select the Assignments tab, and select Lab 1. You should be able to submit from this page. Simply attach all of your files, including your lab notes, and submit. You should receive email confirmation of your successful submission.
Run the query, and look at some of the lower-scoring alignments shown. Are these proteins homologous to the query sequence? Do you think they are close enough to have been horizontally transferred?
>>> i = 2 >>> city = "Boston" >>> city >>> len(city)+i
What output do you see if you type the same code into a new python program file, save it, and run it? What would you have to change about your program to see the same output that you saw in the interpreter window?
>>> dog = raw_input("Please type in your dog's name: ")Type "Snoopy" at the prompt. If you then type "print dog" into the interpreter window, you will see that the variable dog now holds the string "Snoopy". (More documentation on the raw_input function appars here.)
Modify your script myname.py to ask users to input their first and last names. Specifically, have the program print a message like "please enter your first name:", read the input from the user, and store it in the firstname variable. Do the same for the last name. Then print the same output: the sum of the string lengths, the concatenated name, and the uppercase name.
Initials are DSTo do this, recall that you can access the letters of a string by putting the index of the position you want in square brackets. For example, if the variable dog holds the value "Snoopy", then
print dogprints the letter 'n', because computers count starting at zero.