Comp 7 / Bio 40 Lab 7
Turning it around: dictionaries and loops

November 9, 2016

Purpose of this lab

To give students practice working with Python dictionaries and more complex control structures.

Lab problems

  1. What's in a name?

    In this problem, you will create your own dict to translate the common names of various species to their Latin names.

    Write a Python script containing a dict whose keys are strings containing the common names of animals, and whose values are strings containing the associated Latin names. Pick any four species you would like, and in your program, add these four species to your dict.

    The program should then use the dict to obtain a list of the included species, and should print out a message telling the user what species are available. Then it should ask the user to choose one to look up. If the species that the user enters is in the dict, the program should print out the corresponding Latin name. If not, it should print a message saying so.

    A while loop can iterate until the user inputs something that is not in the the dict.

    A sample run of your code might look like this:

    The available animals are 
    house wren 
    striped bass
    Which would you like to look up? 
    >>>  cheetah
    The Latin name for the cheetah is Acinonyx jubatus.
    Which would you like to look up? 
    >>>  house wren
    The Latin name for the house wren is Troglodytes aedon.
    Which would you like to look up? 
    >>>  done
    Okay, bye!
  2. Translating an APOE coding region

    Save the file from the labs web page. This file contains a dict including the mapping from codons to single letter amino acid codes. (Stop codons will be represented by the character '-' once you enter them.)

    Also save the file apoe_cds.txt from the labs web page. This is a file that contains the nucleotide sequence of part of the coding region of the APOE gene.

    Your goal in this problem is to translate the APOE coding region to its amino acid sequence.

    You can write your program in the file so that it can use the dict that is already in there. (Or you may copy this over into a new file.)

    First, add the stop codons to the definition of the dict codon in your file. The corresponding coding sequences are 'TAA', 'TGA', and 'TAG'. They should have the value '-' instead of a single-letter amino acid code.

    Next, define a string variable called proteinseq. Read the APOE sequence from the file, and use the dict to convert each trio of letters in order into the appropriate amino acid. each three characters in the sequence (assume that your reading frame starts from the very first character in the sequence). Add the translated character to proteinseq.

    Print out the translated sequence to a file named "apoe_protein.txt". Submit this file with your writeup.

  3. Student name lookup: turning it around.

    Save the file studentnames.txt from the labs web page. This is a file containing a list of exactly 50 student names and id numbers, where each id is a number from 1 to 50 (and each number is used exactly once). This file is identical to the one that we used in lab6 for student ID lookup.

    Last time you wrote a python script that opened and read the tab-delimited list of names from a file, stored the names in a list whose index is the id number of the corresponding name, asked the user for a student ID number, and printed out the name corresponding to that id number.

    This time, you will use a dictionary to turn it around: you will ask the user for a name, and print out the id number to which that name corresponds. Note that the spacing and capitalization of the name matter, so when you are asking for "Larry Bird", be careful to type just that and not "larry bird". If the name is not in your list, print "Bye!" and exit.

    We suggest that you start with your program from lab 6, so that you don't have to rewrite all the code to open and read data from the file. If you can't find it or never got it to work, you can use ours here. Save a copy of this program (or better yet, your own program from lab 6) as "", or some other name with a ".py" file extension, and then edit this copy in the IDLE editor to solve the problem. You may need to edit the location or name of the input file, depending on how you save it.

    Recall that the easiest way to write a loop that repeats until something specific happens is to use a while loop. If you did this in some other way last week, this is a good time to replace that code with a while loop. Note that our sample code also uses the "with ... as" format for opening a file, as in the CodeAcademy File I/O assignment. Ask us questions if you don't understand how this is working.

    An example run of your program (with a different student list) might look like:

    Please enter a name:  Tom Brady
    Please enter a name:  Derek Jeter
    Please enter a name:  Larry Bird
    Please enter a name:  Fred Astaire

Going Further:

  1. Modify your program from question 2 above to read the codon data from a file and create the dict mapping codons to amino acid codes yourself.

    To do this, save the file codon_chart.txt from the labs web page. Read all the codons and the corresponding amino acid letters from the codon_chart.txt file and store them in a dictionary. Confirm that you can still translate the sequence as before.

  2. Find all the names on coca-cola bottles that start with "C".

    Save the file cocacola_names.txt from the labs web page. This file contains all the first names that have appeared on coca-cola bottle labels. You will read the names from the file and store them in a python dictionary.

    Before you read the names from the file, define a python dictionary called names. You will need to initialize names in a way that it will have A-Z as keys and empty lists as values. You can do this with the following code snippet:

    import string

    alphabet = string.uppercase
    names = {}

    for c in alphabet:
         names[c] = []

    The code shown above will create a python dictionary that has 26 keys (all the capital letters) and an empty list value for each key. Now add the names in the file to the dictionary by appending the name to the list with its initial as the key. For example, if you want to add "Aaron" to the dictionary you can do:

    name = "Aaron"
    firstc = name[0]

    Finally, read an uppercase letter from the terminal using the raw_input function, and print out all the names that have that initial letter. Repeat this process until the user enters a string that is not an uppercase letter. (Use a while loop for this.)

    An example run might look like:

    Enter an uppercase letter: A
    ['Aarav', 'Aaron', 'Adam', 'Adrian', 'Alana', 'Alex', 'Alice', 'Amanda', 'Amy', 'Andrea', 'Angela', 'Anna', 'Anthony', 'Ash']

    Enter an uppercase letter: B
    ['Belinda', 'Bella', 'Ben', 'Bianca', 'Brad', 'Brandon', 'Brett', 'Brittany', 'Brooke']

    Enter an uppercase letter: C
    ['Caitlin', 'Callum', 'Cameron', 'Casey', 'Cass', 'Catherine', 'Chelsea', 'Chris', 'Christian', 'Claire', 'Corey', 'Courtney', 'Craig']

    Enter an uppercase letter: 6


  3. Two levels deep

    Now modify program 1 above to create another dict, whose keys are the Latin names of the animals and whose values are the conservation status (as reported if you query the animal name in Wikipedia). You can even make up some values if you can't find them; "Unknown" is also an acceptable status for this problem, but in most cases the species' summary record will include more detail. (E.g., the cheetah is described as "Vulnerable (Population decreasing)," while the striped bass is doing just fine, with status described as "Least Concern.")

    Modify the lookup part of the program to ask the user for an animal's common name, use both dicts to find the associated conservation status, and print that out.