You are on page 1of 4

CBE647 Bioinformatics Laboratory 1

The goal of this lab is to give you practical experience using the NCBI interface: how to navigate the website, how to perform basic and advanced searches. You will need to draw on what you learned in lectures, as well as the on-line help available to you through the website itself. Becoming experienced in using these sites will help you in the future as you try to identify novel genes or perform your own research. A. Finding Public Biological Databases The 2010 Database Issue of Nucleic Acids Research is the sixteenth in a series dedicated to factual biological databases. Such databases are an essential resource for working biologists and this compilation provides descriptions of the most important of these databases and serves to introduce newly compiled databases that provide specialist information in the biological area. NAR Online contains hotlinks to all of the databases in the compilation as well as brief summaries of their content. Go to the NAR (http://www.oxfordjournals.org/nar/database/a/) Visit databases of your own selection to see how these databases are accessed and what information is available. Include Genbank (Nucleotide Sequence Databases), KEGG (Metabolic Pathways), UniProtKB (Proteins), OMIM and GO (Gene Ontology) in your visit.

B. NCBI Entrez and Searching Biological Databases Problem: Triose Phosphate Isomerase We are going to investigate the human triose phosphate (or triosephosphate) isomerase 1 gene. This gene is responsible for the reaction that converts dihydroxyacetone phosphate to glyceraldehyde-3-phosphate in glycolysis. Glycolysis is the pathway in cells where a simple sugar (glucose) is transformed into two pyruvate molecules, which are then used to generate energy for the cell. Much is known about this gene, and when it is deficient, severe problems can occur. A loss of function mutation in this gene would be lethal. First, visit the NCBI website (http://ncbi.nlm.nih.gov/) and visit the All Databases page. 1. What would be a good search query to use for this gene that would specify both the name of the gene as well as the organism? For your answer here, dont worry about the difference between triose phosphate vs. triosephosphate; however, you may have to use both searches for the rest of the lab. Use this search query in the Gene, Protein, and Nucleotide sections of Entrez. You should observe different results in each, although they will contain much similar information.

2. What is the RefSeq accession number for this gene in the mRNA form and for the protein form? 3. On what chromosome is this gene found? 4. How many amino acids are in the protein chain? What are the first five? One of the useful abilities of Entrez is to cross reference recent publications that relate to this gene. A recent paper published implicates the triose phosphate isomerase protein in the disease Lupus. 5. Who were the three authors of this paper? What is this papers unique PubMed ID?

C. Determination of the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 (HBA2) Gene. In this exercise you will learn how to determine an open reading frame (ORF) and determine the gene product of the ORF. A reading frame is a way of dividing the sequence of nucleotides in a nucleic acid (DNA or RNA) molecule into a set of consecutive, nonoverlapping triplets. Where these triplets equate to amino acids or stop signals during translation, they are called codons. A single strand of a nucleic acid molecule has a phosphoryl end, (called the 5-end) and a hydroxyl, or (3-end). These then define the 5'3' direction. An open reading frame (ORF) is the part of a reading frame that contains no stop codons. 1. Retrieve the alpha 2 globin mRNA sequence (NM_000517) from the GenBank database. Can you manually identify the Open Reading Frame (ORF), i.e., the coding sequence (e.g., in notepad or wordpad)? Proceed by determining the start and stop codons (use genetic code table). Note that the sequence contains triplets of nucleotides that are similar to the start/stop codons but which are not the true start and stop codons. Why is that? 2. Once you have determined the ORF of the HBA2 gene, translate the first 10 codons to the amino acid sequence (use genetic code table)

3. Are the ORF and the amino acid sequence confirmed by the NM_000517 annotation in the GenBank database?

4. For the automatic determination of putative ORFs you can also use the ORF finder at the NCBI site. Go to the ORF finder and copy/paste the NM_000517 sequence or just type in the accession code (the program is linked to the GenBank database). The results are the ORFs for all six reading frames. The longest ORF is most probably the frame that will be translated to the protein. By clicking on the largest ORF, the corresponding translation is given. Is this correct?

D. Sequence Extraction This part of the lab will guide you through the process of getting DNA sequences using the NCBI GeneBank database as a source. STEP 1 Go to the NCBI website STEP 2 Choose your search type (Nucleotide) and enter your search item in the box. Some examples of search items are: - ara h2 - opsins You will get a lot of results but for the purpose of this lab, find the following links and click on them: - Ara h2 ==> AY158467 - Opsins ==> NM_020061 STEP 3 The new page contains the DNA coding sequence for the proteins at the bottom, below Origin. Click and drag the cursor to highlight the entire sequence, right click the highlighted sequence and select copy to store it. - Ara h2 ==> from 1atggc to tactaa - Opsins ==> from cggctgccgt to ccaa *** Also copy this sequence and store in a .txt file. Remember to delete the numbers at the beginning of each row.*** STEP 4 Open the Expasy page to view the translation tool. This tool will do in seconds what will take you hours to do. It reads the codons in the sequence and translates them into proteins. STEP 5 Right click the cursor in the box below Please enter DNA and select paste to enter your gene sequence. To the right of Output format, select Includes nucleotide sequence from the drop-down menu and click Translate Sequence. Your results in the 5 3 Frame 1 should show the amino acid/ protein sequence of the gene in capital letters below the corresponding codons of the gene. Notice that: - Ara h2 ==> the gene starts with atg and the corresponding protein is M for methionine

- Opsins ==> the gene starts with cgg and the corresponding protein is R for arginine. The other frames translate the sequence but in an alternate direction from the 5 3 Frame 1 frame. STEP 6 Click on the 5 3 Frame 1 link to open another window with just the protein sequence. Click and drag the cursor to highlight the entire sequence, right click the highlighted sequence and select copy to store it. Now we are going to BLAST the sequence! BLAST is a tool that will match your sequence to any other similar sequences and give you a description of what your gene is/ does. Click http://www.ncbi.nlm.nih.gov/BLAST to open BLAST. STEP 7 Click protein blast and right click to paste your protein sequence into the large text box. Click BLAST. You have just asked the BLAST program to search the entire NCBI protein database for matches to your sequence. The BLAST results page can be a lot to take in, but the colour-coded graph shows the most similar sequence in red and other sequences that are less similar in magenta, green, blue and black. Under the graph, click on one of the links with a high score. On the resulting page, look for a DEFINITION or TITLE that will give you information about your gene sequence. For the examples we have been using, one of them is a peanut allergen and the other is an eye gene related to long-wave sensitivity and colour blindness. Can you tell which is which?

Submission Instructions: Please submit your individual laboratory report in the proper format by Wednesday (2nd April 2014) for EH222 8A or Thursday (3rd April 2014) for EH222 8B.

You might also like