You are on page 1of 20

BIOINFORMATICS

MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

MODULE IN BIOINFORMATICS

Databases for the Storage and “Mining” of Genome Sequences

Using Databases to Compare and Identify Related Protein Sequences


Formatted: Level 1, Space Before: Auto,
After: Auto

Visualizing Three-Dimensional Protein Structures

Enzyme Inhibitors and Rational Drug Design

Metabolic Enzymes, Microarrays, and Proteomics


BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

MODULE 1: Databases for the Storage and “Mining” of Genome


Sequences

This is an introduction to nucleotides, nucleic acids (DNA and RNA), and the processes of
transcription and translation. The exercises below are designed to introduce you to some of the
relevant databases and the tools they contain for examining and comparing different bits of
information (see Sections 3-4C and 3-4D). Biological databases are an important resource for the
study of biochemistry at all levels. These databases contain huge amounts of information about the
sequences and structures of nucleic acids (DNA and RNA) and proteins. They also contain software
tools that can be used to analyze the data. Some of the software—called web applications—can be
used directly from a web browser. Other software—called freestanding applications—must be
downloaded and installed on your local computer.

1. Finding Databases. We'll start with finding databases.


a. What major online databases contain DNA and protein sequences?
b. Which databases contain entire genomes?
c. Using your textbook and online resources (http://www.google.com), make sure you
understand the meaning of the following terms: BLAST, taxonomy, gene ontology, phylogenetic
trees, and multiple sequence alignment. Once you have defined these terms, find resources on the
Internet that enable you to study them.

2. TIGR (The Institute for Genomic Research). Open the TIGR site (http://www.tigr.org/db.shtml)
Find the Comprehensive Microbial Resource.
a. What 2001 publication describes the Comprehensive Microbial Resource at TIGR?
b. How many completed genomes from Pseudomonas species have been deposited at TIGR?
c. Which Pseudomonas species are these?
d. Identify the primary reference for Pseudomonas putida KT2440.
e. Find the link on the Comprehensive Microbial Resource home page for restriction digests. Perform
a computer-generated restriction digest on Pseudomonas putida KT2440 with BamH1. How many
fragments form and what is the average fragment size? (See Section 3-4A for a discussion of
restriction endonucleases.)
f. In addition to microbial genomes, TIGR also contains the genomes of many higher organisms.
Identify five eukaryotic genomes that are available at TIGR.

3. Analyzing a DNA Sequence. Using high-throughput methods, scientists are now able to sequence
entire genomes in a very short period of time. Sequencing a genome is quite an accomplishment in
itself, but it is really only the beginning of the study of an organism. Further study can be done both at
the wet lab bench and on the computer. In this problem, you will use a computer to help you identify
an open reading frame, determine the protein that it will express, and find the bacterial source for that
protein. Here is the DNA sequence: Click here for text version
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

a. First, try to find an open reading frame in this segment of DNA. What is an open reading frame
(ORF)? You can find the answer in your textbook (Section 3-4D) or online with a simple Internet
search (http://www.google.com). You may also wish to try the bookshelf at PubMed
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books). In bacteria, an
open reading frame on a piece of mRNA almost always begins with AUG, which corresponds to
ATG in the DNA segment that codes for the mRNA. According to the standard genetic code (Table
26-1), there are three Stop codons on mRNA: UAA, UAG, and UGA, which correspond to TAA,
TAG, and TGA in the parent DNA segment. Here are the rules for finding an open reading frame in
this piece of bacterial DNA:
1. It must start with ATG. In this exercise, the first ATG is the Start codon. In a real gene search,
you would not have this information.
2. It must end with TAA, TAG, or TGA.
3. It must be at least 300 nucleotides long (coding for 100 amino acids).
4. The ATG Start codon and the Stop codon must be in frame. This means that the total number of
bases in the sequence from the Start to the Stop codon must be evenly divisible by 3 (see
Section 26-1A).
Hints: Try this search by pasting the DNA sequence into a word processing program, then
searching for the Start and Stop codons. Once you have found a pair, highlight the text of the
proposed ORF and use the program's Word Count function to count the number of characters
between (or including) the Start and Stop codons. This number must be evenly divisible by 3. You
can also use a fixed-width font such as Courier, enlarge the size of the text, and adjust the margins
so that each line holds just three characters (one codon). Once you find the first ATG, delete the
characters that precede it. Then search for a Stop codon that fits all on one line (is in the same
reading frame as the Start codon).
b. Admittedly, Part (a) is a tedious approach. Here is an easier one: Highlight the entire DNA
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

sequence again and copy it. Then go to the Translate tool on the ExPASy server
(http://www.expasy.org/tools/dna.html). Paste the sequence into the box entitled
―Please enter a DNA or RNA sequence in the box below (numbers and blanks are ignored).‖ Then
select ―Verbose (―Met‖, ―Stop‖, spaces between residues)‖ as the Output format and click on
―Translate Sequence.‖ The ―Results of Translation‖ page that appears contains six different reading
frames. What is a reading frame and why are there six? (Refer to Section 26-1A, the Internet, or the
PubMed bookshelf for an answer.) Identify the reading frame that contains a protein (more than
100 continuous amino acids with no interruptions by a Stop codon) and note its name. Now go back
to the Translate tool page, leave the DNA sequence in the sequence box, but select ―Compact (―M‖,
―-‖, no spaces)‖ as the Output format. Go to the same reading frame as before and copy the protein
sequence (by one-letter abbreviations) starting with ―M‖ for methionine and ending in ―-‖ for the
Stop codon. Save this sequence to a separate text file.
c. Now you will identify the protein and the bacterial source. Go to the NCBI BLAST page
(http://www.ncbi.nlm.nih.gov/BLAST/). What does BLAST stand for? You will do a
simple BLAST search using your protein sequence, but you can do much more with BLAST. You
are encouraged to work the Tutorials on the BLAST home page to learn more. On the BLAST
page, select ―Protein-protein BLAST.‖ Enter your protein sequence in the ―Search‖ box. Use the
default values for the rest of the page and click on the ―BLAST!‖ button. You will be taken to the
―formatting BLAST‖ page. Click on the ―Format!‖ button. You may have to wait for the results.
Your protein should be the first one listed in the BLAST output. What is the protein and what is the
source?

Note to instructors: You can do this exercise with any DNA sequence. You can also start from a DNA
sequence directly in BLAST (use blastn) and find the genes that way. It is probably best to choose a
DNA segment that encodes only one protein.

4. Sequence Homology. You will use BLAST to look at sequences that are homologous to the protein
that you identified in Problem 3.
a. First, some definitions: What do the terms ―homolog,‖ ―ortholog,‖ and ―paralog― mean? Go to the
NCBI BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/) and choose ―Protein-
protein BLAST.‖ Paste your protein sequence into the ―Search‖ box. Before clicking on the
―BLAST!‖ button, narrow the search by kingdom. As you look down the BLAST page, you'll see
an Options section. Under ―Limit by entrez query‖ (followed by an empty box) or ―select from:‖
(followed by a drop-down menu), select ―Eukaryota.‖ Now click on the ―BLAST!‖ button. Click
on the ―Format!‖ button on the next page. Can you find a homologous sequence from yeast?
(Hint: Use your browser's Find tool to search for the term ―Saccharomyces.‖) Note the Score and E
value given at the right of the entry.
Can you find a homologous sequence from humans?
(Hint: Search for the term ―Homo.‖) Note its Score and E value.
Most biochemists consider 25% identity the cutoff for sequence homology, meaning that if two
proteins are less than 25% identical in sequence, more evidence is needed to determine whether
they are homologs. Click on the Score values for the yeast and human proteins to see each sequence
aligned with the Yersinia pestis sequence and to see the percent sequence identity. Are the yeast
and human sequences homologous to the Yersinia pestis sequence?
b. Use the BLAST online tutorial
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html)
to discover the meaning of the Score and E value for each sequence that is reported. What is the
difference between an identity and a conservative substitution? Provide an example of each from
the comparison of your sequence and a homologous sequence obtained from BLAST (see Section
5-4A for a discussion of conservative substitution).
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

c. BLAST uses a substitution matrix to assign values in the alignment process, based on the analysis
of amino acid substitutions in a wide variety of protein sequences. Be sure you understand the
meaning of the term ―substitution matrix.‖ What is the default substitution matrix on the BLAST
page? What other matrices are available? What is the source of the names for these substitution
matrices? Repeat the BLAST search in Problem 4(a) using a different substitution matrix. Do you
find different answers?

5. Plasmids and Cloning


a. REBASE is the Restriction Enzyme Database
(http://rebase.neb.com/rebase/rebase.html), which is supported by a number of
commercial restriction enzyme suppliers (restriction enzymes are described in Section 3-4A). Go to
the REBASE Enzymes page (http://rebase.neb.com/rebase/rebase.enz.html)
and find a restriction enzyme from Rhodothermus marinus (it starts with the letters Rma). What is
the abbreviation for this enzyme?
Click on the enzyme's abbreviation to be taken to the page for this enzyme. Follow the links there
to answer the following two questions. What is the recognition sequence for this enzyme? What are
the expected and actual frequencies of restriction enzyme recognition sites for this enzyme in
Bacillus halodurans C-125?
b. What is a plasmid? pBR322 was one of the first plasmids to be developed for experimental work.
Go to the Entrez site (http://www.ncbi.nlm.nih.gov/Entrez) and find the sequence of
pBR322 by searching for the terms ―pBR322, complete genome.‖ You must select Nucleotide as
your search option on the Entrez main page.
Look through the Entrez description of pBR322 and identify one gene encoded by pBR322 and
name the antibiotic that it targets.
You can get Entrez to display your sequence in FASTA format by selecting this option next to the
―Display‖ button. (Here are two of many sites that describe the FASTA format:
http://ngfnblast.gbf.de/docs/fasta.html;
http://bioinformatics.ubc.ca/resources/faq/?faq_id=1). Save the pBR322
sequence in FASTA format.
c. Go to PubMedCentral and search for a 1978 article in Nucleic Acids Research about restriction
mapping of pBR322. Download the article in pdf format (use Adobe Acrobat to read it; you can get
this program at http://www.adobe.com). What is the size of the pBR322 plasmid in number
of base pairs?
How many cut sites are there for the restriction enzyme HaeIII on pBR322?
d. Some restriction enzymes generate ―blunt ends,‖ and some generate ―sticky ends.‖ Explain the
meaning of those terms and provide an example of each.
e. Go to the RESTRICT site at the Pasteur Institute
(http://bioweb.pasteur.fr/seqanal/interfaces/restrict.html). Enter your
email address at the top, then input the pBR322 sequence file. Scroll down to the ―Required
section‖ and note that you have a Minimum recognition site length of four nucleotides and you
have selected all the enzymes available in REBASE to digest pBR322 at the same time. Click on
the ―Run Restrict‖ button.
On the output screen, click on the ―outfile.out‖ link. This takes you to a simple text page that lists
all the cuts that were made in the pBR322 plasmid. How many pBR322 fragments did ―all‖ the
enzymes generate? (Look for the ―HitCount‖ number on the output.out page).
What happens to the number of fragments when the minimum recognition site length is changed to
six nucleotides? Why did the number change?
f. Now change the enzyme name from ―all‖ to ―BamHI‖ in the enzymes box under the Required
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

section on the RESTRICT page. How many fragments are generated? How many fragments are
obtained using AvaI? What is the size of the restriction site for AvaI? How many fragments are
obtained using Eco47III? What is the size of the restriction site for Eco47III?
g. How many pBR322 fragments are produced when the three different enzymes are combined
(separate the enzyme names by commas)? How large are the fragments?
h. Use a mixture of the restriction enzymes BamHI, AvaI, and PstI to construct a restriction map of
pUC18 similar to the one shown in Fig. 3-25. How does this procedure for restriction mapping
differ from that used in Problem 10 at the end of Chapter 3?
i. For the adventurous: Find an enzyme or combination of enzymes that will produce 10 fragments
from pUC18. Draw a restriction map of your results.

---END OF MODULE 1---


BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

MODULE 2: Using Databases to Compare and Identify Related Protein


Sequences

1. Obtaining Sequences from BLAST. Triose phosphate isomerase is an enzyme that occurs in a
central metabolic pathway called glycolysis (see Chapter 14.) It is also known as an enzyme that
demonstrates catalytic perfection (see Section 12-1B). For this problem, you'll start with the sequence
of triose phosphate isomerase from rabbit muscle and look for related proteins in the online databases.
Here is the sequence of rabbit muscle triose phosphate isomerase in FASTA format: Click here for
text version

a. Go to http://www.ncbi.nlm.nih.gov/BLAST and follow the link to Protein-protein


BLAST (blastp) under Protein. Perform a BLAST search using the triose phosphate isomerase
sequence by copying and pasting it into the Search box. Find a human homolog of rabbit muscle
triose phosphate isomerase.
The first item in this record (gi|4507645|ref|NP_000356.1|) is a link to another database where this
protein is described in more detail (items that begin with ―gi‖ lead to GenBank records). The next
item (triosephosphate isomerase 1 [Ho ) is a description of the protein. Next is the score (493)
followed by the E value (e-138). In bioinformatics, two proteins are called ―homologs‖ if they
arose from a common ancestor; the two proteins are called ―orthologs‖ if they arose from a
common ancestor and perform the same function in two different species. Does the NP_000356.1
entry represent a human ortholog of rabbit muscle triose phosphate isomerase? What is the percent
identity between the two enzymes?
Find another human homolog to the rabbit muscle enzyme. Click on the link on the left side of the
record to bring up its Genbank entry. Select ―FASTA‖ as the display format and click on the
―Display‖ button. Copy the FASTA text and save it to a text file (if you are using a word processor,
be sure to save the file in ―text only‖ format). Save the text file (suggested name: TIM_FASTA.txt)
for later use.
b. Instead of trying to look through the entire BLAST output to find triose phosphate isomerase
homologs from plants, bacteria, and archaea, you can use some options in BLAST to narrow your
search. Return to the protein-protein BLAST page and paste the rabbit muscle sequence into the
―Search‖ box. This time, look down the BLAST page for an option to select ―Archaea‖ and then
perform the BLAST search. Select one of the resulting sequences and save it in FASTA format.
Repeat this process to get FASTA-formatted sequences for triose phosphate isomerases from a
bacterial and plant (Viridiplantae) source. Combine the five FASTA-formatted sequences (rabbit,
human, archaea, bacterial, and plant) in a single file (suggested name: TIM_5_FASTA.txt). This
must be a simple text file with individual sequences separated by a blank line.

2. Multiple Sequence Alignment. Multiple sequence alignment is a tool to identify highly conserved
residues in homologous proteins. A program called CLUSTALW will perform multiple sequence
alignments on protein sets that are submitted in FASTA format. CLUSTALW is available as a
command line program to be executed in a UNIX environment (not very user-friendly). Fortunately,
the European Bioinformatics Institute has a web interface that performs CLUSTALW alignments:
http://www.ebi.ac.uk/clustalw/.
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

[Note to instructors: The EBI asks that you alert them via email if you are using this resource for your
course. Respond to http://www.ebi.ac.uk/support/.]
a. Go to the EBI site and submit your text file containing the five triose phosphate isomerase
sequences in FASTA format on the input form page. There are many options for refining the
alignment, but for now, use the default values. Be sure to enter your email address. The output of
CLUSTALW can be accessed in many ways. The simplest version will be described here, but you
are encouraged to explore other options (especially JaiView). In the simple text output, the
sequences are optimally aligned and annotated: Residues that are identical in all chains are marked
with an asterisk (*), those that are highly conserved are marked with a colon (:), and those that are
semiconserved are marked with a period (.). From your multiple sequence alignment, how many
identical residues did you find? Identify the residues, using the single-letter amino acid
abbreviations found in Table 4-1. Classify these ―identity‖ sites as polar, nonpolar, acidic, and basic
amino acids. Do most of the ―identities‖ fall into a single class of amino acids? If you plan to
continue to Part (b), keep your browser open or bookmark the results page. You can learn more
about CLUSTALW at a tutorial provided by EBI
(http://www.ebi.ac.uk/2can/tutorials/protein/clustalw.html).
b. Figure 5-23 of the textbook shows a phylogenetic tree, which is described as ―a diagram that
indicates the ancestral relationships among organisms that produce the protein.‖ There are useful
tutorials on phylogenetic trees at the Los Alamos National Laboratories web site
(http://www.hiv.lanl.gov/content/hiv-db/TREE_TUTORIAL/Tree-
tutorial.html), at the EBI help page
(http://www.ebi.ac.uk/clustalw/tree_frame.html), and at the NCBI site
(http://www.ncbi.nlm.nih.gov/About/primer/phylo.html). Complete one or all
these tutorials.
Scroll down the output page from the CLUSTALW program at EBI to the tree representations of
the alignments. What is the difference between a cladogram and a phylogram tree? What do these
trees tell you about triose phosphate isomerase from the five different species? The tree image on
the EBI site is a dynamic image, meaning that you can't just cut and paste it. If you would like to
capture this image, you can use the PrintScreen button on your computer and paste the image into a
simple Paint program (with Mac OSX use the program Grab for screen capture).

3. One-Dimensional Electrophoresis. Electrophoresis is a laboratory technique that is used to separate


proteins and DNA molecules on the basis of size and charge. The principles of one-dimensional
electrophoresis (1DE) are explained in Sections 3-4B and 5-2D. Sodium dodecyl sulfate
polyacrylamide gel electrophoresis (SDS-PAGE) is the most common form of 1DE. In this technique,
proteins are mixed with a reducing agent (usually dithiothreitol or 2-mercaptoethanol) and a detergent
(SDS), heated for 5 minutes, then separated on an acrylamide gel. You can explore 1DE at the
Electrophoresis Simulation site
(http://www.rit.edu/~pac8612/electro/Electro_Sim.html), which contains a
Java applet that enables you to compare the migration of an unknown protein (you can choose from
seven unknowns) with a series of standards. Visit the site and report the molecular weights you
determine for each of the unknowns. Also, experiment with the controls (voltage, % acrylamide,
animation speed). You can learn how to use the applet by clicking on the ―How To ‖ button.

4. Two-Dimensional Electrophoresis. Two-dimensional electrophoresis (2DE) is also described in


Section 5-2D. In the first dimension, proteins are separated by isoelectric focusing; that is, they move
to a position in a pH gradient according to their isoelectric point (pI, the pH at which the net charge of
the protein is 0). Then they are separated according to molecular weight by SDS-PAGE in the second
dimension, as described in Problem 3. For this part of the exercise you will need to retrieve the text
file containing the triose phosphate isomerase sequences from five different species (Problem 1;
TIM_5_FASTA.txt).
a. The ExPASy Proteomics Server contains many tools for analyzing data from two-dimensional
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

electrophoresis gels, as well as a catalog of gels themselves. Go to the Primary structure analysis
tools section of the ExPASy Proteomics server at
http://us.expasy.org/tools/#primary. These tools will compute and predict values
for a protein based only on its primary structure.
Select the Compute pI/Mw tool (http://us.expasy.org/tools/pi_tool.html). Enter
the sequence for one of the five triose phosphate isomerase proteins in the data entry box. Be sure
to enter only one amino acid sequence and do not include its FASTA header (e.g.,
>gi|17389815|gb|AAH17917.1| Triosephosphate isomerase 1 [Homo sapiens]), because the
program will attempt to calculate pI and Mw values for each term entered. Record the predicted pI
and molecular weight for the protein. Repeat these steps for the other four protein sequences. How
similar are the pI and Mw values for the triose phosphate isomerases from the five different
organisms?
b. Now try to find triose phosphate isomerase on a published gel. Go to the SWISS-2DPAGE search
page (http://us.expasy.org/cgi-bin/ch2d-search-de) and search for triose
phosphate isomerase. Select the entry for the human enzyme. How do the reported values for pI and
Mw compare with the theoretical values you obtained in Part (a)? If the values are different, can
you suggest an explanation?
c. The best way to identify a protein spot on a 2DE gel is to use mass spectrometry (see Section 5-
3D). The ExPASy Proteomics server has tools to predict fragmentation patterns based on the
primary sequence of a protein and also to identify proteins based on fragmentation patterns from
actual mass spectrographs. Go to the Protein identification and characterization tool section at the
ExPASy Proteomics server (http://us.expasy.org/tools/#proteome). Select the
―PeptideMass‖ tool. Paste in one of your triose phosphate isomerase sequences, verify that
―trypsin‖ is selected as the enzyme, and leave the other options at their default settings. Click on
the ―Perform‖ button and record the four largest fragments that you would obtain if you digested
the protein with trypsin. Do the same for the four other sequences. Are any of the fragmentation
patterns identical between the species?
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

MODULE 3: Visualizing Three-Dimensional Protein Structures Formatted: Left

There are a number of useful free visualization tools available on the Internet. Each has strengths and
weakness. For this exercise you will use a tool called Rasmol that you can download from
http://www.bernstein-plus-sons.com/software/rasmol/README.html. Rasmol
is available for Macintosh, Windows, and Linux/UNIX operating systems. Install Rasmol on your
computer according to the instructions on the Rasmol site (http://www.bernstein-plus-
sons.com/software/rasmol/INSTALL.html). As you go through the exercises below, you
are encouraged to visit any of a number of excellent Rasmol tutorials on the Internet: Gale Rhodes's
tutorial at the University of Southern Maine
(http://www.usm.maine.edu/~rhodes/RasTut/), Eric Martz's tutorial at the University
of Massachusetts Amherst
(http://www.umass.edu/microbio/rasmol/rasquick.htm), and David Hackney's
tutorial (adapted to HTML by Will McClure) at Carnegie Mellon University
(http://www.bio.cmu.edu/Courses/BiochemMols/RasMolTutorial/RasTut.ht
ml). If you are interested in exploring additional visualization tools, you can obtain free software via
the Internet for Protein Explorer, KING (Kinemage), DeepView, CN3D, Chime, Jmol, and BioEditor.

1. Obtaining Structural Information. Review the discussion of protein secondary structure (Section 6-
1). Secondary structures in proteins include alpha helices, beta sheets, and beta turns.
a. Many programs have been written to predict secondary structures based only on the primary
structure (amino acid sequence) of a protein. Here is a list of such programs that are available
online:
1. PredictProtein (http://www.embl-
heidelberg.de/predictprotein/predictprotein.html). You can request this
site to predict secondary structure from seven different web servers online. If this site is
available, it will enable you to complete this problem by clicking on two or more of the optional
services.
2. JPred (http://www.compbio.dundee.ac.uk/~www-jpred/). If you use the JPred
server, be certain to check the box under #4 to avoid comparison to known PDB structure files.
3. NNPredict (http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html)
For this exercise, you will use the sequence of rabbit muscle triose phosphate isomerase, given
here: Click here for text version

Submit this sequence to two of the servers listed above. You may have to wait several minutes for
the results. Compare the results you receive from the different servers. Can you identify segments
where the predictions are not consistent between servers?
b. The structure of rabbit muscle triose phosphate isomerase has been determined by X-ray
crystallography (Section 6-2A). Go to the Protein Data Bank web server
(http://www.rcsb.org/pdb) and search for 1R2R (the PDB ID for this protein). Once you
reach the Structure Explorer page for 1R2R, click on the link for Sequence Details. Scroll down the
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

page to the section entitled ―Sequence and Secondary Structure.‖ The results shown here for the
secondary structure are based on an analysis of the actual (not predicted) three-dimensional
structure, using the principles developed by Kabsh and Sander [see
http://www.rcsb.org/pdb/help-results.html#sequence_details and
Biopolymers 22, 2577–2637 (1983)]. The secondary structure assignments are H = helix; B =
residue in isolated beta bridge; E = extended beta strand; G = 310 helix; I = pi helix; T = hydrogen
bonded turn; S = bend. Compare your predicted secondary structure results from Part (a) with the
results presented on the PDB site.
Note that the Protein Data Bank web site is undergoing revision so that some of the web addresses
and specific instructions provided here may vary somewhat. For example, in the new site, you can
go to the 1R2R main page and access the secondary structure information by clicking on the
―Sequence Details‖ link on the right side of the screen under the image of the structure.
c. Follow the link on the PDB site for ―Download File.‖ You can download the file in a number of
formats, but it is best to download the file in PDB format for use with Rasmol. Save the structure
file as 1R2R.pdb on your computer (suggested folder: My Documents/PDB Files). Open the
Rasmol program, then use the drop-down menu File. .Open to open 1R2R.pdb. You will initially
see a wireframe model that simply displays all the bonds in the structure as lines. Perform the
following steps to get a more informative view:
Select Display. .Cartoons from the drop-down menu.
Select Colours. .Structure
Now you should be able to see the alpha helix and beta sheet structures in rabbit muscle triose
phosphate isomerase. How many chains are shown in the structure? What is the dominant structural
feature of this protein? How does your structure compare with Fig. 6-30c? Take time to experiment
with the other drop-down menu options on Rasmol.
In addition to drop-down menus, Rasmol also has a ―command line‖ window that enables you to
select specific atoms or parts of a structure (amino acid residues, for example) and change the way
they appear. There are more details in the tutorials listed above. Eric Martz has also prepared a
helpful command list
(http://www.umass.edu/microbio/rasmol/distrib/rasman.htm#chcomref).
Bring up the command line window and note the effects of entering the following commands:
1. select hetero and not water (selects nonprotein parts of the structure excluding water)
2. spacefill (a van der Waals radius representation)
3. color cpk (standard chemistry color scheme)
What heteroatoms do you see in this structure?
Hint: Click on them to see their identities in the command line. You can also find the hetero
atoms in a structure by looking at the summary information page of the pdb file. Are any
substrates or inhibitors represented in this structure? Now try more commands:
4. select protein
5. cartoon off
6. select sheet
7. wireframe 30
8. spacefill 100 (these combined commands yield a ball-and-stick structure)
Can you see the sheet structure now? If not, type the command ―cartoon.‖ What do you see?
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

2. Exploring the Protein Data Bank. In the first problem, you visited the Protein Data Bank (PDB).
You can explore that site in more detail now (see also Section 6-6). The Protein Data Bank web site
(http://www.rcsb.org/pdb) is undergoing revision so that some of the web addresses
provided in this set of exercises may become outdated. The new site incorporates a ―Site Search‖
button that will enable you to search the PDB site for teaching materials and tutorials, in addition to
the standard ―Search‖ box that can be used to find specific structures in the PDB. You may be able to
find any of the materials described below using the ―Site Search‖ button. You can also use the
extensive Help files that are accessible from any page on the new PDB site.

The PDB is a repository of macromolecular structures. Perhaps the most important skill for a PDB site
user is the ability to find a particular structure. There is a query tutorial at
http://www.rcsb.org/pdb/query_tut.html that provides instructions on finding
structures in the PDB. On the new PDB site, the query tutorial is contained in the Help files.

Each structure in the PDB is assigned a PDB ID (or PDBid), a four-character alphanumeric code that
uniquely identifies that structure. So, for example, 4HHB is a hemoglobin structure and 8GCH is a
chymotrypsin structure. If you know the PDB ID, then you can use that to search the PDB. You can
obtain PDB IDs from research publications. Most scientists who determine macromolecular structures
are highly motivated to publish their findings in journals such as Science, Nature, Journal of
Biological Chemistry, Journal of Molecular Biology, and Protein Science. These journals have an
agreement with the PDB that requires authors to submit their structures to the PDB before the journal
will publish the results. Most of the figures in the text that contain a molecular structure include the
PDB ID (PDBid) for that structure in the figure legend.

For your first PDB search, you will find a PDB ID in a journal article, then find that structure on the
PDB site. Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search
for this paper:
Parthasarathy, S., Eaazhisai, K., Balaram, H., Balaram, P., and Murthy, M.R.N., Structure of
Plasmodium falciparum triose-phosphate isomerase-2-phosphoglycerate complex at 1.1-Å
resolution. J. Biol. Chem. 278, 52461–52470 (2003).
Go to the footnotes section and find the four-character PDB ID code for the Plasmodium protein. Next
go to the Protein Data Bank main page. Type the PDB ID in the search box and click on the ―Search‖
button to find the ―Structure Explorer‖ page for this enzyme. You can investigate the linked resources
on this page by completing the following exercises:
a. Download the PDB (structure) file for this protein to your computer (suggested folder: My
Documents/PDB Files; suggested name: 1o5x.pdb). You will need this file for Problem 3, studying
the protein's structure using Rasmol.
b. Download the protein sequence in FASTA format.
c. Find some still images of this protein on the PDB site. You can look under the ―View Structure‖
link on the left side of the 1o5x ―Structure Explorer‖ page. Scroll down the ―View Structure‖ page
until you come to ―Still Images.‖ To save an image, just right click on it (Mac users: Control click)
and select the option that lets you save the file (in Internet Explorer, the command is ―Download
image to disk‖; in Firefox, the command is ―Save Image As. .‖).
d. Return to the ―Structure Explorer‖ page for 1o5x. Click on the ―Other Sources‖ link on the left side
of the page. Follow the links for 1o5x to the sites at PDBSum and the IMB Jena Image Library.
Collect still images from each of these sites, and be sure to keep a record of where you found each
image. Suggest ways in which you can use such downloaded images.
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

3. Using Rasmol. In Problem 2, you saved the PDB file for 1o5x, entitled ―Plasmodium falciparum
TIM complexed to 2-phosphoglycerate.‖ You can use Rasmol to explore this structure, focusing on
identifying secondary structures and looking at the active site.
a Open the Rasmol program on your computer. (If you have not installed it already, please see the
. opening paragraph of the exercises for Chapter 6). Open the file 1o5x.pdb. You will see a wireframe
image: All the bonds in the PDB structure file are shown as thin wires, colored according to Corey-
Pauling-Kultun (CPK) coloring rules (oxygen is red, nitrogen is blue, hydrogen is white, and carbon
is gray). There are seven drop-down menus in Rasmol: File, Edit, Display, Colours, Options,
Export, and Help. Spend a few minutes trying each command in each of the menus. Perform the
following operations:
1. File. .Information. This identifies the protein structure by name and PDB ID.
2. Display. .Backbone. This shows the protein backbone; the bonds actually connect alpha carbons.
3. Display. .Cartoon. This shows an image of the protein that clearly displays helices and sheets.
Leave your image in cartoon format and move to the Colours menu.
4. Colours. .Structure. This shows alpha helices in magenta, beta sheets in yellow, and turns in pale
blue.
5. Options. .Labels. This command labels all selected atoms. The view will not look good right
now because all atoms are selected. If you select a single atom and then use the label command,
you can attach text to that atom (call it by its name, or give it another label such as ―inhibitor‖).
6. Export. .GIF. This function enables you to export a still image of the structural view you just
created. Export your image as 1o5x.gif (store it somewhere accessible, such as the Desktop),
then view it in a simple image viewer (Paint in Windows; Preview in OSX).
7. HELP. .User Manual. This is really a critical tool for using Rasmol. In order for this to work, the
help file (Rasmol.hlp in Windows) must be stored in the same directory as the Rasmol program.
Even then, Rasmol may ask you to find it on your computer system. The Help file is searchable.
The Table of Contents has links to major features of the program, including frequently used
items such as Command Reference, Atom Expressions, and Colour Schemes. The manual is also
available at
http://info.bio.cmu.edu/Courses/BiochemMols/RasFrames/TOC.HTM.
b Open the Rasmol command line window, if it is not already visible. You will use this window to
. enter specific commands for viewing the structure, including highlighting the small molecules (3-
hydroxypyruvic acid and 2-phosphoglyceric acid) that are bound to triose phosphate isomerase in
the 1o5x file. But first you'll need to learn a little bit about viewing a PDB file.
Go to the Structure Explorer page for 1o5x at the PDB website
(http://www.rcsb.org/pdb/cgi/explore.cgi?job=summary&pdbId=1O5X&pag
e=&pid=190431099321055). Click on ―Download/Display File‖ on the left side of the page.
Under ―Display the Structure File,‖ select the ―HTML‖ option. This shows you the complete PDB
file. There is a lot of information in this file, but you'll only look at a few items. For more details,
you can go to the PDB Format Description page
(http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.htm
l) or click on any links you see on the HTML page for 1o5x.
Each line in a PDB file is called a ―record,‖ and the first six characters on that line tell what kind of
―record‖ it is. In your browser, search for SEQRES. As explained in the PDB Format Description,
SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the
macromolecule. Hence you can see the sequence of your protein there. For 1o5x, the first few lines
of the SEQRES section are given in Table I (below). Each line contains 13 amino acid residues
listed by their three-letter abbreviations. So residue #27 in chain A is PHE (phenylalanine). The
12th character (counting spaces) in each record is a chain identifier. If a protein contains more than
one polypeptide chain, the chains are identified with a letter (in this file, there are two chains: A and
B).
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

Table I.
SEQRES 1 248 MET ALA ARG LYS TYR PHE VAL ALA ALA ASN TRP LYS CYS
A
SEQRES 2 248 ASN GLY THR LEU GLU SER ILE LYS SER LEU THR ASN SER
A
SEQRES 3 248 PHE ASN ASN LEU ASP PHE ASP PRO SER LYS LEU ASP VAL
A
SEQRES 4 248 VAL VAL PHE PRO VAL SER VAL HIS TYR ASP HIS THR ARG
A
SEQRES 5 248 LYS LEU LEU GLN SER LYS PHE SER THR GLY ILE GLN ASN
A

Anything in a PDB file that is not either protein or nucleic acid is considered a heterogen atom and
is referred to with the prefix ―het.‖ So HETNAM is the label for a record that contains the name of a
nonprotein, non-nucleic acid group. Search the HTML version of the 1o5x file for ―HETNAM.‖
What are the heterogen groups in this structure?
c Now you can display the heterogen groups in 1o5x. Go to the command line window of Rasmol and
. enter the command ―select hetero and not water.‖ This command selects all the heterogen atoms
excluding water. The current view of 1o5x should be a cartoon diagram of the structure. To show
the heterogen molecules differently, enter the command ―spacefill on‖ and then ―color cpk.‖ This
creates a space-filling representation of the two molecules and colors them according to the CPK
conventions.
d As the last part of this exercise, you will display the active site residues, based on Figure 4 of the
. primary citation for this structure (see Problem 2). Figure 4a shows three residues that interact with
the 2-phosphoglycerate: glutamate 165, lysine 12, and histidine 95. Select these residues by entering
the command ―select lys12,his95,glu165‖ in the Rasmol command line window. Then use the drop-
down menu in the structure window to show the residues in a ball-and-stick format (Display. .Ball
& Stick). Finally, enter the command ―color cpk‖ so that you can distinguish the atoms of the
structure.
To get a better look at the intermolecular interactions, you can zoom in on the structure by pressing
the ―Shift‖ key as you move the mouse. To zoom in, drag the mouse up in the structure window. To
move the image side-to-side or up-and-down, use the right-click on your mouse (Mac users: use the
―Option‖ key) and drag the image where you want to go. Using a combination of Shift-mouse and
right-click (or Option-Mouse), you can get a close-up view of the binding site for 2-
phosphoglycerate.
To complete this exercise, identify and display additional residues that interact with the 2-
phosphoglycerate in 1o5x. A couple of hints: Use Figures 3 and 4 from the primary citation. Also,
for some reason, Rasmol won't select 2-phosphoglycerate using the command ―select 2PG,‖ but you
can select it using ―select 4400‖, which is the second way 2PG is identified in 1o5x.

4. Protein Families. The goal of this exercise is to identify a protein that shares structural homology
with triose phosphate isomerase (as seen in PDB ID 1o5x) but catalyzes a different reaction. Because
1o5x is a fairly recently described structure, it is not well documented in other structural databases.
Therefore you will use an earlier PDB entry on triose phosphate isomerase from Plasmodium
falciparum, PDB ID 1ydv. You will explore two resources, CATH and SCOP (see Section 6-6).
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

a. CATH. You can access the CATH homepage at


http://www.biochem.ucl.ac.uk/bsm/cath/, but perhaps the easiest way to get to
these resources is through the ―Other Sources‖ link on the Protein Data Bank Structure Explorer
page for 1ydv. Click on the CATH link to open a new browser window containing the CATH main
page. Review the introduction to CATH before proceeding.
CATH describes proteins in a hierarchical fashion. What information is found in CATH under the
following headings?
1. Class
2. Architecture
3. Topology
4. Homologous superfamily
Return to the PDB ―Other Sources‖ page for 1ydv. Click on the ―CATH‖ link for 1ydv (right side
of the screen). Click on the ―1ydvA0‖ link to find a list of proteins that are members of this
superfamily. Explore the page to find five enzymes in this superfamily that catalyze reactions
different from the reaction catalyzed by triose phosphate isomerase.
b. SCOP. Once again, you can go directly to the SCOP homepage (http://scop.mrc-
lmb.cam.ac.uk/scop/), but it may be easier to get to these resources in SCOP by going
through the ―Other Sources‖ link on the Protein Data Bank Structure Explorer page for 1ydv. Click
on the ―SCOP‖ link to open a new browser window containing the SCOP main page. Read the
synopsis before continuing. Then return to the PDB page and click on the ―SCOP‖ link for 1ydv.
SCOP provides a lineage for each protein that is classified. Follow the lineage links at the Fold
level to identify five proteins that are related to triose phosphate isomerase. Are any of these
proteins also in your list for Problem 4(a)?
c. The final part of this exercise is to identify other resources that help you find proteins related to
triose phosphate isomerase from Plasmodium falciparum. You are encouraged to follow other links
from the PDB Other Sources page for 1ydv. You may also be able to find other resources by
searching the Internet using the PDB ID codes. List and summarize three other resources that you
find.

---END OF MODULE 3---


BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

MODULE 4: Enzyme Inhibitors and Rational Drug Design

1. Dihydrofolate Reductase. In Section 12-4, enzyme inhibitors are identified as the second largest
class of drugs. The first exercise for this chapter is to find the structure of an enzyme that has a
competitive inhibitor bound to its active site. First, look in the Protein Data Bank for the enzyme
dihydrofolate reductase (DHFR; see Section 22-3A). How many structures do you find for DHFR?
Since there are too many to analyze all at once, you can limit your search to DHFR complexed with
an inhibitor. On the PDB results page, select ―Refine your query‖ from the ―Pull down to select
option‖ menu. Enter the words ―inhibitor and x-ray‖ and click on the option for a full text search. Go
down the list to find DHFR complexed with NADPH (a substrate) and an inhibitor.

Download the PDB file and use Rasmol to visualize this structure (see the exercises for Chapter 6).
Display the protein in Ribbons format, colored by structure. Then select NADPH and the inhibitor
(using the command ―select hetero and not water‖) and show them as space-filling models with CPK
coloring.

2. HIV Protease. There is a description of structure-based drug design in Section 12-4A, and Box 12-
4 describes two drugs that are directed at the protease from human immunodeficiency virus (HIV
protease). In fact, these were some of the very first drugs to be created using structure-based drug
design. The first drugs directed against HIV targeted the reverse transcriptase (Boxes 12-4 and 24-2),
but the virus quickly developed resistance to these drugs. Researchers in academia and at
pharmaceutical companies began studying HIV protease when initial results indicated that it might be
useful as an additional drug target to delay the onset of AIDS. You are encouraged to look for recent
reviews on PubMedCentral (http://www.pubmedcentral.gov) or at your local library (try
journals such as Current Opinion in Chemical Biology and Trends in Biochemical Sciences) to find
more information on structure-based drug design.
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

a. Go to Genbank (http://www.ncbi.nlm.nih.gov/) and search for the protein sequence of


HIV protease. Many mutant forms of HIV protease have been sequenced, so you may find a mutant
sequence. To get the native sequence, you'll have to reverse the mutation. For example, if you have
an L90M mutant sequence, you'll simply need to replace the M (methionine) at position 90 with L
(leucine). Save the sequence to a file in FASTA format.
b. Do a BLAST search with this sequence to find homologous proteins. HIV protease is a member of
the aspartic protease family, which includes pepsin (Box 12-4). What other proteases also appear in
your BLAST search?
c. Now search the Protein Data Bank for HIV protease structures. How many do you find? Rather
than searching through these structures to find particular inhibitors, start a new search using the
name of an inhibitor, such as saquinavir or ritonavir, both of which are mentioned in Box 12-4.
Search for those terms. Does either appear in the Protein Data Bank?
d. To take a closer look at the HIV protease complex with ritonavir, download the PDB file, 1HXW,
and open it in Rasmol. Use the drop-down menu to display the protein structure as a cartoon. Then
color by Structure.
The drug ritonavir is identified on the PDB Structure Explorer site for 1HXW by the abbreviation
―RIT.‖ Bring up the Rasmol command line window. Type in ―select RIT‖ and hit return. Then type
in ―wireframe 100.‖ You should see the ritonavir in wireframe format with CPK coloring. Now you
can get a better look at how ritonavir affects HIV protease. Look at the picture of ritonavir in Box
12-4. Identify the feature (in red) that mimics the geometry of the transition state. Find the
tetrahedral carbon atom in Rasmol. Note that when you click on an atom in the Rasmol structure
window, the atom is identified in the command line window. You may want to make the protein
disappear temporarily by entering ―select protein; cartoon off‖ in the command line window (you
can make it reappear with the command ―cartoon on‖). When you click on the correct carbon atom
(in the backbone between the two phenyl groups), it will be listed in the command line window:
Rasmol > Atom: C13 1865 Hetero: RIT 301.
Now identify the two aspartate residues in HIV protease that interact with ritonavir. You can use
the within command to do this: Enter ―select asp and within (5.0, atomno=1865).‖ This command
selects all aspartate residues within 5.0 angstroms of atom number 1865 in the PDB structure file.
Which aspartate residues are close to the carbon atom you identified above? Identify these residues
by number and chain.
e. Find the names of additional HIV protease inhibitors (using Google, for example) and see whether
they occur in structures in the PDB. Explore the structures using Rasmol and identify interactions
between the hydrophobic side chains on the inhibitors and the surface of HIV protease.

3. Pharmacogenomics and Single Nucleotide Polymorphisms. PubMedCentral contains an excellent


article that reviews the role of pharmacogenomics in medicine and drug discovery. Go to
http://www.pubmedcentral.gov and search for "pharmacogenomics review". One of the
articles that you should find is Chiche, J.-D., Cariou, A., and Mira, J.-P., Bench-to-bedside review:
Fulfilling promises of the Human Genome Project, Critical Care 6, 212–215 (2002). Oak Ridge
National Laboratories (ORNL) also has an excellent site on pharmacogenomics at
http://www.ornl.gov/sci/techresources/Human_Genome/medicine/pharma.s
html. Pharmacogenomics is a very broad and rapidly expanding field. This exercise is a general
guide to introduce you to some relevant database and literature resources.
a. Using the article above and the ORNL site, define the following terms: pharmacogenomics, single
nucleotide polymorphism (SNP), and cytochrome P450. What is the significance of SNPs in the
function of cytochrome P450 and drug metabolism (see also Section 12-4D)?
b. Look in Entrez (http://www.ncbi.nlm.nih.gov) for cytochrome P450. You will have a
number of options to explore. Explore PubMed, PubMed Central, Books, and OMIM to find
documents relating to cytochrome P450 and single nucleotide polymorphisms.
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

c. Return to the ―Entrez‖ page. This time, explore the databases for information on cytochrome P450
and single nucleotide polymorphisms. Suggested sites are the Protein sequence database and SNP
database. Describe the results of your exploration.

---END OF MODULE 4---

MODULE 5: Metabolic Enzymes, Microarrays, and Proteomics

Here is a list of a few useful and reliable online resources about metabolism:
The Biology Project at the University of Arizona:
http://www.biology.arizona.edu/biochemistry/biochemistry.html
Metabolic Pathways of Biochemistry at George Washington University:
http://www.gwu.edu/~mpb/
Chemistry Biology Information Center at ETH Zurich:
http://www.infochembio.ethz.ch/links/en/biochem_metabolismus.html
Main Metabolic Pathways on the Internet:
http://home.wxs.nl/~pvsanten/mmp/main.htm
Kyoto Encyclopedia of Genes and Genomes (KEGG):
http://www.genome.ad.jp/kegg/metabolism.html
Enzyme Structures Database: http://www.ebi.ac.uk/thornton-
srv/databases/enzymes/

1. Metabolic Enzymes. In Chapter 12, you looked at the role of enzyme inhibitors as drugs. In this
exercise, you will use some online resources to learn more about the enzymes involved and the
pathways that are affected.
a. Look in your textbook for dihydrofolate reductase (DHFR). Write out the reaction catalyzed and
the pathway involved (see Section 22-3B and Box 22-1).
b. Go to the Enzyme search page at the KEGG site (http://www.genome.jp/dbget-
bin/www_bfind?enzyme) and search for the enzyme (by name). Now look for links that lead
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

to pathways that include DHFR. Where does DHFR appear in each pathway? Is this consistent with
your findings in the textbook?
c. Go to the Enzyme Structures Database (http://www.ebi.ac.uk/thornton-
srv/databases/enzymes/) and find dihydrofolate reductase. Every enzyme with a known
reaction is classified in a hierarchy: EC #.#.#.# where # represents a number (see Section 11-1A).
What is the enzyme classification for dihydrofolate reductase? Explore the hierarchy for
dihydrofolate reductase. What does each of the numbers in the hierarchy represent?

2. Microarrays. Malcolm Campbell at Davidson University has done a remarkable job of making a
high-end technology (microarrays; see Section 13-4C) available to researchers (students and faculty)
at the undergraduate level.
a. Visit Malcolm Campbell's site at Davidson and go through the following web exercise: DNA
Microarray Methodology (a FLASH animation) at
http://www.bio.davidson.edu/courses/genomics/chip/chipQ.html.
b For more advanced background on microarrays, visit Manish Patel's microarray tutorial at
. http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/tut_frameset
.htm.
c. Visit PubMed Central or PubMed and find a review article on the use of microarrays to study one
of the following diseases: breast cancer, lymphoma, hypertension, atherosclerosis, or a disease that
is of particular interest to you. Provide the citation for the article you found and explain how
microarray technology was applied.

3. Proteomics. Proteomics (Section 13-4D) is the study of all the proteins expressed in an organism or
tissue under a specific set of conditions. To gain a broader understanding of proteomics, read the
following article: Graves, P.R. and Haystead, T.A., ―Molecular biologist's guide to proteomics,‖
Microbiol. Mol. Biol. Rev. 66, 39–63 (2002), which is available at PubMed Central. After reading this
article and reviewing other available resources, answer the following questions:
a. What analytical techniques are used most commonly to separate proteins in proteomics?
b. How can proteins be identified with certainty in proteomics?
c. What is meant by the phrase ―the dynamic range of protein expression‖? You will need to find an
additional source to answer this question; it is not addressed in the ―Molecular Biologist's Guide to
Proteomics.‖ Can you find quantitative values in the literature to further define this term? Leigh
Anderson has published some informative articles on the human plasma proteome; this would be a
good place to look.

4. Two-Dimensional Gel Electrophoresis. A major proteomics tool is two-dimensional gel


electrophoresis (2DE; see Problem 4 in the Bioinformatics exercises for Chapter 5). One of the
techniques you encountered in Problem 3(a) was 2DE. One of the best bioinformatics sites on the web
is the ExPASy server in Geneva, Switzerland (http://www.expasy.org), which has a
database/tools site for 2DE called Swiss-2DPAGE.
a. Open the Swiss-2DPAGE site (http://www.expasy.org/ch2d/). Follow the link to search
the site by description and find out how many human proteins are catalogued there (enter ―human‖
as the search keyword).
b. Return to the search page and search for dihydrofolate reductase. How many listings do you find?
Is a human version of DHFR catalogued here?
BIOINFORMATICS
MAPÚA
School of Chemical Engineering and Chemistry
Excellence at the Highest Level

c. Start over and search for E. coli proteins (enter ―E. coli‖ as the search keyword). Note that if you
use ―E. coli,‖ you'll get much different results than if you just use ―coli.‖ Now look for
dihydrofolate reductase in E. coli. (Use the find function in your browser to search the E. coli
results.) Follow the links to E. coli DHFR to answer the following questions:
1. What is the theoretical molecular weight (Mw) and isoelectric point (pI) for E. coli DHFR?
2. What actual values were obtained by 2DE?
3. What peptide fragment was used to identify E. coli DHFR? Use the BLAST server to find out
where this peptide is located in the E. coli DHFR sequence.
d. Search the Protein Data Bank for structures of DHFR. Has the three-dimensional structure of this
enzyme been determined? Keep searching to see if there are three-dimensional structures of E. coli
DHFR complexed with methotrexate. What is methotrexate and how is it used in the treatment of
disease? (Consult Box 22-1 and other resources for the answer.)

---END OF MODULE 5---

You might also like