Professional Documents
Culture Documents
Protein Structure
&
Molecular Modelling
Version 1.5
BIOSCIENCE IT SERVICES
e-mail: molbio.support@bbsrc.ac.uk
February 2004
Introduction
This document is the property of BBSRC Bioscience IT Services (BITS) and may not be
reproduced, either wholly, or in part, or transmitted in any form, or by any means,
electrical or mechanical including photocopying, or any information storage or retrieval
system, without prior permission from the author(s)
1. Introduction 8
Abbreviations 8
Useful Information 9
Groups of amino acids 10
Bond Geometry 10
Keyboard Shortcuts in Deep View 11
Introduction to the course 12
Course summary 12
Hardware and Software requirements 12
Deep View version 3.72 13
Getting Help 13
Presentation Slides 14
2. Fetching and understanding protein structure files 20
The Protein Databank (PDB) 20
Searching for PDB files 21
Accession code 21
Text searching 22
The PDB annotation page 25
Summary Page 25
Data Retrieval 27
Viewing the structure 27
Analysing a PDB file 29
3. Viewing and analysing structures in Deep View 31
What is Deep View? 31
How can I get Deep View? 31
Configuring Deep View 32
General Preferences (Preferences → General) 33
Loading Preferences (Preferences → Loading Protein) 33
Swiss Model Preferences (Preferences → Swiss-Model) 34
Network Preferences (Preferences → Network) 34
Help 35
Viewing and manipulating a structure 36
Importing a molecule 36
Loading a molecule 36
Initial view 37
Moving around the molecule 37
Changing the view of the molecule 39
Layers Infos Window 41
Control Panel 42
The Header 42
Group List 42
Colours 44
Changing attributes 46
View Mode 47
Changing how the molecule is displayed 49
Slab mode 49
Exercises for changing the view of a molecule 50
Selecting Groups 52
None / All 52
Inverse selection 52
4
Visible Groups 52
Pick on screen 53
Group Kind 53
Group Property 53
Secondary Structure 53
Accessible amino acids 53
Groups with the same colour as 53
Extend to other layers 54
Neighbours of Selected aa 54
Groups close to another chain 54
Groups close to another layer 54
Amino acids making clashes ( / with backbone) 54
Sidechains lacking proper H-bonds 55
Reconstructed amino acids 55
Keyboard Modifiers when selecting 55
Control Panel keyboard modifiers 56
Example selections 57
Working with multiple structures 58
Opening multiple files 58
Automated Alignment 59
Iterative Magic fit (Fit → Iterative Magic Fit) 59
Structural Sequence Alignments 60
Explore alternate fits 60
Aligning around specific groups 61
Looking at a complete family alignment 61
Looking for differences between layers 62
Supplementary Exercises with Multiple Structures 63
Splitting Layers 63
Changing layer names and properties 64
4. Calculations within SPDBV 66
Motif Searching 66
Distances and Angles 67
Distance measure 67
Angle measures 68
Angle information 68
Identifying Residues 69
Hydrogen Bonds 69
Electrostatics and Molecular surfaces 70
Van der Waals surface 70
Accessible surface 70
Molecular surface 70
Setting up surfaces 71
Electrostatic Potentials 73
A bit of (simplified) theory! 74
Viewing Electrostatic Fields 75
Molecular Surfaces 76
Other Molecular Surfaces 77
Contact Surface 78
Mutations and Torsions 79
Mutations 79
Torsions 80
Energy minimisation 82
When to use energy minimisation 82
5
1. Introduction
Abbreviations
CA Alpha Carbon
CB Beta Carbon
CP Control Panel
LI Layers Infos
PDBSEQ A protein database containing only sequences of proteins for which a tertiary structure
has been experimentally determined
TM Trans-Membrane
Useful Information
This section simply provides some reference information which you may find useful throughout
the course
10
Bond Geometry
Distance Energy
Bond
(Angstroms) (kJ / mol)
Covalent <2 200-400
Hydrogen 1-3 12-29
11
Protein structure analysis and modelling was, for many years, the sole preserve of
specialists, who had access to extremely powerful computing systems, and state of the art
software. With the advent of more powerful desktop systems, and the production of new
software for these platforms, it is now possible for a non-specialist to perform many of these
analyses.
This course will not miraculously turn you into a structural biologist in two days. It will
however show you how you can use protein modelling and structural analysis to suggest new
directions for your research, or to help to explain existing results.
Course summary
The course will initially look at the Deep View package, and show how this can be used
to examine and analyse single and multiple existing structures. Since this package will be used
when preparing and analysing models, it is a good idea to get used to how it works at this stage.
Having got used to how the package works, we will look at performing protein modelling.
This will be done using the advanced submission features available in Deep View.
Having generated a model, we will then look at the various ways in which the reliability
of the returned model can be assessed, and how mistakes in the original submission could be
rectified.
Finally, the forms of analysis which can be performed on proteins unsuitable for
modelling will be considered. Also, the exporting of images as high quality output for
publication will be explained.
The majority of this course is based around Deep View 3.7. This package is freely
available for both PC and Macintosh platforms. The PC version requires a 486 DX or better, and
the Mac version requires a Power Macintosh. Whilst these are the absolute minimum
requirements, a machine with these specifications will perform some operations extremely
slowly. A more realistic specification would be a Pentium II (or equivalent) with at least 64MB
RAM.
For the production of presentation graphics the POV-Ray package is used, along with the
MegaPOV patch. These too are free packages, and are available for both PC and Mac.
In addition, much of the course uses web-based tools, accessing these requires a Java-
enabled web-browser such as Netscape 4.x or IE 4. These also require the Chime plug-in for
online viewing of molecular structures.
Each of these programs can be installed at no extra cost. This course was explicitly
designed around free software. Instructions for downloading and installing software are situated
in the appropriate chapters of the manual.
Using PDB Files 13
• Recentering view and rotation now has its own tool button. This is positioned at top
left:
Getting Help
In addition to this manual there are a few extra sources of information which you may
find useful:
http://www.molbiol.bbsrc.ac.uk/protein_struc/introduction.html
POV-Ray tutorial:
http://www.students.tut.fi/~warp/povVFAQ/
You can of course, also contact the BBSRC molecular biology support line. This is a free
service to BBSRC institute researchers, and can be contacted by phone on:
The PDB database is accessible by web interfaces, which can be found at:
From these sites it is possible to retrieve the molecular co-ordinates of structures either by
entering an accession code or by text searching the annotation of the entries.
It is a good idea to search the database of existing structures before you go off
modelling your protein sequence. There is often as much information to be
gained from studying related structures as there is from generating a model.
There are currently nearly 23, 800 structures in the PDB and this number is
increasing exponentially. It is worth checking the database every month to see
whether a structure for a protein similar to that on which you are working has
been deposited.
Unlike protein or nucleotide sequence data, there is no requirement that the co-
ordinates of a structure should be deposited with the PDB in order that the
structure can be published. Most groups will routinely deposit their structural
data, but there is no compulsion on them to do so.
For the purposes of this tutorial we shall be using the RCSB Cambridge mirror site
interface to the PDB, SRS can also be used with similar functionality but a different layout. You
will probably find that the Cambridge mirror site, which is in the U.K. is usually faster than the
RCSB site in the U.S.A during the afternoon.
Using PDB Files 21
We shall examine the different ways in which a PDB file can be accessed.
If you point your browser at the Cambridge mirror PDB site you should see the
following:
http://pdb.ccdc.cam.ac.uk/pdb
Accession code
To find a PDB file, check the “query by PDB id only button”, type in the accession code
and press “search” button.
Often, if you see a structure within a paper you will have a quoted accession code. You
can retrieve the structure by entering this code here. You should note that, unlike protein
sequences, structures do not have to be released to the public once they have been published.
Using PDB Files 22
Text searching
Whilst searching by Entry code is fine if you have seen a structure in a paper, you will
often just wish to find structures matching a certain text pattern. From the RCSB page there are
a few different search options available, but for most text searching "SearchLite" should be
sufficient.
This page uses a fairly simple search system (which is explained further down the page).
You enter one or more text queries into the box and press enter to search. All terms are combined
with AND by default, and you can use a star as a wildcard if you wish to search with part of a
term (eg *globin would find haemoglobin and myloglobin).
You will notice that you can limit your search by the way the structure was determined.
Crystal diffraction and NMR are both experimental ways of solving structures. The "Theory"
option contains a small number of modelled proteins. When looking at these structures you
should always bear in mind that they are not experimentally determined, and may not exactly
reflect the real 3D structure.
Enter one or more search terms into the white box, and press return to see a list of hits
generated. The examples shown here were generated using "*globin" as a search term, and
searching all possible entries, including theoretical structures.
Using PDB Files 23
Be careful about using plurals. You may want a list of all lipocalins, but
try this compared to the singular (lipocalin) and see what happens!
Depending on the type of search you performed and the number of hits found, you may
see a summary screen which is shown below. If you don't see this then don't worry - you have
probably just gone straight to one of the data retreival screens.
From this screen, you can either go back and add or remove terms from your query, or
view a PDB file by clicking on “Explore”. You can select a file of specific relevance by placing
a checkmark in the box, to view these checked files use the pull down option to “Show only
selected queries”.
Using PDB Files 24
By selecting the “Download Structure or Sequences” option you will see something like
this:
By clicking on any of these options will allow a batch download of the PDB files that you
have selected.
At this point, please feel free to go back and try out the search interface using
different keywords and restrictions. If you find a structure which is interesting to you
then remember that the only part you need to be able to find it again in future, is the
four character Accession Code.
Using PDB Files 25
Each PDB entry has a page of information associated with it. Whenever you view a PDB
file through one of the web interfaces you will be presented with this page. The information is
taken from the annotation in the header of the actual PDB file. PDB files are extremely well
annotated and this information page will give you a lot of information about the protein, and will
provide cross-references to other databases.
Summary Page
When you click on the “Explore” button the PDB file will open into its Summary page:
Using PDB Files 26
This provides information as to the origin of the protein. It will specify its function, and
will tell you if the structure contains a substrate (ligand) in addition to the main protein. This area
will also list the authors, state how the structure was solved and provide details as to its
resolution. The resolution of a structure is quoted in angstroms (Å) and the lower the resolution
the better. Most structures have resolutions of <4Å and the best structures are <1Å.
You can access the rest of the information within the PDB file by clicking on any title
contained within the blue/purple column. Please explore some of these options.
Using PDB Files 27
Data Retrieval
Click on “Download/Display file”, this section allows the retrieval of the PDB data, you
can choose to either display or download the structure file.
To download the sequence click on “Sequence Details” in the blue/purple column, you
can then view the sequence with its secondary structure and download in Fasta format.
Click on “View Structure” this section uses the information contained in the PDB file to
provide a view of the structure. There are a lot of different programs available, and some are
much better than others.
It is worth spending a short while trying out these programs and seeing which you like,
and which provide information in which you may be interested. The functionalities of many of
these programs overlap, and it is a matter of personal taste which (if any) you prefer to use.
Remember that the sequence analysis and structural views offered here can usually be
performed from your own computer. Many of the pages presented run programs from within the
web pages, which can be extremely slow.
Probably the most useful tool in the view structure is the ability to preview the 3D
structure from within a web page. This will give you a quick impression of the structural
organisation of the protein and the positioning of any ligands. To see this view click on the link
which says "FirstGlance".
Using PDB Files 28
In order to use the interactive structural viewer you will need to install a program called
Chime. This allows structures to be displayed within a browser window. This software has been
installed on the training machines already
NB: Chime will only work with browsers Netscape 4.7X and higher and Microsoft Internet
Explorer 5.5SP2 and higher.
Chime is written by a company called MDL, and is distributed free of charge. For more
information on Chime, look at:
http://www.mdli.com/chime/
You'll hear this several times throughout this manual - but please check with your
computing centre before installing new software on any of your machines. You will
not have the required privileges to make system alterations and you will need
someone from computing to do it for you.
Using PDB Files 29
When you have looked around the search tools and are happy with the various ways in
which information about the PDB files can be presented to you, go back and view the text
remarks of one of the PDB files (the examples are for 1b0o).
You will see that the file is divided into sections with the section name being specified on
the left-hand side. The various sections are set out below. Most of the important information
from the header will have been reproduced in the information sheet you first saw, but it can be
especially worthwhile reading the remarks section to get extra information about the structure
with which you are working.
Record Description
HEADER Classification for the entry, date of deposition, id
TITLE Name of the entry, relating to experiment
CAVEAT Serious errors in an entry
COMPND Description of the make up of the protein; definition of the biological unit
SOURCE Details the biological source of the molecule, including expression system (if
applicable)
KEYWDS For indexing to assist text searching for entries
EXPDTA Method by which the structure was solved
AUTHOR Who deposited the entry
REVDAT History of revisions to this entry
JRNL Main literature citation. Usually the one in which the structure was solved
REMARK Optional. May contain a variety of information on experimental details, other
publications, expansion of other records
DBREF Cross reference to the same sequence in another database
SEQADV Indicates any conflicts between the PDB sequence and that contained in the
database specified in DBREF
SEQRES Amino acid (or nucleic acid) sequence for each chain
MODRES Modifications to residues in the entry
HET Non-protein atoms in the entry, e.g. inhibitors ions etc.
HETNAM Describes the non-protein atoms
HETSYN Synonyms for above to assist searching
FORMUL Molecular formula for hetatoms, so that a molecular weight can be calculated
for example
HELIX/SHEET/TURN Positions of secondary structure elements
SSBOND Positions of disulphide bonds if present
LINK Describes bonds between HET groups or HET groups and protein
HYDBND Hydrogen bonds in the entry
SLTBRG Salt bridges in the entry
CISPEP Indicates any residues in cis, rather than more usual trans configuration
SITE Important site in the structure, but see also REMARK record
CRYST/ORIG/SCALE/ Crystallographic details
MTRIX/TVECT
ATOM Coordinates for the atoms of all residues in the structure
Format: atom number/atom type/residue number/coordinates
HETATM Coordinates for non amino acid atoms
CONNECT Bonding between hetatoms, distinct from that between whole groups
MASTER Number of lines in the file for bookeeping purposes
END Unambiguously marks the end of the entry
The REMARK field often provides very useful information as this is the opportunity for
the crystallographer to include any details which they feel would be useful to people examining
the file.
Using PDB Files 30
Because a crystal structure can take months, or even years, to solve, the amount of
information contained in the header of a PDB file is usually large. It is a big event submitting a
new structure, and someone will have worked long and hard on it. This means they are already
likely to have done a lot of the work you would want to do.
So, working our way down the header of 1b0o, there are several points which you would
want to have read if you were working on this protein.
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: BETA-LACTOGLOBULIN;
COMPND 3 CHAIN: NULL;
COMPND 4 BIOLOGICAL_UNIT: PREDOMINANTLY DIMERIC
There follows some additional literature citations (possibly useful) and crystallographic
details (probably not, for modellers anyway).
REMARK 465 indicates missing residues, which we might need to account for:
We note the SwissProt reference with accession code and a problem with one residue
(though not apparently very significant):
DBREF 1B0O 2 162 SWS P02754 LACB_BOVIN 18 178
SEQADV 1B0O LEU 1 SWS P02754 LEU 1 DISORDERED