You are on page 1of 25

Introduction

Protein Structure
&
Molecular Modelling
Version 1.5

BIOSCIENCE IT SERVICES

Molecular Biology Support


Scientific Applications Section
BBSRC Bioscience IT Services
Harpenden, Herts
AL5 2JE

e-mail: molbio.support@bbsrc.ac.uk

Tel : 01582 714 904


Fax : 01582 714 901

February 2004
Introduction

This document is the property of BBSRC Bioscience IT Services (BITS) and may not be
reproduced, either wholly, or in part, or transmitted in any form, or by any means,
electrical or mechanical including photocopying, or any information storage or retrieval
system, without prior permission from the author(s)

Copyright © Bioscience IT Services 2001

If you would like a copy of this manual, please contact molbio.support@bbsrc.ac.uk


3

1. Introduction 8
Abbreviations 8
Useful Information 9
Groups of amino acids 10
Bond Geometry 10
Keyboard Shortcuts in Deep View 11
Introduction to the course 12
Course summary 12
Hardware and Software requirements 12
Deep View version 3.72 13
Getting Help 13
Presentation Slides 14
2. Fetching and understanding protein structure files 20
The Protein Databank (PDB) 20
Searching for PDB files 21
Accession code 21
Text searching 22
The PDB annotation page 25
Summary Page 25
Data Retrieval 27
Viewing the structure 27
Analysing a PDB file 29
3. Viewing and analysing structures in Deep View 31
What is Deep View? 31
How can I get Deep View? 31
Configuring Deep View 32
General Preferences (Preferences → General) 33
Loading Preferences (Preferences → Loading Protein) 33
Swiss Model Preferences (Preferences → Swiss-Model) 34
Network Preferences (Preferences → Network) 34
Help 35
Viewing and manipulating a structure 36
Importing a molecule 36
Loading a molecule 36
Initial view 37
Moving around the molecule 37
Changing the view of the molecule 39
Layers Infos Window 41
Control Panel 42
The Header 42
Group List 42
Colours 44
Changing attributes 46
View Mode 47
Changing how the molecule is displayed 49
Slab mode 49
Exercises for changing the view of a molecule 50
Selecting Groups 52
None / All 52
Inverse selection 52
4

Visible Groups 52
Pick on screen 53
Group Kind 53
Group Property 53
Secondary Structure 53
Accessible amino acids 53
Groups with the same colour as 53
Extend to other layers 54
Neighbours of Selected aa 54
Groups close to another chain 54
Groups close to another layer 54
Amino acids making clashes ( / with backbone) 54
Sidechains lacking proper H-bonds 55
Reconstructed amino acids 55
Keyboard Modifiers when selecting 55
Control Panel keyboard modifiers 56
Example selections 57
Working with multiple structures 58
Opening multiple files 58
Automated Alignment 59
Iterative Magic fit (Fit → Iterative Magic Fit) 59
Structural Sequence Alignments 60
Explore alternate fits 60
Aligning around specific groups 61
Looking at a complete family alignment 61
Looking for differences between layers 62
Supplementary Exercises with Multiple Structures 63
Splitting Layers 63
Changing layer names and properties 64
4. Calculations within SPDBV 66
Motif Searching 66
Distances and Angles 67
Distance measure 67
Angle measures 68
Angle information 68
Identifying Residues 69
Hydrogen Bonds 69
Electrostatics and Molecular surfaces 70
Van der Waals surface 70
Accessible surface 70
Molecular surface 70
Setting up surfaces 71
Electrostatic Potentials 73
A bit of (simplified) theory! 74
Viewing Electrostatic Fields 75
Molecular Surfaces 76
Other Molecular Surfaces 77
Contact Surface 78
Mutations and Torsions 79
Mutations 79
Torsions 80
Energy minimisation 82
When to use energy minimisation 82
5

When not to use energy minimisation 82


Setting up energy minimisation 82
Seeing the initial energy state of your molecule 83
Limitations of energy minimisation 85
5. Preparing a protein sequence for modelling 87
Background 87
Locating Domains 87
Sequence headers 88
Databases 88
Other information 89
6. Using the Swiss-Model Web interface 91
Preparing your sequence 92
Making a submission 93
7. Using SPDBV to perform protein modelling 96
Formatting your sequence 96
Loading your sequence 96
Isolating a domain 98
Removing unwanted groups 99
Finding a template 100
Other search programs 103
Saving and incorporating the template 106
Threading your sequence 107
Sequence Alignment 109
Pairwise alignments 109
Multiple Sequence Alignment 110
Using Structural Alignments 111
Motif and Domain Searching 116
Optimising threading energy 116
Threading energy display 116
Handy hints for altering threading alignments 118
Submitting a modelling request to Swiss-Model 118
8. Evaluating and Optimising Models 122
Files received from Swiss Model 122
The trace file 123
The model file 123
Saving Attachments 123
Viewing your model 124
B - factors in models 124
Looking for problems 125
B-factors 125
Phi-Psi problems 125
What Check Reports 126
Force field energies 127
Improving your model 129
6

Identifying incorrect templates 129


Optimising your alignment 130
Altering your alignment within SPDBV 131
Submitting an Optimise request 131
9. Incorporating Substrates into Models 132
Locating your active site 132
Look at your template 132
Look at your multiple sequence alignment 133
Look for motifs 133
Look for shapes 134
Finding a structure for your substrate 135
The 3D Structure Database 135
Hetero Compound Information Centre 135
Klotho Biochemical Compounds Database 135
Fitting your substrate into your active site 136
Manual Fitting 136
Transferring a substrate from one structure to another 136
10. Predicting Secondary Structure and TM Regions 139
Secondary Structure Prediction 139
Visual inspection 139
Software approaches 140
The Jpred Server 140
Why would I want to do a secondary structure prediction ? 143
I couldn't find a homologue to make a model 143
I want to try fold recognition 143
I want to help refine my alignment for homology modelling 143
Transmembrane Region Prediction 144
Problems of transmembrane protein structure 144
Predicting transmembrane regions 145
Hydrophobicity profiles 145
Incorporating extra Information 146
Topology 146
Transmembrane prediction servers 147
Modelling Transmembrane Proteins 148
Restraint Modelling 148
Swiss Model G-Protein coupled receptor (GPCR) mode 149
11. Making Pictures with POV-Ray 151
Installing POV-Ray 152
Setting up DV to create POV-Ray files 153
Using DV to create POV-Ray files 154
Considerations 155
Setting up POV-Ray 156
Objects, camera, lights 156
Rendering a scene 159
So what is anti-aliasing? 159
Rendering 160
Errors 161
How big do my pictures need to be? 162
7

Getting at your pictures 162


Output 163
Putting Pictures into Documents 163
12. Glossary 164
13. Some relevant WWW links 169
Our Sites 169
Tutorial Sites 169
Software Links 169
Modelling Servers 169
Secondary Structure Prediction 170
Transmembrane region prediction 170
Databases 170
Protein structures 170
Other structures 171
Motifs 171
Sequences 171
8

1. Introduction
Abbreviations

CA Alpha Carbon

CB Beta Carbon

CP Control Panel

DPI Dots per inch

DV Deep View, aka Swiss PDB Viewer

EBI European Bioinformatics Institute

EXP3D Experimentally determined 3D structures. Database of protein sequences for which a


tertiary structure is known. Essentially the same as PDBSEQ.

GPCR G-protein coupled receptor

HBND Hydrogen Bond

HDST Distance of Hydrogen Bond

HTM Helical Trans Membrane domain

LI Layers Infos

PDB Protein Databank. A database of molecular structures maintained by the Research


Collaborative for Structural Bioinformatics

PDBSEQ A protein database containing only sequences of proteins for which a tertiary structure
has been experimentally determined

PGDS Prostaglandin D2 Synthase

RMS Root Mean Squared

RCSB Research Collaboratory for Structural Bioinformatics

SPDBV Swiss PDB Viewer, aka Deep View

TM Trans-Membrane

VDW Van der Waal


9

Useful Information

This section simply provides some reference information which you may find useful throughout
the course
10

Groups of amino acids

Polar Arginine Non-Polar Alanine


Asparigine Isoleucine
Aspartic Acid Leucine
Cysteine Methionine
Glutamine Phenylalanine
Glutamic Acid Proline
Glycine Tryptophan
Histidine Tyrosine
Lysine Valine
Serine
Threonine

Acidic Aspartic Acid Basic Arginine


Glutamic Acid Lysine
Histidine

H-bonding side chains Glutamic Acid


Aspartic Acid
Threonine
Glutamine
Lysine
Histidine
Tryptophan
Arginine
Serine

Bond Geometry

Distance Energy
Bond
(Angstroms) (kJ / mol)
Covalent <2 200-400
Hydrogen 1-3 12-29
11

Keyboard Shortcuts in Deep View

Key Combination Window Action


Return or Enter All Resets main view to show only the currently selected
groups
Right Mouse View Window Translation, i.e. lateral movement of structure
Left mouse & Right View Window Zoom
Mouse together
Right Mouse Control Panel Changes an attribute for all groups in a structure
Left mouse Control Panel Changes an attribute for one group (CP), or one
Layers Info sequence (LI)
Left Mouse Control Panel Activates an attribute for all currently selected groups.
Header Deactivates all other groups
Control & Left Mouse Control Panel Activates an attribute for all currently selected groups.
Header Does not affect other groups
Shift & Control & Control Panel Deactivates an attribute for all currently selected
Left Mouse Header groups. Does not affect other groups
+ (on numeric keypad) All Toggles visibility of currently selected groups
Shift & Left Mouse Control Panel Allows the continuous selection of all groups between
two points
Control & Left Mouse Control Panel Allows the discontinuous selection of multiple groups
F5 When rotating or Restrict operation to x-axis
translating
F6 When rotating or Restrict operation to y-axis
translating
F7 When rotating or Restrict operation to z-axis
translating
Shift When using the Select Apply selection to all layers
Menu
Alt & / View Window Toggle slab mode
Alt & - View Window Remove all distance measure and bond angles
Using PDB Files 12

Introduction to the course

This course is intended to be an introduction to protein structure analysis and protein


modelling. It assumes no specific knowledge about the subject, although a basic biological
background is assumed (ie knowing what proteins / amino acids are, etc.).

Protein structure analysis and modelling was, for many years, the sole preserve of
specialists, who had access to extremely powerful computing systems, and state of the art
software. With the advent of more powerful desktop systems, and the production of new
software for these platforms, it is now possible for a non-specialist to perform many of these
analyses.

This course will not miraculously turn you into a structural biologist in two days. It will
however show you how you can use protein modelling and structural analysis to suggest new
directions for your research, or to help to explain existing results.

Course summary

The course will initially look at the Deep View package, and show how this can be used
to examine and analyse single and multiple existing structures. Since this package will be used
when preparing and analysing models, it is a good idea to get used to how it works at this stage.

Having got used to how the package works, we will look at performing protein modelling.
This will be done using the advanced submission features available in Deep View.

Having generated a model, we will then look at the various ways in which the reliability
of the returned model can be assessed, and how mistakes in the original submission could be
rectified.

Finally, the forms of analysis which can be performed on proteins unsuitable for
modelling will be considered. Also, the exporting of images as high quality output for
publication will be explained.

Hardware and Software requirements

The majority of this course is based around Deep View 3.7. This package is freely
available for both PC and Macintosh platforms. The PC version requires a 486 DX or better, and
the Mac version requires a Power Macintosh. Whilst these are the absolute minimum
requirements, a machine with these specifications will perform some operations extremely
slowly. A more realistic specification would be a Pentium II (or equivalent) with at least 64MB
RAM.
For the production of presentation graphics the POV-Ray package is used, along with the
MegaPOV patch. These too are free packages, and are available for both PC and Mac.

In addition, much of the course uses web-based tools, accessing these requires a Java-
enabled web-browser such as Netscape 4.x or IE 4. These also require the Chime plug-in for
online viewing of molecular structures.

Each of these programs can be installed at no extra cost. This course was explicitly
designed around free software. Instructions for downloading and installing software are situated
in the appropriate chapters of the manual.
Using PDB Files 13

Deep View version 3.72

The following changes have been implemented since version 3.63.

• Recentering view and rotation now has its own tool button. This is positioned at top
left:

• Right mouse button is now used for translation (lateral


movement) of structure in view window.
• Left and right mouse buttons together for zoom.
• Rotate/zoom/translate tools restricted within view window. If you move outside and
release left button, tool action continues back inside view window until repeat click.
• Rendering attributes tool button is gone. Display related attributes now accessed from
Preferences → Display.
• Various minor changes to menu layouts.
• Scripting capability. For more information on this take a look at:
http://www.expasy.ch/spdbv/text/script.htm

Getting Help

In addition to this manual there are a few extra sources of information which you may
find useful:

BBSRC Protein course online:

http://www.molbiol.bbsrc.ac.uk/protein_struc/introduction.html

Swiss PDB Viewer online manual:


http://www.expasy.ch/spdbv/

Swiss PDB Viewer tutorials:


http://www.expasy.ch/spdbv/text/tutorial.htm
http://www.usm.maine.edu/~rhodes/SPVTut/index.html

POV-Ray tutorial:
http://www.students.tut.fi/~warp/povVFAQ/

You can of course, also contact the BBSRC molecular biology support line. This is a free
service to BBSRC institute researchers, and can be contacted by phone on:

01582 714 904

or by e-mail on: molbio.support@bbsrc.ac.uk


Using PDB Files 20

2. Fetching and understanding protein structure files

The Protein Databank (PDB)

When the structure of a protein is solved, either by crystallography or NMR, it is


described as a series of atomic co-ordinates. There is a central repository for these data
maintained by the "Research Collaboratory for Structural Bioinformatics" (RSCB) which is
known as the Protein Databank (PDB). The protein sequences of structures held in the PDB are
also maintained as a separate database called PDBSEQ.

The PDB database is accessible by web interfaces, which can be found at:

http://www.rcsb.org/pdb/ (Main site - held in USA)

http://pdb.ccdc.cam.ac.uk/pdb (Mirror site - held in U.K.)

From these sites it is possible to retrieve the molecular co-ordinates of structures either by
entering an accession code or by text searching the annotation of the entries.

It is a good idea to search the database of existing structures before you go off
modelling your protein sequence. There is often as much information to be
gained from studying related structures as there is from generating a model.
There are currently nearly 23, 800 structures in the PDB and this number is
increasing exponentially. It is worth checking the database every month to see
whether a structure for a protein similar to that on which you are working has
been deposited.

Unlike protein or nucleotide sequence data, there is no requirement that the co-
ordinates of a structure should be deposited with the PDB in order that the
structure can be published. Most groups will routinely deposit their structural
data, but there is no compulsion on them to do so.

For the purposes of this tutorial we shall be using the RCSB Cambridge mirror site
interface to the PDB, SRS can also be used with similar functionality but a different layout. You
will probably find that the Cambridge mirror site, which is in the U.K. is usually faster than the
RCSB site in the U.S.A during the afternoon.
Using PDB Files 21

Searching for PDB files

We shall examine the different ways in which a PDB file can be accessed.

If you point your browser at the Cambridge mirror PDB site you should see the
following:

http://pdb.ccdc.cam.ac.uk/pdb

Accession code

To find a PDB file, check the “query by PDB id only button”, type in the accession code
and press “search” button.

Often, if you see a structure within a paper you will have a quoted accession code. You
can retrieve the structure by entering this code here. You should note that, unlike protein
sequences, structures do not have to be released to the public once they have been published.
Using PDB Files 22

Text searching

Whilst searching by Entry code is fine if you have seen a structure in a paper, you will
often just wish to find structures matching a certain text pattern. From the RCSB page there are
a few different search options available, but for most text searching "SearchLite" should be
sufficient.

This page uses a fairly simple search system (which is explained further down the page).
You enter one or more text queries into the box and press enter to search. All terms are combined
with AND by default, and you can use a star as a wildcard if you wish to search with part of a
term (eg *globin would find haemoglobin and myloglobin).

You will notice that you can limit your search by the way the structure was determined.
Crystal diffraction and NMR are both experimental ways of solving structures. The "Theory"
option contains a small number of modelled proteins. When looking at these structures you
should always bear in mind that they are not experimentally determined, and may not exactly
reflect the real 3D structure.

Enter one or more search terms into the white box, and press return to see a list of hits
generated. The examples shown here were generated using "*globin" as a search term, and
searching all possible entries, including theoretical structures.
Using PDB Files 23

Be careful about using plurals. You may want a list of all lipocalins, but
try this compared to the singular (lipocalin) and see what happens!

Depending on the type of search you performed and the number of hits found, you may
see a summary screen which is shown below. If you don't see this then don't worry - you have
probably just gone straight to one of the data retreival screens.

From this screen, you can either go back and add or remove terms from your query, or
view a PDB file by clicking on “Explore”. You can select a file of specific relevance by placing
a checkmark in the box, to view these checked files use the pull down option to “Show only
selected queries”.
Using PDB Files 24

By selecting the “Download Structure or Sequences” option you will see something like
this:

By clicking on any of these options will allow a batch download of the PDB files that you
have selected.

At this point, please feel free to go back and try out the search interface using
different keywords and restrictions. If you find a structure which is interesting to you
then remember that the only part you need to be able to find it again in future, is the
four character Accession Code.
Using PDB Files 25

The PDB annotation page

Each PDB entry has a page of information associated with it. Whenever you view a PDB
file through one of the web interfaces you will be presented with this page. The information is
taken from the annotation in the header of the actual PDB file. PDB files are extremely well
annotated and this information page will give you a lot of information about the protein, and will
provide cross-references to other databases.

Summary Page

When you click on the “Explore” button the PDB file will open into its Summary page:
Using PDB Files 26

This provides information as to the origin of the protein. It will specify its function, and
will tell you if the structure contains a substrate (ligand) in addition to the main protein. This area
will also list the authors, state how the structure was solved and provide details as to its
resolution. The resolution of a structure is quoted in angstroms (Å) and the lower the resolution
the better. Most structures have resolutions of <4Å and the best structures are <1Å.

You can access the rest of the information within the PDB file by clicking on any title
contained within the blue/purple column. Please explore some of these options.
Using PDB Files 27

Data Retrieval

Click on “Download/Display file”, this section allows the retrieval of the PDB data, you
can choose to either display or download the structure file.

To download the sequence click on “Sequence Details” in the blue/purple column, you
can then view the sequence with its secondary structure and download in Fasta format.

Viewing the structure

Click on “View Structure” this section uses the information contained in the PDB file to
provide a view of the structure. There are a lot of different programs available, and some are
much better than others.

It is worth spending a short while trying out these programs and seeing which you like,
and which provide information in which you may be interested. The functionalities of many of
these programs overlap, and it is a matter of personal taste which (if any) you prefer to use.

Remember that the sequence analysis and structural views offered here can usually be
performed from your own computer. Many of the pages presented run programs from within the
web pages, which can be extremely slow.

Probably the most useful tool in the view structure is the ability to preview the 3D
structure from within a web page. This will give you a quick impression of the structural
organisation of the protein and the positioning of any ligands. To see this view click on the link
which says "FirstGlance".
Using PDB Files 28

In order to use the interactive structural viewer you will need to install a program called
Chime. This allows structures to be displayed within a browser window. This software has been
installed on the training machines already

NB: Chime will only work with browsers Netscape 4.7X and higher and Microsoft Internet
Explorer 5.5SP2 and higher.

Chime is written by a company called MDL, and is distributed free of charge. For more
information on Chime, look at:

http://www.mdli.com/chime/

You'll hear this several times throughout this manual - but please check with your
computing centre before installing new software on any of your machines. You will
not have the required privileges to make system alterations and you will need
someone from computing to do it for you.
Using PDB Files 29

Analysing a PDB file

When you have looked around the search tools and are happy with the various ways in
which information about the PDB files can be presented to you, go back and view the text
remarks of one of the PDB files (the examples are for 1b0o).

You will see that the file is divided into sections with the section name being specified on
the left-hand side. The various sections are set out below. Most of the important information
from the header will have been reproduced in the information sheet you first saw, but it can be
especially worthwhile reading the remarks section to get extra information about the structure
with which you are working.

Record Description
HEADER Classification for the entry, date of deposition, id
TITLE Name of the entry, relating to experiment
CAVEAT Serious errors in an entry
COMPND Description of the make up of the protein; definition of the biological unit
SOURCE Details the biological source of the molecule, including expression system (if
applicable)
KEYWDS For indexing to assist text searching for entries
EXPDTA Method by which the structure was solved
AUTHOR Who deposited the entry
REVDAT History of revisions to this entry
JRNL Main literature citation. Usually the one in which the structure was solved
REMARK Optional. May contain a variety of information on experimental details, other
publications, expansion of other records
DBREF Cross reference to the same sequence in another database
SEQADV Indicates any conflicts between the PDB sequence and that contained in the
database specified in DBREF
SEQRES Amino acid (or nucleic acid) sequence for each chain
MODRES Modifications to residues in the entry
HET Non-protein atoms in the entry, e.g. inhibitors ions etc.
HETNAM Describes the non-protein atoms
HETSYN Synonyms for above to assist searching
FORMUL Molecular formula for hetatoms, so that a molecular weight can be calculated
for example
HELIX/SHEET/TURN Positions of secondary structure elements
SSBOND Positions of disulphide bonds if present
LINK Describes bonds between HET groups or HET groups and protein
HYDBND Hydrogen bonds in the entry
SLTBRG Salt bridges in the entry
CISPEP Indicates any residues in cis, rather than more usual trans configuration
SITE Important site in the structure, but see also REMARK record
CRYST/ORIG/SCALE/ Crystallographic details
MTRIX/TVECT
ATOM Coordinates for the atoms of all residues in the structure
Format: atom number/atom type/residue number/coordinates
HETATM Coordinates for non amino acid atoms
CONNECT Bonding between hetatoms, distinct from that between whole groups
MASTER Number of lines in the file for bookeeping purposes
END Unambiguously marks the end of the entry

The REMARK field often provides very useful information as this is the opportunity for
the crystallographer to include any details which they feel would be useful to people examining
the file.
Using PDB Files 30

Because a crystal structure can take months, or even years, to solve, the amount of
information contained in the header of a PDB file is usually large. It is a big event submitting a
new structure, and someone will have worked long and hard on it. This means they are already
likely to have done a lot of the work you would want to do.

So, working our way down the header of 1b0o, there are several points which you would
want to have read if you were working on this protein.

In the COMPND record we see that the native molecule is a dimer:

COMPND MOL_ID: 1;
COMPND 2 MOLECULE: BETA-LACTOGLOBULIN;
COMPND 3 CHAIN: NULL;
COMPND 4 BIOLOGICAL_UNIT: PREDOMINANTLY DIMERIC

There follows some additional literature citations (possibly useful) and crystallographic
details (probably not, for modellers anyway).

REMARK 465 indicates missing residues, which we might need to account for:

REMARK 465 MISSING RESIDUES


REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE):
REMARK 465
REMARK 465 M RES C SSSEQI
REMARK 465 LEU 1

REMARK 800 will definitely be of interest as it refers to a binding site:

REMARK 800 SITE


REMARK 800 SITE_IDENTIFIER: PBS
REMARK 800 SITE_DESCRIPTION:
REMARK 800 FATTY ACID ( PALMITATE ) BINDING SITE

And the SITE record should be examined for further details:


SITE 1 PBS 11 LEU 46 PHE 105 MET 107 VAL 41
SITE 2 PBS 11 LYS 69 LYS 60 ILE 71 ILE 84
SITE 3 PBS 11 ILE 56 LEU 103 VAL 94

We note the SwissProt reference with accession code and a problem with one residue
(though not apparently very significant):
DBREF 1B0O 2 162 SWS P02754 LACB_BOVIN 18 178
SEQADV 1B0O LEU 1 SWS P02754 LEU 1 DISORDERED

This structure is complexed with palmitic acid:

HETNAM PLM PALMITIC ACID


FORMUL 2 PLM C16 H32 O2
FORMUL 3 HOH *105(H2 O1)

And importantly, it contains a disulphide bond:

SSBOND 1 CYS 66 CYS 160


SSBOND 2 CYS 106 CYS 119

You might also like