Professional Documents
Culture Documents
Proteins
Protein: from the Greek word PROTEUO which means "to be first (in rank or influence)" Why are proteins important to us:
Proteins make up about 15% of the mass of the average person and maintain the structural integrity of the cell. Enzyme acts as a biological catalyst Storage and transport Haemoglobin Antibodies Hormones Insulin
Introduction to proteins
Peptide Bond
Green region indicates the stericially permitted & values except Gly and Pro. Yellow circles represent the conformational angles of several secondary structures..-helix, parallel & anti parallel -sheet
Helices
Helices
H: - helix G: 310 helix I: - helix (extremely rare)
Secondary Structure
8 different categories (DSSP): H: - helix (pitch 5.4 A0) G: 310 helix I: - helix (extremely rare) E: - strand B: - bridge T: - turn S: bend L: the rest
1.5 A0
PSSP Algorithms
There are three generations in PSSP algorithms Early/First Generation: based on statistical/rule based information of single aminoacids Second Generation: based on windows (segments) of aminoacids. Typically a window containes 11-21 aminoacids Third Generation: based on the use of windows on evolutionary information
Statistical information Physico-chemical properties Sequence patterns Multi-layered neural networks Graph-theory Multivariante statistics Expert rules Nearest-neighbour algorithms No Bayesian networks
Prediction accuracy <70% Prediction accuracy for -strand 28-48% Predicted chains are usually too short what leads do the difficult use of predictions
Use of evolutionary information: 1. Scan a database with known sequences with alignment methods for finding similar sequences 2. Filter the previous list with a threshold to identify the most significant sequences 3. Build aminoacid exchange profiles based on the probable homologs (most significant sequences) 4. The profiles are used in the prediction, i.e. in building the classifier
Chou-Fasman Garnier-Osguthorpe-Robson
Chou-Fasman method
Uses table of conformational parameters (propensities) determined primarily from measurements of secondary structure.
Designations: H = Strong Former, h = Former, I = Weak Former, i = Indifferent, B = Strong Breaker, b = Breaker; P = Conformational Parameter
If you were asked to determine whether an amino acid in a protein of interest is part of a -helix or sheet, you might think to look in a protein database and see which secondary structures amino acids in similar contexts belonged to.
The Chou-Fasman method (1974) is a combination of such statistics-based methods and rule-based methods.
P(i / Helix ) P (i )
P(i / Beta ) P (i )
P(i / Turn ) P (i )
Propensities > 1 mean that the residue type I is likely to be found in the Corresponding secondary structure type.
-Helix
1.29 1.11 1.30 1.47 1.44 1.27 1.22 1.23 0.91 0.97 1.07 0.72 0.99 0.82 0.56 0.82 1.04 0.90 0.52 0.96
-Sheet
0.90 0.74 1.02 0.97 0.75 0.80 1.08 0.77 1.49 1.45 1.32 1.25 1.14 1.21 0.92 0.95 0.72 0.76 0.64 0.99
Turn
0.78 0.80 0.59 0.39 1.00 0.97 0.69 0.96 0.47 0.51 0.58 1.05 0.75 1.03 1.64 1.33 1.41 1.23 1.91 0.88
Favors -Helix
Favors -strand
Favors turn
2. Once the propensities are calculated, each amino acid is categorized using the propensities as one of:
Each amino acid is also categorized as one of: helix-former, helix-breaker, or helix-indifferent. (That is, helix-formers have high helical propensities, helix-breakers have low helical propensities, and helix-indifferent have intermediate propensities.)
sheet-former, sheet-breaker, or sheet-indifferent. For example, it was found (as expected) that glycine and prolines are helix-breakers.
These sites are found with some heuristic rule (e.g. a sequence of 6 amino acids with at least 4 helix-formers, and no helixbreakers").
4. Extend the nucleation sites, adding residues at the ends, maintaining an average propensity greater than some threshold. 5. Step 4 may create overlaps; Finally, we deal with these overlaps using some heuristic rules.
A helix propensity table contains information about propensity for residues at 17 positions when the conformation of residue j is helical. The helix propensity tables have 20 x 17 entries. Build similar tables for strands and turns. GOR simplification: The predicted state of AAj is calculated as the sum of the position-dependent propensities of all residues around AAj.
Suppose aj is the amino acid that we are trying to categorize. GOR looks at the residues
Intuitively, it assigns a structure based on probabilities it has calculated from protein databases. These probabilities are of the form
Accuracy
Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%.
(initially, higher scores were reported, but the experiments set to measure Q3 were flawed, as the test cases included proteins used to derive the propensities!)
Steps
Computes MS Algorithm Computes the distance between homologous sections Example : NNSSP
SDV Hyperplanes DS manipulation
Training Data
Test data
Validation
New data
Prediction
Neural Networks
Single sequence methods - train network using sets of known proteins of certain types (all alpha, all beta, alpha+beta) then use to predict for query sequence
NEURAL NETWORKS
Inspired by the brain Traditional computers struggle to recognize and generalize patterns of the past for future actions Brain as an information processing system contains 10 billion nerve cells or neurons and each neuron is connected to other neuron through about 10,000 synapses
Brain
Interconnected network of neurons that collect, process and disseminate electrical signals via synapses
Neural Network
Interconnected network of units (or nodes) that collect, process and disseminate values via links
Neurons Synapses
Nodes Links
Typical methodology used to train a feed-forward network for secondary structure prediction is based on Qian and Sejnowski, 1988
One or more hidden layers. Input projects only from previous layers onto a layer.
2-layer or 1-hidden layer fully connected network
Input layer
A network with feedback, where some of its inputs are connected to some of its outputs (discrete time).
Recurrent network
Input layer
Output layer
Features
A typical training set consists of 100 nonhomologous protein chains (15,000 training patterns)
A net with an input window of 17, five hidden nodes in a single hidden layer and three outputs will have 357 input nodes and 1,808 weights.
Two Ways: One is solely based on the construction principles of proteins associated with physico-chemical properties of amino acids. No concept of training is involved.
The other is to collect data sets with known structures, extract features and use machine learning algorithms for predictions.
Outline
1. Importance of Transmembrane Proteins 2. General Topologies 3. Methods (and challenges) for Structural Studies of TM Proteins
Transmembrane Proteins
v Cellular roles include: Communication between cells Communications between organelles and cytosol Ion transport, Nutrient transport Links to extracellular matrix Receptors for viruses Connections for cytoskeleton v Over 25% of proteins in complete genomes. v Key roles in diabetes, hypertension, depression, arthritis, cancer, and many other common diseases. v Targets for over 75% of pharmaceuticals.
Transmembrane Proteins
v Cellular roles include: Communication between cells Communications between organelles and cytosol Ion transport, Nutrient transport Links to extracellular matrix Receptors for viruses Connections for cytoskeleton v Over 25% of proteins in complete genomes. v Key roles in diabetes, hypertension, depression, arthritis, cancer, and many other common diseases. v Targets for over 75% of pharmaceuticals.
Approximately 30 thick Hydrophobic core + Hydrophilic or charged headgroups Mixture of lipids that vary in type of head groups, lengths of acyl chains, number of double bonds (Some membranes also contain cholesterol)
In order to be stable in this environment, a polypeptide chain needs to (1) contain a lot of amino acids with hydrophobic sidechains, and (2) fold up to satisfy backbone H-bond propensity - How?
PDB = 1QHJ
PDB = 1RRC
Single helix or helical bundles (> 90% of TM proteins) Examples: Human growth hormone receptor, Insulin receptor ATP binding cassette family - CFTR Multidrug resistance proteins 7TM receptors - G protein-linked receptors
PDB = 1EK9
PDB = 2POR
Beta barrels - in outer membrane of gram negative bacteria, and some nonconstitutive membrane acting toxins Examples: Porins
Single helix or helical bundles and Beta barrels Both topologies result in hydrophobic surfaces facing acyl chains of lipids Part protruding from membrane can be a very short sequence (a few amino acids), a loop, or large, independently folding domains
General Idea
We know what an alpha-helix or a beta strand looks like, so (1) figure out which parts of the sequence are helices and which parts are strands (2) figure out how they pack together For soluble proteins, neither is well predicted. But for transmembrane proteins ...
Summary
Transmembrane Proteins play many important processes in cellular processes in both health and disease Two general type of tertiary structure are found to cross the membranes: beta-barrels and alpha-helices Structural Studies of TM Proteins are impeded by difficulties in overexpression, purification and crystallization However, the few dozen structures that have been determined have provided key information about channels (gating, selectivity, etc.), energetics, transport, and other transmembrane processes Analysis of helical transmembrane protein structures may lead to accurate predictions of protein structure from amino acid sequence for this type of protein
Protein Conformations
Predict protein 3D structure from (amino acid) sequence Sequence secondary structure 3D structure function
73
74
75
Protein Structure
Protein 3D structure biological function Lock & key model of enzyme function (docking) Folding problem protein sequence 3D structure Structure prediction and alignment Protein design, drug design, etc The holy grail of bioinformatics
76
Can we predict the final 3D protein structure knowing only its amino acid sequence? Studied for 4 Decades Primary Motivation for Bioinformatics Based on this 1-to-1 Mapping of Sequence to Structure Still very much an OPEN PROBLEM
77
78
79
PSP: Goals
Accurate 3D structures. But not there yet.
Good guesses
Working models for researchers Understand the FOLDING PROCESS Get into the Black Box Only hope for some proteins 25% wont crystallize, too big for NMR Best hope for novel protein engineering Drug design, etc. 80
Comparative Modeling
Homology Modeling Threading
Template-Free Modeling
De novo/ab initio Methods
Physics-Based Knowledge-Based
82
Homology Modeling
83
Steps
84
Threading
a library of protein folds (templates) a scoring function to measure the fitness of a sequence -> structure alignment a search technique for finding the best alignment between a fixed sequence and structure a means of choosing the best fold from among the best scoring alignments of a sequence to all possible folds
85
ab initio Methods
86
The
ab
initio
approach
(Figure
6.25)
ignores
sequence homology and attempts to predict the folded state from fundamental energetics or
the
modelling
How to define the energy of a PROTEIN? How to find the conformations for which the energy is minimum?
Global Minimum
structure
To over come this reduce the resolution at which the potential function is calculated.
Instead of atom-atom potential The United atom potential would be an approximation. This approximation is also called as a Pseudo atom
93
Molecular Dynamics
Computation of dynamics or motion of a ptn.
1) The physical forces which influence the folding process are well represented by the semi empirical force fields. 2) The atoms of the ptn move independently upon induction of the force fields.. Calculated by Newtons laws of motion:F=ma (calculated for each fematoseconds)..
94
96
Trajectory
energy
Conformational space
97
98
ROSETTA ALGORITHM
Break target sequence into fragments of 9 amino acids
Use fragments as starting point for optimisation, using: - hydrophobic burial - polar side-chain interactions Create 1000 structures, and - hydrogen bonding between beta-strands Choose cluster centre as the - hard sphere repulsion (van der Waals) best prediction