You are on page 1of 72

Molecular

dynamics

ECE 697S: Topics in


Computational Biology
Senthamizhselvi.K
May 30, 2007
Outline
• Why simulate motion of molecules?
• Energy model of conformation
• Two main approaches:
– Monte Carlo - stochastic
– Molecular dynamics - deterministic
• Applications of MD
• Speeding up MD
Why simulate motion?
• Predict structure
• Understand interactions
• Understand properties
• Learn about normal modes of vibration
• Design of bio-nano materials
• Experiment on what cannot be studied
experimentally
• Obtain a movie of the interacting
molecules
Structure implies function
• central dogma: sequence to
structure
• After protein is generated, it
folds without the help of
external agents (except
solvent)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Difficult problem for humans
to predict 3-d structure from
1-d protein sequence
• Organic chemists know how
structure implies function
for small molecular systems.
Proteins are much larger,
and more difficult. Open
problem. central dogma
Aquaporin simulation

2004 Winner of Visualization Challenge in Science and Engineering,


Organized by the National Science Foundation and Science Magazine.
Aquaporin simulation

QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.

Structure, Dynamics, and Function of


Aquaporins
http://www.ks.uiuc.edu/Research/aquaporins/
Method of Molecular
Dynamics
• Use physics to find the potential
energy between all pairs of atoms.
• Move atoms to the next state.
• Repeat.
Energy Function
• Target function that MD tries to
optimize
• Describes the interaction energies of
all atoms and molecules in the
system
• Always an approximation
– Closer to real physics --> more realistic,
more computation time (I.e. smaller
time steps and more interactions
increase accuracy)
Scale in Simulations
mesoscale continuum

Monte Carlo
Time Scale

10-6 S molecular
dynamics domain

quantum
10-8 S chemistry exp(-∆E/kT)

10-12 S F = MA

Ηψ = Εψ

10-10 M 10-8 M 10-6 M 10-4 M


Length Scale

Taken from Grant D. Smith


Department of Materials Science and Engineering
Department of Chemical and Fuels Engineering
University of Utah
http://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt
The energy model
• Proposed by Linus
Pauling in the 1930s
• Bond angles and
lengths are almost
always the same
• Energy model broken
up into two parts:
– Covalent terms
• Bond distances (1-2
interactions)
• Bond angles (1-3)
• Dihedral angles (1-4)
– Non-covalent terms
• Forces at a distance
between all non- http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechani
cs_document.html
bonded atoms The NIH Guide to Molecular Modeling
The energy equation
Energy =
Stretching Energy +
Bending Energy +
Torsion Energy +
Non-Bonded Interaction Energy

These equations together with the data


(parameters) required to describe the behavior
of different kinds of atoms and bonds, is called a
force-field.
Bond Stretching Energy

kb is the spring constant of the bond. Unique kb and r0 assigned for each bond
r0 is the bond length at equilibrium. pair, i.e. C-C, O-H
Bending Energy

kθ is the spring constant of the bend. Unique parameters for angle bending are
assigned to each bonded triplet of atoms
θ0 is the bond length at equilibrium. based on their types (e.g. C-C-C, C-O-C, C-
C-H, etc.)
The “Hookeian” potential

kb and kθ broaden or steepen the slope of the parabola.

The larger the value of k, the more energy is required to deform an angle (or
bond) from its equilibrium value.
Torsion Energy

A controls the amplitude of the curve The parameters are determined from
curve fitting.
n controls its periodicity
Unique parameters for torsional rotation
φ shifts the entire curve along the are assigned to each bonded quartet of
rotation angle axis (τ). atoms based on their types (e.g. C-C-C-
C, C-O-C-N, H-C-C-H, etc.)
Torsion Energy
parameters

A is the amplitude.
n reflects the type symmetry in the dihedral angle.
φ used to synchronize the torsional potential to the initial rotameric state of
the molecule
Non-bonded Energy

A determines the degree the attractiveness A determines the degree the attractiveness
B determines the degree of repulsion B determines the degree of repulsion
q is the charge q is the charge
Non-bonded Energy
parameters
Energy Model in Action
Stochastic method
aka Monte Carlo

QuickTime™ and a
TIFF (Uncompressed) decompressor

Deterministic method
are needed to see this picture.

aka Molecular
Dynamics (some
ambiguity of how
term is used)

Goal: energy minimization


Simulating In A Solvent
• The smaller the system, the more
particles on the surface
– 1000 atom cubic crystal, 49% on surface

– 10^6 atom cubic crystal, 6% on surface

• Would like to simulate infinite bulk


surrounding N-particle system
• Two approaches:
– Implicitly
– Explicitly
• Ex. Periodic boundary
conditions
Schematic representation of periodic
boundary conditions.
http://www.ccl.net/cca/documents/molecular-
modeling/node9.html
Monte Carlo
Explore the energy surface by randomly
probing the configuration space
Metropolis method (avoids local minima):
1. Specify the initial atom coordinates.
• Select atom i randomly and move it by random displacement.
• Calculate the change of potential energy, ∆E corresponding to
this displacement.
• If ∆E < 0, accept the new coordinates and go to step 2.
• Otherwise, if ∆E ≥ 0, select a random R in the range [0,1] and:
• If e-∆E/kT < R accept and go to step 2
• If e-∆E/kT ≥ R reject and go to step 2
Properties of MC simulation
• System behaves as a Markov process
– Current state depends only on previous.
– Converges quickly.
• System is ergodic (any state can be
reached from any other state).
• Accepted more by physicists than by
chemists.
– Not deterministic and does not offer
time evolution of the system.
Deterministic Approach
• Provides us with a trajectory of the
system.

• Typical simulations of small proteins


including surrounding solvent in the
pico-seconds.

∂E r r
Fi = F =ma 
∂x i
Deterministic / MD
methodology
• From atom positions, velocities, and
accelerations, calculate atom positions
and velocities at the next time step.
• Integrating these infinitesimal steps
yields the trajectory of the system for
any desired time range.
• There are efficient methods for
integrating these elementary steps with
Verlet and leapfrog algorithms being
the most commonly used.
MD algorithm
{r(t+∆t), v(t+∆t)}

• Initialize system {r(t), v(t)}

– Ensure particles do not overlap in initial


positions (can use lattice)
– Randomly assign velocities.
• Move and integrate.
Leapfrog algorithm
Paper: The role of molecular
modeling in
bionanotechnology

• Molecular modeling for r+d of


bionanotechnology
• Three case studies:
– Polarizability of carbon nanotubes
– Nanopores for DNA sequencing
– Nanodiscs for experimenting on
membrane proteins
Towards an efficient description of
carbon nanotube-biomolecule
interaction

• Single-walled carbon nanotubes


(SWNT) can be used for in vivo
biosensor applications.
• SWNTs are highly-polarizable
• Used MD to develop accurate
description of the influence of
biomolecules on nanotube electronic
properties.
• Furnishes ability to test nanotube
electronic properties in realistic
biological environments.
Polarizable SWNT model
• Studied water-nanotube and ion-nanotube
interactions.
• Engineers seek to control the transport of ions or
charged molecules through SWNTs by means of
external electric fields.

(a) Top view (b) side view of an ideal 12-section SWNT. © Potential energy profiles for same SWNT
interacting with a water molecule of fixed orientation at various positions along tube axis.
Case study 2:
Nanopore device for high-
throughput sequencing of DNA

• With current technology, individual


genomes determined to 99.99%
accuracy within 2 months for 10
million dollars.
• Describes a device that may
provide alternative
Nanopore device
• 2-nm-diameter pore in
a 2-5nm thick
synthetic membrane.
• Sequence of a DNA
molecule can be read
by such a device
through a
semiconductor
detector in the pore
– A,C,G, and T
nucleotides differ by
only a few atoms
– Essential to
characterize
conformation of DNA
Typical translocation event

• A 1.4V bias applied to membrane.


• 20 base-pair fragment of double stranded DNA placed in
front of a nanopore.
• End of DNA nearest to the pore is pulled into the pore by its
charged backbone (a,b)
• System reaches a meta-stable state (c) and translocation
halts.
• Base pairs start to split. Some freed nucleotides adhere to
pore surface.
• Voltage increased momentarily to drive system out of
metastable state.
• DNA exits pore. One of the bases holds firmly to the pore
surface.
Case study 3:
Nanodiscs: discoidal protein-lipid
particles
• Nanodiscs provide a platform in which to
embed, solubilize, and study single membrane
proteins.
• Membrane proteins are difficult to study
experimentally.
• Often studied under non-native conditions,
with artifacts unknown.
MD simulation of assembly of nanodiscs
• Using an optimized
starting ratio of protein
to lipid, homogeneous
nanodiscs form with
discrete size and
composition.
• Predict structure of
nandodiscs by simulating
the experimental self-
assembly.
• To account for full
process, used Marrink et-
al coarse-grained lipid-
water model.
– Protein amino acid residues are
represented by 2 beads
– DPPC lipids represented by 12 beads.
– 4 water molecules are represented 1
bead.
Conclusions
• Protein folds from 10 to 100s of µs.
Longest simulations ~1 µs.
• Can we speed up MD?
• Would like:
– larger systems QuickTime™ and a
Video decompressor
are needed to see this picture.

– longer runs
– more trajectories
• approaches:
– Mix in experimental data Steered MD example
Bacteriorhodopsin
– Steered MD Steered molecular dynamics,
performed by applying a series of
– Massive parallel computation external forces to retinal, allow one to
extract retinal from bR once the Schiff
– Course-grained methods base bond to Lys216 is cleaved
Future of MD
• Review of deterministic method for
molecular dynamics
• Two examples of approaches for
making MD ‘faster’
• Questions
MD algorithm
• Initialize system
– Ensure particles do not overlap in initial
positions (can use lattice)
– Randomly assign velocities.
• Compute Energy
Etotal =
Ebond + Etorsion + Ebond-angle + Enon-bond
• Get velocities of atoms, then move for
timestep (~1fs)
∂E v i (t +
∆t) =
v i (t) +
ai (t) •
∆t
Fi = r r
∂x i F =ma  {r(t+∆t), v(t+∆t)}

{r(t), v(t)}
Two example approaches to
speeding up molecular
simulation

• Collect large number of


trajectories using traditional MD,
and see if folding rates agree with
lab results.
• Exploit geometry of the system to
remove number of nodes that
must be considered.
Paper: Absolute comparison of
simulated and experimental
protein-folding dynamics

• Statistically compared computational


predictions and experimentally
determined mean folding times for
small protein (BBA5).
• Used massively distributed cluster to
accumulate thousands of 5 to 20 ns
trajectories.
• Found computational predictions in
excellent agreement with
experimentally determined mean folding
times and equilibrium times.
• Similar to SETI@home, volunteers run
screensaver software to donate unused
CPU cycles.
• Since October 1, 2000, over 1,000,000
CPUs throughout the world have
participated in Folding@Home.
Approach to simulation
• Simulated two BBA5 mutants
– Considered fast-folding, mean folding time
10 µs
– 23 residues
– Has β-hairpin turn and α-helix motif
• Statistically, 10 out of 10,000 molecules
will be folded after 10 ns.
• Qualitatively reasonable but arbitrary
criteria used to discriminate folded
structures.
– Each final conformation aligned to Cα
positions of low energy BBA5 NMR structure
and calculated r.m.s.d.
– A conformation was folded if it had the
Results
•Acquired a total of 32500 trajectories
•Observed β-hairpin in 1100 trajectories
•Observed α-helix in 21000 trajectories
•Of 9000 double-mutant trajectories, 16 were folded
after 20ns.
Simulation Experimental
Mean 6 µs 7.5 ± 3.5 µs
folding time
Unfolding Slightly faster 4-times
rate than folding faster than
folding
Results
Limitations
• Criteria for discrimination of folded
structures is arbitrary
• Does not represent structure
prediction, because relies on Cα
positions of BBA5
• Native state flexibility may be
indication that force field is
insufficiently accurate
• Experimental folding rate uncertainty
large (instrument latencies)
Conclusions
• Contributes to body of knowledge about
rapid protein dynamics.
• Found following consistent between
simulation and experimental evidence :
– Helical structure in the unfolded state
– Rate of helix formation
– Rate of hairpin formation
– Rate for folding
• “The ability to investigate directly
folding dynamics and heterogeneity
with molecular dynamics will bring
molecular dynamics to the same
prominence in protein folding that it has
already achieved in structural
modeling.”
Paper: Constrained
geometric simulation of
diffusive motion in proteins
• Old method
– MD considers all 3N degrees of motion
– Accurate trajectories for atoms only obtained
with short time step (~femtosecond)
– 10ns requires weeks of CPU time to compute
– Most biologically relevant motions take place
on milliseconds to seconds timescale
• New method
– Geometric simulation for proteins
Differs from MD
• Not time-dependent, but distance
dependent
– Obtain collections of possible conformations,
which define a trajectory that respects stereo-
chemistry
– Measure of the progression of motion is
distance rather than time:
• Mobility is monitored by RMSD from a reference
conformer.
Two components to
geometric simulation
• FIRST - Find which regions are rigid
and which are flexible
• FRODA - generate sterically accurate
motions of a protein using rigidity
information
FIRST
• Floppy Inclusion and Rigid Substructure
Topography
• Given a structure (graph of atoms and
covalent bonds), identify rigid and
flexible regions
FRODA
• Framework Rigidity Optimized
Dynamic Algorithm
• Finds continuous pathways as it
traverses allowed regions within
conformational phase space.
• Evaluates non-covalent interaction
constraints (i.e. steric clashes and
hydrophobic tethers) simultaneously
with the geometric constraints.
Steps
1. Perform static rigidity analysis on
the protein (FIRST)
2. Perform geometric simulation
(FRODA)
Rigidity Analysis
• Protein is a network. All covalent
bond lengths and angles are
constrained.
Finding Rigid components
A bond-bending graph G is rigid in R3 iff
e ≤ 3n- 6 for every subgraph of G having
n graph vertices and e graph edges.
Ring of 6 atoms and bonds

G2 Bond Bending Graph:


|V|=6, |E| = 12
|E| = 3|V|-6
Constraints
• A subgraph is “over-constrained” if e > 3n-6
– If you remove an edge, component is still rigid
• A subgraph is “under-constrained” if e < 3n-6
– Component has 3n-6 - e degrees of freedom.

Under-constrained
e = 5 n = 4 3n - 6 = 9 over-constrained
e = 8 n = 4 3n - 6 = 5
Complexity of finding rigid
components
• Brute force method to find all rigid
subcomponents takes exponential time (2N
subgraphs in any graph).
• FIRST uses pebble game algorithm, which
takes O(N2) time, and scales linearly in
practice.
– Essence of pebble game: an algorithm for
distributing the degrees of freedom belonging
to the atoms (‘pebbles’) over the bonds
(‘constraints’) to determine the rigidity.
– More on pebble games at
http://flexweb.asu.edu
FIRST
• decomposes a protein down to its
rigid and flexible components
FRODA
• Uses geometrical constraints
• Claim: geometric simulation is an efficient
method to obtain the mobility that results
from flexibility.
• Uses creative algorithm for generating
motion while enforcing constraints
– Replace interatomic potentials by fictitious rigid
bodies (the ghost templates)
– Atoms bound to the vertices of ghost
templates.
Ghost templates
• A ghost template is a rigid body that
consists of set of mutually rigid
atoms
• Since they are rigid only 6 DOF
• Atoms are bound to vertices of ghost
templates
Ghost templates
• Defined by least-square fit of atoms
• Vertex corresponds to specific edge
Move set

Fit templates

Perturb

Fit atoms
Move set

The motion of an ethane molecule (a) initial atomic positions; (b) ghost templates; (c) random
atomic displacement; (d) fitting of ghost templates to atoms; (e) refitting of atoms to ghost
templates; (f) and (g) further iterations of (d) and (e); (h) until a new conformer is found.
Steric clashes
• Small bias to the
atomic motion during
the fitting (atoms to
templates) process
– Cutoff of 62% of radius
• Sterics are always
ensured (no
rejections)
• Similarly,
hydrophobic tethers
are ensured
– Bias given if two atoms
with hydrophobic
tether move further
apart than sum of radii
plus .5 Å
Algorithm
• Individual step ~.1
to ~.4Å RMSD
• If algorithm does
not converge in a
fixed number of
steps, restart from
earlier conformer
Mobility results
50 CPU minutes!
Barnase
110 residue

Above left: NMR ensemble of 20


conformers
Above right: single x-ray structure plus
FRODA conformers
Right: RMSD for each residue, determined
from figures above.
Directional motion
• Given a starting point and ending point
can we find a path?
– Rather than randomly perturb atoms, bias
towards desired structure
• Include RMSD to target conformer in ghost/atom
fitting
– Can generate a pathway from one conformer
to the other.
Directional motion results

DHFR closed to occluded


Enzymatic protein
Has three very similar conformations (open, closed, and occluded), most
significant differences in residues 14-24 (M-20 loop)/
Mobile loop in closed (red) and occluded (blue) positions.
Directional motion results

From closed to occluded


Directional motion results
Results
• FRODA can produce a sterically
correct pathway (all motions within
allowed regions).
• Does not show time-dependency.
• A flexible motion of a typical protein
(~100 residues) investigated in 10 to
100 min.
Conclusions
Massively parallel Geometric abstraction
traditional MD method
• Can find trajectories • Can find a full
that are at most ~10 sterically correct
to ~20ns time pathway from initial
duration. conformation to target
conformation
• Time dependent.
Most accurate • Not time dependent.
simulation of actual Not an accurate
simulation of motion
motion of each atoms
of each atom.
Questions
• Can we make some relaxations to
MD while minimizing loss of
accuracy?
• Can we make geometric simulation
time-dependent?
• Any other ideas for course-grained
methods?
References
• Understanding Molecular Simulation: from Algorithms to
Applications. Daan Frenkel, Berend Smith. 2002. Academic
Press. San Diego, California.
• Deyu Lu, Aleksei Aksimentiev, Amy Y. Shih, Eduardo Cruz-Chu,
Peter L. Freddolino, Anton Arkhipov, and Klaus Schulten. The role
of molecular modeling in bionanotechnology. Physical Biology,
3:S40-S53, 2006.
• Snow CD, Nguyen H, Pande VS, Gruebele M. Absolute comparison
of simulated and experimental protein-folding dynamics. Nature
420(6911):33-4, Nov 7 2002.
• Stephen Wells, Scott Menor, Brandon Hespenheide, and MF
Thorpe. Constrained geometric simulation of diffusive motion in
proteins. Physical Biology 2:S127-S136, 2005.
THANK YOU

You might also like