Professional Documents
Culture Documents
of molecules
Contents
1 Integration Algorithms
1.1 Principles of Classical Mechanics .
1.2 Numerically calculating forces . . .
1.2.1 Forward Difference method
1.2.2 Central Difference method .
1.3 Euler Algorithm . . . . . . . . . .
1.4 Verlet Algorithm . . . . . . . . . .
1.5 Leap Frog Algorithm . . . . . . . .
1.6 Comparison . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Force Fields
2.1 Important Interactions . . . . . . . . .
2.1.1 Electrostatics . . . . . . . . . .
2.1.2 Magnetostatics . . . . . . . . .
2.1.3 Gravity . . . . . . . . . . . . .
2.1.4 Dispersion interactions . . . . .
2.1.5 Pauli repulsions . . . . . . . . .
2.1.6 Chemical bonds . . . . . . . . .
2.2 Simple Force Field expressions . . . .
2.3 Treatment of non-specific interactions
3 Environmental Controls
3.1 Thermodynamic ensembles .
3.2 Boundary Conditions . . . . .
3.3 Heat bath . . . . . . . . . . .
3.3.1 Andersen thermostat .
3.3.2 Berendsen thermostat
3.3.3 Extended Nose-Hoover
3.4 Pressure control . . . . . . . .
3.4.1 Berendsen barostat . .
3.4.2 Andersen barostat . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
3
3
4
4
6
7
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
11
11
12
12
14
15
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
thermostat
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
19
20
21
21
21
23
24
24
.
.
.
.
.
.
.
.
.
4 Principles of diffusion
26
5 Bash commands
27
The life
of molecules
1 Integration Algorithms
The key element of every Molecular Dynamics Engine is a stable integration algorithm that
accounts for correct time propagation of the system. As typical MD simulations propagate a
system for about 2 fs per step but simulation times of the order of 1 s shall be achieved, the
error per step must be very small to assure simulating a physically correct behavior.
x U
Fxi
U (r i )
~ (r i )
Fyi = y U
F i (r i ) =
= U
r i
Fzi
z U
Having computed the forces on the ith particle Newtons second law can be used to calculate
the acceleration of the particle caused by the potential. In order to do so the inertial mass of
the particle is needed as a parameter of the model.
Fxi
axi
Fyi = mi ayi
F i (r i ) = mi ai (r i )
Fzi
azi
Acceleration is defined as the rate with which velocity changes over time. Additionally, velocity
is the rate with which the position changes over time. Hence, acceleration is the second total
derivative of position with respect to time. With this relation, the change in position of a
particle can be related to the potential the particles feels at a given time t which allows for
estimating the particles position at time t + t.
d2
1
1 U (~ri )
ri =
F i (r i ) =
2
dt
mi
mi ri
Then only thing we still need to think about is how to integrate this equation of motion
efficiently. This is what integration algorithms can be used for.
The life
of molecules
f 0 (a) = lim
This expression is also called Differential Quotient. Instead of calculating the Differential
Quotient, the forward difference method approximates the first derivative by only computing
the Difference Quotient
Definition 1
Given an arbitrary function f : x 7 f (x), the forward difference first deriva0
tive fFD at a given position x = a is given by
0
fFD
(a) =
f (a + h) f (a)
h
for some constant h > m , where m is the machine precision of the machine used for the
computation.
Of course this approximation of the first derivative yields a certain error. The error can be
estimated by Taylor expanding the expression for the forward difference derivative
0
fFD
(x) =
f (x + h) f (x)
f (x) + hf 0 (x) + (h2 /2) f 00 (x) + . . . f (x)
=
h
h
0
00
= f (x) + (h/2) f (x) + . . .
For small h the error in this approximation is mostly determined by h itself and the second
derivative of the considered function. Assuming that the function is well-behaved, meaning
that it is sufficiently smooth, every derivative to any order is of about the same order of
magnitude as the function itself. Thus, we may conclude that the error is mostly determined
by the choice of h.
The trade-off in this method as in most of the methods for calculating first derivatives is that
on one hand h cannot be chosen arbitrarily small as we need to be significantly above the
machine precision but on the other hand h should be small enough to sufficiently decrease the
error in the computation.
It can be shown that for double precision the minimum error being the sum of the round-off
error and the approximation error of 3 108 is achieved for h 4 108
The life
of molecules
f (a + h/2) f (a h/2)
h
for some constant h > m , where m is the machine precision of the machine used for the
computation.
It can be shown that the error made by this method is significantly smaller than the error
of the forward difference method. In order to do so the defining expression is again Taylor
expanded
0
(x) =
fCD
f (x + h/2) f (x h/2)
h
2
h 0
f (x) + 2 f (x) + h8 f 00 (x) +
h3 (3)
(x)
48 f
+ . . . f (x) + h2 f 0 (x)
h
h2 00
8 f (x)
h3 (3)
(x)
48 f
...
The life
of molecules
(A):
(B):
f (x)
f (x)
f (x)
f (x)
f 0 (a)
f 0 (a)
0 (a)
fCD
0 (a)
fFD
a a+h
a h/2
a a + h/2
Figure 1: Derivative Methods: Depicted are the basic ideas behind the forward difference
method (A) and the central difference method (B). Both can be used to approximate
the first derivative of a given function but based on the definition of the methods the
central difference method performs significantly better than the forward difference
method even though the number of evaluations is the same.
time t + t can be calculated from the previous phase space point (r, v)(t) by simply Taylor
expanding the expressions for later times.
dr(t) 1
d2 r
+ (t)2 2 + . . .
dt
2
d t
dv(t)
v(t + t) = v(t) + t
+ ...
dt
r(t + t) = r(t) + t
Note that truncating the sums after any arbitrary order of t is exact for infinitesimal t. The
occurring derivatives can be computed from the underlying physics
dr(t)
= v(t)
dt
d2 r(t)
dv(t)
F (t)
=
= a(t) =
2
dt
dt
m
where F (t) are the forces acting on the particle at time t. A numerical way to calculate the
next position and velocity is now to consider a finite t t but still truncate the sums
of the Taylor expansion. This way a certain error is made, that is smaller the smaller t is
chosen. However, in order to achieve interesting time scales within a reasonable simulation
time t should not be chosen too small.
Given the position r N and velocity v N at step N of the simulation, position and velocities
can be updated via the following scheme
Definition 3
Euler Algorithm
r N +1 = r N + tv N +
v N +1 = v N + t
(t)2 F N
2
m
FN
m
Note that effectively there is only one force evaluation required for this integration algorithm.
Nevertheless, the error made by this algorithm in the position of the particle if of the order
of (t)3 as this is the first order that was neglected in the Taylor expansion.
The life
of molecules
The here presented Euler method is known to be a quite inaccurate integration algorithm
and should only be used as an academic example. Even though the Euler method can be
comparably fast, this algorithm is rather unstable.
+ ...
r(t t) = r t
+ (t)2
(t)
dt
2
dt2
6
dt3
r(t + t) = r + t
Both expressions can now be added and slightly rearranged, which leads to
r(t + t) = 2r(t) r(t t) + (t)2
d2 r(t)
+ ...
dt2
Again, note that truncating the sum at any arbitrary order of t still yields a true equation
for infinitesimal t. However, for numerical simulations a finite t t is required. By
considering the current and the previous position of the particular an equation could be
derived, which is exact up to terms of (t)4 , which is already an order of magnitude better
than the Euler Algorithm.
Substituting the second time derivative of the position with the underlying physics the Verlet
Algorithm can be effectively used to determine the position r N +1 of a particle at step N + 1
of a simulation.
Definition 4
Verlet Algorithm
r N +1 = 2r N r N 1 + (t)2
FN
m
Note that the Verlet algorithm does not explicitly include velocities. However, velocities can
be calculated using the basic idea of the central difference method
r(t + t) r(t) + tv(t)
r(t + t) r(t t)
2t
vN =
r N +1 r N 1
2t
which allows for subsequent computation of the velocities up to an error of (t)2 being very
important for thermodynamically controlling the system as we will see later on.
The life
of molecules
Using these two equations the Leap Frog algorithm can be formulated in the following way
Definition 5
(t)2 F N
2
m
t
[F N + F N +1 ]
2m
Note that even though the force appears to be evaluated twice during one iteration the number of force evaluation reduces to one by storing the previously calculated results. Hence, the
Leap Frog Algorithm provides also only one force evaluation.
In addition it can be shown that the Leap Frog Algorithm performs a symplectic integration,
meaning that phase space volumes are conserved by this algorithm. This feature arises from
the fact, that the algorithm propagates positions and velocities on independent time grids.
As the Hamiltonian of a single particle consists of one momentum dependent kinetic term and
one position dependent potential term, the total energy does not change when propagating a
system with this algorithm. Having this inbuilt feature of energy conservation we are already
capable of simulating many particle systems in the microcanonical ensemble.
1.6 Comparison
Aside from the already discussed numerical integration techniques there are many more integration algorithms that for example include higher orders in the Taylor expansion or show
The life
of molecules
better energy conservation behavior so that larger time steps are possible. However, these
techniques often include more force evaluations which might slow down the speed of the simulation significantly.
The main features a suitable simulation algorithm should have are
simplecticity
few force evaluations
high accuracy
high speed
time reversibility
The Leap Frog integration algorithm comes along with all these features and is thus one of
the most commonly used integration algorithms in Molecular Dynamics engines.
(A):
y
(B):
y
Euler
Verlet
Figure 2: Integration Algorithms: Depicted are the basic principles of simple integration
algorithms. The Euler Algorithm (A) only takes into account information about
the previous position as well as the slope at the current position while the Verlet
Algorithm (B) uses more information and thus stays closer to the actual trajectory
(black).
The life
of molecules
2 Force Fields
Considering a many particle system with interacting particles, the interactions between the
particles have to be modelled sufficiently accurate to assure correct simulation of the behavior of the system. However, to keep the algorithms simple and the parameter space small,
only important interactions shall be included into the model. In order to identify relevant
interactions in biological systems the order of magnitude of certain interactions in the regime
biomolecules exist in will be estimated first to then neglect interactions that are sufficiently
weak compared to others. Based on these considerations, an analytical expression for the
potential energy of a many particle system will be derived.
The life
of molecules
The force of a charge qi in the electric field Ej caused by another charge qj follows the simple
relation
F = qi Ej
Modelling the charge qj as a point charge1 the electric field and thus the force can be expressed
as follows
Ej =
qj
2
4rij
F =
qi qj
2
4rij
Considering the negatively charged phosphates in the backbone of DNA as an example, the
distance between the charges rij is about 2 nm while each phosphate carries one elementary
charge. Given these numbers, the force between the two phosphates is of the order of
Felect 7 pN
Here we see, that this force is well within the order of magnitude of forces that should be taken
into account. Hence, electrostatic interactions are indeed important to model the behavior of
biomolecules.
2.1.2 Magnetostatics
Nuclei in general carry a magnetic moment in addition to their electric moment. The nuclei with the greatest magnetic moment are protons, which are present in large number in
biomolecules due to the large amount of hydrogen of which biomolecules comprise. Opposed to
the electrostatic scenario, a magnetic moment B in the magnetic field B of another magnetic
moment feel a torque acting on them. The torque depends on both the magnetic moment
and the magnetic field and can be expressed as the product of a force F and a corresponding
lever arm r.
M = B B = F r
Considering the magnetic moment of a proton B = 1.4 1026 Nm/T and a lever arm of
10 nm, the resulting force in a super-strong 1 T magnetic field is
Fmagnet 1.4 107 pN
Magnetic fields are usually weaker than the considered one and the force even with such a
strong magnetic field is much smaller than forces which are relevant for biological systems.
Hence, magnetic interactions do not need to be modelled in a Molecular Dynamics simulation.
This assumption is not too bad because the diameter of the nuclei negligibly small compared to Bohrs
radius.
10
The life
of molecules
2.1.3 Gravity
Considering the forces on particles caused by gravity we need the typical mass of one of the
particles. Considering an entire protein, typically consisting of about 300 amino acids, we
arrive at a typical mass of about
mprotein,typical 60 1021 g
This mass already appears to be quite small. Furthermore, making the (very accurate) assumption that the gravitational potential for proteins on earth does not change along the size
of the protein, the force acting on the entire protein is of the order of
Fgravitational,typical = mprotein,typical g 6 1010 pN
where g = 9.81 ms2 is the gravitational acceleration. Since gravitational forces are much
smaller than the relevant force scale gravity can be neglected in any Molecular Dynamics
simulation.
2.1.4 Dispersion interactions
Dispersion forces are intramolecular forces arising from quantum-induces spontaneous polarization multipoles in molecules. A simple (but not accurate) picture of these types of
interactions is to think about two atoms with their electron clouds around their nuclei (compare figure 3). Assuming the existance of fluctuations, one electron cloud of one atom might
deviate such that the charge centers of positive and negative charges no longer coincide. Thus,
a dipole was created2 . The other atom is now located in the electric field of a dipole. As the
electron cloud and the nucleus react differently to the just arisen dipole the charge centers of
the other atom also separate creating another dipole.
Eventually there are two dipoles interacting through their electric fields. This classical picture allows for correct modelling of the spacial dependence of this interaction even though
a quantum mechanical description might be more appropriate in this case as it respects the
finite number of electrons and the wave nature of the electrons.
Nevertheless, both descriptions predict a potential energy in the form of
Edisp =
3 IA IB A B
2 IA + IB R6
Here, I indicates the first ionization potentials of the atoms and their dipole polarizabilities.
Note that this interaction is attractive due to the minus sign and that the spacial dependence
goes as r6 .
Dispersion relations are the reason why non-polar molecules stick together even though separated states would be favored entropically. Hence, the assumption that dispersion relations
play an important role for biomolecules seems valid.
Considering the finite number of electrons around a nucleus even higher order multipoles can easily be
created but higher order multipoles are typically weaker than dipoles so that they are neglected in this
case.
11
The life
of molecules
A:
B:
C:
Figure 3: Dispersion Interaction: Depicted is a classical picture of the dispersion interaction between two atoms. Nuclei are shown in red while electron clouds are shown
in blue. Usually, charge centers coincide (A), but due to fluctuations charge centers
might deviate from one another creating a dipole (B). While these fluctuations are
usually short lived for isolated atoms the just formed dipole can induce the creation
of another dipole in another atom as positive and negative charges react differently
to an electric field (C). A system of two induced dipoles has a much longer life time
than a dipole induced by fluctuations alone.
2.1.5 Pauli repulsions
Taking into account the quantum mechanical nature of electrons and nuclei, another interaction arises from a very fundamental quantum mechanics principle, the Pauli exclusion
principle. This principle basically states that two identical fermions3 cannot under any circumstances occupy the same quantum state.
Electrons are particles with spin 1/2 and thus fermions. The quantum state of an electron
in an atom is defined by the orbital the electron occupies and its spin. Due to the Pauli
principle, electrons can only occupy the same orbital if they have different spins. Since the
only spin states that are allowed for the electron are +1/2 and 1/2 there can only be two
electrons per orbital.
One important consequence of this principle is that atoms cannot come arbitrarily close to
one another. Since orbitals occupy a certain space around the nuclei the orbitals of two atoms
start to overlap if they get closer and closer. However, electrons with the same spin cannot
occupy the same orbital and thus be in the same space region. This results in an effective
repulsion.
There is no closed expression for this type of repulsion. However, some characteristics of this
repulsion are known. The repulsion appears on a small length scale but is very strong4 . One
popular mathematical expression used to model this phenomenon is an energy penalty term
of the following form
Eexcl =
K
r12
where K is some arbitrary constant. Other methods model the Pauli repulsion with exponential functions or just a hard wall.
2.1.6 Chemical bonds
In biomolecules, many chemical bonds are present. As biomolecules mostly consist of carbon,
oxygen, nitrogen, hydrogen, phosphate and sulphur, the chemical bonds are mostly either
3
4
12
The life
of molecules
covalent or non-covalent bonds. These types of bonds are characterized by particular energy
landscapes that can only be calculated quantum mechanically. Analytical solutions to these
problems can only be found for the H2 molecule, all other bonds can only be calculated
approximately by very sophisticated quantum mechanical techniques.
However, all energy landscapes describing chemical bonds between particular atom types have
in common that they are close to the minimum well approximated by a parabola. This idea
arises from the Taylor expansion. Considering any energy landscape E(x) along a reaction
coordinate x and Taylor expanding it around a local minimum x0 yields
E(x0 + x) = E(x0 ) + x
dE(x0 ) 1
d2 E(x0 )
+ (x)2
+ ...
dx2
| dx
{z } 2
=0
Carrying out the quantum mechanical calculations and computing the energy landscapes
suggests that the thermal energy present in biological systems is not sufficient to highly
excite chemical bonds. Thus, expected deviations from the local minimum are rather small,
indicating that the performed Taylor expansion is a good approximation in the considered
regime. Chemical bonds are therefore modelled with an harmonic potential which allows for
the analogy of connecting the atoms via springs.
In addition to the harmonic spring between two adjacent atoms also higher order interactions
need to be modelled. Since orbitals are not necessarily symmetrically distributed around
the nucleus spacial orientations have to be taken into account. This results in preferred
geometrical angles in three atom interactions as well as in preferred dihedral angles in four
atom interactions which influence the resulting geometry of the biomolecule enormously. The
basic idea behind these quantum mechanical effects is depicted in figure ??.
Similar to the energy term for the bond length between two atoms there is also a particular
r0
Figure 4: Orbital Geometry: When two atoms approach each other their orbitals (blue) can
fuse to form bonding and non-bonding orbitals. Orbitals in general occupy certain
space regions as indicated here. Due to the particular geometric distribution of
the orbitals certain conformations are highly preferred by the molecule compared
to others. It can be shown that there is an optimal bond length r0 as well as an
optimal angle 0 for two and three particle systems. Not shown are dihedral angles,
which are defined in four particle interactions.
energy landscape for the angle between three atoms. If calculating the optimal bond length is
already quite challenging, calculating the optimal angle is even harder. Nevertheless it can be
assumed that there exists one optimal angle 0 and that thermal excitations at temperatures
13
The life
of molecules
of 300 K do not change this angle drastically. Taylor expansion of the (unknown) energy
landscape yields
E(0 + ) = E(0 ) +
dE(0 1
d2 E(0 )
+ ()2
+ ...
d
2
d
| {z }
=0
Truncating the sum again allows for modelling the angular potential by a harmonic spring.
In addition due to the three dimensional nature of orbitals there is also another term that
needs to be added for four particle interactions which accounts for the dihedral angle between
the four particles. As this term arises from the repulsion of orbitals there is not only but local
energy minimum but many of them. The dihedral angle energy is most commonly modelled
via
X
k (1 + cos(n 0 ))
E() =
n
r
r
where r is the distance between the particles and and are parameters defining the position
and the depth of the minimum.
Combining all relevant interactions into one potential energy term V leads to
X
X
X
V =
kbond (r r0 )2 +
kangle ( 0 )2 +
kdihedral (1 + cos(n 0 ))
bonds
X
charges
angles
qi qj
+
4r
X
particles
12
r
dihedrals
6
r
This expression captures the most basic features of Molecular Dynamics even though there
are already many parameters included. The parameters can either be obtained directly from
experiments or by trying to fit thermodynamic predictions of Molecular Dynamics Simulation
to experimental results and correcting for the parameters in the MD model.
14
The life
of molecules
Additionally, there are endeavors to improve force fields by taking into account more com-
LJ, Coulomb
15
The life
of molecules
A:
B:
V [ ]
V [ q]
0.2
0.2
0.5
1.0
1.5
2.0
2.5
3.0
0.4
0.6
0.8
1.0
r [ 1/ ( 4 ) ]
r [ ]
-2
- 0.2
-4
- 0.4
-6
- 0.6
- 0.8
-8
- 1.0
- 10
Figure 6: Non-specific interactions: Depicted are the general shapes of the Lennard-Jones
potential (A) and the Coulomb potential (B) in natural units. While the LennardJones potential quickly decays for long distances the Coulomb potential decays much
slower making it necessary to introduce larger cut-offs for the Coulomb potential or
treat coulombic interactions with the Particle Mesh Ewald method.
than van der Waals cut-offs.
But there is a method that allows for a more accurate computation of the Coulomb interactions. This method is called the Particle Mesh Ewald and works particularly well together
with periodic boundary conditions. The general idea of this method is to split potential contributions into short range and long range contributions. While short range contributions
quickly converge in real space, long range contributions quickly converge in Fourier space.
By Fourier transforming the system5 the long range contribution can be calculated efficiently.
Nevertheless there are currently ongoing debates whether or not PME artificially constraints
the system to rather crystal-like conformations as due to the finite simulation box size there
is some (small) area around the origin in Fourier space that does not correspond to the simulated system.
In addition to these techniques for potential computation treatment simulations can further
be sped up by introducing the concept of neighbor lists. To reduce the error of the integration
algorithm typical time steps chosen for the algorithm are in the order of 2 fs. Given this small
time step atoms will not move far for a small amount of integration steps. Having identified
the particles which only weakly interact with the considered particle and thus were neglected
it can be assumed that those particles can also be neglected for the following integration step.
The list of the particles which need to be considered for the evaluation of the potential is
called neighbor list. Depending on the particles simulated neighbor lists are typically updated
every 10 to 100 steps. Note that this parameter should be defined with care.
Note that there are Fast Fourier Transform algorithm which scales as N log N .
16
The life
of molecules
Figure 7: Neighbor Lists: Whenever the force on one particular particle (red) needs to be
calculated, certain cut-offs can be applied to reduce the cost of the force evaluation. The contribution of particles within a specified cut-off (green) is taken into
account while all others are ignored for the force evaluation. However, calculating
distances between particles is costly and scales badly ( N 2 ) so that a neighbor
list is defined containing all particles that are within the force evaluation cut-off
(green) and particles which might enter this zone as long as the neighbor list is not
updated (yellow). Applying this method distances between all atoms only need to
be calculated for every update of the neighbor list.
17
The life
of molecules
3 Environmental Controls
Having a symplectic integration algorithm and a force field describing the potential energy of
the system still does not allow for correct modelling of the dynamics of the biomolecule. First
of all biomolecules experience constant temperature and pressure as they are coupled to the
environment and can exchange energy.
Additionally, in most applications biomolecules exist in aqueous solution. Water is rather
strong dipole and various examples have been found where certain processes only occur due
to the presence of a single water molecule. Aside from the water molecules also ions can have
a huge impact on the behavior of the system. For example in the case of DNA many negative
charges due to the backbone phosphates exist in the system. However, diffusion in real system
allows for local neutralization so that both, water molecules and ions have to be added to the
system to simulate biological relevant dynamics. Coupling the system to temperature and
pressure controls however is slightly more challenging than just adding more molecules to the
system.
Furthermore, in order to simulate the system at either constant volume or constant pressure
boundaries need to be defined. Introducing walls to keep the particles within a specific volume
causes undesired artefacts: As particles will be reflected by the wall, the density of particles
of the same species will be smaller close to the wall compared to the bulk.
How to find solutions to all of the mentioned problems will be discussed in the following
chapter after a short introduction into thermodynamic ensembles.
18
The life
of molecules
dynamic ensemble is called the canonical ensemble with the Helmholtz free energy A as
the appropriate thermodynamic potential. The change in free energy is given by
dA = SdT pdV + dN
which can be derived by Legendre transforming the total energy with respect to temperature.
Since in this ensemble volume, temperature and number of particles does not change the
Helmholtz free energy fully characterizes one particular state and is therefore well suited to
distinguish different states. For this example it can be relatively easy shown that the states
are no longer equally distributed but rather follow a probability distribution N V T (r N , pN )
which weighs the energies in a so-called Boltzmann factor
N
3N
N V T (r , p ) = N !h
eH(r ,p )
dr N dpN eH(rN ,pN )
Here, the factor N !h3N acts as a normalization factor. While N ! can be understood from the
indistinguishableness of the particles, h3N is a normalization factor for the volume of phase
space. Quantum statistical analyses are required to show that this normalization factor is
identical to Plancks constant.
Aside from temperature also pressure is often controlled. In order to also impose a pressure
control to the system the pressure is measured during the simulation and the volume adapted
accordingly. The corresponding ensemble is the isothermal-isobaric ensemble and well
described by the Gibbs free energy G
dG = SdT + V dp + dN
which is derived from the Helmholtz free energy through a Legendre transform with respect to
the volume. As temperature, pressure and number of particles are constant in this ensemble,
the Gibbs free energy can be used as a measure to distinguish states.
For reflecting the particle the wall exerts twice the momentum the particle has perpendicular to the wall
plane to the particle
19
The life
of molecules
Figure 8: Periodic boundary conditions: Depicted is the set-up of a small DNA strand
in aqueous solution with periodic boundary conditions. Due to the boundary conditions, particles at the periphery of the simulation box are not only affected by
potentials caused by close-by atoms but also by potentials caused by periodically
close atoms. Hence, the effective potential a particle feels is the potential caused by
the system and a mirror image of the system. Effectively, the system is replicated
throughout space as depicted.
which repels particles not only at the exact position of the walls but also in close proximity.
In order to make sure that this potential does not affect the molecules of interest a sufficiently
thick layer of solvent molecules is needed that separates the molecule of interest from the wall.
Diffusion processes of the molecule of interest complicate the simulation procedure.
These undesired effects can be avoided by imposing periodic boundary conditions. When
using periodic boundary conditions, the simulated system is virtually replicated beyond the
walls. Molecules that would leave the simulation box enter at the other side. Strictly speaking
there is no box but only a huge space with the same system over and over again (compare
figure 8). Even though periodic boundary conditions to not introduce virtual potentials to
the system there are some complications. The molecule of interest might interact with its
mirror image. Hence, a sufficiently thick (but much thinner compared to rigid walls) layer of
solvent molecules is needed to separate the molecule from its replica.
X1
f
kB T =
mi vi2
2
2
i
20
T =
2 X1
mi vi2
f kB
2
i
The life
of molecules
Note that vi is a one dimensional velocity. Using this expression and knowing the degrees of
freedom temperature can be measured numerically during a simulation.
Based on this measurement methods can be applied which correct for the temperature and
drive it towards a desired reference temperature.
3.3.1 Andersen thermostat
The Andersen thermostat follows the idea that the velocities of an idealized gas follow a
Maxwell-Boltzmann distribution. This distribution is given by
s
3
2
m
mv
f (v) =
4v 2 e 2kB T
2kB T
and can be derived from Statistical Mechanics. When simulating a many particle system
controlled by the Andersen thermostat the particles are propagated in time unperturbed
according to Newtons laws. However, collisions occur at uncorrelated Poissonian distributed
times t, where
P (t) = exp(t)
and is the collision rate. Whenever a collision occurs, the velocity of the colliding particle is
randomly drawn from the Maxwell-Boltzmann distribution at the desired reference temperature.
Since collisions occur randomly and uncorrelated and velocities are drawn from the correct
statistical mechanics based distribution the Andersen thermostat produces a correct canonical ensemble. However, as velocities are randomly drawn after every collision the Andersen
thermostat is not suited for investigating dynamic processes such as diffusion.
3.3.2 Berendsen thermostat
The Berendsen thermostat attempts to control temperature by rescaling velocities at every
time step according to
v new = v old
ttime step
1+
1/2
T0
1
T
It is apparent that this method drives the system towards the desired reference temperature
with a damping constant . Nevertheless this method is less thermodynamically profound.
It can be shown that the Berendsen thermostat does not necessarily resemble a canonical
distribution.
Despite this great disadvantage the Berendsen thermostat is widely applied for equilibration purposes as it can relatively fast adjust the temperature. Furthermore, the Berendsen
thermostat perturbs the system only weakly.
3.3.3 Extended Nose-Hoover thermostat
A more sophisticated thermostat based on extended Lagrangians is the Nose-Hoover thermostat. For a simulation of a system with this thermostat the Lagrangian of the system contains
21
The life
of molecules
additional, artificial coordinates which are used for the temperature control. Considering the
usual Lagrangian L(r, v of classical mechanics
L(r, v) = T (v) V (r)
with T (v) being the kinetic energy and V (r) being the potential energy. The momenta
associated with r can be obtained via
p=
L(r, p)
v
X mi
2
s2 v 2i V (r) +
Q 2 g
s ln s
2
The parameter Q acts as an effective mass for the associated coordinate s. The corresponding
momenta are
LNose
= mi s2 v i
v
LNose
= Qs
ps =
s
pi =
X
i
p2s
p2i
ln s
+g
+
U
(r)
+
2
2mi s
2Q
1
ZNose =
dps dsdrdp(HNose E)
N!
"
#
2
X p0 2
1
p
g
i
=
dps dsdrdp0 s3N
+ V (r + s + ln s E
N!
2mi
2Q
i
C
3N + 1
0
0
H(p , r)
ZNose =
drdp exp
N!
g
In addition with have to use the following result from distribution theory
[h(s)] = (s s0 )/|h0 (s)|
22
P
i
p0i 2
2mi
+ V (r) allows
The life
of molecules
where
exp[E(3N + 1)/g]
C=
g
3N + 1 p2s
dps exp
g
2Q
Using these equations of motion the system can be simulated at a desired reference temperature. Even though this thermostat seems to work fine in theory it can be shown that the
method remains non-ergodic for insufficient chaotic systems such as the harmonic oscillator.
r i r U =
r i f i = N kB T
3
3
i
Here, the force f i represents both the internal forces and the external forces which are related
to the external pressure. The effect of the container walls on the system is given by
*
+
1 X
ext
ri f i
= pV
3
i
1X
1X
r i i U =
r i f inter
=W
i
3
3
i
23
The life
of molecules
t
(p0 p)
tp
The equations of motion can then be readily obtained from the Lagrangian of the system
LV = T + TV U UV
24
The life
of molecules
Note that these methods are designed to simulate a system in the isobaric-isoenthalpic ensemble. Only the additional coupling to a temperature bath provides sampling in the desired
isobaric-isothermal ensemble.
In addition to the presented environment controlling schemes there are also other, more sophisticated methods that carry out the presented basic ideas more carefully and thus provide
better results. Nevertheless the historical starting point for correct thermodynamics in manyparticle simulations were the coupling schemes presented here.
25
The life
of molecules
4 Principles of diffusion
Suppose we have a molecular system of N particles when an intermolecular interaction is
present, such that the ith particle experiment an effective force due to the N 1 particles.
Initially each particle have a coordinate ~ri and velocity ~vi , for i = 1, . . . , N . If we consider
the time evolution of the system we will encounter the system in other configurations respect
to the initial conditions. This is because, the system searches equilibrium states. One of
the processes present in this evolution, between equilibrium states, is the net movement of
particles known as diffusion.
Diffusion refers to the random walk of the center of mass of a tagged particle (the tracer
particle) caused by the collision (hard sphere) with the surrounding particles. The most
important quantity that characterizes the translational diffusion of the center of mass of a
particle is the so-called mean square displacement (MSD) W (t), which is defined as
W (t) =
1
h|~r(t) ~r(0)|2 i
2d
, where ~r(t) is the position vector of the center of mass of the tracer particle at time t, and
hence, ~r(t) = ~r(t) ~r(0) is the particle displacement during a time interval t. A factor 1/2d
has been included into the definition of the MSD for convenience, where d denotes the system
dimension. The reference time t = 0 is taken at the initial experimental time, the brackets
h i denote, in general, an ensemble average.
Suppose that al time t = 0 a particle tracer has a velocity ~v0 , for very short times, when the
molecule velocity has hardly changed under the impact of surroundings, ~r(t) ~r(0) ~v0 t and
hence,
W (t) t2
For long times, when the particle has experienced many collisions with the surroundings, the
MSD changes into a linear function of time, i.e.,
W (t) = Dt
where D is referred as to the single particle diffusion coefficient. We want measure this
quantity to characterize the freedom of movement of the molecules in the system.
26
The life
of molecules
5 Bash commands
Below is a list of commands that might become useful during the course.
cd directory
cp file
grep expression file
htop
ls
mkdir directory
more file
mv file
rm file/directory
scp file
ssh user@IP
top
vi file
vim file
change directory
copy file
find expression in file
show running processes (more detailed)
list all files and directories that are located in the current directory
create directory
show first lines of file
rename file
delete file (use carefully as files cannot be recovered)
copy file from other computer
connect to user at IP
show running processes
show file
show file
Table 1: List of Bash commands
Below is a list of gromacs executables that might become useful during the course
editconf
genbox
grompp
mdrun
pdb2gmx
trjconv
27