You are on page 1of 46

VIRTUAL SCREENING by X-ray crystallography [1,2] or nuclear mag-

netic resonance spectroscopy (NMR) [3]. How-


ever, only very few drugs resulted from those
INGO MUEGGE
early efforts. Examples include captopril as an
SCOTT OLOFF
Boehringer Ingelheim angiotensin-converting enzyme inhibitor [4]
Pharmaceuticals, and metothrexate as a dihydrofolate reductase
Inc., Ridgefield, CT inhibitor [5]. The reasons for this somewhat
disappointing drug yield lie in the low resolu-
tion of the protein structures as well as limita-
1. INTRODUCTION tions in compute power and methods. Re-
searchers have often tried to de novo design
Since the 6th edition of Burgers Medicinal the final drug candidate on a computer screen.
Chemistry and Drug Discovery went to print The compounds suggested have often been
six years ago, the field of virtual screening difficult to synthesize; initial failure in exhibit-
(VS) has expanded dramatically in scope as ing potency has often resulted in the termina-
well as in breadth of application. Figure 1 tion of structure-based projects. At the end of
attests eloquently to the increased use of VS the 1980s, rational drug design techniques
in medicinal chemistry-related research and became somewhat discredited due to the high
drug discovery. In 2008 alone, there are more failure rate in drug discovery projects.
than 404 articles in journals published by the In the 1990s, drastic changes occurred in
American Chemical Society that contain the the way drugs are discovered in the pharma-
phrase virtual screening; 50 of them main- ceutical industry. High-throughput synth-
tain this phrase in its title (compared to only 6 esis [6,7] and screening techniques [8] chan-
in 2002). These numbers amount to a >2.5- ged the lead identification process that is now
fold increase in yearly publications related to governed not only by large numbers of com-
VS over these 6 years. New areas of interest pounds processed but also by fast prosecution
surfaced during the past few years including a of many putative drug targets in parallel. The
resurgence of fragment-based screening tech- characterization of the human genome has
niques often guided by VS, the virtual screen- resulted in a large number of novel putative
ing of trillions of combinatorial compounds, a drug targets. Improved screening techniques
growing interest in the in silico screening of make it also possible to look at the entire gene
synthetically accessible compounds (also often families, at orphan targets, or at otherwise
know as de novo design), the increased use of uncharacterized putative drug targets. In this
machine learning techniques, the emphasis on environment of data explosion, rational de-
scaffold hopping, and combining ligand-based sign techniques experienced a comeback [9].
and structure-based approaches in synergy to While the growing number of solved protein
name a few. In addition to highlighting some structures at high resolution makes it possible
of these new developments in VS, we have to embark on structure-based design for many
revised the topics presented in the last edition drug targets, virtual screeningthe computa-
and include more recent examples for VS tional counterpart to high-throughput screen-
successes. inghas become a particularly successful
Virtual screening, sometimes also called in computational tool for lead finding in drug
silico screening, is an established branch of discovery.
medicinal chemistry that represents a fast Proprietary screening collections typically
and cost-effective tool for computationally hold approximately 106 compounds. This is
screening virtual compound databases in only a tiny fraction of the conceivable drug-
search for novel drug leads. The roots for like chemical space that is estimated to hold
virtual screening go back to structure-based more than 1060 compounds [10,11]. Even if
drug design and molecular modeling. In the this space is reduced to compounds that are
1970s, researchers hoped to find novel drugs comparably easy to access synthetically it is
designed rationally using a fast growing num- still an astronomical number. Virtual screen-
ber of diverse protein structures being solved ing attempts to suggest which compounds
1

Burgers Medicinal Chemistry, Drug Discovery, and Development, Seventh Edition,


edited by Donald J. Abraham and David P. Rotella
Copyright  2010 John Wiley & Sons, Inc.
2 VIRTUAL SCREENING

Figure 1. Number of ACS publications with the phrase virtual screening in the text (gray patterned bars)
or in the title (black bars).

should actually be synthesized or tested using ligand information only without includ-
against a drug target of interest. Large virtual ing any information of the target structure.
libraries of up to 1012 individual compounds or LBVS ranges from compound similarity meth-
1014 combinatorial chemistry compounds or ods for which only a single template molecule
1018de novo compounds reassembled from needs to be known to very sophisticated phar-
fragments [12] can be screened today using macophore elucidation and machine learning
a cascade of various screening tools to reduce methods using a multitude of information on
the chemical space. multiple compounds. The interested reader is
This chapter describes the different con- referred to recent reviews on the topic of simi-
cepts and tools used today for virtual screen- larity searching [21,26], pharmacophore eluci-
ing. They reach from the assessment of the dation methods [27,28], and machine learn-
overall drug-likeness of a small organic mo- ing [29,30] for virtual screening.
lecule to its ability to specifically bind to a
given drug target. Ligand-based as well as 2.1. Compound Similarity
structure-based approaches are described and
Compound similarity as basis for virtual
illustrated with specific examples. The ability
screening relies on the assumption that similar
of virtual screening to assist in scaffold hop-
compounds exhibit similar biological activ-
ping (identifying isofunctional molecular
ities. The degree to which this is true varies
structures with significantly different mole-
substantially, however. Martin et al. showed
cular backbones [13]) is also discussed. Addi-
from the analysis of a series of HTS experi-
tional topics include the triaging of VS hit sets,
ments at Abbott Laboratories that only ap-
the use of machine learning techniques, and
proximately 30% of compounds with Daylight
finding synergies between different VS ap-
fingerprint Tanimoto similarities >0.85 show
proaches. For additional information, the in-
activity against the same drug target [31].
terested reader is referred to a selection of
Compound similarity is generally measured
recent books and reviews on the subject of
in either structural terms or property terms.
virtual screening [1425].
Topological fingerprints such as Daylight fin-
gerprints [32] or atom pair descriptors [33],
2. LIGAND-BASED VIRTUAL SCREENING circular fingerprints such as SciTegic finger-
prints [34], and structural keys such as ISIS
Casting a wide net, ligand-based virtual keys [35] encode structural features of the
screening (LBVS) encompasses all techniques molecules. MolconnZ descriptors [36] present
to computationally screen for novel compounds a mixture of molecular connectivity and prop-
LIGAND-BASED VIRTUAL SCREENING 3

erty description, graph theoretical description, libraries [52]. The usefulness of 3D pharmaco-
and topological description. Properties such as phore fingerprints in finding new chemotypes
lipophilicity (calculated log P ), molecular through virtual screening has recently been
weight, the number of rotatable bonds, hydro- demonstrated [54]. The use of fingerprints to
gen-bond acceptors and donors, pKa values etc. describe chemotypes and biological effects are
form a group of property descriptors that can described in Chapter 12 of Volume 1.
link compounds that are structurally quite
different. Using such molecular descriptors is
quite popular for virtual screening purposes. 2.1.1. Example 1: Virtual Screening for 15-
Especially 2D fingerprints have performed Lipoxygenase Inhibitors Weinstein et al. re-
very well in virtual screening [3741]. How- ported the identification of a new 15-lipoxy-
ever, 3D descriptors have gained in popular- genase inhibitor from the virtual screen of a
ity [4244]. Jenkins et al. showed that 3D corporate database of compounds [55]. Start-
pharmacophore feature descriptors called FE- ing from the literature compound PD-146176
POPS can outperform standard 2D descriptors as a template, 4-point pharmacophore finger-
in a virtual screening protocol [43]. Similarly, prints [48] were generated for the template
Good et al. have demonstrated applying a compound as well as for all database com-
variety of 2D and 3D descriptors that 3D phar- pounds and then compared. Figure 2 shows
macophore fingerprints are better suited than the template compound as well as the result-
2D descriptors for scaffold hopping [45]. 3D ing virtual screening hit. The figure illustrates
descriptors used in the context of similarity how structurally different the virtual screen-
searching often capture pharmacophore fea- ing hit is from the template molecule. Some
tures such as hydrophobic or aromatic moi- resemblance of the indole moiety is noticeable.
eties, hydrogen-bond (HB) acceptors or donors, This example illustrates the scaffold hopping
and both negative and positive ionizable abilities of 3D pharmacophore fingerprints.
groups. 3D pharmacophore fingerprints store
the information of pairs, triplets, or quartets of
2.2. Pharmacophore Screening
these pharmacophore features for multiple
conformations in the form of binned distance Pharmacophore models are built more often
ranges [4653]. 3D pharmacophore finger- based on small-molecule information. They
prints are used more often now as descriptors are a preferred filter tool for virtual screen-
for QSAR as well as for the design of compound ing [56]. Feature-based pharmacophore eluci-

Figure 2. Discovery of a new class of 15-lipoxygenase inhibitors through virtual screening using a 3D
pharmacophore fingerprint similarity technique.
4 VIRTUAL SCREENING

dation algorithms such as HipHop [57] and definition was slightly modified by Peter Gund
other automated pharmacophore query to a set of structural features in a molecule
builders [58] allow for the fast generation of that is recognized at a receptor site and is
pharmacophore hypotheses. Today sophisti- responsible for that molecules biological
cated tool such as LigandScout [59] are avail- activity [69]. An example is shown in Fig. 3.
able that extract pharmacophore models from An X-ray structure of CDK2 complexed with
one or more proteinligand crystal structures the adenine-derived inhibitor H717 [7072]
making pharmacophore modeling a hybrid was solved. Interactions that are essential to
between ligand-based and structure-based substrate and inhibitor binding to the enzyme
modelings. Traditionally, however, pharma- will form the pharmacophore that should be
cophore modeling has been most often applied captured by inhibitors binding the same way
to the target class of GPCRs [60] where until H717 does. As shown in Fig. 3, the inhibitor
very recently structural information [6165] binds to the hinge region (Phe82 and Leu83)
of the target receptors was unavailable. Phar- through two hydrogen bonds, to a hydrophobic
macophore filters are generally much faster region through the cyclopentyl group, and to
than molecular docking approaches and can, Asp145 and Asn132 through hydrogen bonds.
therefore, greatly reduce the number of com- The pharmacophore that reflects these inter-
pounds screened further using more expen- actions has a hydrogen-bond donor and a hy-
sive docking applications. Today, pharmaco- drogen-bond acceptor pair that ensures bind-
phore features are guiding most 3D virtual ing to the hinge region, a hydrophobic group
screening approaches including docking in the that corresponds to the cyclopentyl binding
form of constraints or prefilters. Most time site, and a hydrogen-bond donor that ensures
consuming is still the generation of 3D con- binding to Asp145 and/or Asn132. Pharmaco-
formations for small moleculesa task that phore hypotheses can be generated using
can only be solved for a small number of structural information from ligands or from
molecules when precalculation and storage is the protein active site itself. Pharmacophore-
necessary. based drug design is further elaborated in
An interesting aspect of pharmacophores Chapter 11 of Volume 1.
in virtual screening is 3D pharmacophore di-
versity. While the diversity concept for virtual 2.2.2. Databases of Organic Compounds It is
compounds in general is not applicable be- much more practical to virtually screen data-
cause of the enormity of the chemical space, bases of compounds that have already been
diversity in pharmacophore space is a feasible synthesized than to operate in a completely
concept. Virtual libraries can therefore be op- virtual chemical space. If no information
timized to cover a wide pharmacophore about possible lead structures is available, if
space [66]. in-house efforts such as HTS were not success-
ful, and if cost and time are an issue, virtual
2.2.1. Introduction to Pharmacophores In screening of vendor databases and the subse-
1894, Emil Fischer proposed the lock and quent purchase of hits is the method of choice
key hypothesis to characterize the binding of for the generation of novel lead chemical
compounds to proteins [67]. This can be con- matter.
sidered the first attempt to explain binding of There is a wealth of databases of available
small molecules to a biological target. Proteins chemical matter including academic sources
recognize substrates through specific interac- such as the ZINC database [73] and the Na-
tions, forming the pharmacophore. Inhibitors tional Cancer Institute Database [74,75] as
intended to block substrate binding to the well as commercial vendor databases such as
protein, should also capture these interac- eMolecules [76] that holds approximately 10
tions. The first definition of the pharmaco- million commercially available compounds,
phore formulated by Paul Ehrlich was a mo- Chemspider [77] that is publicly available
lecular framework that carries (phoros) the holding approximately 20 million compounds,
essential features responsible for a drugs and the Available Chemicals Directory screen-
(pharmacon) biological activity [68]. This ing collection [78]. Some vendors maintain
LIGAND-BASED VIRTUAL SCREENING 5

Figure 3. Pharmacophore derived based on the interactions between human cyclin-dependent kinase 2 and
the adenine-derived inhibitor H717 as observed in the X-ray structure of the complex (PDB ID 1g5s). Dashed
lines highlight hydrogen-bonding interactions. HBD: hydrogen-bond donor, HBA: hydrogen-bond acceptor,
and the hinge region is linking the N- and C-terminal domains of a kinase.

databases of virtual compounds that can be appropriate assignment of charges, chirality,


custom synthesized. Enamines REAL data- and tautomerization states, filtering for dupli-
base of close to 10 million compounds is an cates, and unwanted structures.
example [79]. In addition, many vendors of The primary source of 3D experimental
chemicals also provide searchable databases structures of organic molecules is the Cam-
with 2D-structure and property information of bridge Structural Database [82]. Alterna-
their compounds. Sometimes, compounds are tively, 2D databases of organic compounds can
coded in linear representations such as the be converted into 3D databases using several
SMILES [80,81] notation. The SMILES codes software programs [83]. Each program starts
obtained using CACTVS and Daylight pro- with generating a crude structure that is sub-
grams for 4-benzyl pyridine and R-cocaine are sequently optimized using a force field. To
shown in Fig. 4. mention the most commonly used programs,
Managing compound collections from differ- CONCORD [84] applies rules derived from
ent sources is not trivial. Chemical databases experimental structures and a univariate
frequently contain incorrect structures. Care- strain function for building an initial structure;
ful treatment and curation of compound collec- CORINA [85] generates an initial structure
tions is therefore important to assure optimal using a standard set of bond lengths, angles,
virtual screening outcomes. This includes the and dihedrals and rules for cyclic systems.
6 VIRTUAL SCREENING

a pharmacophore capturing these common


features should be able to identify from a
database novel compounds that bind to the
same site of the protein as the known com-
pounds do. The process of deriving a pharma-
cophore, called pharmacophore mapping, con-
sists of three steps: (1) identifying common
binding elements that are responsible for bio-
logical activity, (2) generating potential con-
formations that active compounds may adopt,
and (3) measuring distances between binding
pharmacophore elements in each conforma-
tion. To build a pharmacophore based on a set
of active compounds, two methods are usually
applied. One method generates a set of mini-
mum energy conformations for each ligand
and searches for common structural features.
Another method considers all possible confor-
mations of each ligand to evaluate shared
Figure 4. Examples of SMILES notations for two orientations of common functional groups.
compounds obtained using CACTVS and Daylight. Analyzing many low energy conformers of
active compounds can suggest a range of dis-
tances between key groups that will take into
2.2.3. 2D Pharmacophore Searching Differ- account the flexibility of the ligands and of the
ent strategies are pursued to search a 2D protein. This task can be performed manually
database. An exact structure search is applied or automatically.
to find out if a compound is present in the Pharmacophore generation through con-
database. Substructure searches are used to formational analysis and manual alignment
find larger molecules with the query im- is a time-consuming task especially when the
bedded. Superstructure searches are used to list of active ligands is large and the elements
find smaller molecules that are embedded in of the pharmacophore model are not obvious.
the query. Similarity searches are used to find There are several software products and algo-
compounds that are structurally related to the rithms such as HipHop [86], HypoGen [87],
query. Some searching software combines si- PHASE [88], MOE [89], and older tools such as
milarity with substructure or super structure Disco [90], Gasp [91], Flo [92], APEX [93] that
searches. Flexible match searches are used for can automatically generate potential pharma-
identifying compounds that differ from the cophores from a list of known inhibitors. A
query structure in user specified ways. In collection of views on pharmacophore elucida-
addition, isomer, tautomer, parent molecule tion can be found in Chapter 11 of Volume 1 as
searches may be done to find in a database well as in the book by G uner [94]. The pro-
isomers, tautomers or parent molecules of the grams use algorithms that identify common
query. pharmacophore features in the training set
molecules that are ranked with a scoring func-
2.2.4. Ligand-Based 3D Pharmacophore tion. Common pharmacophore features in-
Generation Ligand-based pharmacophores clude: hydrogen-bond donors, acceptors, nega-
are typically used when the crystallographic, tive and positive charges or ionizable centers,
solution structure, or modeled structure of a and surface accessible hydrophobic regions
protein cannot be obtained. When a set of that can be aliphatic, aromatic, or nonspecific.
active compounds is known and it is hypothe- Most of the programs incorporate ligand flex-
sized that all compounds bind in a similar way ibility when generating pharmacophores
to the protein, then common groups should since compounds may not bind to the protein
interact with the same protein residues. Thus, in the minimum energy conformation.
LIGAND-BASED VIRTUAL SCREENING 7

2.2.5. Example 2: Virtual Screening for bonyl oxygen, and an aromatic ring that can be
Dopamine Transporter Inhibitors The dopa- defined by the position of its center. A systema-
mine transporter (DAT) is a 12 transmem- tic conformational search followed by confor-
brane helix protein that plays a critical role in mer clustering was performed to obtain all
terminating dopamine neurotransmission by possible conformations these compounds could
taking up dopamine released into the synapse. assume when bound to DAT. The resulting 3D
DAT is involved in several diseases such as pharmacophore model (Fig. 5) was used to
drug addiction and attention deficit disor- search the NCI 3D-database [75] of 206,876
der [95]. DAT inhibitors share one or more compounds using the program Chem-X [99]. A
common 3D pharmacophore models [9698]. total of 4094 compounds, 2% of the database,
A pharmacophore model was derived based on were identified as hits. After further reduction
two known potent DAT inhibitors R-cocaine using molecular weight, structural novelty,
and WIN-35065-2 (Fig. 5) [98]. The common simplicity, diversity, and hydrogen-bond ac-
binding elements of these compounds are a ceptor nitrogen filters, 70 compounds were
ring nitrogen that may be substituted, a car- selected for testing in biochemical assays.

Figure 5. Pharmacophore proposed for identifying DAT inhibitors. The pharmacophore was obtained based
on two known DAT inhibitors, R-cocaine and WIN-35065-2. Distance ranges between pharmacophore points
were obtained through systematic search of all possible conformations that the two compounds may adopt
when bound to DAT.
8 VIRTUAL SCREENING

Forty-four compounds displayed more than ligand costructure, biochemical data can be
20% inhibition at 10 mM in a [3H]mazindol used for identifying key residues that are im-
binding assay. Figure 6 shows selected DAT portant for substrate and/or inhibitor binding.
inhibitors found in the virtual screen. Most ligands bind to proteins through non-
bonded interactions such as hydrogen-bond
2.2.6. Receptor-Based 3D Pharmacophore and hydrophobic interactions. Programs such
Generation If the 3D structure of a receptor as LUDI [100] or GRID [101] can use the
is known a pharmacophore model can be de- structure of the protein to generate interac-
rived directly from the receptor active site. tion sites or grids to characterize favorable
The presence of a cocrystallized ligand in the positions that ligand atoms should occupy.
active site can greatly help with the identifica- Since proteins are not rigid, Carlson et al. [102]
tion of the pharmacophore. The program Li- proposed using molecular dynamics simula-
gandScout generates pharmacophore hypoth- tion for generating a set of diverse protein
eses in a fully automated way from protein conformations to include protein flexibility in
ligand complexes [59]. In the absence of a the pharmacophore development. In this case,

Figure 6. Selected DAT inhibitors identified from the NCI database through virtual screening.
LIGAND-BASED VIRTUAL SCREENING 9

distance ranges between pharmacophores are 2.2.8. 3D Pharmacophore-Based Virtual


obtained by examining several conformations Screening Techniques Pharmacophore-based
of the protein. This technique is similar to the virtual screening is the process of matching
one used for the generation of flexible phar- atoms and/or functional groups and the geo-
macophores based on active compounds when metric relations between them to the pharma-
several conformations of compound(s) are con- cophoric query. Once a pharmacophore model
sidered for pharmacophore mapping. is generated, virtual screening of a database
against such a query model becomes straight-
2.2.7. Example 3: Virtual Screening for Novel forward. Programs that perform pharmaco-
PPAR Ligands To illustrate how receptor- phore-based searches include Catalyst [104],
derived pharmacophore models are used in UNITY [105], ROCS [106], PHASE, and
virtual screening we discuss here a recent ex- MOE. There are also some Web-based appli-
ample of identifying novel PPAR ligands [103]. cations that can perform pharmacophore
A series of LigandScout models were generated searches [107,108]. Usually pharmacophore-
based on agonist-PPAR complex structures based searches are done in two steps. (1) The
from the PDB. In addition, ligand-based Hi- software checks if the compound contains the
pHop models were generated. The pharmaco- atom types and/or functional groups pre-
phore models were evaluated using 357 PPAR scribed by the pharmacophore. (2) It checks
ligands and 12,775 PPAR decoys. The most if the spatial arrangement of these elements
selective PPARa agonist model was obtained matches the query. Since most small mole-
from LigandScout applied to the PPARa com- cules are flexible, multiple conformations
plex structure with PDB ID 1k7l. The best have to be checked. These conformations can
PPARd agonist model was generated likewise be either precomputed and stored (such as in
using the PDB complex 1gwx. The best PPARg MOE and Catalyst) or calculated on the fly.
agonist model was obtained from a HipHop Precalculating conformations will speed up
model. More than one million compounds were the virtual screen tremendously. Therefore,
screened in Catalyst [104] using all three this is the method of choice for databases of
PPAR-agonist models. A total of 14,311 virtual existing compounds that are screened repeat-
screening hits were obtained. Filtering by phy- edly. On the other hand, for larger virtual
sicochemical properties as found in 321 PPAR libraries the data handling requirements be-
agonists retained 5898 hits. 3D shape screen- come too complex. Therefore, generating con-
ing, electrostatic similarity to known PPAR formations on the fly and discarding them
agonists, diversity clustering, visual inspec- after the pharmacophore query has been
tion, and elimination of compounds with re- checked remains an important option for
ported PPAR activity led to 21 compounds of pharmacophore screening programs. Gener-
which 10 were purchased and tested. Several ating a representative ensemble of conforma-
novel PPAR agonists with varying PPAR se- tions is essential for the success of a virtual
lectivity were discovered (Fig. 7). screen using a 3D pharmacophore query.

Figure 7. Novel PPAR agonists identified through structure-based pharmacophore screening.


10 VIRTUAL SCREENING

Many ligands do not adopt a minimum energy tive modeling; they are based on linear as well
conformation as the binding pose [109111]. as nonlinear methods. In all approaches, de-
Therefore, continued research is necessary to scriptors serve as independent variables, and
represent a conformational ensemble of a li- biological activities as dependent variables.
gand including higher energy conformations To establish the ability of a QSAR model to
with a limited number of conformations. predict biological activities the model needs to
be thoroughly validated. This is arguably the
2.3. Application of Machine Learning for most important part of QSAR model develop-
Virtual Screening ment [119,120]. Most QSAR modeling meth-
ods implement a leave-one-out (LOO) (or
2.3.1. QSAR Model Generation and Validation
leave-some-out) cross-validation procedure.
Machine learning is a form of ligand-based VS
The outcome of this procedure is a cross-vali-
that builds predictive quantitative struc-
dated R2 (q2), which does not always guaran-
tureactivity relationship (QSAR) models
tee the predictive ability of the model [119,121]
that are based on available assay data. Com-
illustrating the necessity for thorough model
pounds are described with various molecular
validation. A widely used approach to estab-
descriptors to provide numerical representa-
lish the models robustness is called y-rando-
tions of a compounds properties. There are
mization (randomization of the dependent
numerous types of 2D and 3D descriptors that
variable, that is, biological activities). Y-ran-
are commonly used but will not be discussed in
domization consists of repeating the calcula-
detail here [112117]. Particular descriptors
tion procedure with randomized activities and
and machine learning methods are chosen
subsequent probability assessment of the re-
because they are believed to be linked or di-
sults. Often, it is used along with the LOO
rectly correlated to the property being mod-
cross-validation to ensure the model is not
eled. The process for model generation, vali-
based on a chance correlation.
dation, and application to virtual screening is
shown in Fig. 8 (figure adapted from a recent
2.3.2. Machine Learning Algorithms Commonly
review by Tropsha and Golbraikh [118]).
Used in QSAR Models Collections of vali-
Many different algorithms and computer soft-
dated QSAR models can be useful tools for VS
ware are available for the purpose of predic-

Figure 8. Workflow for generating and validating QSAR models for virtual screening
LIGAND-BASED VIRTUAL SCREENING 11

with the aid of a model applicability domain. logical property class from another. While the
Here, we include a short synopsis of the most Bayesian classifier algorithm has not per-
prominent machine learning techniques in formed as well as more sophisticated ap-
addition to highlighting their applications in proaches [124], it does handle large training
VS. Several recent success stories have been sets very well.
reported in the literature and a few examples Decision trees (or forests) incorporate the
of popular machine learning techniques are simplest form of variable selection and can be
listed in Table 1. Each method has its own considered as a set of Boolean functions. De-
advantages and disadvantages that should be scriptors that capture molecular features of
understood to select the best approach for a the training compounds are systematically
particular virtual screen. added to a decision tree model one at a time
The first approach listed in the table, a self- until training compounds with different bio-
organizing map (SOM), is quite simple and logical properties are adequately separated.
makes it easy to visualize groups of com- This approach allows for easy determination
pounds with the same biological properties. of the chemical features most relevant to the
A data set of compounds is mapped on to a 2D target biological property. The resulting rules
grid such that most similar compounds are including specific property ranges from the
grouped together. Compounds in the vicinity data partitioning can then be used to design
of compounds with a desired biological prop- new molecules. Comparison studies have
erty are considered potential hits in a virtual shown that decision tree methods slightly out-
screen. As a disadvantage the SOM approach perform the Bayesian classifier; however,
has a high false-positive rate. SOMs have been other more advanced approaches show even
used recently to identify several purinergic higher enrichment rates in a virtual
receptor agonists [122]. screen [124].
Binary QSAR uses all compounds in the Mathematical functions are often used to
training set to predict the biological property correlate descriptor values with a biological
of test compounds in a virtual screen rather property. Multiple linear regression estab-
than the most similar training compounds lishes a linear correlation between dependent
only. While not providing an image of the and independent variables. A popular exten-
training data, the binary QSAR approach is sion, called partial least squares (PLS), helps
fast and like SOMs it works well if the training simplify the model optimization to allow for
data are highly similar to the test compounds large training sets. Variable selection techni-
being screened. Compounds that are signifi- ques are commonly added on top of the PLS
cantly different from the training set are not algorithm to optimize the choice of descrip-
expected to be predicted accurately and are tors. This approach has shown success in en-
commonly missed in a virtual screen [123]. riching a virtual database for various GPCR
While SOMs and binary QSAR techniques ligands [125]. The obvious drawback is that
do not require a prebuilt model to perform a there is not always a linear correlation be-
virtual screen most other machine learning tween the property modeled and the descrip-
methods do. These techniques form correla- tors describing the data set. The artificial
tions between training data and descriptors neural network (ANN) and support vector
that describe training compounds to predict a machine (SVM) approaches try to overcome
biological property for a virtual set of com- this problem by building nonlinear models.
pounds. The Bayesian classifier requires a Therefore, ANN and SVM have become pop-
prebuilt model. It distinguishes itself from ular tools for model building and virtual
binary QSAR through identifying specific de- screening. In a side by side comparison, both
scriptors that best separate compounds with a ANN and SVM showed similar enrichment
desired biological property from others. The rates for several virtual screens [124].
search for pertinent descriptors is called vari- The k-nearest neighbors (kNN) approach
able selection and helps the model eliminate does not require the use of a mathematical
descriptors that are not relevant to the current function to split one property class from an-
problem and cloud the separation of one bio- other. This behavior can be very useful if the
Table 1. Overview of Popular Regression and Learning Algorithms Used for Virtual Screening
Classification Regression Variable Selection Explanatory Virtual Screening
SOM Yes, useful for No Various techniques Yes, when the Identified purinergic
visualizing global show success pertinent dependent receptor antagonists from
data trends extracting pertinent variables are optimized a virtual combinatorial
dependent variables by library [122]
grouping compounds in
the same target class
together
Binary QSAR Yes No No No Showed superior
enrichment rates when
compared to Bayesian
Classifiers and PLS [123]
Bayesian classifier Yes Yes Descriptors are Yes if the significance Performed poorly com-
weighted based on how of each descriptor can pared to SVM, kNN, ANN,
well each divides the be extracted and Decision Trees [124]
training data
Decision trees Yes No Descriptors that best Variables used in the Slightly outperformed a

12
divide one class from tree(s) suggest activity Bayesian classifier in a
another are used to dependency comparison study [124]
separate the data
PLS variants Yes Yes Variable selection Yes, when a variable Ligands for various GPCR
techniques are com- selection technique is targets were successfully
monly added above incorporated enriched from a test
PLS model building database [125]
ANN Yes Yes Performed internally No Comparable enrichment
rates in a direct
comparison to SVM and
kNN [124]
SVM Yes Yes Performed internally Yes, if the weights of Identified previously
each descriptor are ex- characterized Dopamine
plicitly solved. D1 Inhibitors and
suggested new hits [127]
kNN Yes Yes Commonly a genetic Descriptors selected by Identified several
algorithm or simulated multiple models imply Anticonvulsant
annealing is used relevance to the target compounds that were
property experimentally
confirmed [126]
LIGAND-BASED VIRTUAL SCREENING 13

problem is complex. Compounds in a virtual and cannot account for unpopulated descrip-
screen are predicted based on known activities tor regions within the convex hull. Probability
of the most similar training compounds. Simi- density approaches are gaining in popularity
larity between compounds is calculated using because, in theory, they can account for empty
only a small set of descriptors that are opti- descriptor space within a training set and are
mized during the model building (this optimi- not dependent upon any one training set re-
zation can be quite slow for large data sets). ference point.
When used in a virtual screen, kNN shows Thus far, the most successful applicability
database enrichment rates similar to those domain approaches measure the similarity to
obtained with both ANN and SVM [124]. the training set via distance and/or the num-
For an in-depth view on QSAR the interested ber of training set nearest neighbors within a
reader is encouraged to review Chapter 13 in predefined cutoff. Merck researchers found
Volume 1. these approaches to be independent of the
machine learning approach being used [128].
2.3.3. The Applicability Domain Concept For this reason, we will use the remainder of
With all predictive models, the question of this section do discuss a traditional distance
their predictive reach is central to their suc- based applicability domain.
cess. Therefore, the concept of quantitatively To measure distance similarity, each com-
assessing the applicability of a predictive mod- pound is represented by a point in the M-
el needs to be established. Applicability do- dimensional descriptor space (where M is the
main techniques have been applied success- total number of descriptors in the descriptor
fully for instance for kNN [126] and SVM [127]. pharmacophore) with the coordinates Xi1, Xi2,
However, limitations lie in the fact that a very . . ., XiM, where Xis are the values of individual
narrow applicability domain only identifies descriptors. The molecular similarity between
potential compound hits that are highly simi- any two molecules is characterized by a Eu-
lar to the training set compounds. Such hits clidean distance (sometimes weighted) be-
could have possibly been found using simple tween their representative points. In the
similarity approaches as well. weighted case, the contribution of each de-
Technically, a QSAR model can predict the scriptor to the overall similarity is weighted
target property for any compound for which based upon correlations found within the
chemical descriptors have been calculated. training set model. The contribution of each
Since the training set models are developed descriptor to the pairwise distance is weighted
by interpolating activities, an applicability by a descriptors relevance to the model, wk .
domain (i.e., similarity threshold) is typically This provides a more accurate representation
introduced to avoid making predictions for of what compounds in the training set the
compounds that differ substantially from the model believes to be most similar. The Eucli-
training set molecules [120]. In general, there dean distance di,j between two points i and j
are four different approaches used to deter- (which correspond to compounds i and j) in M-
mine whether a test compound is similar to dimensional space can be calculated as fol-
the training set within a multivariate descrip- lows:
tor space: range, distance, geometrical shapes,
and probability density. Range approaches
use the min/max descriptor values within the v
uM
training set to define an M-dimensional rec- uX
tangle surrounding the training set. This ap- dij t wk  Xik Xjk 2 1
k1
proach works well only if the training set is
evenly distributed. Distance approaches de-
fine a distance cutoff to the training set as a Compounds with the smallest distance be-
boundary often resulting in an M-dimensional tween one another are considered to have the
sphere that surrounds the training set. Geo- highest similarity. In parallel, the unweighted
metrical shapes such as a convex hull are Euclidean or Tanimoto distances can be used
difficult to compute in large descriptor spaces as a measure of overall similarity. The dis-
14 VIRTUAL SCREENING

tances (similarity) of compounds in the train- float activity prediction to the HTS com-
ing set are then compiled to produce an applic- pounds corresponding to their likelihood of
ability domain threshold, DThreshold, calcu- being active. These predictions were averaged
lated as follows: for each database compound and sorted based
on the predicted likelihood an HTS compound
DThreshold Z  DTrain 2 is active. This ranking allows for the selection
of the top X% of a database to enrich a large
Here, DThershold is a distance threshold for a percentage of active compounds within a
test compound, DTrain is the minimum dis- smaller compound collection.
tance required to predict all compounds in the In Fig. 9, the database enrichment plot for
training set in a leave-one-out fashion, and Z kScore models developed with Ghose and
is an arbitrary parameter to control the radii Crippen descriptors [130] is shown with and
of the applicability domain. The default value without the use of an applicability domain.
of this parameter is 1.0 that would classify The figure demonstrates that the addition of
each training compound within the applicabil- an applicability domain (AD) can significantly
ity domain if it was individually removed and improve the results of a virtual screen.
placed into a test set. Thus, if the pairwise
distance exceeds DThreshold for an external
3. STRUCTURE-BASED VIRTUAL
compound to all training compounds, the pre-
SCREENING
diction is considered unreliable. This assumes
the compounds are evenly distributed in the
In direct analogy to high-throughput screen-
training set descriptor space. A modified vec-
ing, docking, and scoring techniques are ap-
tor Z0 can also be used to account for changes in
plied to computationally screen a database of a
the data density. For example, one may wish
million compounds and sometimes more
to decrease the applicability domain in por-
against a specific target protein. Computa-
tions of the descriptor space where training
tional methods that predict the three-dimen-
compounds are very sparse. More insight can
sional structure of a proteinligand complex
be gained from Chapter 13 of Volume 1.
are often referred to as molecular docking
approaches [131]. Protein structures can be
2.3.4. Example 4: Enrichment of an HTS
employed to dock ligands into the binding site
Collection for Kinase Inhibitors When large
of the protein and to study their interac-
databases of compounds are screened, often
tions [132]. For virtual screening, the crucial
only small portions of the hit set can be or-
task is the fast and reliable ranking of a da-
dered for testing. Therefore, hits need to be
tabase of putative proteinligand complexes
prioritized. This provides a means to select the
according to their binding affinities. Docking
top X% of hits chosen for testing or additional
programs used today (Table 2) can facilitate
filtering. In this example, the kScore algo-
this task in less than 1 min per processor and
rithm, a hybrid of kNN and SVM, was used
molecule. Energy functions that evaluate the
to develop several QSAR models of kinase
binding free energy between protein and li-
inhibition to enrich a large HTS collec-
gand sometimes employ rather heuristic
tion [129]. The training set contained approxi-
terms. Therefore, those functions are more
mately 10,200 compounds of which approxi-
broadly referred to as scoring functions.
mately 17% inhibited a specific kinase. The
remaining training set compounds are known
3.1. Protein Structures and Protein
inhibitors of other kinases, but showed no
Flexibility
activity against the kinase in question.
The validated models were generated and A 3D-protein structure of the receptor at atom-
used to predict approximately 775,000 HTS ic resolution is necessary to start a protein
compounds of which approximately 3700 were ligand docking experiment. Homology mod-
found to be experimentally active kinase in- els [133] and pseudo-receptor models [134,135]
hibitors corresponding to an approximately represent alternatives. The chances of a suc-
0.5% hit rate. Each kScore model assigned a cessful virtual screen very much depend on the
STRUCTURE-BASED VIRTUAL SCREENING 15

Figure 9. Database enrichment plot of kinase inhibitors rank-ordered by kScore models using Ghose and
Crippen descriptors.

Table 2. Selection of Available ProteinLigand Docking Software for Structure-Based Virtual


Screening.
Docking Program Docking/Sampling Method Scoring Method
Glide (www.schrodinger.com) Rigid protein; multiple Empirical scoring including
conformation rigid docking; penalty term for unformed
grid based energy evaluation hydrogen bonds; force field
scoring
DOCK (www.cmpharm.ucsf. Rigid protein; flexible ligand Force field scoring; chemical
edu/kuntz/dock.html) docking (incremental scoring, contact scoring
construction)
FlexX (www.biosolveit.com) Rigid protein; flexible ligand Empirical scoring intertwined
docking (incremental with sampling
construction)
DockVision (www.dockvision. Monte Carlo, genetic Various force fields
com) algorithm
DockIT (www.daylight.com/ Ligand conformations PLP, PMF
meetings/emug00/Dixon) generated inside binding
site spheres using distance
geometry
FRED (www.eyesopen.com/ Exhaustive sampling; rigid ChemScore, PLP, Screen-
fred.html) protein, multiple conforma- Score, and Gaussian shape
tion rigid docking scoring
LigandFit (www.accelrys. Monte Carlo LIGSCORE, PLP, PMF,
com) LUDI
eHITS (www.simbiosys.com) Exhaustive sampling of all Customizable
possible binding modes
Gold (www.ccdc.cam.ac.uk/ Genetic algorithm Soft core vdW potential and
prods/gold/) hydrogen bond potentials

For a Comprehensive List, Please See the Review of Moitessier et al. [171]
16 VIRTUAL SCREENING

quality of the available structure as well as the therefore, focuses on terms deemed relevant
flexibility and adaptability of the ligand bind- by the particular author. The result is a wealth
ing site [136138]. Optimally, the crystal of different scoring functions being promoted
structure

resolution should be at least by different researchers focusing the demon-
2.5 A [139]. Small changes in structure can stration of usefulness almost exclusively on
drastically alter the outcome of a computa- empirical data comparisons rather than the
tional docking experiment [140]. conceptual improvement of the underlying
To account for induced fitting effects upon methods. This fact has led to an explosion of
proteinligand binding in molecular docking literature that compares the merits of differ-
experiments different strategies have been ent scoring functions in the context of virtual
employed. Explicit side-chain flexibility can screening; [157172] the most recent and com-
be invoked in AutoDock [141,142] and GOLD prehensive review was provided by Moitessier
for a limited number of residues [143]. FlexE et al. [171]. Before we go into discussing the
allows for the use of alternative composite various contributions to scoring function eva-
protein structures derived from an ensemble luations, we start with some general remarks
of protein structures [144]. A series of promis- about scoring functions.
ing approaches have been explored, but have Scoring functions need to be able to enrich
not found their way into widely used commer- active compounds in the upper ranks of a
cial applications yet. They include among virtual screening hit list. The question is what
others the use of normal modes [138], elastic is a reasonable enrichment? The answer ob-
network theory [145], constrained geome- viously depends on the number of actives and
try [146], constructing receptor ensembles inactives in the database, as well as the num-
using rotamer libraries [147], multiple crystal ber of compounds that can ultimately be tested
structures [148], alanine scanning [149,150], in a biological assay. The number of com-
and cross-docking [151]. pounds that can be tested is typically in the
While taking induced fitting into account is area of 102103 compounds depending on the
undoubtedly useful in exploring the relevant assay throughput and compound availability
binding modes of small molecules in protein (if a virtual screen is performed on vendor
binding sites it is less obvious how beneficial it compounds, only a limited number of com-
actually is for virtual screening applications. pounds can be purchased because of budget
While the chances increase to dock the desired constraints). If one considers a typical virtual
active ligands, increased protein flexibility screening setup with 1 million individual com-
also increases the potential for false positives. pounds being screened in silico and one as-
Therefore, induced fitting plays a limited role sumes to have 100 actives among them there is
in todays structure-based virtual screening only an approximately 4.7% chance to find at
setups. Protein flexibility is also the subject of least one active compound among 500 com-
Chapter 19 in Volume 2. pounds tested. If a scoring function is able to
choose an active compound with a 10-fold
higher chance than a false positive (inactive
3.2. Scoring for Virtual Screening
compound with high score) the chances of
Structure-based virtual screening methods finding at least one active among the top-
such as high-throughput ligand docking into ranked 500 virtual screening hits is still below
a receptor site rely on fast scoring functions for 40%. It takes an enrichment of 100-fold to
ranking large databases of compounds accord- almost guarantee (99%) at least one hit under
ing to their predicted complementarity to a the given hypothetical scenario. The situation
target protein of interest [132,152155]. No is even worse considering that more and more
scoring function attempts to rigorously cap- often screening (including virtual screening)
ture all relevant contributions to a ligand collections are loaded with combinatorial com-
protein binding free energy (even sophisti- pounds that are highly structurally corre-
cated and time-consuming methods such as lated. If a combinatorial library is enriched
free energy perturbation have difficulties with actives, one may experience a higher
doing so [156]). Each scoring function design, confirmed hit rate of a virtual screen; however,
STRUCTURE-BASED VIRTUAL SCREENING 17

all hits will be of the same chemotype and, inspection along with diversity considerations
therefore, the gain in information may be should be the final selector for compounds to
limited. On the other hand, in the more likely be tested.
case that the library does not contain actives, Following these introductory remarks on
it dilutes the database that contains actives scoring functions, we will review the current
and most likely contributes overproportion- state of scoring functions as well as efforts to
ally to the number of false positives thereby increase the effectiveness of scoring through
making it harder to find true actives in a consensus combination of these functions.
virtual screen. This means that in the imper-
fect world of scoring functions not only the 3.2.1. Force Field Scoring Using nonbonded
target and the actual choice of the scoring terms of force fields has been the first choice
function make a significant difference in the for scoring functions in early docking pro-
performance of the virtual screen but also the grams such as DOCK [179]. When the concept
composition of the database of compounds of proteinligand docking was born, nobody
that is screened. There are indeed quite dras- really thought of using this tool in the context
tic differences observed in enrichment rates of of virtual screening. The goal was rather to
the same actives in virtual screening hit lists establish binding hypotheses for known or
depending on the decoys being used as demon- putative ligands and learn from the resulting
strated for instance for CDK2 and SRC kinase models. Force field-based methods with their
virtual screening studies involving decoys emphasis on eletrostatics and sterics are a
from different databases [173]. reasonable choice for finding good binding
With few exceptions scoring functions modes because they will favor the formation
rarely exhibit enrichment rates high enough of hydrogen bonds and salt bridges and avoid
to be reliable primary decision makers for van der Waals clashes. However, for binding
selecting virtual screening hits. Scoring func- free energies, hydrogen bonds may not add
tions exhibit various scoring artifacts that can much to the overall binding energy because of
be studied in detail and used to improve scor- competing desolvation of the ligand. Also, in
ing function design [174], but nevertheless some cases solvation effects alone are the
limit their robustness. Therefore, it is vital to major driver of ligand potency [180]. There-
use all other information one can assemble fore, force field-based methods have had less
about the target protein or the known active importance in recent developments of scoring
ligands, especially the interaction to known functions aiming at improving the ranking of a
ligands to filter virtual screening hits sug- large ligand database according to their puta-
gested by scoring functions. Especially, infor- tive binding free energy against a target pro-
mation about specific pharmacophore interac- tein of interest.
tions or the preference for certain binding Notwithstanding the above remarks, force
subpockets [175] needs to be translated into field-based scoring functions can be quite suc-
relevant filters or constraints for virtual cessful in virtual screening. Especially, if soft
screening/scoring runs. For instance, prefer- Lennard-Jones potentials are used attenuat-
ences for hinge binders in kinases [173] or ing repulsive terms (as for instance in the
metal ion interactions in metalloprotei- commercial software package GOLD) [139]
nases [159] can be defined. The point is illu- the hit rates of virtual screens can improve
strated in a recent report in the literature that as shown for virtual screens against T4 lyso-
the application of the same scoring function zyme and aldose reductase [181]. To kineti-
with pharmacophore-enhanced docking/scor- cally access the binding pocket, Vieth et al. has
ing increases the enrichment of actives in also concluded in an analysis of several para-
virtual screening hits for BACE1 [176] when meters to be optimized for force field scoring
applying FlexX [177] and FlexX-Pharm in that a soft vdW potential is needed [182]. Also,
comparison [177,178]. Also, one should keep the use of containment potentials is useful as
in mind, that the experienced chemist is most additional constraints forcing a ligand to dock
often the best scoring function. Once a hit list in a certain region [183]. In addition to includ-
is reduced to less than 1000 compounds, visual ing soft vdW potentials, considerable effort
18 VIRTUAL SCREENING

has been put into including solvation effects Ionic and nonionic hydrogen bonds are
into docking [184188]. For virtual screening combined into one term. The deviation from
purposes, however, these methods have been the ideal hydrogen-bond length of 1.85 A
less used so far partly because they tend to be (HO/N) is given by Dr. The deviation from
slower. the ideal angle
P of 180  is given by
P Da. The
lipophilic iL f riL and metal aM f raM
3.2.2. Empirical Scoring Regression methods terms are calculated as simple contact terms.
have been in vogue for almost as long as com- The flexibility penalty of ligands for frozen
putational chemistry has been around [189]. rotatable bonds is calculated as
As a natural extension in the area of structure-
 
based methods, regression-based functions 11 X Pnl r Pnl
0
r
Hrot 1 6
have been derived using properties of protein- Nrot r 2
ligand interactions such as hydrogen bonds,
buried surface area, metal contacts, and lipo- where Nrot is the number of frozen rotatable
philic contacts to name a few. These functions bonds and Pn1(r) and Pnl 0
r are the percen-
are then trained based on known binding affi- tages of nonlipophilic heavy atoms on either
nities of ligands for which the 3D structures of side of the rotatable bond. ChemScore has
proteinligand complexes are known at atomic been trained on 82 proteinligand complexes
resolution [190194]. The popular empirical yielding DG0 5.48 kJ/mol, DGhbond 3.34
scoring function ChemScore has been designed kJ/mol, DGmetal 6.03 kJ/mol, DGlipo
by Eldridge et al. [192,195]. To illustrate the 0.117 kJ/mol, DGrot 2.56 kJ/mol with a stan-
concept of empirical scoring, the ChemScore dard error of 8.68 kJ/mol. As with all regres-
function is outlined here in some detail. sion-type functions, the predictability of
X ChemScore becomes less reliable as the func-
DGbinding DG0 DGhbond g1 Drg2 Da tion extrapolates further from proteinligand
iI
X complexes in the training set. Lead Finder is
DGmetal f raM the the most recent example of an improved
aM empirical scoring function [196]. Lead Finder
X
DGlipo f riL DGrot Hrot trained the same function differently for the
iL three purposes of optimized docking pose pre-
3 diction, free energy estimation, and virtual
screening.
where the various DG coefficients are opti-
mized
P using multiple linear regression. 3.2.3. Knowledge-Based Scoring Statistical
iI g 1 g 2 is a hydrogen-bond term calculated preferences for proteinligand atom pair in-
for all ligand atoms i and protein atoms I. teractions can be derived from large databases
8 of proteinligand complexes assuming that

>
> 1 if Dr  0:25A the deposited structures represent assemblies
>
< 1Dr0:25   of high affinity and structural accuracy. For
g1 Dr if 0:25A < Dr  0:65A
> 0:4 virtual screening purposes a series of knowl-
>
>
:  edge-based scoring functions were developed
0 if Dr > 0:65A
including SmoG [197], PMF scoring [198],
4 DurgScore [199], and BLEEP [200]. Knowl-
edge-based scoring functions have been exten-
8 
sively reviewed by Gohlke and Klebe [201].
>
> 1 if Da  30 Therefore, we focus here on explaining the
>
< 1Da30 
g2 Da if 30 < Da  80 concept of knowledge-based screening using
> 50
>
>
PMF scoring as example.
: 
0 if Da > 80 The PMF scoring function [198,202204]
5 has recently been updated exploiting the ex-
STRUCTURE-BASED VIRTUAL SCREENING 19

ponential increase in publicly available crys- PMF99. Figure 11 shows for a set of 170
tal structures of proteinligand complexes. proteinligand complexes assembled by Wang
The new version of PMF, called PMF04, has et al. [208] how PMF99 and PMF04 compare.
been derived from 7152 proteinligand com- While there is no substantial overall improve-
plexes from the PDB. Compared to the original ment in correlating PMF scores to experimen-
PMF99 score (697 complexes) this is approxi- tal binding affinities the figure shows how
mately a 10-fold increase. This increase in the some outliers have been fixed in PMF04. As
statistical knowledge based for PMF scoring an example, elastase complexed with FTAP
has resulted in new potentials being intro- has been highlighted (7est). Because of the
duced for metal ions. Also, while similarly to statistically insufficient potentials for fluorine
SmoG halogen-containing potentials have atom types one of trifluoromethyl groups of
mostly been ignored in PMF99, for many halo- FTAP has scored very badly in PMF99. The
gen atom-containing proteinligand atom combined score of the three fluoro atoms
pairs PMF04 now contains statistically sig- has been 45.0 (Fig. 12). In PMF04 this problem
nificant potentials. has been rectified because of the better poten-
Here is a brief outline of the PMF scoring tials for halogens. The combined score of the
function. The PMF score is defined as a sum three fluoro atoms is now -0.1. The same im-
over all proteinligand atom pair interaction provement in scoring can be observed for me-
free energies Aij(r) at distance r, tal binding proteinligand complexes (data
not shown). For other, statistically more es-
tablished atom types, the differences in
X
PMF score Aij r 7 PMF99 and PMF04 scores are minimal
(Fig. 12).
kl
r < rijcutoff
3.2.4. Hybrid and Alternative Scoring Methods
where kl is a ligandprotein atom pair of type There is a wealth of other scoring functions
ij. rijcutoff is the distance at which atom pair being introduced ranging from steric fit,
interactions are truncated. The Aij(r) are cal- surface, and pharmacophore complementar-
culated as ity [209,210] to empirically motivated scores
such as PLP (piecewise linear poten-
tial) [211,212]. An interesting example of a
" #
rijseg r hybrid scoring method has been introduced
j
Aij r kB T ln fVol corr r 8 by Feher et al. called BHB [213,214]. BHB
rijbulk stands for buriedness, hydrogen bonding, and
binding energy. The first term assesses how
where kB is the Boltzmann factor, T is the well a ligand occupies a binding pocket. Inter-
j
absolute temperature, and fVol corr r is a li- estingly, the authors focus on the buriedness
gand volume correction factor. rijseg ris the of the receptor site rather than that of the
number density of atom pairs of type ij at a ligand as suggested by others [213,215]. The
certain atom pair distance r. rijbulk is the num- hydrogen-bonding term along with a term for
ber density of a ligandprotein atom pair of steric fits based on vdW potentials has been
type ij in a reference sphere with a radius of taken from the GOLD docking func-
12 A. For docking purposes, the PMF score tion [139,216]. A binding energy is elabo-
adds a vdW term to account for short-ranged rately calculated using a thermodynamic cy-
interaction [205]. Knowledge-based potentials cle approach subtracting the energies of the
have been proven useful in virtual screening unbound rigid solvated receptor and ligand
scenarios against FK506 binding pro- from that of the complex structure. Using a
tein [205], aldose reductase [206], and a series series of different proteinligand complexes
of other targets [207]. including androgen receptor, estrogen recep-
Figure 10 shows some of the newly derived tor, neuraminidase, thymidine kinase, inosin
PMF04 potentials and how they compare to monophosphate dehydrogenase, CDK3, and
20 VIRTUAL SCREENING

3.0 3.0 3.0


2.0 2.0 2.0
MEOC MEOA MEOD
PMF score

1.0 1.0 1.0


0.0 0.0 0.0
1.0 1.0 1.0
2163 1647 4906
2.0 2.0 2.0
3.0 3.0 3.0
0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0
3.0 3.0 3.0
2.0 2.0 2.0
MENC MENR MEOS
PMF score

1.0 1.0 1.0


0.0 0.0 0.0
1.0 1.0 1.0
2.0 1049 2.0
2132 2.0
10273
3.0 3.0 3.0
0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0
3.0 3.0 3.0
2.0 2.0 2.0
PMF score

1.0 cFCL 1.0 CFF 1.0 CPBr


0.0 0.0 0.0
1.0 1.0 1.0
2.0
1075 2.0
36913 2.0
7184
3.0 3.0 3.0
0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0 0.0 2.0 4.0 6.0 8.010.0 12.0
Atom pair distance () Atom pair distance () Atom pair distance ()

Figure 10. PMF04 (bold) and PMF99 potentials for a selected set of proteinligand atom pairs [213]. The
four-letter code refers to the atom pair types, where the first two letters indicate the protein atom types and the
last one or two letters indicate the ligand atom type (ME: metal, OC: negatively charged oxygen, OA: oxygen as
HB acceptor, OD: oxygen as HB donor, NC: positively charged nitrogen, NR: ring nitrogen, OS: other oxygen
(e.g., phosphate), CF: aliphatic carbon, cF: aromatic carbon, CL: chlorine, F: fluorine, and Br: bromine). The
number of atom pair occurrences in the set of proteinligand complexes used to derive the potentials is given
for each pair potential exemplified in the figure. A complete list of protein and ligand atom types for PMF99
and PMF04 as well as complete access to the potentials has been given in a recent publication [213].

p38 MAP kinase Feher et al. [214] derive the and hydrogen bonding as defined in GOLD.
following generalized function: In a virtual screening setting involving the
Glide decoys as negative controls [217], the
 
Fscore 0:05BE 20 RB RB3 authors report drastically improved enrich-
 X ments of actives in the virtual screening hit
10 n
HBi 0:5Hext vdWext lists for all the targets mentioned.
n i
9 3.2.5. Some Comments on Using Enrichment
where BE is the calculated binding energy, Factors Before assessing the performance of
RB is the receptor buriedness, HBi is an different scoring functions it is useful to add
indicator for the ith specific hydrogen bond some remarks on how they are compared. The
(1 if exist, 0 otherwise), and n is the total performance of virtual screening methods/
number of hydrogen-bonding residues con- scoring functions is often measured in enrich-
sidered for the receptor. Hext and vdWext are ment factors. A typical enrichment factor mea-
terms describing van der Waals interaction sure is defined as
STRUCTURE-BASED VIRTUAL SCREENING 21

0.0

7est

50.0

PMF score 100.0

150.0

200.0
15.0 10.0 5.0 0.0
log Ki

Figure 11. PMF04 (filled circles) and PMF99 (open circles) scores compared to the experimental binding
affinities for a set of 170 proteinligand complexes taken from the training set of the Score function [208].

of the target. However, even for the same


%actives retrieved target, if literature data are compared employ-
EF 10
%compounds sampled ing different numbers of actives and inactives
comparing enrichment factors becomes cum-
A random enrichment would result in EF 1. bersome even if the same number of com-
The higher the enrichment factor the better pounds is being sampled. To illustrate this
the scoring function performs. This enrich- point let us consider Case 1 with 1000 com-
ment factor is ideal for comparing different pounds and 10 actives compared to Case 2
scoring functions or virtual screening scenar- with 1000 compounds and 100 actives among
ios for the same set (or at least the same them. Let us assume for Case 1, 50% of the
number) of actives and inactives independent actives (5/10) have been sampled among the

Figure 12. Trifluoroacetyl-L-leucyl-L-alanyl-P-trifluoromethylphenylanilide (TFAP) ligand of the serine


protease elastase. The PMF99/PMF04 contributions of selected atoms are shown as calculated from the
elastase/TFAP complex taken from the PDB ID 7est.
22 VIRTUAL SCREENING

top 2% of the compounds and 100% (20/100) 3.2.6. Comparing, Consolidating, and Finding
have been sampled in Case 2. The enrichment Consensus After most scoring functions have
factor for Case 1 would be better (EF 25) conceptually been developed in the 1990s,
than that of Case 2 (EF 10). The reason is recently more effort has been put on validating
that the maximum enrichment that can be various scoring functions in the context of
achieved depends on the number of actives docking and scoring tools. One of the early
and inactives. While the maximum enrich- comparative studies of docking/scoring proto-
ment in Case 1 is 50, the maximum enrich- cols has interestingly been done on the more
ment in Case 2 is only 10. One way of avoiding challenging task of fragment docking using
the problems described above is to define an Abbotts SAR from an NMR fragment library.
enrichment factor as Comparing DOCK/force field scoring and
DOCK/PMF scoring 3200 fragments (Ki va-
%actives retrieved lues between 0.06 and 2 mM) were virtually
EF 11
%inactives retrieved screened against the KF605 binding protein
reaching an enrichment of up to EF0 3 (Eq.
In this case, the enrichment factor can become 12) [205]. While an enrichment factor of three
infinite. would be considered very low for fully enum-
The term %actives_retrieved is often erated compounds, it is still typical for weakly
called recovery. A relevant question for virtual binding fragments.
screening purposes is often how much of a An important observation made by several
database needs to be screened to retain a authors is that different targets require dif-
certain recovery. An additional problem with ferent scoring schemes. Comparing DOCK,
Equation 10 for calculating enrichment fac- FlexX [177], and GOLD docking programs
tors at a certain recovery percentage is that along with ChemScore, DOCK, FlexX, Fres-
this measure only captures the performance of no [219,220], GOLD, PMF, and Score [220]
the worst active within the desired recovery. It scoring functions, Bissantz et al. [221] evalu-
would make no difference if ligands scored 1st, ate virtual screens against thymidine kinase
2nd, and 50th, or if they scored 48th, 49th, and (TK) and estrogen receptor (ER). Mixing 10
50th. However, it would be much more desir- ligands with 990 decoys, the authors found
able to have higher ranking actives among the that FlexX docking and PMF scores correctly
recovered as in the first case. Therefore, for identify the 10 TK ligands but fail for the
full recovery an alternative enrichment func- estrogen case. Similarly, Stahl and Rarey re-
tion has been suggested [205] as port varying performances of docking/scoring
combinations for different targets [207]. For
instance, PMF performed well for neuramini-
50% dase but poorly for gelatinase A and thrombin.
EF0 12 Yet another study by Kontoyianni et al. [222]
%APRactives retrieved
further supports the finding of varying perfor-
where %APR(actives_retrieved) is the aver- mances for four docking methods, 10 scoring
age percentile rank of the retrieved actives. functions, and 6 drug targets with a few front
Equation 12 can be extended to observation runners being combinations between Ligand-
points of less than full recovery as [218] Fit, FlexX, and DOCK as docking engines
combined with Ligscore, FlexX, GOLD, and
00 50% PMF as scoring functions. Without trying to be
EF %actives retrieved exhaustive, a few other studies by Perola et
%APRactives retrieved
al. [163], Schulz-Gasch and Stahl [164],
13
Muegge and Enyedy [173], and Warren et
al. [167] are mentioned here solidifying
Enrichment factors EF0 and EF00 consider the further the general observation of greatly
rank of each retrieved hit. As a consequence, varying performances of docking/scoring ex-
enrichment factors can be larger than those periments. There are two major consequences
obtained from Equation 10. from these observations:
STRUCTURE-BASED VIRTUAL SCREENING 23

(i) For optimum virtual screening pur- coming not only from different scoring func-
poses a validation phase needs to be tions applied to the same proteinligand com-
introduced using known active and plex but also from a series of different protein
inactive ligands to fine-tune a virtual targets a given ligand is exposed to [230].
screening protocol before it is applied Using a set of arbitrary active sites, a refer-
in a production run. ence score for a given ligand is calculated
(ii) Consensus approaches need to be con- leading to a simple statistical correction for
sidered to average out the volatile a calculated ligand score against a target of
performances of single scoring interest. This correction has been called
functions. MASC (multiple active site correction) and is
calculated as:
As pointed out above, because of the vary-  
Sij mi
ing performance of scoring functions, methods Sij0 14
have been deployed that allow focusing on si
scoring functions that are better (or worse)
than others by consensus. In general, consen- where Sij0 is the modified score for compound i
sus scoring is rarely able to improve scoring in the active site j and Sij0 is the original score.
beyond the performance of the single best mi and si are mean and standard deviations of
scoring function used [223]. However, because the scores for compound i across all active sites
one does not know up-front which of the scor- j. If a statistically significant higher score is
ing function performs best for a given target, obtained for the active site of interest, the
consensus scoring allows for an averaging compound is more likely to be a true active.
effect that often (but not always) is only mar- The MASC procedure requires a substantial
ginally worse in performance than the best initial investment in CPU time due to the need
individual scoring function. The first compre- to dock the ligands of interest against not only
hensive consensus scoring study has been pre- 1 but 10 or more targets to calculate a refer-
sented by Charifson et al. [224]. Using 11 ence score. However, especially for databases
scoring functions (LUDI [100], ChemScore, used repeatedly such as an in-house screening
Score, PLP, Merck force field [225], DOCK collection this up-front expenditure is reason-
energy score, DOCK chemical score, able. Vigers and Rizzi report significant im-
Flog [226], strain energy, Poisson Boltz- provements in virtual screening of p38 MAP
mann [227], buried lipophilic surface kinase and Ptp-1b using their MASC scoring
area [228], DOCK contact score, and volume method in the context of FlexX and Gold
overlap [210]), two docking programs (DOCK, scores.
Gambler), and three protein targets (p38 MAP
kinase, inosine monophosphate dehydrogen- 3.2.7. Example 5: Identification of a Novel Liver
ase, and HIV-1 protease) the authors showed a X Receptor (LXR) Modulator LXRa and LXRb
significant reduction of false positives in the are oxysterol-activated nuclear receptors that
virtual screening hit sets. Other authors re- are involved in controlling the expression of
port success with consensus scoring also sug- various genes associated with cholesterol, glu-
gesting general hybrid scores applicable to cose, and fatty acid metabolism. LXR agonists
any target such as ScreenScore [207]. Consen- are therefore of potential pharmacological in-
sus scoring tools are available in commercial terest as antiatherosclerotic, anti-inflamma-
software packages such as Sybyl [229] and tory, and antidiabetic agents [231]. Combin-
Fred [190]. ing Glide virtual screening with gene profiling
Often scoring functions exhibit artifactual using an ArrayPlate mRNA assay resulted in
dependencies on ligand properties such as the identification of a novel LXR modulator
compound size, prevalence of hydrogen-bond- (Fig. 13) [232]. To validate the Glide docking
ing features, or flexibility. To normalize bind- method, 127 representative LXR agonists
ing scores of compounds, Vigers and Rizzi from 6 different chemical classes were selected
have suggested a form of consensus scoring from the literature. These LXR ligands were
24 VIRTUAL SCREENING

Figure 13. Structure-based virtual screening flowchart used by Cheng et al. [232] to identify a novel LXR
modulator.

docked against seven LXR crystal structures more than 20% were removed. This reduced
from the PDB. The PDB structure 1P8D re- the hit rate to 3263. Compounds were prior-
covered the most LXR ligands (27 of 127) itized further by requiring a close contact to
through Glide docking. To increase the recov- His435 that was believed to be a required
ery rate further, the hydrophobic pocket of the interaction for LXR agonists. A total of 647
ligand binding site was widened through ro- compounds were thus selected. Compounds
tating side chains of Phe340 and Ile253. Mini- prioritized by the PLP1 score [233] were
mization of such an altered protein structure added. In addition, 247 compounds with halo-
in presence of the LXR agonist GW3965 gen atoms interacting with His435 such as in
(Fig. 14) resulted in the final protein structure GW3965 were added to the hit list. A total of
used in the virtual screen of 135,000 com- 1308 compounds were ultimately selected for
pounds from an internal collection. Applying high-throughput gene expression profiling.
a cutoff of 9.0 using the GlideSP scoring Gene profiles of one virtual screening hit
function 10,407 compounds were retrieved. (Fig. 14) were found to be extremely similar
Ligands clashing with the protein matrix to to those of other LXR agonists such as
PRE- AND POSTPROCESSING OF VIRTUAL SCREENING DATA 25

Figure 14. LXR ligands.

GW3965. The hit was subsequently confirmed A second example illustrates again the ne-
as full LXR agonist by a cofactor recruitment cessity to guide fragment based in silico
assay and relevant functional assays. screening through specific constraints to be
successful. Indolin-2-one analogs were discov-
3.3. Fragment-Based Virtual Screening ered by in silico fragment-based virtual
screening of DNA gyrase [238]. DNA gyrase
Fragment-based drug discovery methods
is a bacterial enzyme targeted for discovering
have experienced a renaissance in recent
antibacterial agents. A new core fragment to
years [234]. Fragments are mostly identified
sit in the DNA gyrase active-site pocket was
using high-concentration screening methods
desired. The crystal structure with PDB ID
with NMR follow-up and crystallographic
1ei1 of the N-terminal fragment of DNA gyr-
binding mode elucidation. However, virtual
ase B was used. LUDI identified key interac-
screening is sometimes used to narrow the set
tions in the active site. The MDL-ACD and
of fragments chosen for fragment screening.
MDL-SCD databases of commercially avail-
Although fragment virtual screening methods
able compounds were searched to find a novel
have historically produced low enrichment
fragment with MW < 180 Da to fill the active
rates [205], there are some success stories
site and engage it in all relevant interactions.
reported in the literature.
LUDI was used in combination with com-
A virtual screen for S1-binding fragments
pound similarity to known DNA gyrase bin-
of dipeptidyl peptidase IV was reported by
ders to subsequently identify indolin-2-ones
Rummey et al. [235]. DPP IV is an important
with up to 60 mM activity.
diabetes target. DPP IV inhibitors prevent
The interested reader can learn more on
the rapid degradation of GLP-1 that has
the topic of fragment-based drug design from
beneficial effects on glucose homeostasis.
Chapter 10 in Volume 2.
Using a pharmacophore-constrained docking
approach, the S1 pocket of DPP IV was tar-
geted. Ten thousand primary aliphatic 4. PRE- AND POSTPROCESSING OF
amines (positively charged group in the S2 VIRTUAL SCREENING DATA
pocket was required) were docked using
FlexX [236]. The formation of at least two The preprocessing of databases subjected to
hydrogen bonds was required for a hit in the virtual screens as well as the postprocessing of
S1 pocket. With such specific constraints hits needs some attention. The choice of com-
fragment docking identified a series of known pound databases to screen has been discussed
and unknown anchor fragments in the S1 elsewhere in different contexts already.
pocket. Similar examples of DPP IV inhibi- Hence, only a few words about preparing a
tors obtained through fragment-based database for a virtual screen. Generating 3D
screening were described by Makara [237]. coordinates if necessary and enumerating
26 VIRTUAL SCREENING

stereocenters of small molecules can be done lights and oral PhysChem scores to triage hit
in a straightforward fashion by programs such lists. The traffic lights center around solubi-
as CORINA [85]. Generating the appropriate lity, octanol-water partitioning, polar surface
ionization states of titrable groups is often area, rotatable bonds, and molecular weight
more challenging. Commercial software to descriptors. The interested reader may con-
predict ionization states at different pH va- sult Chapter 2 in Volume 2 for further reading
lues with varying degree of accuracy is avail- on the topic of in silico ADMET prediction.
able from software vendors such as Schrodin-
ger, Chemaxon, and ACS Labs. Tautomer 4.1.2. Ligand Efficiency When evaluating
enumeration is often ignored due to the ab- hits identified in a screen, they must be judged
sence of commercial software to reliably de- not only for their experimental potency but
termine the tautomer with the highest popu- also for the potential to develop the hit into a
lation among many possible tautomers. As drug. Often this lead optimization process
mentioned in Section 2, sometimes ligand con- results in an increased molecular weight to
formations need to be precalculated. Also, improve potency and target selectiv-
various fingerprint representations can be ity [245,246]. Unfortunately, increased mole-
precalculated and stored for databases that cular weight often compromises critical drug
are recurrently screened. properties such as solubility or permeability
that are necessary for bioavailabil-
4.1. Hit Triaging ity [247,248]. Therefore, it is better to optimize
a small, less potent molecule than a much
Smart hit triaging is an essential step in a
larger molecule that already contains liabil-
virtual screen. Unfortunately, there is no sui-
ities that are difficult to remove. To relate
table commercial software available that can
potency to compound size, Andrews [249] and
powerfully assist in hit triaging. Virtual
then later Kuntz [250] defined a measure of
screening filter cascades rely on the power of
binding energy and demonstrated there is a
multiple exclusion criteria to reduce the num-
correlation between potency and the number
ber of virtual screening hits to a manageable
of nonhydrogen atoms in a ligand. This con-
size. If target-specific criteria such as phar-
cept was used to define the term ligand effi-
macophore matches, shape complementarity,
ciency by Hopkins et al. [251] whereby the
and similarity to known hits are not discrimi-
binding energy is divided by the number of
native enough, target-independent criteria
nonhydrogen atoms, shown in Equation 15.
are typically used. These include properties
such as general drug-likeness [239], perceived
RT ln Kd
selectivity, ligand efficiency [234,240,241], Ligand efficiency
and diversity [242]. Nonhydrogen atoms
15
4.1.1. Traffic Lights Virtual screening often
produces as many hits or more than high- Ligand efficiency has gained wide use in
throughput screening does. If we set aside for the pharmaceutical industry for HTS hit se-
a moment the fact that for virtual screening lection [252]. Increased ligand efficiency in-
we are not sure of the true activity due to false creases the likelihood that a hit can be devel-
positives, both approaches face a similar pro- oped into a clinical candidate [253].
blem of hit triaging. An example of how hits The minimum ligand efficiency viewed as
are triaged using traffic lights has been given acceptable for a drug-like compound is ap-
by Lobell at al. [243]. The authors highlight proximately 0.3. This number is calculated
the fact that leads are often very similar to from the hypothetical case of a 10 nM opti-
hits [244] that ADME-related liabilities are mized compound with approximately 36 hea-
hard to address during lead optimization and vy atoms (the maximum number of atoms to
that more hit classes are generated. They able to achieve a MW < 500) [254]. The max-
propose to use ADMET (absorption, distribu- imum ligand efficiency appears to plateau as
tion, metabolism, excretion, toxicology) traffic the number of heavy atoms increases [255].
SCAFFOLD HOPPING 27

This is thought to be the consequence probable liabilities that would have to be over-
of enthalpic effects from structural changes come if the compound was pursued [273]. There
in the target protein that ultimately occur are several databases available to serve this
as a larger molecule binds. The number of purpose including PubChem [244,274,275],
contact points with the target protein in- ChemIDplus [276], and DiscoveryGate [277].
creases with the ligand size making it These resources can be used to identify an exact
challenging to optimize each contact simul- chemical match to HTS or virtual screening
taneously [256]. Entropic effects should also hits or highly similar compounds that highlight
play a significant role in ligand efficiency potential activities.
but as of yet no technique has shown an
accurate estimation of ligand efficiency loss
5. SCAFFOLD HOPPING
due to entropy.
The ultimate goal of a virtual screening ex-
4.1.3. HTS Triaging High-throughput screen-
periment is to identify promising hits of new
ing (HTS) is arguably the major technique used
chemical matter. It is often irrelevant what
today for the discovery of new chemical mat-
the actual hit rate of a virtual screen is. Rather
ter [257,258]. Often the lead-like matter
the novelty, patentability, drug-likeness, syn-
identified by an HTS is clouded with nonspe-
thetic accessibility, and optimizability of a hit
cific inhibitors that do not directly bind to the
class are of importance. Therefore, it is essen-
target protein in a one-to-one fashion. In-
tial to understand the ability of a virtual
stead, compounds may absorb and emit light
screening approach to identify new chemo-
at the wavelengths tested [259] or may ag-
types. For structure-based virtual screening
gregate to form a colloidal particle that di-
methods the ability to scaffold hop is usually
rectly interacts with the target pro-
taken for granted because there is no ligand
tein [260,261]. These phenomena have led to
(other than the one being docked) involved in
the creation of structural filters [262] and
docking experiments. While the assumption of
experimental techniques [263265] that
scaffold-independent docking preferences
identify such potential aggregators. Like-
should be debated at some point, it is not the
wise, there are also nonspecific activators
central subject here and is therefore not
characterized that may elicit activity
further discussed. We would rather like to say
through a similar mechanism [266]. These
some words about ligand-based scaffold hop-
filters derived from HTS experiences can be
ping opportunities in the context of compound
applied directly as promiscuity filters for vir-
similarity and virtual screening. In particular,
tual screening hit lists.
we would like to illustrate a few simple con-
Within the class of HTS hits that direct
cepts such as exhaustive scaffold enumeration
binding to the target protein in a one-to-one
and scaffold hopping using similarity descrip-
fashion, there are structural elements that
tors. The more wide-ranging concept of ex-
allow binding to a large range of targets. These
tending virtual screening to virtually accessi-
compounds are commonly referred to as
ble chemicals, an approach often referred to as
promiscuous binders or privileged struc-
de novo design [278], will not be discussed
tures [267271]. They contain specific chemi-
here.
cal groups that interact with a portion of the
target protein that is conserved throughout a
5.1. Example 6: Scaffold Enumeration for
protein family. There can also be strong cross-
PDE5 Inhibitors
reactivity between completely unrelated pro-
tein families if the pharmacophoric elements For any given scaffold, there are a number of
necessary for binding are similar [272]. These analogs that may produce a similar clinical
classes of compounds commonly must be fil- endpoint. In particular, for a specific ring
tered out if the main contributor to binding is system the necessary topology of the ring can
also the cause of the selectivity liabilities. To be maintained by modifying the heteroatoms
identify these compounds, the known assay and bond characteristics to preserve the
data are associated with each HTS hit to flag shape, critical H-bonding characteristics, or
28 VIRTUAL SCREENING

Figure 15. Example analogs of sildenafil and vardenafil using a scaffold enumeration approach that
maintains the critical pharmacophores circled on both drugs.

attachment points [279]. An instructive exam- sildenafil. In Fig. 15, the atoms circled in the
ple of modifying the shape, attachment, and core scaffold are critical pharmacophoric ele-
pharmacophoric properties in a molecule yet ments for potency. By holding these atoms
scaffold hop to a new chemotype with similar constant and maintaining the 3 attachment
biological activity is illustrated in Fig. 15. The points, there are only 12 scaffolds that would
discovery of PDE5 inhibitor vardenafil ap- likely produce a similar efficacy.
pears to be a direct result of a scaffold hop To calculate the possible analogs for a spe-
starting from sildenafil. Vardenafil was de- cific core the algorithm is quite simple. First,
signed by simply shifting the heteroatom pla- one must define the atom types to be consid-
cement in the core. Such subtle modifications ered; an example list is shown in Fig. 16.
can sometime even improve the bioavailabil- Second, one must define a SMILES string that
ity, solubility, selectivity, etc. without drama- represents a core to start from. Then one
tically modifying the clinical efficacy. In the simply loops through every atom in the core
case of vardenafil, the shift in heteroatoms and sequentially replaces each atom with an
improved the selectivity over PDE6, which atom from the atom type list. This procedure
reduced the risk of altered color perception, produces a large number of SMILES codes
a rare side effect that sometimes occurs with that may have valence problems, may be

Figure 16. Pseudo-code for an atom enumeration algorithm.


VIRTUAL SCREENING OF COMBINATORIAL LIBRARIES 29

unstable, or are duplicates. A valence filter SciTegic fingerprints) [18,54]. Not surpris-
check or a substructure filter eliminates ingly, it was concluded that 3D descriptors as
SMILES strings with known stability pro- well as circular fingerprints were found to
blems. Other problematic SMILES strings are facilitate scaffold hopping in more cases than
removed through postfiltering. If there are topological fingerprints did (Daylight finger-
specific atom restrictions, such as those circled prints). What was more surprising was the
in Fig. 15, then an additional substructure finding that ligand similarity was better able
filter can identify the relevant scaffolds. A to identify new chemotypes than structure-
pseudo-code of the algorithm is shown in based virtual screening was, a phenomenon
Fig. 16. that was observed before [41] but hitherto not
The scaffold enumeration approach works studied systematically.
well not only for rings but also for complete
molecules. A critical detail is to restrict the
6. VIRTUAL SCREENING OF
search by requiring specific pharmacophores,
COMBINATORIAL LIBRARIES
atoms, or groups at different locations. In the
example of sildenafil, without restricting the
Virtual screening success obviously relies on
three atoms necessary for potency and stabi-
accessing a hit molecule synthetically for bio-
lity filters, this search would have yielded over
logical testing. The number of available com-
a thousand possibilities. An example set of
pounds (either in-house or commercially) is
stability filters that can be applied is shown
limited to approximately 10 million. On the
in Fig. 17.
other hand, synthesizing individual hit mole-
cules (for instance as a result of a de novo
5.2. Scaffold Hopping Through Similarity
design campaign) is typically limited to few
Descriptors
analogs and may involve significant chemistry
As described in Section 2, compound similar- efforts. Finding an alternative between the
ity descriptors have different abilities to iden- practically boundless in silico world of mole-
tify compounds with new scaffolds yet similar cules and the quite limited synthetic options is
biological activity. Molecular descriptors such therefore crucial for realizing the full potential
as atom pairs [41], 3D pharmacophore finger- of virtual screening.
prints [54], molecular field descriptors [280], The concept of extending virtual screens to
and FeatureTrees [281] have been shown to the area of chemically accessible combinator-
facilitate scaffold hopping. Recently, a sys- ial libraries has several advantages.
tematic study of the scaffold hopping ability
of different compound similarity descriptors 1. Combinatorial libraries of previously
was conducted that involved seven drug tar- established chemistry can be synthe-
gets of different classes (p38 kinase, CDK2, sized rather quickly.
thrombin, neuraminidase, estrogen receptor,
2. Combinatorial libraries compensate
HIV-1 protease, COX2) as well as several
for the limited accuracy in many vir-
popular descriptors (atom pairs, Daylight fin-
tual screens by the power of large num-
gerprints, 3D pharmacophore fingerprints,
bers of analogs thereby introducing an

Figure 17. An example of compound stability filters that can be used to filter the compound lists produced by
the atom enumeration algorithm.
30 VIRTUAL SCREENING

element of serendipity into the drug nodes represent small building blocks of the
discovery process. compound. These nodes are either ring sys-
3. Virtual screening methods applied tems or acyclic atoms that are connected ac-
to combinatorial libraries allow for cording to the molecular graph (Fig. 18). The
dimensionality reduction techniques to nodes are labeled with information related to
be deployed that render the storage the properties of the building blocks they re-
and processing of individual com- present. Shape and size of a building block as
pounds unnecessary. well as chemical descriptors such as pharma-
cophore character profiles are linearly com-
bined to characterize each node. The Feature-
Large combinatorial library collections for Trees-FS extension of the Ftree approach re-
the purpose of virtual screening have been lies on dynamic programming to perform a
described in the literature by Nikitin et similarity search of combinatorial libraries of
al. [282] based on published combinatorial up to 1018 compounds in a few minutes search-
chemistry collected from approximately 200 ing within the entire combinatorial space with-
individual publications. Boehm et al. de- out enumeration.
scribed the use of large sets of combinatorial Combining reaction rules applicable to
libraries for applications within the pharma- combinatorial chemistry is a concept that can
ceutical industry [283]. Typically, combinator- be deployed in de novo design programs. Re-
ial library collections include a few hundred cent attempts in retrosynthetic de novo design
libraries with up to 10 trillion compounds. A programming go beyond the use of simple
few 3D methods have been described to search RECAP [285] rules for combining fragments
in such a large molecular spaces. Fragmenta- to fully enumerated molecules. Recently, Sim-
tion of the molecules is used to reduce the BioSys announced plans to combine their ret-
dimensionality of the search space. Topomer rosynthtic chemistry package ARChem with
descriptors are an example of 3D compound their eHITS technology to build a new retro-
fragmentation used for this purpose [42]. synthetic de novo design tool in 2009 [286].
Nikitin et al. used their collection of reactions
to feed into a 3D de novo design program. For 6.1. Privileged Structures
more insight into the design of combinatorial
libraries, the reader may consult Chapter 8 in Privileged structures are structural types of
Volume 1. small molecules that are able to bind with high
Very fast searching of large combinatorial affinity to multiple classes of receptors [287].
libraries can only be done using 2D similarity An enrichment of libraries with privileged
descriptors. FeatureTrees-FS [12] an exten- structures may increase the chance of finding
sion of the FeatureTrees [284] algorithm has active compounds. Examples of privileged
been applied by Boehm et al. to search through structures include benzazepine analogs found
358 libraries containing >1012 compounds to be effective ligands for an enzyme that
[283]. The FeatureTree approach is a 2D com- cleaves the peptide angiotensin I while others
pound similarity method that represents a are effective CCK-A receptor ligands. Cypro-
small molecule by a tree structure where the heptadine derivatives were found to have per-

Figure 18. FeatureTree representation of a small molecule.


VIRTUAL SCREENING OF COMBINATORIAL LIBRARIES 31

ipheral anticholinergic, antiserotonin, anti- 6.2. Targeting Protein Families


histaminic, and orexigenic activity. Hydroxa-
As alluded to above already, it is as important
mate and benzamidine derivatives have been
to screen the most appropriate compound col-
shown to be privileged structures for metallo-
lection, as it is to choose the best virtual
proteases and serine proteases, respectively.
screening method. In addition to enriching
For the class of 7-transmembrane G-protein-
screening collections with privileged struc-
coupled receptors, a large number of privi-
tures, the choice of databases shaped for spe-
leged structures has been found including, for
cific protein classes is also very useful. It can
example, diphenylmethane, diazepine, benza-
be expected that the number of hits in a tar-
zepine, biphenyltetrazole, spiropiperidine, in-
geted compound collection is higher than that
dole, and benzylpiperidine [48]. Some ubiqui-
of a general compound collection. At the same
tously privileged structures have been identi-
time, the number of false positives will also be
fied [288]. They include carboxylic acids, bi-
possibly higher. It is therefore observed that
phenyls, diphenylmethane, and to a lesser
virtual screening enrichment rates for tar-
extent, naphthyl, phenyl, cyclohexyl, biben-
geted libraries and general drug-like libraries
zyl, benzimidazole, and quinoline.

Figure 19. Virtual screening workflow to identify novel FFAR1 ligands [290].
Table 3. Recently Pursued Targets Using Synergistic Virtual Screening Strategies
Target Class Target VS Approaches Used OUTCOME Reference
Kinases YpkA Machine learning and multi- First reported single digit mi- Hu et al., 2007 [294]
ple conformational virtual cromolar inhibitors
screening
TMPKmt Pharmacophore searching, Three hits showed MIC of 3.12 mg/ Kumar et al., 2008 [295]
docking, structure interaction mL
fingerprints
JAK3 Docking, similarity searching Three hits with nanomolar Chen et al., 2008 [296]
activity
Phosphatases CDC25 Several docking programs 13 mM hit Montes et al., 2008 [297]
(FRED, Surflex, LigandFit)
G-protein-coupled receptors CCR2 pharmacophore fingerprints, Two putative CCR2 ligands Nair and Sobhia, 2008 [298]
HSQSAR, docking
FFAR1 2D similarity, 3D pharmaco- 15 agonists, partial agonists, and Tikhonova et al., 2008 [290]
phore searching, docking antagonists
CCR5 2D pharmacophore similar- Agonist found that promoted re- Kellenberger et al., 2007 [299]
ity, docking, clustering ceptor internalization
Proteases CatepsinS Ligand-based and structure- 7 hits in low micromolar range Markt et al., 2008 [300]

32
based pharmacophore
searching
SARS CoV protease Docking, 3D-QSAR, 25 inhibitors >3 mM Tsai et al., 2006 [301]
pharmacophore
Ion channels Kv1.5 Protein-derived pharmaco- 5 inhibitor classes <10 mM Pirard et al., 2005 [302]
phores, FeatureTrees, 2D-
similarity
Nuclear receptors PPAR Ligand-based and structure- Novel PPAR agonists with vary- Markt et al., 2008 [103]
based pharmacophore ing PPAR selectivity
searching
Other enzymes Chorismate mutase Ligand-based pharmacophore 6 mM inhibitors Agrawal et al., 2007 [303]
models, docking
DPPIV Pharmacophore, docking 51 inhibitors found Ward et al., 2005 [304]
COX-2 machine learning, pharmaco- new inhibitors found Franke et al., 2005 [305]
phore descriptor
protein-RNA interactions HIV-1 reverse transcriptase 3D-QSAR, docking 2 potent inhibitors from SPECS Zhang et al., 2006 [306]
transporters SHGB 2D-QSAR, docking ligands found Cherkasov et al., 2005 [307]
others HRV coat protein structure-based pharmaco- 6 structures with antirhinoviral Steindl et al., 2005 [308]
phore model, docking, PCA activity
spermidine synthase 3D pharmacophore filtering, seven novel binders Jacobsson et al., 2008 [309]
docking
SYNERGIES 33

Figure 20. FFAR1 ligands. GW9508 and the literature agonist 2 are full agonists [293]. The other three
compounds depict representative virtual screening hits [290].

is often similar [289]. However, targeted li- 7.1. Example 7: Free Fatty Acid Receptor 1
braries have the potential of offering structu- Ligands
rally richer hit sets.
A particularly instructive example of using
virtual screening synergies has been pre-
7. SYNERGIES sented recently by Tikhonova et al. [290]. Ago-
nists, partial agonists, and an antagonist of
Early on virtual screening has been consid- free fatty acid receptor (FFAR) 1 were identi-
ered a cascade of filters involving many dif- fied through a combination of 2D similarity
ferent components considering 1D to 3D searching of the ZINC database [291], BCUT
properties according to computational ex- diversity analysis, 3D pharmacophore search-
pense [11]. Nonetheless, in practical applica- ing, receptor modeling involving receptor mu-
tions one primary filter often prevailed. For tagenesis data [292], molecular docking, and
instance, if target structures at atomic reso- pharmacological testing. Due to its perceived
lution were available structure-based virtual role in regulating insulin and glucagon secre-
screening was often the primary virtual tion FFAR1 is considered a promising target
screening tool [22]. Alternatively, for target for treating type 2 diabetes. Figure 19 illus-
classes such as GPCRs mostly ligand-based trates the workflow of the virtual screening
methods were preferred. After some debate procedure. Figure 20 shows the compounds
about the strengths of ligand-based and used for the ligand-based part of the virtual
structure-based virtual screening ap- screen as well as the hits identified. It is
proaches [41,54], it is now common practice particularly interesting to note that the dock-
to combine different virtual screening ap- ing into the FFAR1 homology model that was
proaches in synergy. Table 3 illustrates re- generated using a ground-state rhodopsin
cent virtual screening campaigns involving structure in presence of GW9508 only yielded
different drug targets and synergetic virtual agonist and partial agonist hits. The only
screening protocols. antagonist hit was obtained through the
34 VIRTUAL SCREENING

pharmacophore search. This example shows mental-design for high throughput screening.
not only how different virtual screening meth- Drug Discov Today 1996;1:277286.
ods can work together to discover structurally 9. Bohm HJ, Klebe G. What can we learn from
novel hits but also how the different strengths molecular recognition in proteinligand com-
contribute to an overall increased richness of plexes for the design of new drugs. Angew
the hits properties. Chem Int Ed Engl 1996;35:25892614.
10. Martin YC. Challenges and prospects for com-
putational aids to molecular diversity. Per-
ACKNOWLEDGMENT spect Drug Discov Des 1997;7/8:159172.
11. Walters WP, Stahl MT, Murcko MA. Virtual
I.M. would like to thank Dr. Istvan Enyedy for screening: an overview. Drug Discov Today
his work on the previous version of this book 1998;3:160178.
article that appeared in the 6th edition, Vol. 1 12. Rarey M, Stahl M. Similarity searching in
of Burgers Medicinal Chemistry and Drug large combinatorial chemistry spaces. J Com-
Discovery. put Aided Mol Des 2001;15:497520.
13. Schneider G, Neidhart W, Giller T, Schmid G.
Scaffold-hopping by topological pharmaco-
REFERENCES phore search: a contribution to virtual screen-
ing. Angew Chem Int Ed Engl 1999;38:
1. Abraham DJ. The potential role of single crys- 28942896.
tal X-ray diffraction in medicinal chemistry.
14. Klebe G. Virtual ligand screening: strategies,
Intra-Science Chem Rep 1974;8:19.
perspectives and limitations. Drug Discov To-
2. Perutz M. Protein Structure. New Approaches day 2006;11:580594.
to Disease and Therapy. New York: W. H.
15. Lyne PD. Structure-based virtual screening: an
Freeman and Company; 1992.
overview. Drug Discov Today 2002;7: 10471055.
3. Fesik SW. NMR Studies of molecular com-
16. Muegge I, Enyedy I, Virtual screening. In:
plexes as a tool in drug design. J Med Chem
Abraham D, editors. Burgers Medicinal
1991;34:29372945.
Chemistry. 6th ed. New York: Wiley; 2003. p
4. Cushman DW, Cheung HS, Sabo EF, Ondetti 243280.
MA. Design of potent competitive inhibitors of
17. Muegge I, Oloff S. Advances in virtual screen-
agiotensin-converting enzyme. Carboxyalka-
ing. Drug Discov Today Technol 2006;3:
noyl and mercaptoalkanoyl amino acids. Bio-
405411.
chemistry 1977;16:54845491.
18. Muegge I. Synergies of virtual screening ap-
5. Kuyper LF, Roth B, Baccanari DP, Ferone R,
proaches. Mini Rev Med Chem 2008;8:927933.
Bedell CR, CHampness JN, Stammers DK,
Dann JG, Norrington FE, Blaker DJ, Goodford 19. Oprea TI, Matter H. Integrating virtual
PJ. Receptor-based design of dihydrofolate re- screening in lead discovery. Curr Opin Chem
ductase inhibitors: comparison of crystallogra- Biol 2004;8:349358.
phically determined enzyme binding with 20. Waszkowycz B. Towards improving compound
enzyme affinity in a series of carboxy-substi- selection in structure-based virtual screening.
tuted trimethoprim analogues. J Med Chem Drug Discov Today 2008;13:219226.
1982;25:11201122. 21. Willet P. Similarity-based virtual screening
6. Gallop MA, Barrett RW, Dower WJ, Fodor using 2D fingerprints. Drug Discov Today
SPA, Gordon AM. Applications of combinator- 2006;11:10461053.
ial technologies to drug discovery. 1. Back- 22. Good A. Structure-based virtual screening pro-
ground and peptide combinatorial libraries. tocols. Curr Opin Drug Discov Dev 2001;4:
J Med Chem 1994;37:12331251. 301307.
7. Gordon EM, Barrett RW, Dower WJ, Fodor 23. Good AC, Mason JS, Three dimensional struc-
SPA, Gallop MA. Applications of combinatorial ture database searches. In: Lipkowitz KB,
technologies to drug discovery 2. Combinator- Boyd DB, editors. Reviews in Computational
ial organic synthesis, library screening strate- Chemistry. Vol.7. New York: VCH; 1995. p
gies, and future directions. J Med Chem 67117.
1994;37:13851401. 24. Klebe G. Virtual Screening: An Alternative or
8. Lutz MW, Menius JA, Choi TD, Laskody RG, Complement to High Throughput Screening?
Domanico PL, Goetz AS, Saussy DL. Experi- Leiden: Kluwer/Escom; 2000.
REFERENCES 35

25. Varnek A, Tropsha A, Chemoinformatics 40. Matter H, Schwab W, Barbier D, Billen G,


Approaches to Virtual Screening. Cambridge, Haase B, Neises B, Schudok M, Thorwart W,
UK: RSC Publishing; 2008. Schreuder H, Brachvogel V, Lonze P, Weith-
26. Eckert H, Bajorath J. Molecular similarity mann KU. Quantitative structureactivity re-
analysis in virtual screening: foundataions, lationship of human neutrophil collagenase
limitations and novel approaches. Drug Discov (MMP-8) inhibitors using comparative mole-
Today 2007;12:225233. cular field analysis and X-ray structure ana-
lysis. J Med Chem 1999;42:19081920.
27. Mason JS, Good AC, Martin EJ. 3-D pharma-
cophores in drug discovery. Curr Pharm Des 41. Sheridan RP, Kearsley SK. Why do we need so
2001;7:567597. many chemical similarity search methods?
28. Sun H. Pharmacophore-based virtual screen- Drug Discov Today 2002;7:903911.
ing. Curr Med Chem 2008;15:10181024. 42. Andrews KM, Cramer RD. Toward general
methods of targeted library design: topomer
29. Vert JP, Jacob L. Machine learning for in silico
virtual screening and chemical genomics: new shape similarity searching with diverse struc-
strategies. Comb Chem High Throughput tures as queries. J Med Chem 2000;43:
17231740.
Screen 2008;11:677685.
43. Jenkins JL, Glick M, Davies JW. A 3D simi-
30. Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY,
Lin HH, Chen YZ. Machine learning ap- larity method for scaffold hopping from known
proaches for predicting compounds that inter- drugs or natural ligands to new chemotypes.
act with therapeutic and ADMET related pro- J Med Chem 2004;47:61446159.
teins. J Pharm Sci 2007;96:28382860. 44. Makara GM. Measuring molecular similarity
31. Martin YC, Kofron JL, Traphagen LM. Do and diversity: total pharmacophore diversity.
structurally similar molecules have similar J Med Chem 2001;44:35633571.
biological activity? J Med Chem 2002;45: 45. Good AC, Hermsmeier MA, Hindle SA. Mea-
43504358. suring CAMD technique performance: a vir-
32. Daylight Chemical Information Systems, Inc. tual screening case study in the design of va-
Mission Viejo, CA; www.daylight.com. lidation experiments. J Comput Aided Mol Des
2004;18:529536.
33. Carhart RE, Smith DH, Venkataraghavan R.
Atom pairs as molecular features in struc- 46. Good AC, Cho SJ, Mason JS. Descriptors you
can count on? Normalized and filtered phar-
tureactivity studies: definition and applica-
tion. J Chem Inf Comput Sci 1985;25:6473. macophore descriptors for virtual screening.
J Comput Aided Mol Des 2004;18:523527.
34. Pipeline Pilot Basic Chemistry Component
Collection, SciTegic Inc., San Diego, CA. 2008. 47. Mason JS, Cheney DL. Library design and
virtual screening using multiple 4-point phar-
35. MDL Information Systems, Inc., San Leandro, macophore fingerprints. Pac Symp Biocomput
CA.
2000;5:576587.
36. Kier LB, Hall LH, Molecular Connectivity in
48. Mason JS, Morize I, Menard PR, Cheney DL,
Chemistry and Drug Research. New York: Hulme C, Labaudiniere RF. New 4-point phar-
Academic Press; 1976. macophore method for molecular similarity
37. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui and diversity applications: overview of the
K, Jacoby E, Schuffenhauer A. Comparison of method and applications, including a novel
fingerprint-based methods for virtual screen- approach to the design of combinatorial li-
ing using multiple bioactive reference struc- braries containing privileged substructures.
tures. J Chem Inf Comput Sci 2004;44: J Med Chem 1999;42:32513264.
11771185.
49. Mason JS, Cheney DL. Ligandreceptor 3-D
38. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui similarity studies using multiple 4-point phar-
K, Jacoby E, Schuffenhauer A. Comparison of macophores. Pac Symp Biocomput 1999;4:
topological descriptors for similarity-based vir- 456467.
tual screening using multiple bioactive refer-
50. Deng Z, Chuaqui C, Singh J. Structural inter-
ence structures. Org Biomol Chem 2004;2: action fingerprint (SIFt): a novel method for
32563266. analyzing three-dimensional proteinligand
39. Matter H, Potter T. Comparing 3D pharmaco- binding interactions. J Med Chem 2004;47:
phore triplets and 2D fingerprints for selecting 337344.
diverse compound subsets. J Chem Inf Comput 51. McGregor MJ, Muskal SM. Pharmacophore
Sci 1999; 12111225. fingerprinting. 1. Application to QSAR and
36 VIRTUAL SCREENING

focused library design. J Chem Inf Comput Sci Crystal structure of opsin in its G-protein-inter-
1999;39:569574. acting conformation. Nature 2008;455: 497502.
52. McGregor MJ, Muskal SM. Pharmacophore 63. Warne T, Serrano-Vega MJ, Baker JG, Mou-
fingerprinting. 2. Application to primary li- khametzianov R, Edwards PC, Henderson R,
brary design. J Chem Inf Comput Sci 2000; Leslie AG, Tate CG, Schertler GF. Structure of
40:117125. a beta1-adrenergic G-protein-coupled recep-
53. Abrahamian E, Fox PC, Naerum L, Christen- tor. Nature 2008;454:486491.
sen IT, Thogersen H, Clark RD. Efficient gen- 64. Rasmussen SG, Choi HJ, Rosenbaum DM,
eration, storage, and manipulation of fully Kobilka TS, Thian FS, Edwards PC, Burgham-
flexible pharmacophore multiplets and their mer M, Ratnala VR, Sanishvili R, Fischetti RF,
use in 3-D similarity searching. J Chem Inf Schertler GF, Weis WI, Kobilka BK. Crystal
Comput Sci 2005;43:458468. structure of the human beta2 adrenergic G-
54. Zhang Q, Muegge I. Scaffold hopping through protein-coupled receptor. Nature 2007;450:
virtual screening using 2D and 3D similarity 383387.
descriptors: ranking, voting, and consensus 65. Palczewski K, Kumasaka T, Hori T, Behnke
scoring. J Med Chem 2006;49:15361548. CA, Motoshima H, Fox BA, Le TI, Teller DC,
55. Weinstein DS, Liu W, Gu Z, Langevin C, Ngu Okada T, Stenkamp RE, Yamamoto M, Miyano
K, Fadnis L, Combs DW, Sitkoff D, Ahmad S, M. Crystal structure of rhodopsin: a G protein-
Zhuang S, Chen X, Wang F-L, Loughny DA, coupled receptor. Science 2000;289:739745.
Atwal KS, Zahler R, Macor JE, Madsen CS, 66. Leach AR, Green DV, Hann MM, Judd DB,
Murugesan N. Tryptamine and homotrypta- Good AC. Where are the GaPs? A rational
mine-based sulfonamides as potent and selec- approach to monomer acquisition and selec-
tive inhibitors of 15-lipoxygenase. Bioorg Med tion. J Chem Inf Comput Sci 2000;40:
Chem Lett 2005;15:14351440. 12621269.
56. Van Drie JH, Weininger D, Martin YC. Aladdin: 67. Fischer E. Einfluss der Konfiguration auf die
an integrated tool for computer-assisted mole- Wirkung der Enzyme. Ber Dtsch Chem Ges
cular design and pharmacophore recognition 1894;27:29852993.
from geometric, steric and substructure search- 68. Ehrlich P. Uber den jetzigen stand der che-
ingofthree-dimensional molecular structures.J motherapie. Ber Dtsch Chem Ges 1909;42:
Comput Aided Mol Des 1989;3:225251. 1747.
57. Clement OO, Mehl AT, HipHop: pharmaco- 69. Gund P. Three-dimensional pharmacophoric
phores based on multiple common-feature pattern searching. Prog Mol Subcell Biol
alignments. In: G
uner OF, editors. Pharmaco- 1977;5:117143.
phore Perception, Development, and Use in 70. Berman HM, Westbrook J, Feng Z, Gilliland G,
Drug Design. La Jolla CA: International Uni- Baht TN, Weissig H, Shindyalov IN, Bourne
versity Line; 2000. p 7184. PE. The Protein Data Bank. Nucleic Acids Res
58. Jia L, Zou J, So SS, Sun H. Automated phar- 2000;28:235242.
macophore query optimization with genetic
71. Bernstein FC, Koetzle TF, Williams G, Meyer
algorithms: a case study using the MC4R sys- EF Jr, Brice MD, Rodgers JR, Kennard O, Shi-
tem. J Chem Inf Model 2007;47:15451552. manouchi T, Tasumi M. The Protein Data Bank:
59. Wolber G, Langer T. LigandScout: 3-D phar- a computer based archival file for macromole-
macophores derived from protein-bound li- cular structures. J Mol Biol 1977;112: 535542.
gands and their use as virtual screening filters. 72. Dreyer MK, Borcherding DR, Dumont JA, Peet
J Chem Inf Model 2005;45:160169. NP, Tsay JT, Wright PS, Bitonti AJ, Shen J,
60. Jacob L, Hoffmann B, Stoven V, Vert JP. Vir- Kim S-H. Crystal structure of human cyclin-
tual screening of GPCRs: an in silico chemo- dependent kinase 2 in complex with the ade-
genomics approach. BMC Bioinformatics nine-derived inhibitor H717. J Med Chem
2008;9:363 2001;44:524530.
61. Jaakola VP, Griffith MT, Hanson MA, Cher- 73. Irwin JJ, Shoichet BK. ZINC: a free database of
ezov V, Chien EY, Lane JR, Ijzerman AP, commercially available compounds for virtual

Stevens RC. The 2.6 A crystal structure of a screening. J Chem Inf Model 2005;45:177182.
human A2A adenosine receptor bound to an 74. Milne GWA, Miller JA. The NCI Drug Infor-
antagonist. Science 2008;322:12111217. mation System Parts 1-6. J Chem Inf Comput
62. Scheerer P, Park JH, Hildebrand PW, Kim YJ, Sci 1986;26:154159.
Krauss N, Choe HW, Hofmann KP, Ernst OP.
REFERENCES 37

75. Milne GWA, Nicklaus MC, Driscoll JS, Wang 89. CCG; http://www.chemcomp.com/journal/ph4.
S, Zaharevitz DW. The NCI Drug Information htm. 2008.
System 3D Database. J Chem Inf Comput Sci 90. Martin YC, Bures MG, Danaher EA, DeLazzer
1994;34:12191224. J, Lico I, Pavlik PA. A fast new approach to
76. eMolecules; www.emolecules.com. 2008. pharmacophore mapping and its application to
77. ChemSpider; www.chemspider.com. 2009. dopaminergic and benzodiazepine agonists. J
78. Available Chemicals Directory is available Comput Aided Mol Des 1993;7:83102.
from MDL Information Systems, Inc., San 91. Beusen D,Alignment of angiotensin II receptor
Leandro, CA, and contains specialty bulk che- antagonists using GASP, Tripos Technical
micals from commercial sources; www.mdli. Notes 1996:1.(4).
com. 92. Flo96, Thistlesoft, Colebrook, CT. 1996.
79. Enamine; www.enamine.net. 2009. 93. Golender V, Vesterman B, Eliyahu O, Kardash
80. Weininger D. SMILES, a chemical language A, Kletzin M, Shokhen M, Vorpagel E, 1995.
and information system. 1. Introduction to Knowledge-engineering approach to drug de-
methodology and encoding rules. J Chem Inf sign and its implementation in the Apex-3D
Comput Sci 1988;28:3136. expert system. In: Sanz F, Giraldo J, Manaut
F, editors. QSAR and Molecular Modeling:
81. Weininger D, Weininger A, Weininger JL. Al-
gorithm for generation of unique SMILES no- Concepts, Computational Tools and Biological
tation. J Chem Inf Comput Sci 1989;29: Applications: Proceedings of the 10th Eur-
97101. opean Symposium on Structure-Activity Rela-
tionships, QSAR and Molecular Modeling:
82. Allen FH, Bellard S, Brice MD, Cartwright BA, 1994 September 49; Barcelona, Spain; J. R.
Doubleday A, Higgs H, Hummelink-Peters T, Prous Science Publishers; 1995. p 246251.
Kenard O, Motherwell WDS, Rodgers JR, Wat-
son DG. The Cambridge Crystallographic Data 94. OF.Pharmacophore Perception, Development,
and use in Drug Design. La Jolla, CA: Inter-
Center: computer-based search, retrieval, ana-
lysis and display of information. Acta Cryst national University Line; 2000.
1979;B35:23312339. 95. Millichap JG. Drugs in management of mini-
83. DesJarlais RL, Generation and use of three- mal brain dysfunction. Ann NY Acad Sci
dimensional databases for drug-discovery. In: 1973;205:321334.
Charifson PS, editors. Practical Application of 96. Enyedy IJ, Zaman WA, Sakamuri S, Kozikows-
Computer-Aided Drug Design. New York: ki AP, Johnson KM, Wang S. Pharmacophore-
Marcel Dekker, Inc; 1997. p 73104. based discovery of 3, 4-disubstituted pyrroli-
84. Rusinko A III, Skell JM, Balducci R, McGarity dines as a novel class of monoamine transpor-
ter inhibitors that function as weak cocaine
CM, Pearlman RS,CONCORD, University of
Texas, Austin, TX and Tripos Associates, St. antagonists. Bioorg Med Chem Lett 2001;
Louis, MO. 1988. 11:11131118.
85. Gasteiger J, Rudolph C, Sadowski J. Auto- 97. Froimowitz M, Patrick KS, Cody V. Conforma-
tional analysis of methylphenidate and its
matic generation of 3D-atomic coordinates for
organic molecules. Tetrahedron Comput Meth- structural relationship to other dopamine re-
odol 1990;3:537547. uptake blockers such as CFT. Pharm Res
1995;12:14301434.
86. Barnum D, Greene J, Smellie A, Sprague P.
98. Wang S, Sakamuri S, Enyedy IJ, Kozikowski
Identification of common functional configura-
tions among molecules. J Chem Inf Comput Sci AP, Deschaux O, Bandyopadhyay BC, Tella
1996;36:563571. SR, Zaman WA, Johnson KM. Discovery of a
novel dopamine transporter inhibitor, 4-hy-
87. Sprague PW. Automated chemical hypothesis droxy-1-methyl-4-(4-methylphenyl)-3-piperi-
generation and database searching with Cat- dyl 4-methylphenyl ketone, as a potential co-
alyst. Perspectives in Drug Discovery and De- caine antagonist through 3D-database phar-
sign 1995;3:2133.
macophore searching. Molecular modeling,
88. Dixon SL, Smondyrev AM, Knoll EH, Rao SN, structureactivity relationships, and beha-
Shaw DE, Friesner RA. PHASE: a new engine vioral pharmacological studies. J Med Chem
for pharmacophore perception, 3D QSAR mod- 2000;43:351360.
el development, and 3D database screening: 1.
99. Chem-X, version 96; Oxford Molecular Group,
Methodology and preliminary results. J Com- Inc., Hunt Valley, MD (company exists no long-
put Aided Mol Des 2006;20:647671. er). 2001.
38 VIRTUAL SCREENING

100. Bohm H-J. LUDI: Rule-based automatic design 114. Carhart RE, Smith DH, Venkataraghavan R.
of new substituents for enzyme inhibitor leads. Atom pairs as molecular features in struc-
J Comput Aided Mol Des 1992;6: 593606. tureactivity studies: definition and applica-
101. Goodford PJ. A computational procedure for tions. J Chem Inf Comput Sci 1985;25:6473.
determining energetically favorable binding 115. Cruciani G, Crivori P, carrupt PA, Testa B.
sites on biologically important macromole- Molecular fields in quantitative structure
cules. J Med Chem 1985;28:849857. permeation relationships: the VolSurf ap-
102. Carlson HA, Masukawa KM, Rubins K, Bush- proach. J Mol Struct 2000;503:1730.
man FD, Jorgensen WL, Lins RD, Briggs JM, 116. Todeschini R, Consonni V, Handbook of Mole-
McCammon JA. Developing a dynamic phar- cular Descriptors. Weinheim: Wiley-VCH;
macophore model for HIV-1 integrase. J Med 2000.
Chem 2000;43:21002114. 117. Golbraikh A, Bonchev D, Tropsha A. Novel
103. Markt P, Petersen RK, Flindt EN, Kristiansen chirality descriptors derived from molecular
K, Kirchmair J, Spitzer G, Distinto S, Schuster topology. J Chem Inf Comput Sci 2001;41:
D, Wolber G, Laggner C, Langer T. Discovery 147158.
of novel PPAR ligands by a virtual screening 118. Tropsha A, Golbraikh A. Predictive QSAR
approach based on pharmacophore modeling, modeling workflow, model applicability do-
3D shape, and electrostatic similarity screen- mains, and virtual screening. Curr Pharm Des
ing. J Med Chem 2008;51:63036317. 2007;13:34943504.
104. Catalyst, version 4.11; Accelrys, Inc., San Die- 119. Golbraikh A, Tropsha A. Beware of q2! J Mol
go, CA. 2006. Graph Model 2002;20:269276.
105. UNITY, version 6.6; Tripos Associates, St. 120. Golbraikh A, Gramatica P, Gombar VK. The
Louis, MO; www.tripos.com. importance of being earnest: Validation is the
106. Rush TS III, Grant JA, Mosyak L, Nicholls A. A absolute essential for successful application
shape-based 3-D scaffold hopping method and and interpretation of QSPR models. QSAR
its application to a bacterial proteinprotein Comb Sci 2003;22:6977.
interaction. J Med Chem 2005;48:14891495. 121. Kubinyi H, Hamprecht FA, Mietzner T. Three-
107. Fang X, Wang S. A web-based 3D-database dimensional quantitative similarityactivity
pharmacophore searching tool for drug discov- relationships (3D QSiAR) from SEAL similar-
ery. J Chem Inf Comput Sci 2002;42:192198. ity matrices. J Med Chem 1998;41:25532564.
108. Ihlenfeldt W-D, Voigt JH, Bienfait B, Oellien 122. Schneider G, Nettekoven M. Ligand-based
F, Nicklaus MC. Enhanced CACTVS browser combinatorial design of selective purinergic
of the open NCI database. J Chem Inf Comput receptor (A2A) antagonists using self-organiz-
Sci 2002;42:4657. ing maps. J Comb Chem 2003;5:233237.
109. Nicklaus MC, Wang S, Driscoll JS, Milne 123. Prathipati P, Saxena AK. Evaluation of binary
GWA. Conformational changes of small mole- QSAR models derived from LUDI and MOE
cules binding to proteins. Bioorg Med Chem scoring functions for structure based virtual
1995;3:411428. screening. J Chem Inf Model 2006;46:3951.
110. Perola E, Charifson PS. Conformational ana- 124. Plewczynski D, Spieser SA, Koch U. Assessing
lysis of drug-like molecules bound to proteins: different classification methods for virtual
an extensive study of ligand reorganization screening. J Chem Inf Model 2006;46:
upon binding. J Med Chem 2004;47: 10981106.
24992510. 125. Evers A, Hessler G, Matter H, Klabunde T.
111. Hao MH, Haq O, Muegge I. Torsion angle Virtual screening of biogenic amine-binding G-
preference and energetics of small-molecule protein coupled receptors: comparative eva-
ligands bound to proteins. J Chem Inf Model luation of protein- and ligand-based virtual
2007;47:22422252. screening protocols. J Med Chem 2005;
112. EduSoft. Molconn-Z, version 4.09 is available 48:54485465.
from EduSoft; http://www.edusoft-lc.com/mol- 126. Shen M, Beguin C, Golbraikh A, Stables JP,
conn/. 2006. Kohn H, Tropsha A. Application of predictive
113. Cruciani G, Pastor M, Guba W. VolSurf: a new QSAR models to database mining: identifica-
tool for the pharmacokinetic optimization of tion and experimental validation of novel an-
lead compounds. Eur J Pharm Sci 2000;11 ticonvulsant compounds. J Med Chem 2004;
(Suppl 2): S29S39 47:23562364.
REFERENCES 39

127. Oloff S, Mailman RB, Tropsha A. Application for flexible docking. J Mol Biol 1997;267:
of validated QSAR models of D1 dopaminergic 727748.
antagonists for database mining. J Med Chem 140. Muegge I. The effect of small changes in pro-
2005;48:73227332. tein structure on predicted binding modes of
128. Sheridan RP, Feuston BP, Maiorov VN, Kears- known inhibitors of influenza virus neurami-
ley SK. Similarity to molecules in the training nidase: PMF-scoring in DOCK4. Med Chem
set is a good discriminator for prediction accu- Res 1999;9:490500.
racy in QSAR. J Chem Inf Comput Sci 2004; 141. Goodsell DS, Olson AJ. Automated Docking of
44:19121928. Substrates to proteins by Simulated Annea-
129. Oloff S, Muegge I. kScore: a novel machine liong. Proteins 1990;8:195202.
learning approach that is not dependent on 142. Huey R, Morris GM, Olson AJ, Goodsell DS. A
the data structure of the training set. J Com- semiempirical free energy force field with
put Aided Mol Des 2007;21:8795. charge-based desolvation. J Comput Chem
130. Ghose AK, Crippen GM. Atomic physicochem- 2007;28:11451152.
ical parameters for three-dimensional-struc- 143. Verdonk ML, Cole JC, Hartshorn MJ, Murray
ture-directed quantitative structureactivity CW, Taylor RD. Improved proteinligand
relationships. 2. Modeling dispersive and hy- docking using GOLD. Proteins 2003;52:
drophobic interactions. J Chem Inf Comput Sci 609623.
1987;27:2135. 144. Claussen H, Buning C, Rarey M, Lengauer T.
131. Blaney JM, Dixon JS. A good ligand is hard to FlexE: efficient molecular docking considering
find: automatic docking methods. Perspect protein structure variations. J Mol Biol 2001;
Drug Disc Des 1993;15:301319. 308:377395.
132. Muegge I, Rarey M. Small Molecule Docking 145. Ahmed A, Gohlke H. Multiscale modeling of
and Scoring. In: Boyd DB, Lipkowitz KB, macromolecular conformational changes
editors. Reviews in Computational Chemistry. combining concepts from rigidity and elastic
Vol. 17. New York: Wiley-VCH; 2001. p 160. network theory. Proteins 2006;63:
133. Blundell TL, Sibanda BL, Sternberg MJE, 10381051.
Thornton JM. Knowledge-based prediction of 146. Zavodszky MI, Lei M, Thorpe MF, Day AR,
protein structures and the design of novel Kuhn LA. Modeling correlated main-chain mo-
molecules. Nature 1987;326:347352. tions in proteins for flexible molecular recogni-
134. Vedani A, Zbinden P, Snyder JP, Greenidge tion. Proteins 2004;57:243261.
PA. Pseudoreceptor modeling: the construc- 147. Nabuurs SB, Wagener M, de VJ. A flexible
tion of three-dimensional receptor surrogates. approach to induced fit docking. J Med Chem
J Am Chem Soc 1995;117:49874994. 2007;50:65076518.
135. Tanrikulu Y, Schneider G. Pseudoreceptor 148. Bottegoni G, Kufareva I, Totrov M, Abagyan R.
models in drug design: bridging ligand- and Four-dimensional docking: a fast and accurate
receptor-based virtual screening. Nat Rev account of discrete receptor flexibility in ligand
Drug Discov 2008;7:667677. docking. J Med Chem 2009;52:397406.
136. Teague SJ. Implications of protein flexibility 149. Totrov M, Abagyan R. Flexible ligand docking
for drug discovery. Nat Rev Drug Discov to multiple receptor conformations: a practical
2003;2:527541. alternative. Curr Opin Struct Biol 2008;18:
137. Cozzini P, Kellogg GE, Spyrakis F, Abraham 178184.
DJ, Costantino G, Emerson A, Fanelli F, 150. Bottegoni G, Kufareva I, Totrov M, Abagyan R.
Gohlke H, Kuhn LA, Morris GM, Orozco M, A new method for ligand docking to flexible
Pertinhez TA, Rizzi M, Sotriffer CA. Target receptors by dual alanine scanning and refine-
flexibility: an emerging consideration in drug ment (SCARE). J Comput Aided Mol Des
discovery and design. J Med Chem 2008;51: 2008;22:311325.
62376255. 151. Zentgraf M, Fokkens J, Sotriffer CA. Addres-
138. Cavasotto CN, Kovacs JA, Abagyan RA. Re- sing protein flexibility and ligand selectivity by
presenting receptor flexibility in ligand dock- in situ cross-docking. ChemMedChem
ing through relevant normal modes. J Am 2006;1:13551359.
Chem Soc 2005;127:96329640. 152. Tame JRH. Scoring functions: a view from the
139. Jones G, Willett P, Glen RC, Leach AR. Devel- bench. J Comput Aided Mol Des 1999;13:
opment and validation of a genetic algorithm 99108.
40 VIRTUAL SCREENING

153. Bohm HJ, Stahl M. Rapid empirical scoring Woolven JM, Peishoff CE, Head MS. A critical
functions in virtual screening applications. assessment of docking programs and scoring
Med Chem Res 1999;9:445462. functions. J Med Chem 2006;49: 59125931.
154. Good A. Structure-based virtual screening pro- 168. Wang R, Lu Y, Wang S. Comparative evalua-
tocols. Curr Opin Drug Discov Devel 2001;4: tion of 11 scoring functions for molecular dock-
301307. ing. J Med Chem 2003;46:22872303.
155. Bohm H.-J. Stahl M. The use of scoring func- 169. Kontoyianni M, McClellan LM, Sokol GS. Eva-
tions in drug discovery applicationg. In: Lip- luation of docking performance: comparative
kowitz KB, Boyd DB, editors. Reviews in Com- data on docking algorithms. J Med Chem 2004;
putational Chemistry. Vol.18.Hoboken, NJ: 47:558565.
Wiley-VCH; 2002. p 4187 170. Liebeschuetz JW. Evaluating docking pro-
156. Kollman P. Free energy calculations-applica- grams: keeping the playing field level. J Com-
tions to chemical and biological phenomena. put Aided Mol Des 2008;22:229238.
Chem Rev 1993;7:23952417. 171. Moitessier N, Englebienne P, Lee D, Lawandi
157. Bissantz C, Folkers G, Rognan D. Protein- J, Corbeil CR. Towards the development of
based virtual screening of chemical databases. universal, fast and highly accurate docking/
1. Evaluation of different docking/scoring com- scoring methods: a long way to go. Br J Phar-
binations. J Med Chem 2000;43:47594767. macol 2008;153(Suppl 1): S7S26
158. Ferrara P, Gohlke H, Price DJ, Klebe G, 172. Holloway MK, McGaughey GB, Coburn CA,
Brooks CL III. Assessing scoring functions for Stachel SJ, Jones KG, Stanton EL, Gregro AR,
proteinligand interactions. J Med Chem Lai MT, Crouthamel MC, Pietrak BL, Munshi
2004;47:30323047. SK. Evaluating scoring functions for docking
159. Ha S, Andreani R, Robbins A, Muegge I. Eva- and designing beta-secretase inhibitors.
luation of docking/scoring approaches: a com- Bioorg Med Chem Lett 2007;17:823827.
parative study based on MMP-3 inhibitors. J 173. Muegge I, Enyedy I. Virtual screening of kinase
Comput Aided Mol Des 2000;14:435448. targets. Curr Med Chem 2004;11: 691705.
160. Kellenberger E, Rodrigo J, Muller P, Rognan 174. Graves AP, Brenk R, Shoichet BK. Decoys for
D. Comparative evaluation of eight docking docking. J Med Chem 2005;48:37143728.
tools for docking and virtual screening accu- 175. Jansen JM, Martin EJ. Target-biased scoring
racy. Proteins 2004;57:225242. approaches and expert systems in structure-
161. Knegtel RMA, Wagener M. Efficacy and selec- based virtual screening. Curr Opin Chem Biol
tivity in flexible database docking. Proteins 2004;8:359364.
1999;37:334345. 176. Polgar T, Keseru GM. Virtual screening for
162. Kontoyianni M, Sokol GS, McClellan LM. Eva- beta-secretase (BACE1) inhibitors reveals the
luation of library ranking efficacy in virtual importance of protonation states at Asp32 and
screening. J Comput Chem 2005;26:1122. Asp228. J Med Chem 2005;48:37493755.
163. Perola E, Walters WP, Charifson PS. A de- 177. Rarey M, Kramer B, Lengauer T, Klebe G. A fast
tailed comparison of current docking and scor- flexible docking method using an incremental
ing methods on systems of pharmaceutical construction algorithm. J Mol Biol 1996;261:
relevance. Proteins 2004;56:235249. 470489.
164. Schulz-Gasch T, Stahl M. Binding site char- 178. Hindle SA, Rarey M, Buning C, Lengaue T.
acteristics in structure-based virtual screen- Flexible docking under pharmacophore type
ing: evaluation of current docking tools. J Mol constraints. J Comput Aided Mol Des 2002;
Model 2003;9:4757. 16:129149.
165. Stahl M, Rarey M. Detailed analysis of scoring 179. Blaney JD, Dixon JS. Distance geometry in
functions for virtual screening. J Med Chem molecular modeling. In: Lipkowitz KB, Boyd
2001;44:10351042. DB, editors. Reviews in Computational Chem-
166. Kontoyianni M, Madhav P, Suchanek E, Seibel istry. New York: Wiley-VCH; 1994. p 299335
W. Theoretical and practical considerations in 180. Nauchatel V, Villaverde MC, Sussman F. Sol-
virtual screening: a beaten field?. Curr Med vent accessibility as a predictive tool for the
Chem 2008;15:107116. free energy of inhibitor binding to the HIV-1
167. Warren GL, Andrews CW, Capelli AM, Clarke protease. Protein Sci 1995;4:13561364.
B, LaLonde J, Lambert MH, Lindvall M, Nevins 181. Ferrari AM, Wei BQ, Costantino L, Shoichet
N, Semus SF, Senger S, Tedesco G, Wall ID, BK. Soft docking and multiple receptor con-
REFERENCES 41

formations in virtual screening. J Med Chem 195. Murray CW, Auton TR, Eldridge MD. Empiri-
2004;47:50765084. cal scoring functions. II. The testing of an
182. Vieth M, Hirst JD, Kolinski A, Brooks CL III. empirical scoring function for the prediction
Assessing energy functions for flexible dock- of ligandreceptor binding affinities and the
ing. J Comput Chem 1998;14:16121622. use of Bayesian regression to improve the
183. Liu M, Wang SM. MCDOCK: a Monte Carlo sim- quality of the model. J Comput Aided Mol Des
ulation approach to the molecular docking pro- 1998;12:503519.
blem. J Comput Aided Mol Des 1999;13:435451. 196. Stroganov OV, Novikov FN, Stroylov VS,
184. Majeux N, Scarsi M, Apostolakis J, Ehrhardt Kulkov V, Chilov GG. Lead finder: an ap-
C, Caflisch A. Exhaustive docking of molecular proach to improve accuracy of proteinligand
fragments with electrostatic solvation. Pro- docking, binding energy estimation, and vir-
tual screening. J Chem Inf Model 2008;48
teins 1999;37:88105.
(12): 23712385.
185. Qui D, Shenkin PS, Hollinger EP, Still WC.
The GB/SA contunuum model for salvation. A 197. DeWitte RS, Shakhnovich EI. SMoG: de novo
fast analytical method for the calculation of design method based on simple, fast, and ac-
curate free energy estimates. 1. Methodology
approximate Born radii. J Phys Chem 1997;
101:30053014. and supporting evidence. J Am Chem Soc
1996;118:1173311744.
186. Shoichet BK, Leach AR, Kuntz ID. Ligand
solvation in molecular docking. Proteins 198. Muegge I, Martin YC. A general and fast scor-
1999;34:416. ing function for proteinligand interactions: a
simplified potential approach. J Med Chem
187. Wang J, Wang W, Huo S, Lee M, Kollman PA. 1999;42:791804.
Solvation model based on weighted solvent
accessible surface area. J Phys Chem B 199. Gohlke H, Hendlich M, Klebe G. Knowledge-
2001;105:50555067. based scoring function to predict proteinli-
gand interactions. J Mol Biol 2000;295:
188. Zou X, Sun Y, Kuntz ID. Inclusion of solvation 337356.
in ligand binding free energy calculations
using the generalized-Born model. J Am Chem 200. Mitchell JBO, Laskowski RA, Alex A, Thorn-
Soc 1999;121:80338043. ton JM. BLEEP-potential of mean force de-
scribing proteinligand interactions. I. Gener-
189. Hansch C, Anderson SM. The structureactiv- ating potential. J Comput Chem 1999;20:
ity relationship in barbiturates and its simi- 11651176.
larity to that in other narcotics. J Med Chem
1967;10:745753. 201. Gohlke H, Klebe G. Statistical potentials and
scoring functions applied to proteinligand
190. OpenEye; http://www.eyesopen.com/fred.html. binding. Curr Opin Struct Biol 2001;11:
191. Bohm H-J. The development of a simple 231235.
empirical scoring function to estimate
202. Muegge I. A knowledge-based scoring function
the binding constant for a proteinligand
for proteinligand interactions: probing the
complex of known three-dimensional struc- reference state. Perspect Drug Disc Des
ture. J Comput Aided Mol Design 1994;8: 2000;20:99114.
243256.
203. Muegge I. Effect of ligand volume correction on
192. Eldridge MD, Murray CW, Auton TR, Paolini
PMF-scoring. J Comput Chem 2001;22:
GV, Mee RP. Empirical scoring functions. I. 418425.
The development of a fast empirical scoring
function to estimate the binding affinity of 204. Muegge I. PMF scoring revisited. J Med Chem
ligands in receptor complexes. J Comput Aided 2006;49:58955902.
Mol Des 1997;11:425445. 205. Muegge I, Martin YC, Hajduk PJ, Fesik SW.
193. Head RD, Smythe ML, Oprea TL, Waller CL, Evaluation of PMF scoring in docking weak
ligands to the FK506 binding protein. J Med
Green SM, Marshall GM. VALIDATE: a new
method for the receptor-based prediction of Chem 1999;42:24982503.
binding affinites of novel ligands. J Am Chem 206. Kraemer O, Hazemann I, Podjarny AD, Klebe
Soc 1996;118:39593969. G. Virtual screening for inhibitors of human
aldose reductase. Proteins 2004;55:814823.
194. Jain AN. Scoring noncovalent proteinligand
interactions: a continuous differentiable func- 207. Stahl M, Rarey M. Detailed analysis of scoring
tion tuned to compute binding affinities. J functions for virtual screening. J Med Chem
Comput Aided Mol Des 1996;10:427440. 2001;44:10351042.
42 VIRTUAL SCREENING

208. Wang R, Liu L, Lai L, Tang Y. SCORE: a new ing affinity of a proteinligand complex. J Mol
empirical method for estimating the binding Model 1998;4:379394.
affinity of a proteinligand complex. J Mol 221. Bissantz C, Folkers G, Rognan D. Protein-
Model 1998;4:379394. based virtual screening of chemical databases,
209. Bohacek RS, McMartin C. Definition and dis- 1. Evaluation of different docking/scoring com-
play of steric, hydrophobic, and hydrogen-bond- binations. J Med Chem 2000;43:47594767.
ing properties of ligand binding sites in proteins 222. Kontoyianni M, Sokol GS, McClellan LM. Eva-
using Lee and Richards accessible surface: va- luation of library ranking efficacy in virtual
lidation of a high-resolution graphical tool for screening. J Comput Chem 2005;26:1122.
drug design. J Med Chem 1992; 35:16711684. 223. Verdonk ML, Berdini V, Hartshorn MJ, Mooij
210. Stouch TR, Jurs PC. A simple method for the WT, Murray CW, Taylor RD, Watson P. Vir-
representation, quantification, and comparison tual screening using proteinligand docking:
of the volumes and shapes of chemical com- avoiding artificial enrichment. J Chem Inf
pounds. J Chem Inf Comput Sci 1986;26: 412. Comput Sci 2004;44:793806.
211. Gehlhaar DK, Verkhivker GM, Rejto PA, Sher- 224. Charifson PS, Corkery JJ, Murcko MA, Wal-
man CJ, Fogel DB, Fogel LJ, Freer ST. Mole- ters WP. Consensus scoring: a method for ob-
cular recognition of the inhibitor AG-1343 by taining improved hit rates from docking data-
HIV-1 protease: conformationally flexible bases of three-dimensional structures into pro-
docking by evolutionary programming. Chem teins. J Med Chem 1999;42:51005109.
Biol 1995;2:317324. 225. Halgren TA. Merck molecular force field. II.
212. Gehlhaar DK, Bouzida D, Rejto PA. Reduced MMFF94 van der Waals and electrostatic
dimensionality in ligandprotein structure parameters for intermolecular interactions. J
prediction: covalent inhibitors of serine pro- Comput Chem 1996;17:520552.
teases and design of site-directed combinator- 226. Miller MD, Kearsley SK, Underwood DJ, Sher-
ial libraries. In: Parrill L. Reddy RM. editors.
idan RP. FLOG: a system to select quasi-
ACS Symposium Series 719. Washington, DC: flexible ligands complementary to a receptor
American Chemical Society; 1999. p 292311 of known three-dimensional structure. J Comp
213. Muegge I. PMF scoring revisited. J Med Chem Aided Mol Des 1994;8:153174.
2006;49:58955902.
227. Honig B, Nicholls A. Classical electrostatics in
214. Feher M, Deretey E, Roy S. BHB: a simple biology and chemistry. Science 1995;268:
knowledge-based scoring function to improve 11441149.
the efficiency of database screening. J Chem 228. Flower DR. SERF: a program for accessible
Inf Comput Sci 2003;43:13161327.
surface area calculations. J Mol Graph Model
215. Stahl M, Bohm HJ. Development of filter func- 1998;15:238244.
tions for proteinligand docking. J Mol Graph 229. Tripos; www.tripos.com.
Model 1998;16:121132.
230. Vigers GP, Rizzi JP. Multiple active site cor-
216. Jones G, Willett P. Docking small-molecule rections for docking and virtual screening. J
ligands into active sites. Curr Opin Biotechnol
Med Chem 2004;47:8089.
1995;6:652656.
231. Bradley MN, Hong C, Chen M, Joseph SB,
217. Halgren TA, Murphy RB, Friesner RA, Beard Wilpitz DC, Wang X, Lusis AJ, Collins A,
HS, Frye LL, Pollard WT, Banks JL. Glide: a Hseuh WA, Collins JL, Tangirala RK, Tonto-
new approach for rapid, accurate docking and
noz P. Ligand activation of LXR beta reverses
scoring, 2. Enrichment factors in database atherosclerosis and cellular cholesterol over-
screening. J Med Chem 2004;47:17501759. load in mice lacking LXR alpha and apoE. J
218. Schrodinger, LLC; technical documentation; Clin Invest 2007;117:23372346.
http://www.schrodinger.com/. 2005. 232. Cheng JF, Zapf J, Takedomi K, Fukushima C,
219. Rognan D, Lauemoller SL, Holm A, Buus S, Ogiku T, Zhang SH, Yang G, Sakurai N,
Tschinke V. Predicting binding affinities of Barbosa M, Jack R, Xu K. Combination of
protein ligands from three-dimensional mod- virtual screening and high throughput gene
els: application to peptide binding to class I profiling for identification of novel liver X
major histocompatibility proteins. J Med Chem receptor modulators. J Med Chem 2008;51:
1999;42:46504658. 20572061.
220. Wang RX, Liu L, Lai LH, Tang YQ. SCORE: a 233. Gehlhaar DK, Verkhivker GM, Rejto PA, Fo-
new empirical method for estimating the bind- gel DB, Fogel LJ, Freer ST. Docking confor-
REFERENCES 43

mationally flexible small molecules into ity. J Pharmacol Toxicol Methods 2000;44:
a protein binding site through evolutionary 235249.
programming. In: McDonnell JR, Reynolds 248. Lipinski A. Lead- and drug-like compounds:
RG, Fogel DB, editors. Proceedings of the rule-of-five revolution. Drug Discov Today
the Fourth Annual Conference on Evolution- Technol 2004;1:337341.
ary Programming. MIT PRESS; 1995. p 249. Andrews PR, Craik DJ, Martin JL. Func-
615627 tional group contributions to drugreceptor
234. Carr RA, Congreve M, Murray CW, Rees DC. interactions. J Med Chem 1984;27:
Fragment-based lead discovery: leads by de- 16481657.
sign. Drug Discov Today 2005;10:987992. 250. Kuntz ID, Chen K, Sharp KA, Kollman PA. The
235. Rummey C, Nordhoff S, Thiemann M, Metz G. maximal affinity of ligands. Proc Natl Acad Sci
In silico fragment-based discovery of DPP-IV U S A 8-31- 1999;96:999710002.
S1 pocket binders. Bioorg Med Chem Lett
251. Hopkins AL, Groom CR, Alex A. Ligand effi-
2006;16:14051409. ciency: a useful metric for lead selection. Drug
236. Rarey M, Kramer B, Lengauer T, Klebe G. A Discov Today 2004;9:430431.
fast flexible docking method using an incre- 252. Bleicher KH, Bohm HJ, Muller K, Alanine AI.
mental construction algorithm. J Mol Biol
Hit and lead generation: beyond high-through-
1996;261:470489. put screening. Nat Rev Drug Discov 2003;2:
237. Makara GM. On sampling of fragment space. J 369378.
Med Chem 2007;50:32143221. 253. Wunberg T, Hendrix M, Hillisch A, Lobell M,
238. Oblak M, Grdadolnik SG, Kotnik M, Jerala R, Meier H, Schmeck C, Wild H, Hinzen B.
Filipic M, Solmajer T. In silico fragment-based Improving the hit-to-lead process: data-dri-
discovery of indolin-2-one analogues as potent ven assessment of drug-like and lead-like
DNA gyrase inhibitors. Bioorg Med Chem Lett screening hits. Drug Discov Today 2006;11:
2005;15:52075210. 175180.
239. Muegge I. Selection criteria for drug-like com- 254. Carr RA, Congreve M, Murray CW, Rees
pounds. Med Res Rev 2003;23:302321. DC. Fragment-based lead discovery: leads
240. Cele AZ, Metz JT. Ligand efficiency indices as by design. Drug Discov Today 2005;10:
guideposts for drug discovery. Drug Discov 987992.
Today 2005;10:464469. 255. Reynolds CH, Bembenek SD, Tounge BA. The
241. Hopkins AL, Groom CR, Alex A. Ligand effi- role of molecular size in ligand efficiency.
ciency: a useful metric for lead selection. Drug Bioorg Med Chem Lett 2007;17:42584261.
Discov Today 2004;9:430431. 256. Reynolds CH, Tounge BA, Bembenek SD. Li-
242. Martin YC. Diverse Viewpoints on computa- gand binding efficiency: trends, physical basis,
tional aspects of molecular diversity. J Comb and implications. J Med Chem 2008;51:
Chem 2001;3:120. 24322438.
243. Lobell M, Hendrix M, Hinzen B, Keldenich J, 257. Posner BA. High-throughput screening-driven
Meier H, Schmeck C, Schohe-Loop R, Wunberg lead discovery: meeting the challenges of find-
T, Hillisch A. In silico ADMET traffic lights as ing new therapeutics. Curr Opin Drug Discov
a tool for the prioritization of HTS hits. Chem- Devel 2005;8:487494.
MedChem 2006;1:12291236. 258. Pereira DA, Williams JA. Origin and evolution
244. Proudfoot JR. Drugs, leads, and drug-likeness: of high throughput screening. Br J Pharmacol
an analysis of some recently launched drugs. 2007;152:5361.
Bioorg Med Chem Lett 2002;12:16471650. 259. Walters WP, Namchuk M. Designing screens:
245. Oprea TI, Davis AM, Teague SJ, Leeson PD. Is how to make your hits a hit. Nat Rev Drug
there a difference between leads and drugs? A Discov 2003;2:259266.
historical perspective. J Chem Inf Comput Sci 260. McGovern SL, Caselli E, Grigorieff N, Shoichet
2001;41:13081315. BK. A common mechanism underlying promis-
246. Teague SJ, Davis AM, Leeson PD, Oprea T. cuous inhibitors from virtual and high-
The design of leadlike combinatorial li- throughput screening. J Med Chem 2002;45:
braries. Angew Chem Int Ed Engl 1999;38: 17121722.
37433748. 261. McGovern SL, Helfand BT, Feng B, Shoichet
247. Lipinski CA. Drug-like properties and the BK. A specific mechanism of nonspecific inhi-
causes of poor solubility and poor permeabil- bition. J Med Chem 2003;46:42654272.
44 VIRTUAL SCREENING

262. Seidler J, McGovern SL, Doman TN, Shoi- Center for Biotechnology Information. Nucleic
chet BK. Identification and prediction of pro- Acids Res 2008;36:D13D21
miscuous aggregating inhibitors among 275. PubChem; http://pubchem.ncbi.nlm.nih.gov/.
known drugs. J Med Chem 2003;46:
276. ChemIDplus; http://chem.sis.nlm.nih.gov/che-
44774486. midplus/.
263. Feng BY, Shoichet BK. A detergent-based as- 277. DiscoveryGate; www.discoverygate.com.
say for the detection of promiscuous inhibitors.
Nat Protoc 2006;1:550553. 278. Schneider G, Hartenfeller M, Reutlinger M,
Tanrikulu Y, Proschak E, Schneider P.
264. Feng BY, Shelat A, Doman TN, Guy RK, Shoi-
Voyages to the (un)known: adaptive design of
chet BK. High-throughput assays for promiscu- bioactive compounds. Trends Biotechnol 2009;
ous inhibitors. Nat Chem Biol 2005;1: 146148. 27:1826.
265. Ryan AJ, Gray NM, Lowe PN, Chung CW. 279. Ertl P, Jelfs S, Muhlbacher J, Schuffenhauer
Effect of detergent on promiscuous inhibi-
A, Selzer P. Quest for the rings. In silico ex-
tors. J Med Chem 2003;46:34483451. ploration of ring universe to identify novel
266. Goode DR, Totten RK, Heeres JT, Hergenr- bioactive heteroaromatic scaffolds. J Med
other PJ. Identification of promiscuous small Chem 2006;49:45684573.
molecule activators in high-throughput en-
280. Cheeseright TJ, Mackey MD, Melville JL,
zyme activation screens. J Med Chem 2008; Vinter JG. FieldScreen: virtual screening
51:23462349. using molecular fields. Application to the
267. Kubinyi H. Chemogenomics in drug discovery. DUD data set. J Chem Inf Model 2008;48
Ernst Schering Res Found Workshop 2006;58: (11): 21082117.
119.
281. Evers A, Klebe G. Successful virtual screening
268. Bywater RP. Privileged structures in GPCRs. for a submicromolar antagonist of the neuro-
Ernst Schering Found Symp Proc Vol. 2. 2006; kinin-1 receptor based on a ligand-supported
7591. homology model. J Med Chem 2004;
269. Costantino L, Barlocco D. Privileged struc- 47:53815392.
tures as leads in medicinal chemistry. Curr 282. Nikitin S, Zaitseva N, Demina O, Solovieva V,
Med Chem 2006;13:6585. Mazin E, Mikhalev S, Smolov M, Rubinov A,
270. DeSimone RW, Currie KS, Mitchell SA, Dar- Vlasov P, Lepikhin D, Khachko D, Fokin V,
row JW, Pippin DA. Privileged structures: Queen C, Zosimov V. A very large diversity
applications in drug discovery. Comb Chem space of synthetically accessible compounds
High Throughput Screen 2004;7:473494. for use with drug design programs. J Comput
271. Muller G. Medicinal chemistry of target Aided Mol Des 2005;19:4763.
family-directed masterkeys. Drug Discov 283. Boehm M, Wu TY, Claussen H, Lemmen C.
Today 2003;8:681691. Similarity searching and scaffold hopping
272. McGovern SL, Shoichet BK. Kinase inhibitors: in synthetically accessible combinatorial
not just for kinases anymore. J Med Chem chemistry spaces. J Med Chem 2008;51:
2003;46:14781483. 24682480.
273. Zhou Y, Zhou B, Chen K, Yan SF, King FJ, 284. Rarey M, Dixon JS. Feature trees: a new mo-
Jiang S, Winzeler EA. Large-scale annota- lecular similarity measure based on tree
tion of small-molecule libraries using public matching. J Comput Aided Mol Des 1998;12:
databases. J Chem Inf Model 2007;47: 471490.
13861394. 285. Lewell XQ, Judd DB, Watson SP, Hann MM.
274. Wheeler DL, Barrett T, Benson DA, Bryant RECAP-retrosynthetic combinatorial analy-
SH, Canese K, Chetvernin V, Church DM, sis procedure: a powerful new technique for
Dicuccio M, Edgar R, Federhen S, Feolo M, identifying privileged molecular fragments
Geer LY, Helmberg W, Kapustin Y, Khovayko with useful applications in combinatorial
O, Landsman D, Lipman DJ, Madden TL, chemistry. J Chem Inf Comput Sci 1998;38:
Maglott DR, Miller V, Ostell J, Pruitt KD, 511522.
Schuler GD, Shumway M, Sequeira E, Sherry 286. www.simbiosys.com. 2008.
ST, Sirotkin K, Souvorov A, Starchenko G, 287. Evans BE, Rittle KE, Bock MG, DiPardo RM,
Tatusov RL, Tatusova TA, Wagner L, Yaschen- Freidinger RM, Whitter WL, Lundell GF, Ve-
ko E. Database resources of the National ber DF, Anderson PS, Chang RSL, Lotti VJ,
REFERENCES 45

Cerino DJ, Chen TB, Kling PJ, Kunkel KA, 298. Nair PC, Sobhia ME. Fingerprint directed
Springer JP, Hirshfield J. Methods for drug scaffold hopping for identification of CCR2
discovery: development of potent, selective, antagonists. J Chem Inf Model 2008;48:
orally effective cholecystokinin antagonists. 18911902.
J Med Chem 1988;31:22352246. 299. Kellenberger E, Springael JY, Parmentier M,
288. Hajduk PJ, Bures M, Praestgaard J, Fesik SW. Hachet-Haas M, Galzi JL, Rognan D. Identi-
Privileged molecules for protein binding iden- fication of nonpeptide CCR5 receptor agonists
tified from NMR-based screening. J Med Chem by structure-based virtual screening. J Med
2000;43:34433447. Chem 2007;50:12941303.
289. Muegge I, Enyedy IJ. Virtual screening for 300. Markt P, McGoohan C, Walker B, Kirchmair J,
kinase targets. Curr Med Chem 2004;11: Feldmann C, De MG, Spitzer G, Distinto S,
693707. Schuster D, Wolber G, Laggner C, Langer T.
290. Tikhonova IG, Sum CS, Neumann S, Engel S, Discovery of novel cathepsin S inhibitors by
Raaka BM, Costanzi S, Gershengorn MC. Dis- pharmacophore-based virtual high-through-
covery of novel agonists and antagonists of the put screening. J Chem Inf Model 2008;48:
free fatty acid receptor 1 (FFAR1) using virtual 16931705.
screening. J Med Chem 2008;51:625633. 301. Tsai KC, Chen SY, Liang PH, Lu IL, Mahin-
291. Irwin JJ, Shoichet BK. ZINC: a free database droo N, Hsieh HP, Chao YS, Liu L, Liu D, Lien
of commercially available compounds for vir- W, Lin TH, Wu SY. Discovery of a novel family
tual screening. J Chem Inf Model 2005;45: of SARS-CoV protease inhibitors by virtual
177182. screening and 3D-QSAR studies. J Med Chem
292. Tikhonova IG, Sum CS, Neumann S, Thomas 2006;49:34853495.
CJ, Raaka BM, Costanzi S, Gershengorn MC. 302. Pirard B, Brendel J, Peukert S. The discovery
Bidirectional, iterative approach to the struc- of Kv1, 5 blockers as a case study for the
tural delineation of the functional chemoprint application of virtual screening approaches.
in GPR40 for agonist recognition. J Med Chem J Chem Inf Model 2005;45:477485.
2007;50:29812989. 303. Agrawal H, Kumar A, Bal NC, Siddiqi MI,
293. Garrido DM, Corbett DF, Dwornik KA, Goetz Arora A. Ligand based virtual screening and
AS, Littleton TR, McKeown SC, Mills WY, biological evaluation of inhibitors of choris-
Smalley TL Jr, Briscoe CP, Peat AJ. Synthesis mate mutase (Rv1885c) from Mycobacterium
and activity of small molecule GPR40 agonists. tuberculosis H37Rv. Bioorg Med Chem Lett
Bioorg Med Chem Lett 2006;16:18401845. 2007;17:30533058.
294. Hu X, Prehna G, Stebbins CE. Targeting pla- 304. Ward RA, Perkins TD, Stafford J. Structure-
gue virulence factors: a combined machine based virtual screening for low molecular
learning method and multiple conformational weight chemical starting points for dipeptidyl
virtual screening for the discovery of Yersinia peptidase IV inhibitors. J Med Chem 2005;48:
protein kinase A inhibitors. J Med Chem 69916996.
2007;50:39803983. 305. Franke L, Byvatov E, Werz O, Steinhilber D,
295. Kumar A, Chaturvedi V, Bhatnagar S, Sinha Schneider P, Schneider G. Extraction and vi-
S, Siddiqi MI. Knowledge based identification sualization of potential pharmacophore points
of potent antitubercular compounds using using support vector machines: application to
structure based virtual screening and struc- ligand-based virtual screening for COX-2 in-
ture interaction fingerprints. J Chem Inf hibitors. J Med Chem 2005;48:69977004.
Model 2009;49:3542. 306. Zhang Z, Zheng M, Du L, Shen J, Luo X, Zhu W,
296. Chen X, Wilson LJ, Malaviya R, Argentieri RL, Jiang H. Towards discovering dual functional
Yang SM. Virtual screening to successfully inhibitors against both wild type and K103N
identify novel Janus kinase 3 inhibitors: a mutant HIV-1 reverse transcriptases: molecu-
sequential focused screening approach. J Med lar docking and QSAR studies on 4, 1-benzox-
Chem 2008;51:70157019. azepinone analogues. J Comput Aided Mol Des
297. Montes M, Braud E, Miteva MA, Goddard ML, 2006;20:281293.
Mondesert O, Kolb S, Brun MP, Ducommun B, 307. Cherkasov A, Shi Z, Fallahi M, Hammond G.
Garbay C, Villoutreix BO. Receptor-based vir- Successful in silico discovery of novel non-ster-
tual ligand screening for the identification of oidal ligands for human sex hormone binding
novel CDC25 phosphatase inhibitors. J Chem globulin (SHBG). J Med Chem 2005;48:
Inf Model 2008;48:157165. 32033213.
46 VIRTUAL SCREENING

308. Steindl T, Crump CE, Hayden FG, Langer T. 309. Jacobsson M, Garedal M, Schultz J, Karlen A.
Pharmacophore modeling, docking, and prin- Identification of Plasmodium falciparum sper-
cipal component analysis based clustering: midine synthase active site binders through
combined computer-assisted approaches to structure-based virtual screening. J Med
identify new inhibitors of the human thino- Chem 2008;51:27772786.
virus coat protein. J Med Chem 2005;48:
62506260.

You might also like