You are on page 1of 52

Indiana University School of

David Wild Research Overview July 2006. Page 1


Introducing Chemoinformatics
Gary Wiggins, David Wild
Indiana University School of Informatics
BCCE Chemoinformatics Workshop Jly !""#
Indiana University School of
David Wild Research Overview July 2006. Page 2
Chemoinformatics is
$
%lso kno&n as cheminformatics or chemical
informatics
$
'ery differently defined, reflecting its
cross(disciplinary natre
) *i+rarian
) Chemist ,synthetic, medicinal, theoretical-
) Biologist . Bioinformatician
) /oleclar modeler
) 0harmacetical or Chemical Engineer
) Compter Scientist . Informatician
Indiana University School of
David Wild Research Overview July 2006. Page 3
A working defnition of chemoinformatics
Chemoinformatics ,a1k1a1 chemical informatics- is the
+ranch of informatics dealing &ith all aspects of
the representation and se of chemical strctres,
proteins, and related information, on compter1
2 It is an interdisciplinary field of that reglarly
pshes the +ondaries of compter science,
statistics, visali3ation methods, compting po&er
and scientific techni4e1 5he s+6ect covers a &ide
variety of applications and specialties,
particlarly in the pharmacetical indstry, &here
the rapid increase in ne& technologies in drg
discovery pts chemoinformatics at the forefront of
drg design1 It is fondational to sch diverse
applications as 7D moleclar modeling, artificial
intelligence +iological activity prediction methods,
patent and chemical data+ase searching, and high
throghpt screening data analysis1
Indiana University School of
David Wild Research Overview July 2006. Page 4
More defnitions
$
Computational Chemistry ) 5he application of
mathematical and comptational methods to
particlarly to theoretical chemistry
$
Molecular Modeling ) Using 7D graphics and
optimi3ation techni4es to help nderstand
the natre and action of componds and
proteins
$
Computer-Aided Drug Design ) 5he discipline
of sing comptational techni4es ,inclding
chemical informatics- to assist in the
discovery and design of drgs1
Indiana University School of
David Wild Research Overview July 2006. Page 5
Chemoinformatics hits on Google
Number of word occurrences on Google, Taken from http://www.molinspiration.com/chemoinformatics.html
July 2000
723
April 2005
125,600
Dec 2005
348,100
Indiana University School of
David Wild Research Overview July 2006. Page 6
Hits on Chemoinf.com, August ! " #$, #%%! &sitemeter.com'
Indiana University School of
David Wild Research Overview July 2006. Page 7
Traditional areas of application
$
0harmacetical 8 life science indstry
)
particlarly in early stage drg design
$
Data+ases of availa+le chemicals
$
Electronic p+lishing
)
inclding searcha+le chemical strctre
information in 6ornals, etc1
$
Government and patent data+ases
Indiana University School of
David Wild Research Overview July 2006. Page 8
The theory so far (1!"#s to present$
$
9o& do yo represent !D and 7D chemical
strctres:
) ;ot 6st a pretty pictre
$
9o& do yo search data+ases of chemical
strctres:
) Google doesn<t help ,mch, +t it might do soon2-
$
9o& do yo organi3e large amonts of chemical
information:
$
9o& do yo visali3e chemical strctres 8
proteins:
$
Can compters predict ho& chemicals are going
to +ehave
) 2 in the test t+e:
) 2 in the +ody:
Indiana University School of
David Wild Research Overview July 2006. Page 9
Current trends % hot topics
$
5he move of chemical informatics into the
p+lic domain ,0+Chem, /*I, eScience, open
sorce-
$
Service(oriented architectres
$
0ackaging 8 processing large volmes of
comple= information for hman consmption
$
Integration &ith other ics ,+ioinformatics,
genomics, proteomics, systems +iology-
Indiana University School of
David Wild Research Overview July 2006. Page 10
&hat does it mean for the 'ench chemist(
$
%n increasing nm+er of &e+ tools and
data+ases availa+le &hich can aid in compond
ac4isition, synthesis, and +iological
profiling
$
% trend to&ards more ,and more effective- se
of compters in the la+ ( not 6st for email
$
% need for most synthetic chemists ,and all
medicinal chemists- to +e a&are of
comptational techni4es and ho& they can
assist in the compond synthesis and drg
discovery processes
$
%n opportnity to com+ine an interest in
chemistry &ith an interest in compters
Indiana University School of
David Wild Research Overview July 2006. Page 11
Chemical Informatics )rograms at I*
$
Gradate Certificate in Chemical Informatics
) I>?@ Chemical Information 5echnology
) I>?! Comptational Chemistry 8 /oleclar /odeling
) I>?7 0rogramming for Chemical and *ife Science
Informatics
) Independent Stdy in Chemical Informatics
$
/1Sc1 in Chemical Informatics
$
0h1D1 in Informatics ,Chemical Informatics
5rack-
Indiana University School of
David Wild Research Overview July 2006. Page 12
Chemoinformatics software +endors
Accelrys-Large chemoinformatics company
ACD/Labs - analytical informatics & predictions
Digital Chemistry - 2D fingerprinting, clustering toolkits &
softare
Cambridgesoft - 2D draing tools & !-notebooks
CA" - produce "cifinder "cholar searching softare
ChemA#on - $a%a based toolkits and softare
Daylight - 2D representation & searching softare
Leadscope - 2D structure and property tools
Lion &ioscience - produce Lead'a%igator
(DL - Large chemoinformatics company
(esa Analytics and Computing - !ducational & "tatistical tools
)peneye - *ast +D docking, structure generation, toolkits
,uantum -harmaceuticals - prediction, docking, screening
"age .nformatics - Chem/0 2D analysis softare
/ripos - Large chemoinformatics company
Indiana University School of
David Wild Research Overview July 2006. Page 13
Main academic sites
$ A0reB Chemoinformatics
) University of Sheffield, UC ,Willett . Gillet-
$ httpD..&&&1shef1ac1k.ni.academic.I(/.is.research.cirg1html
) Erlangen, Germany ,Gasteiger-
$ httpD..&&&!1chemie1ni(erlangen1de.
) Cam+ridge Unilever Center
$ httpD..&&&(cc1ch1cam1ac1k.
) Indiana University School of Informatics
$ httpD..&&&1informatics1indiana1ed.
$ Eelated ,comptational chemistry, etc1-
) UCSF ,Cnt3-
$ httpD..mdi1csf1ed.
) University of 5e=as ,0earlman-
$ httpD..&&&1te=as1ed.pharmacy.divisions.pharmacetics.faclty.pearlman1html
) Gale ,Jorgensen-
$ httpD..3ar+i1chem1yale1ed.
) University of /ichigan ,Crippen-
$ httpD..&&&1mich1ed.Hpharmacy./edChem.faclty.crippen.
Indiana University School of
David Wild Research Overview July 2006. Page 14
,Traditional- .ournals
$ Jornal of Chemical Information 8 /odeling ,formerly
JCICS)
) httpD..p+s1acs1org.6ornals.6cisdI.inde=1html
$ Jornal of Compter(%ided /oleclar Design
) httpD..&&&1kl&eronline1com.issn."J!"(#>KL
$ Jornal of /oleclar Graphics and /odeling
) httpD..&&&1elsevier1com.inca.p+lications.store.>.!.>.".@
.!.
$ Jornal of Comptational Chemistry
) httpD..&&&71interscience1&iley1com.cgi(+in.6home.77I!!
$ Jornal of Chemical 5heory and Comptation
) httpD..p+s1acs1org.6ornals.6ctcce.
$ Jornal of /edicinal Chemistry
) httpD..p+s1acs1org.6ornals.6mcmar.
Indiana University School of
David Wild Research Overview July 2006. Page 15
,Informal- pu'lications
$
;et&ork Science ,online-
) httpD..&&&1netsci1org.Science.inde=1html
$
Chemical 8 Engineering ;e&s
) httpD..p+s1acs1org.cen.
$
Drg Discovery 5oday
) httpD..&&&1drgdiscoverytoday1com.
$
Scientific Compting World
) httpD..&&&1scientific(compting1com.
$
Bio(I5 World
) httpD..&&&1+io(it&orld1com.
Indiana University School of
David Wild Research Overview July 2006. Page 16
C/MI0123 4istri'ution 3ist
$
Chemical Information Sorces Discssion *ist
$
Created +y Gary Wiggins at IUB
$
httpD..listserv1indiana1ed.archives.chminf(
l1html
Indiana University School of
David Wild Research Overview July 2006. Page 17
5ahoo6 Chemoinformatics 4iscussion 3ist
$ For
) Jo+ postings
) Ideas e=change
) Mestions
) Indstry ) Stdent
connections
To (oin, go to http://groups.)ahoo.com/group/chemoinf
*r send an email to chemoinf+subscribe,)ahoogroups.com
Indiana University School of
David Wild Research Overview July 2006. Page 18
Impacting Industry
Indiana University School of
David Wild Research Overview July 2006. Page 19
78ample 1
High-Throughput Screening
Testing perhaps millions of compounds
in a corporate collection to see if any
show activity against a certain disease
protein
Indiana University School of
David Wild Research Overview July 2006. Page 20
/igh2Throughput 9creening
$
5raditionally, small nm+ers of componds
&ere tested for a particlar pro6ect or
therapetic area
$
%+ot @" years ago, technology developed that
ena+led large nm+ers of componds to +e
assayed 4ickly
$
9igh(throghpt screening can no& test
@"",""" componds a day for activity against
a protein target
$
/ay+e tens of thosands of these componds
&ill sho& some activity for the protein
$
5he chemist needs to intelligently select the
! ( 7 classes of componds that sho& the most
promise for +eing drgs to follo&(p
Indiana University School of
David Wild Research Overview July 2006. Page 21
Informatics Implications
$
;eed to +e a+le to store chemical strctre
and +iological data for millions of data
points
) Computational representation of 2D structure
$
;eed to +e a+le to organi3e thosands of
active componds into meaningfl grops
) Group similar structures together and relate to
actiity
$
;eed to learn as mch information as possi+le
,data mining-
) Apply statistical methods to the structures and
related information
Indiana University School of
David Wild Research Overview July 2006. Page 22
Tools for mining the data
Tripos Benchware HTS Dataminer (formerly SAR Navigator), www.tripos.com
Indiana University School of
David Wild Research Overview July 2006. Page 23
78ample :; 3D Visualization & Docking
3D Visualization of interactions between compounds and proteins
Docking compounds into proteins computationally
Indiana University School of
David Wild Research Overview July 2006. Page 24
<4 =isuali>ation
$
L(ray crystallography and ;/E Spectroscopy
can reveal 7D strctre of protein and +ond
componds
$
'isali3ation of these Acomple=esB of
proteins and potential drgs can help
scientists nderstand the mechanism of action
of the drg and to improve the design of a
drg
$
'isali3ation ses comptational A+all and
stickB model of atoms and +onds, as &ell as
srfaces
$
Stereoscopic visali3ation availa+le
Indiana University School of
David Wild Research Overview July 2006. Page 25
Accelrys 4isco+ery 9tudio
Indiana University School of
David Wild Research Overview July 2006. Page 26
4ocking algorithms
$
Ee4ire 7D atomic strctre for protein, and
7D strctre for compond ,AligandB-
$
/ay re4ire initial rogh positioning for the
ligand
$
Will se an optimi3ation method to try and
find the +est rotation and translation of the
ligand in the protein, for optimal +inding
affinity
Indiana University School of
David Wild Research Overview July 2006. Page 27
Genetic Algorithms
$
Create a ApoplationB of possi+le soltions,
encoded as AchromosomesB
$
Use Afitness fnctionB to score soltions
$
Good soltions are com+ined together
,AcrossoverB- and altered ,AmtationB- to
provide ne& soltions
$
5he process repeats ntil the poplation
AconvergesB on a soltion
Indiana University School of
David Wild Research Overview July 2006. Page 28
9ample G?34 output
G/0 into E;ase5@
Indiana University School of
David Wild Research Overview July 2006. Page 29
9omething fun
Screensaver that docks molecles &hile yor compter
is idle at
httpD..&&&1grid1org.pro6ects.cancer.
Indiana University School of
David Wild Research Overview July 2006. Page 30
@epresenting :4 structures with
9MI379
Indiana University School of
David Wild Research Overview July 2006. Page 31
/istorical ways of representing chemicals
$
!riial name, e1g1 Baking Soda, Aspirin,
Citric Acid, etc1 Identifies the compond,
+t gives no ,or little- information a+ot
&hat it consists of
$ Chemical formula" e1g1 C
6
H
12
O
6
1 Specifies the
type and 4antity of the atoms in the
compond, +t not its strctre ,i1e1 ho& the
atoms are connected +y +onds-
$
Systematic name, e1g1 1,2-dibromo-3-
chloropropane1 Identifies the atoms present
and ho& they are connected +y +onds1
Indiana University School of
David Wild Research Overview July 2006. Page 32
Tri+ial and 9ystematic 0ames
5rivial nameD
) tyrosine
Systematic namesD
(,p(hydro=yphenyl-alanine
(amino(p(hydro=yhydrocinnamic acid
*H CH
#
CH
NH
#
* H
*
Indiana University School of
David Wild Research Overview July 2006. Page 33
/istorical ways of representing chemicals
2D structure diagram sho&s
atoms present and ho& they
are connected +y +onds
3D structure diagram, shows how atoms are
related to each other in -. space. Can take a
/ariet) of forms. Accurate models onl) reall)
possible since 0+ra) cr)stallograph) and
computers1 but ball and stick models ha/e
been around a long time2
Indiana University School of
David Wild Research Overview July 2006. Page 34
7arly computer representations
$
9o& do &e commnicate strctral information
+et&een hmans and the compter:
) *ine notations, e1g1 Wis&esser *ine ;otation ,and
later S/I*ES-
$
9o& do &e represent the atoms and +onds in a
molecle internally in a compter:
) %tom lookp and connection ta+les
Indiana University School of
David Wild Research Overview July 2006. Page 35
3inear notations
$
Eepresent the atoms, +onds and connectivity
of a molecle in a linear te=t string
$
Consise representation
$
Nriginally designed for manal command line
entry into te=t(only systems
$
;o& an e=cellent format for file and data+ase
storage ,e1g1 can +e held in a spreadsheet
cell, on one line of a te=t file, or in an
Nracle data+ase te=t field-
Indiana University School of
David Wild Research Overview July 2006. Page 36
&iswesser 3ine 0otation (o'solete$
$ W*; for this strctre is QVYZ1R DQ
$ Uses te=t sym+olic representation of fnction
grops, e1g1D

Q O N9, V O (CN(, Z O (;9!, R O +en3ene


$
Nther sym+ols represent +ranching, e1g1 Y
*H CH
#
CH
NH
#
* H
*
Indiana University School of
David Wild Research Overview July 2006. Page 37
9MI379
$ ,one possi+le- S/I*ES for this strctre is
OC(=O)C(N)CC1=CC=C(O)C=C1
$ Can identify any chemical strctre
$ 5here can +e several &ays of &riting the same
strctre in S/I*ES ,althogh a system of
generating canonical SMI#$S) e=ists
*H CH
#
CH
NH
#
* H
*
Dave Weininger, Daylight
www.aylight.c!"
Indiana University School of
David Wild Research Overview July 2006. Page 38
9MI379 A Atoms % Bonds
$
%toms represented +y their chemical
sym+ol ,C, ;, S, N, Br, etc-1 Uppercase
for aliphatic, lo&ercase for aromatic
$
%d6acent atoms implicitly single +onded,
or O for do+le +ond, or P for triple
+ond
$
9ydrogens sally implicit
0ropane
CCC
Indiana University School of
David Wild Research Overview July 2006. Page 39
9MI379 A Atoms % Bonds
$
%toms represented +y their chemical
sym+ol ,C, ;, S, N, Br, etc-1 Uppercase
for aliphatic, lo&ercase for aromatic
$
%d6acent atoms implicitly single +onded,
or O for do+le +ond, or P for triple
+ond
$
9ydrogens sally implicit
@(0ropanol
CCCO
%r %CCC &
Indiana University School of
David Wild Research Overview July 2006. Page 40
9MI379 A Atoms % Bonds
$ %toms represented +y their chemical sym+ol ,C, ;,
S, N, Br, etc-1 Uppercase for aliphatic,
lo&ercase for aromatic
$ %d6acent atoms implicitly single +onded, or O for
do+le +ond, or P for triple +ond
$ 9ydrogens sally implicit
0ropene
C=CC
%r CC'C &
Indiana University School of
David Wild Research Overview July 2006. Page 41
9MI379 A Branching % @ings
$
0arentheses represent +ranching
$
Eing enclosres represented +y sing
nm+ers to signify attachment points
!(0ropanol
CC(O)C
Indiana University School of
David Wild Research Overview July 2006. Page 42
9MI379 A Branching % @ings
$
0arentheses represent +ranching
$
Eing enclosres represented +y sing
nm+ers to signify attachment points
Cyclohe=ane
C1CCCCC1
Indiana University School of
David Wild Research Overview July 2006. Page 43
9MI379 A Branching % @ings
$
0arentheses represent +ranching
$
Eing enclosres represented +y sing
nm+ers to signify attachment points
Ben3ene
c1ccccc1
Indiana University School of
David Wild Research Overview July 2006. Page 44
9MI379 A Branching % @ings
$
0arentheses represent +ranching
$
Eing enclosres represented +y sing
nm+ers to signify attachment points
Bromo+en3ene
c1cc(Cl)ccc1
Indiana University School of
David Wild Research Overview July 2006. Page 45
9MI379 A Acetaminophen (Tylenol$
%cetaminophen
c1c(O)ccc(NC(=O)C)c1
Indiana University School of
David Wild Research Overview July 2006. Page 46
9MI379 A multiple ring structure
Indole
c1ccc2[nH]ccc2c1
Indiana University School of
David Wild Research Overview July 2006. Page 47
?ther 9MI379 notes
$
%ll 9ydrogen atoms are implicit nless
declared other&ise
$
;on(organic ,i1e1 not C,;,S,N,Cl,Br-,
9ydrogens and modified atoms neet to +e
placed in s4are +rackets, e1g1 Q0+R, QLeR
$
Charged species indicated +y a S or ) ,and
s4are +rackets-, e1g1 Q;aSR, Q;SR, QN(R,
QCaSSR
$
Unkno&n atoms can +e represented +y a T ,+t
&atch ot for confsion &ith S/%E5SU-
$
Stereochemistry can +e indicated sing VV
$
ACanonical S/I*ESB can +e created
Indiana University School of
David Wild Research Overview July 2006. Page 48
9MI379 /omepage
httpD..&&&1daylight1com.smiles.
Nfficial Synta= Gide
$
5torial
$
E=amples
$
Eesorces
Indiana University School of
David Wild Research Overview July 2006. Page 49
?ther 3ine 0otations
$
ENSD%* ( Beilstein
Eepresentation Nf Strctre Diagram
%rranged *inearly
1O-2=3O,2-4-5N,4-6-7=-12-7,10-13O
$
Sy+yl *ine ;otation ,S*;- ( 5ripos
OHC(=O)CH(NH2)CH2C[1]=CHCH=C(OH)CH=CH@1
*H CH
#
CH
NH
#
* H
*

-
3
!
4
5
$

#
-
Indiana University School of
David Wild Research Overview July 2006. Page 50
78ample free online we' resources
6or more links, see http://www.chemoinf.com/
Indiana University School of
David Wild Research Overview July 2006. Page 51
)u'chem
http://pubchem.ncbi.nlm.nih.go//
Indiana University School of
David Wild Research Overview July 2006. Page 52
MolInspiration )roperty Calculations
http://www.molinspiration.com/cgi+bin/properties

You might also like