You are on page 1of 23

Certificate Course in

Bioinformatics

Bioinformatics
An overview

Soumitra Nath
mail: nath.soumitra1@gmail.com
What is Bioinformatics?
“The field of science in which biology,

computer science, and information technology


merge to form a single discipline”



Ultimate goal: to enable the discovery of new

biological insights as well as to create a global


perspective from which unifying principles in
biology can be discerned.

Bioinformatics

Biological Computer
Data + Calculations
OUR Objectives

• To introduce the bioinfomaticsdiscipline


• Make familiar with the major biological
questions which can be addressed by
bioinformatics tools
• To introduce the major tools used for sequence
and structure analysis and explain in general
how they work (limitation etc..)


Central Paradigm in Molecular Biology

Gene (DNA) mRNA Protein

21ST centaury

Genome Transcriptome Proteome


Genome

• Chromosomal DNA of an organism

• Coding and non-coding DNA



• Genome size and number of genes does not
necessarily determine organism complexity
Transcriptome

• Complete collection of all possible mRNAs


(including splice variants) of an organism.

• Regions of an organism’s genome that get
transcribedinto messenger RNA.

• Transcriptome can be extended to include all
transcribed elements, including non-coding
RNAs used for structural and regulatory
purposes.

Proteome

• The complete collection of proteins that can be


produced by an organism.

• Can be studied either as static (sum of all


proteins possible) or dynamic (all proteins
found at a specific time point) entity

From DNA to Genome
First protein sequence
Watson and Crick DNA model
1955

1960

1965
First protein structure
1970

1975

1980

1985
1990

First bacterial genome


1995 Hemophilus Influenzae

Yeast genome

2000
First human genome draf
The Human Genome Project
 Initiated in 1986 Completed in 2003

 Project goals were to


• identify all the genes in human DNA,
• determine the sequences of the 3 billion chemical base pairs
that make up human DNA,
• store this information in databases,
• improve tools for data analysis and develop new tools
• addressthe ethical, legal, and social issues that may arise
from the project.

What makes us human?

CHIMP GENOME
Chimpanzees are similar to humans in so many
ways: they are socially complex, sensitive and
communicative, and yet indisputably on the animal
side of the man/beast divide. Scientists have now
sequenced the genetic code of our closest living
relative, showing the striking concordances and
divergences between the two species, and perhaps
holding up a mirror to our own humanity.
How humans
are chimps?

Perhaps not surprising!!!


Comparison between the full drafts of the human and chimp genomes revealed
that they differ only by 1.23%
Open reading frames

Functional sites
Annotation
Structure, function
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT ......
.............. TGAAAAACGTA
promoter TF binding site

CAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

Transcription

AAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

Start Site
TTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
GCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
CAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
GA CAA TTG GTT TCT TCT CTG AAT ..............................

......... TGAAAAACGTA

ORF=Open Reading Frame Ribosome binding Site


CDS=Coding Sequence
16
DNA SEQUENCING

• Refers to sequencing method to determine


the nucleotide bases
• ie., Adenine, guanine, cytosine and Thymine

• 2-d chromatography
– First DNA Sequencing
– Obtained by acedemic researchers –
1970’s

• Chemical Procedure for Sequencing
– Developed by Alan Maxam and Walter
• Enzymatic Procedure
• Developed by Fredrick Sanger (1977)

• Automated DNA Sequencer


• Pyro-Sequencing
– Currently the method of choice for most researche

Also called di-deoxy chain termination method


Sequence Comparison

• DNA is blue print for living organisms


⇒Evolution is related to changes in DNA
⇒By comparing DNA sequences we can
infer evolutionary relationships between
the sequences w/o knowledge of the
evolutionary events themselves
• Foundation for inferring function, active
site, and key mutations
Sequence Alignment

indel
• Key aspect of sequence
Sequence U
comparison is
sequence alignment
•mismatch
• A sequence alignment
maximizes the number
of positions that are in
Sequence V match agreement in two
sequences

Copyright  2004 limsoon wong


Multiple Alignment: An Example
• Multiple seq alignment maximizes number of
positions in agreement across several seqs
• seqs belonging to same “family” usually have
more conserved positions in a multiple seq
alignment

Copyright Conserved
2004 limsoonsites
wong
Phylogeny: An Example
• By looking at extent of conserved positions in the
multiple seq alignment of different groups of
seqs, can infer when they last shared an
ancestor
⇒Construct “family tree” or phylogeny

Copyright 2004 limsoon wong


Benefits of Bioinformatics

To the patient:
Better drug, better treatment
To the pharma:
Save time, save cost, make more $

To the scientist:
Better science

You might also like