You are on page 1of 18

BLAST A heuristic

algorithm
Anjali Tiwari
Pannaben Patel
Pushkala Venkataraman

Basic Local
Alignment Search
Tool

BLAST

Rapid
Searching of
Protein &
nucleotide DBs

Databa
se
nr = non redundant

Seeking similar
sequences

GenBa
nk
SwissP

nr
PIR

rot
PDB
PRF
3

Program

Search
Level
Blastp
Amino
Amino
Amino
acid
acid
acid
Blastn
Nucleotide Nucleotide Nucleotide
Blastx
Nucleotide
Amino
Amino
acid
acid
Tblastn
Amino
Nucleotide
Amino
BLASTacid
3 STEP ALGORITHM acid
Tblastx Nucleotide Nucleotide
Amino
Compile Words
Scan DB acid

Extend

Query

Database

Some definitions
Alignment

Process of lining up 2
or more sequences to
asses similarity

BLOSUM62

A 20*20 substitution
matrix for amino acids

Gap

Space introduced
into alignment to
compensate for
insertions/deletions
in 1 sequence
relative5to another

Similarity
Measures

Similarity
Matrix - BLOSUM

Local
Search
Algorithms

Identities & Conservative


Replacements = +ve

Unlikely
Replacements = -ve
6

General Concept of working of BLAST

Query Input

1000s of
sequences

Calculate
HSP
Calculate
MSP

MSP Maximal Segment Pair


HSP High Scoring Pair

Display
output
7

Key Idea BLAST1


Compile a list of high scoring words of
length w from query (w=3 for proteins, Step
11 for nucleic acids)
1

Scan for word hits in the database


of score greater than
threshold, T

Extend word hit in


both directions to find High
Scoring Pairs with scores greater
than S
8

Step
2

Step
3

Example
Step -1
Query QQGPHUIQEGQQGKEEDPP
Words of length 3 w = QQG, QGP, GPH, PHU, HUI
Take first triple QQG
Make neighborhood words w = QQG, QEG, GQG
Find high scoring triples Blosum(w, w) > T where T
= Threshold parameter
Suppose Blosum (QQG, QEG) =18
Blosum(QQG,GQG) = 12
Blosum(QQG, QQG)= 16
T=13
Choose QQG and QEG since Blosum Value9> T value

Step -2
Suppose Database Sequence = PKLMMQQGKQEGM

Matching Word Pairs in


DB sequence

10

Step -3
Query
QQGPHUIQEGQQGKEEDPP

Blosum(QQG, QQG)
=16

DB Sequence
QQGPHUIQEGQQGKEEDP
PKLMMQQGKQEGM
Blosum(QQGK, QQGK)
P
=21

PKLMMQQGKQEGM
QQGPHUIQEGQQGKEEDP
Blosum(QQGKE,
P
QQGKQ) =23
QQGPHUIQEGQQGKEEDP
PKLMMQQGKQEGM
Blosum(QQGKEE,
P
QQGKQE) =28
PKLMMQQGKQEGM
QQGPHUIQEGQQGKEEDP
Blosum(QQGKEED,
P
QQGKQEG) =27
11

Extension to the right stops here because


BLOSUM value is beginning to decrease

ADVANTAGES

DISADVANTAGES

Faster than Dynamic Programming


Finds & reports only local
Removes low complexity regions alignments
Spends less time on uninterestingFinds too many word hits per
search
Sequence thus reducing speed
Statistical significance of results can
Does not allow for gaps in seque
be obtained & these are very good

*** New Models to combat disadvantages ***


BLAST2, PSI Blast
12

BLAST2 Combination of 2 Hit &


Gapped
2 Hit Method - 3 Step method
Step 1 and Step 2 as BLAST 1
Step 3 is where they differ BLAST now looks for 2
words in a sequence instead of 1 while aligning. The 2
words are at a distance < A and are not overlapping.
Typically A=40

13

Gapped Blast

Gapped alignment is introduced to get an optimal


alignment
Two sequences:
Seq A = ACGTA
Seq B = ACATA

Normal alignment is
ACGTA
ACATA

But if a penalty of mismatch is larger than


the penalty of gap then the best optimal alignment is as belo
AC-GTA
ACG-TA
ACA-TA

AC-ATA
14

Gapped BLAST - Allows gaps to come


while aligning
Query ATTGTCAAAGACTTGAGCTGATGCAT
DB
GGCAGACATGACTGACAAGGGTATCG
ATTGTCAAAGACTTGAGCTGATGCAT
GGCAGACATGA

CTGACAAGGGTATCG

Mismatch
Gap

15

PSI BLAST-

Position specific iterated


BLAST. Used for multiple alignments

New sequences added


& process iterated

Query Sequence
BLAST search
of DB
Sequences with high
scores collected
Multiple alignment &
profile made
DB searched with
profile16

References
Altschul, S.F., Gish, W., Miller, W., Myers,
E.W. & Lipman, D.J. (1990) "Basic local
alignment search tool." Journal of Molecular
Biology 215:403-410.
Altschul, S.F.,Thomas L.M., Alejandro A.S,
Jinghui Z, Zheng Z, W. Miller & David J.L.
(1997) Gapped BLAST and PSI-BLAST: a
new generation of protein database search
programs. Nucleic Acids Research.
http://www.ncbi.nlm.nih.gov/
http://bioinf.man.ac.uk/ember/prototype/
17

References (Continued)

http://www.psc.edu/biomed/training/tutorials
/sequence/db/index.html
http://aracyc.stanford.edu/~jshrager/jeff/mb
cs/match.html
http://www.ime.usp.br/~durham/cursos/ibi50
32/pub/doc/allignmentTutorial.pdf
http://ibivu.cs.vu.nl/teaching/masters/seq_an
alysis/sa_lecture3.pdf

18

You might also like