Professional Documents
Culture Documents
A connected graph has a eulerian path if an only if it contains at most two semibalanced vertices.
Underlying assumptions
Four hidden assumptions that do not hold for next generation sequencing.
We can generate all k-mers present in the genome
All k-mers are error free
Each k-mer appears at most once in the genome
The genome consists of a single chromosome.
The smaller the k-mer, the higher the probability that we see all k-mers.
k-mer multiplicity.
How good the assembly is depends principally on four things
how long the reads are (how much overlap you can get)
how many reads you have (the more you have, the greater the chance of
overlap)
the error rate
the nature of your dna.
If you have repeats this causes problems for the assembly as the program
doesnt know where the read should go.
Paired end sequencing allow you to overcome the problem of repeats as you
know what unique sequence each repeat should be near.