Professional Documents
Culture Documents
1: Must create an account and login to your account. It will save your work and time.
2: Data uploading/fetching from the Internet
i) n the Galaxy tools panel (left), click on Get Data and choose Upload File. Click
Paste/Fetch data and paste the URL below.
https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/VariantDet_BASIC
/NA12878.GAIIx.exome_chr22.1E6reads.76bp.fastq
Select Type as fastqsanger (careful as there is a fastqcsanger too) and click Start.
Once the upload status turns green, it means the upload is complete. You should now
be able to see the file in the Galaxy history panel (right).
ii) Alternatively, if you have a local file to upload you can upload it directly browsing
from your computer.
iii) You can also upload from SRA (Sequence Read Archive) linked with GALAXY at
EMBL-EBI’s ENA (e.g GSE88943 at GEO)
COMMENTS:
Green area shows reads with good quality and red area represents region of bad/low
quality. Low quality reads should be trimmed and discarded in order to get appropriate
results. Only few of the reads lie in red area, which shows less error and less biasness in
the sequencing. Sample is a good representative of the data with most of the reads
showing high quality
2. QUALITY PER TILE (blue color which means indexes of reads are not biased)
3.MEAN SEQUENCE QUALITY:
The average quality score per read mostly lies in the area 35-38.
4. SEQUENCE CONTENT ACROSS ALL BASES:
First few base pairs show uneven distribution of A, T, C, G while 12 bases onwards is uniform.
Mean GC content lies in the region of 53-56 base pairs per read.
6. N CONTENT ACROSS ALL BASES
No adaptors found.
Not Found.
To examine the output sorted BAM file, we need to first convert it into readable
SAM format. From the Galaxy tools panel, select
NGS: SAM Tools > BAM-to-SAM
From the options:
BAM File to Convert: set to the output of the sorted BAM file
Keep other options as default and click execute
7: Study Mapping Statistics
We can generate some mapping statistics from the BAM file to assess the quality of our
alignment.
Run IdxStats
NGS: SAM Tools > IdxStats
From the options:
The BAM: select the sorted BAM file
Keep other options as default and
click execute
Output: A tab-delimited output with four
columns. Each line consists of a
reference sequence name (e.g. a
chromosome), reference sequence length,
number of mapped reads and number of
placed but unmapped reads.
PileUP format:
The pileup file we generated has 10 Further information on (10):
columns: Each character represents one of the following (the
longer this string, higher the coverage):
1. chromosome . = match on forward strand for that base
2. position , = match on reverse strand
3. current reference base ACGTN = mismatch on forward
4. consensus base from the mapped reads acgtn = mismatch on reverse
5. consensus quality +[0-9]+[ACGTNacgtn]+' = insertion between
6. SNV quality this reference position and the next
7. maximum mapping quality -[0-9]+[ACGTNacgtn]+' = deletion between
this reference position and the next
8. coverage ^ = start of read
9. quality values $ = end of read
10. bases within reads BaseQualities = one character per base in
ReadBases, ASCII encoded Phred scores
Convert to pileup file: Above output file is in tabular format. For the processing as
under, we need to convert it to pileup format. For that we need to click on the pencil icon
(Edit attributes) for the pileup file and then change the data type attribute.
Now next process will operate on this converted file.
SNV Filtering
NGS: SAM Tools > Filter Pileup
From the options:
which contains = Pileup with ten
columns (with consensus)
Do not report positions with coverage
lower than = 10
Convert coordinates to intervals = Yes
Keep other options as default and click
execute
To visualize these indels, we need to convert from tabular to bed. This is two-step
process. Click the pencil icon, Under the Datatype tab: choose Interval and save, Under
Attributes tab: make sure End column = 2
Next, we can convert the Interval file to BED format. Click the pencil icon, Under
Convert Format tab: choose Convert Genomic Interval to BED, Rename this to
indels.filtered
Download the bed file and open it using IGV genome browser.
Try looking at region chr22:31,854,409-31,854,460