R GWAS Packages

R Packages
for Genome-Wide Association Studies
Qunyuan Zhang Division of Statistical Genomics Statistical Genetics Forum March 10,2008
What is R ?
R
is a free software environment for statistical computing and graphics.
Run
s on a wide variety of UNIX platforms, Windows and MacOS (interactive or batch mode)
Free
and open source, can be downloaded from cran.r-project.org range of packages (base & contributed), novel methods available grammar & good structure (function, data object, methods and class)
Wide
Concise Help
from manuals and email group
Slow,
time and memory consuming (can be overcome by parallel computation, and/or integration with C)
Popular,
used by 70~80% statisticians
R Task Views
http://cran.r-project.org/web/views/
Statistical Genetics Packages in R

http://cran.r-project.org/web/views/Genetics.html
Population Genetics : genetics (basic), Geneland (spatial structures of genetic data), rmetasim (population genetics simulations), hapsim (simulation), popgen (clustering SNP genotype data and SNP simulation), hierfstat (hierarchical F-statistics of genetic data), hwde (modeling genotypic disequilibria), Biodem (biodemographical analysis), kinship (pedigree analysis), adegenet (population structure), ape & apTreeshape (Phylogenetic and evolution analyses), ouch (Ornstein-Uhlenbeck models), PHYLOGR (simulation and GLS model), stepwise (recombination breakpoints) Linkage and Association : gap (both population and family data, sample size calculations, probability of familial disease aggregation, kinship calculation, linkage and association analyses, haplotype frequencies) tdthap (TDT for haplotypes, powerpkg (power analyses for the affected sib pair and the TDT design),hapassoc (likelihood inference of trait associations with haplotypes in GLMs), haplo.ccs (haplotype and covariate relative risks in case-control data by weighted logistic regression), haplo.stats (haplotype analysis for unrelated subjects), tdthap (haplotype transmission/disequilibrium tests), ldDesign (experiment design for association and LD studies), LDheatmap (heatmap of pairwise LD),. mapLD (LD and haplotype blocks), pbatR (R version of PBAT), GenABEL & SNPassoc for GWAS QTL mapping for the data from experimental crosses: bqtl (inbred crosses and recombinant inbred lines), qtl (genome-wide scans), qtlDesign (designing QTL experiments & power computations), qtlbim (Bayesian Interval QTL Mapping) Sequence & Array Data Processing : seqinr, BioConductor packages
GenABEL
Aulchenko Y .S., Ripke S., Isaacs A., van Duijn C.M. GenABEL: an R package for genome-wide association analysis. Bioinformatics. 2007, 23(10):1294-6.
GenABEL: genome-wide SNP association analysis a package for genome-wide association analysis between quantitative or binary traits and single-nucleotides polymorphisms (SNPs). Version: 1.3-5 Depends: R ( 2.4.0), methods, genetics, haplo.stats, qvalue, MASS Date: 2008-02-17 Author: Yurii Aulchenko, with contributions from Maksim Struchalin, Stephan Ripke and Toby Johnson Maintainer: Yurii Aulchenko <i.aoultchenko at erasmusmc.nl> License: GPL ( 2) In views: Genetics CRAN checks: GenABEL results
GenABEL: Data Objects

nbytes: number of bytes used to store data on a SNP gwaa.datanids: number of people class gtdata: male: male code idnames: ID names genotypic data (snp.data-class) nsnps: number of SNPs nsnpnames: list of SNP names chromosome: list chromosomes corresponding to SNPs snp.data() coding: list of nucleotide coding for SNP names 2-bit strand: strands of the SNPs storage map: list SNPs positions 0 00 load.gwaa.data(phenofile = "pheno.dat", genofile = 01 gtps: genotypes (snp.mx-class) 1
phdata: phenotypic data (data frame)
"geno.raw)
2 10 3 11 Save 75%
convert.snp.text() from text file (GenABEL default format) convert.snp.ped() from Linkage, Merlin, Mach, and similar files convert.snp.mach() from Mach format convert.snp.tped() from PLINK TPED format
GenABEL: Data Manipulation

snp.subset(): add.phdata():
subset data by snp names or by QC criteria
merge extra phenotypic data to the gwaa.data-class.

ztransform(): rntransform():
standard normalization of phenotypes rank-normalization of phenotypes
npsubtreated():
non-parametric adjustment of phenotypes for medicated subjects
GenABEL: QC & Summarization

summary.snp.data():
summary of snp data (Number of observed genotypes, call rate, allelic frequency, genotypic distribution, P-value of HWE test check.trait(): summary of phenotypic data and outlier check based on a specified p/FDR cut-off check.marker(): SNP selection based on call rate, allele frequency and deviation from HWE HWE.show(): showing HWE tables, Chi2 and exact HWE Pvalues perid.summary(): call rate and heterozygosity per person
ibs():
matrix of average IBS for a group of people & a given set of SNPs hom(): average homozygosity (inbreeding) for a set of people, across multiple markers
GenABEL: SNP Association Scans

scan.glm():
snp association test using GLM in R library scan.glm((y~x1+x2++CRSNP", family = gaussian(), data, snpsubset, idsubset) scan.glm((y~x1+x2++CRSNP", family = binomial (), data, snpsubset, idsubset) scan.glm.2D(): 2-snp interaction scan Fast Scan (call C language)
ccfast():
case-control association analysis by computing chi-square test from 2x2 (allelic) or 2x3 (genotypic) tables emp.ccfast(): Genome-wide significance (permutation) for ccfast() scan
qtscore():
association test (GLM) for a trait (quantitative or categorical) emp.qtscore(): Genome-wide significance (permutation) for qscaore() scan
mmscore():
score test for association between a trait and genetic polymorphism, in samples of related individuals (needs stratification variable, scores are computed within strata and then added up)
egscore():
association test, adjusted for possible stratification by principal components of genomic kinship matrix(snp correlation matrix)
GenABEL: Haplotype Association Scans

scan.haplo():
haplotype association test using GLM in R library 2-haplotype interaction scan
scan.haplo.2D():
(haplo.stats package required) Sliding window strategy Posterior prob. of Haplotypes via EM algorithm GLM-based score test for haplotype-trait association (Schaid DJ,
Rowland CM, Tines DE, Jacobson RM, Poland GA. 2002. Score tests for association of traits with haplotypes when linkage phase is ambiguous Am J Hum Genet 70: 425-434. )
GenABEL: GWAS results

from scan.glm, scan.haplo, ccfast, qtscore, emp.ccfast,emp.qtscore
scan.gwaa-class
Names:
snpnames list of names of SNPs tested P1df: p-values of 1-d.f. (additive or allelic) test for association P2df: p-values of 2-d.f. (genotypic) test for association Pc1df: p-values from the 1-d.f. test for association between SNP and trait; the statistics is corrected for possible inflation effB: effect of the B allele in allelic test effAB: effect of the AB genotype in genotypic test effBB: effect of the BB genotype in genotypic test Map: list of map positions of the SNPs Chromosome: list of chromosomes the SNPs belong to Idnames: list of subjects used in analysis Lambda: inflation factor estimate, as computed using lower portion (say, 90%) of the distribution, and standard error of the estimate Formula: formula/function used to compute p-values Family: family of the link function / nature of the test
GenABEL: Functions
Table & Graphic
descriptives.marker(): table of marker info. descriptives.trait(): table of trait info. descriptives.scan(): table of scan results plot.scan.gwaa(): plot of scan results plot.check.marker(): plot of marker data (QC etc.)
GenABEL: Computer Efficiency

2000 subjects x 500K chip Memory: ~3.2 G Loading time: ~4 Min. SNP summary: ~1 Min. Call ccfast: ~0.5 Min. Call qtscore: ~2 Min. Total: < 10 Min. Permutation test N=10,000 73~ 120 hrs, 3~5 days
Intel Xeon 2.8GHz processor,SuSE Linux 9.2, R
SNPassoc
An R package to perform whole genome association studies, Juan R. Gonzlez 1, et al. Bioinformatics, 2007 23(5):654-655
SNPassoc: SNPs-based whole genome association studies This package carries out most common analysis when performing whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Version:1.4-9 Depends:R ( 2.4.0), haplo.stats, survival, mvtnorm Date:2007-Oct-16 Author:Juan R Gonzlez, Llus Armengol, Elisabet Guin, Xavier Sol, and Vctor MorenoMaintainer:Juan R Gonzlez <jrgonzalez at imim.es> License:GPL version 2 or newerURL:http://www.r-project.org and http://davinci.crg.es/estivill_lab/snpassoc; In views:Genetics CRAN checks:SNPassoc results
SNPassoc: Data & Summary

setupSNP(data=snp-pheno.table,
info=map.table,
colSNPs=, sep = "/", ...)
summary()
allele frequencies percentage of missing values HWE test
SNPassoc: Association Tests

WGassociation(y~x1+x2,
data=, model = (codominant, dominant, recessive, overdominant, log-additive or all),quantitative = , level = 0.95) scanWGassociation(): only p values association(): only for selected snps, can do stratified, GxE interaction analyses Results Summary: a summary table by genes/chromosomes Wgstats: detailed output(case-control numbers, percentages, odds ratios/ mean differences, 95% confidence intervals, P-value for the likelihood ratio test of association, and AIC, etc.) Pvalues: a table of p-values for each genetic model for each SNP Plot: p values in the -log scale for plot.Wgassociation() Labels: returns the names of the SNPs analyzed
SNPassoc: Multiple-SNP Analysis

SNPSNP Interaction interactionPval(): epistasis analysis between all pairs of SNPs (and covariates).
Haplotype Analysis haplo.glm(): using the R package haplo.stats: association analysis of haplotypes with a response via GLM haplo.interaction(): interactions between haplotypes (and covariates)
SNPassoc: Computer Efficiency

1000 subjects X 3000 SNPs 5 min. import data 40 min. setupSNP() 30 min. scanWGassociation(): only p values (including permutation test) Memory usage: 750 MB

R GWAS Packages

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R GWAS Packages

Uploaded by

Copyright:

Available Formats

R Packages

for Genome-Wide Association Studies

is a free software environment for statistical computing and graphics.

from manuals and email group

used by 70~80% statisticians

Statistical Genetics Packages in R

GenABEL: Data Objects

phdata: phenotypic data (data frame)

GenABEL: Data Manipulation

subset data by snp names or by QC criteria

merge extra phenotypic data to the gwaa.data-class.

standard normalization of phenotypes rank-normalization of phenotypes

non-parametric adjustment of phenotypes for medicated subjects

GenABEL: QC & Summarization

GenABEL: SNP Association Scans

GenABEL: Haplotype Association Scans

haplotype association test using GLM in R library 2-haplotype interaction scan

GenABEL: GWAS results

Table & Graphic

GenABEL: Computer Efficiency

SNPassoc: Data & Summary

colSNPs=, sep = "/", ...)

allele frequencies percentage of missing values HWE test

SNPassoc: Association Tests

SNPassoc: Multiple-SNP Analysis

SNPassoc: Computer Efficiency

You might also like