MetaGWAS: Combining summary-level results from two or more GWA studies into a single estimate.

This function combine K sets of GWAS association statistics on same (or at least similar) phenotype. This function employs PLINK's (Purcell et al. 2007) inverse variance-based analysis to run a number of models, including a) Fixed-effect model and b) Random-effect model, assuming there may be variation between the genuine underlying effects, i.e., effect size beta. 'This function also calculates weighted Z-score-based p-values after METAL (Willer et al. 2010) . For more information about the algorithms, please see the associated paper.

Usage

MetaGWAS(
  DataDir,
  SummData = c(""),
  ResultDir = tempdir(),
  SNPfile = NULL,
  useSNPposition = TRUE,
  UseA1 = FALSE,
  GCse = TRUE,
  plotname = "Meta_Analysis.plot",
  pval_filter = "R",
  top_snp_pval = 1e-08,
  max_top_snps = 6,
  chosen_snps_file = NULL,
  byCHR = FALSE,
  pval_threshold_manplot = 1e-05
)

Arguments

DataDir: A character string for the file path of the input files needed for SummData and SNPfile arguments.
SummData: Vector value containing the name(s) of the .Rda file(s) with GWAS summary statistics, with ‘SNP’ (i.e., SNP identifier), ‘BETA’ (i.e., effect-size or logarithm of odds ratio), ‘SE’ (i.e., standard error of BETA), ‘P’ (i.e., p-values), 'NMISS' (i.e., effective sample size), 'L95' (i.e., lower limit of 95% confidence interval) and 'U95' (i.e., upper limit of 95% confidence interval) are in mandatory column headers. These files needed to be in DataDir. If the numbers of cases and controls are unequal, effective sample size should be \(4 / (1/<# of cases> + 1/<# of controls>)\). A smaller "effective" sample size may be used for samples that include related individuals, however simulations indicate that small changes in the effective sample size have relatively little effect on the final p-value (Willer et al. 2010) . Columns, such as, CHR (Chromosome code), BP (Basepair position), A1 (First allele code), A2 (Second allele code) columns are optional. If these are present, setting useSNPposition to FALSE, causes CHR, BP and A1 to be ignored and setting UseA1 to be FALSE causes A1 to be ignored. If, both these arguments are TRUE, this function takes care of A1/A2 allele flips properly. Otherwise, A1 mismatches are thrown out. Values of CHR/BP are allowed to vary.
ResultDir: A character string for the file path where all output files will be stored. The default is tempdir().
SNPfile: Character string specifying the name of the plain-text file with a column of SNP names. These could be LD clumped SNPs or any other list of chosen SNPs for Meta analysis. This file needs to be in DataDir.
useSNPposition: Boolean value, TRUE or FALSE for using CHR, BP, and A1 or not. The default is FALSE. Note: if this is FALSE then there will be no Manhattan and QQ plot will be generated.
UseA1: Boolean value, TRUE or FALSE for A1 to be used or not. The default is FALSE.
GCse: Boolean value, TRUE or FALSE for applying study specific genomic control to adjust each study for potential population structure for all the SNPs. The default is TRUE. If users would want to apply genomic control separately for directly genotyped and imputed SNPs prior using the function, set this parameter as FALSE.
plotname: Character string, specifying the plot name of the file containing forest plots for the SNPs. The default is “Meta_Analysis.plot”.
pval_filter: Character value as "R","F" or "W", specifying whether p-value threshold should be chosen based on “Random”, “Fixed” or “Weighted” effect model for the SNPs to be included in the forest plots.
top_snp_pval: Numeric value, specifying the threshold to be used to filter the SNPs for the forest plots. The default is 1e-08.
max_top_snps: Integer value, specifying the maximum number of top SNPs (SNPs with the lowest p-values) to be ploted in the forest plot file. The default is 6.
chosen_snps_file: Character string specifying the name of the plain-text file with a column of SNP names for the forest plots. The default is NULL.
byCHR: Boolean value, TRUE or FALSE, specifying whether the meta analysis will be performed chromosome wise or not. The default is FALSE.
pval_threshold_manplot: Numeric value, specifying the p-value threshold for plotting Manhattan plots.

Value

A list object containing five dataframes and a list of forest plots. The first three dataframes, such as Mfixed, Mrandom and Mweighted contain results for fixed effect, random effect and weighted model. Each of these dataframes can have maximum 12 columns, such as:

CHR (Chromosome code)
BP (Basepair position)
SNP (SNP identifier)
A1 (First allele code)
A2 (Second allele code)
Q (p-value for Cochrane's Q statistic)
I (I^2 heterogeneity index (0-100))
P (P-value from mata analysis)
ES (Effect-size estimate from mata analysis)
SE (Standard Error from mata analysis)
CI_L (Lower limit of confidence interval)
CI_U (Uper limit of confidence interval)

The fourth dataframe contains the same columns CHR, BP, SNP, A1, A2, Q, I", with column N' ( Number of valid studies for this SNP), P (Fixed-effects meta-analysis p-value), and other columns as Fx... (Study x (0-based input file indices) effect estimate, Examples: F0, F1 etc.).

The fifth dataframe, ProblemSNP has three columns, such as

File (file name of input data),
SNP (Problematic SNPs that are thrown)
Problem (Problem code)

Problem codes are:

BAD_CHR (Invalid chromosome code)
BAD_BP (Invalid base-position code), BAD_ES (Invalid effect-size)
BAD_SE (Invalid standard error), MISSING_A1 (Missing allele 1 label)
MISSING_A2 (Missing allele 2 label)
ALLELE_MISMATCH (Mismatching allele codes across files)

A .pdf file comprising the forest plots of the SNPs is produced in the ResultDir with Plotname as prefix. If useSNPposition is set TRUE, a .jpeg file with Manhattan Plot and Q-Q plot will be in the ResultDir with Plotname as prefix.

References

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, others (2007). “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” The American Journal of Human Genetics, 81(3), 559–575. doi:10.1086/519795 .

Willer CJ, Li Y, Abecasis GR (2010). “METAL: fast and efficient meta-analysis of genomewide association scans.” Bioinformatics, 26(17), 2190–2191. doi:10.1093/bioinformatics/btq340 , http://www.ncbi.nlm.nih.gov/pubmed/20616382.

Examples

data("Summary_Stat_Ex1", package = "GXwasR")
data("Summary_Stat_Ex2", package = "GXwasR")
DataDir <- GXwasR:::GXwasR_data()
ResultDir <- tempdir()
SummData <- list(Summary_Stat_Ex1, Summary_Stat_Ex2)
SNPfile <- "UniqueLoci"
useSNPposition <- FALSE
UseA1 <- TRUE
GCse <- TRUE
byCHR <- FALSE
pval_filter <- "R"
top_snp_pval <- 1e-08
max_top_snps <- 10
chosen_snps_file <- NULL
pval_threshold_manplot <- 1e-05
plotname <- "Meta_Analysis.plot"
x <- MetaGWAS(
    DataDir = DataDir, SummData = SummData, ResultDir = ResultDir,
    SNPfile = NULL, useSNPposition = TRUE, UseA1 = UseA1, GCse = GCse,
    plotname = "Meta_Analysis.plot", pval_filter, top_snp_pval, max_top_snps,
    chosen_snps_file = NULL, byCHR, pval_threshold_manplot
)
#> ℹ Processing file number 1
#> ℹ Processing file number 2
#> ℹ Applying study-specific genomic control.
#> ℹ Applying study-specific genomic control.
#> Processing chromosome 
#> ✔ Forest plots for SNPS have compiled and saved as Meta_Analysis.plot.pdf
#> ℹ You can find them in the directory: /var/folders/d6/gtwl3_017sj4pp14fbfcbqjh0000gp/T//Rtmp0uYjFT
#> ✔ Manhattan and QQ plots for SNPS have been compiled and saved as Meta_Analysis.plot.jpeg
#> ℹ You can find them in the directory: /var/folders/d6/gtwl3_017sj4pp14fbfcbqjh0000gp/T//Rtmp0uYjFT