Skip to contents

GXwasR Description:

This package implements various statistical genetics models for Genome-Wide Association (GWA) and X-Chromosome Wide Association (XWA) analyses in a sex-combined or sex-stratified way considering X-Chromosome Inactivation (XCI) pattern. In addition to association analysis, the package also enables testing for sex differences in genetic effects, including the implementation of specific models and applying best practices for additional quality control (QC) of genetic data required for these tests. The package includes thirty different functions in six different categories (A-F) which enable a comprehensive pipeline for sex-aware genetic association analysis of common variants with unrelated individuals.

Basics

Install GXwasR

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. GXwasR is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install GXwasR by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

## Check that you have a valid Bioconductor installation
BiocManager::valid()

## Install GXwasR
BiocManager::install("GXwasR")

## Load GXwasR
library(GXwasR)

External Dependencies

This package requires PLINK and GCTA, two widely used command-line tools for genetic data analysis:

  • PLINK: A toolset for genome association and linkage analysis.
  • GCTA: Genome-wide Complex Trait Analysis, used for estimating genetic relationships and variance components.

Please follow the instructions below to ensure both tools are installed and available to your system before use.

This package depends on the PLINK command-line tool (version 1.9). PLINK must be installed separately and made available on your system.

PLINK is not bundled with this package and must either:

  • (preferred) be specified via the PLINK_PATH environment variable, or
  • be on your system PATH.

Binaries for all major platforms can be downloaded from:

Detailed, platform-specific setup instructions can be found in the INSTALL file included with this package.

This package will attempt to locate PLINK using:

  • The PLINK_PATH environment variable, if set.
  • The system path, via Sys.which("plink").

If PLINK is not found, an error will be raised with guidance on how to resolve it.

You can manually set the path in your R session:

Sys.setenv(PLINK_PATH = "/path/to/plink")

For a persistent configuration, you can add this line to your .Renviron file:

PLINK_PATH=/path/to/plink

To verify that PLINK is discoverable:

plink_path <- Sys.getenv("PLINK_PATH", unset = Sys.which("plink"))
if (!file.exists(plink_path) || !nzchar(plink_path)) {
  stop("PLINK binary not found. Please install PLINK and/or set the PLINK_PATH environment variable.")
}
GCTA

This package also utilizes the GCTA command-line tool (Genome-wide Complex Trait Analysis). GCTA must be installed separately and made available on your system.

GCTA is not bundled with this package and must either:

  • (preferred) be specified via the GCTA_PATH environment variable, or
  • be on your system PATH.
πŸ”§ GCTA Installation Instructions

Binaries for all major platforms can be downloaded from the GCTA website.

Detailed, platform-specific setup instructions can be found in the INSTALL file included with this package.

🧭 Configuring the GCTA Path

This package will attempt to locate GCTA using:

  • The GCTA_PATH environment variable, if set.
  • The system path, via Sys.which("gcta64").

If GCTA is not found, an error will be raised with guidance on how to resolve it.

You can manually set the path in your R session:

Sys.setenv(GCTA_PATH = "/path/to/gcta64")

For a persistent configuration, you can add this line to your .Renviron file:

GCTA_PATH=/path/to/gcta64

To verify that GCTA is discoverable:

gcta_path <- Sys.getenv("GCTA_PATH", unset = Sys.which("gcta64"))
if (!file.exists(gcta_path) || !nzchar(gcta_path)) {
  stop("GCTA binary not found. Please install GCTA and/or set the GCTA_PATH environment variable.")
}
⚠️ macOS Security Warning

macOS may block these applications from launching because they were downloaded from the internet and aren’t explicitly approved by Apple. If you see a warning like:

"β€œ(PLINK/GCTA)” can’t be opened because Apple cannot check it for malicious software."

You can still run the app by following these steps:

  1. Open System Settings (or System Preferences on older macOS versions).
  2. Go to Privacy & Security.
  3. Scroll down to the Security section.
  4. You should see a message about the blocked app β€” click β€œOpen Anyway”.
  5. Confirm when prompted.

For more details, see Apple’s official guide: https://support.apple.com/en-us/102445

Functions Overview

This document provides an overview of all the functions in GXwasR package. It offers thirty-three distinct functions, which are organized into six main categories:

A) Pre-imputation QC
B) Post-imputation QC
C) Sex-combined and sex-stratified GWAS with specialized analysis for XWAS
D) Sex-differential test
E) High level analysis 
F) Utility Functions

These categories and their respective functions are detailed in the table below.

Table of Functions

Function Description Category
QCsnp() Performs quality control for SNPs in PLINK binary files. Pre-imputation QC, Post-imputation QC
QCsample() Identifies outlier individuals based on heterozygosity and missing genotype rates. Pre-imputation QC, Post-imputation QC
AncestryCheck() Evaluates samples’ ancestry and flags outliers using PCA. Pre-imputation QC
SexCheck() Compares sex assignments with predictions from X chromosome inbreeding coefficients. Pre-imputation QC
Xhwe() Filters X-chromosome variants violating Hardy-Weinberg Equilibrium in females. Post-imputation QC
MAFdifSexControl() Tests for significant MAF differences between sexes in control samples. Post-imputation QC
FilterRegion() Filters out specific chromosomal regions from input PLINK files. Post-imputation QC, Utility Functions
GXwas() Runs GWAS models in autosomes with XWAS models like β€œFMcomb01”, β€œFMcomb02”, β€œFMstratified” for binary and quantitative traits, and β€œGWAScxci” for binary traits. Focuses on additive SNP effects, multi-collinearity issues, and includes multiple covariates and their interactions. Sex-combined and sex-stratified GWAS with XWAS
PvalComb() Combines p-values from separate GWAS using various statistical methods and corrects summary p-values. Sex-combined and sex-stratified GWAS with XWAS
SexDiff() Evaluates sex differences in genetic effect size for each SNP. Sex-differential test
SexDiffZscore() Presumably analyzes sex differences using Z-score methodology, comparing genetic effect sizes between males and females. Sex-differential test
DiffZeroOne() Assesses Z-scores for deviation from one and zero for statistics like genetic correlation. Sex-differential test
TestXGene() Performs gene-based association tests using GWAS/XWAS summary statistics. High level analysis
MetaGWAS() Combines summary-level GWAS results using fixed-effect and random-effect models. High level analysis
ComputePGS() Calculates polygenic scores from GWAS summary statistics. High level analysis
GeneticCorrBT() Computes genetic correlation between to traits. High level analysis
EstimateHerit() Computes SNP heritability using GREML or LDSC models. High level analysis
SexRegress() Not previously described. Presumably involves regression analyses specific to sex-stratified data. High level analysis
FilterPlinkSample() Prepares PLINK binary files with desired samples based on specified criteria. Utility Functions
ComputeGeneticPC() Computes principal components from a genetic relationship matrix for population stratification correction. Utility Functions
ClumpLD() Performs linkage disequilibrium clumping of SNPs. Utility Functions
GetMFPlink() Prepares separate male and female PLINK binary files from combined files. Utility Functions
plinkVCF() Converts VCF files to plink binary formats and vice versa, including creation of dummy FAM files. Utility Functions
MergeRegion() Combines two genotype datasets based on common or all SNPs. Utility Functions
FilterAllele() Filters out multi-allelic variants from the genetic dataset, essential for maintaining dataset integrity and simplifying genetic analyses. Utility Functions
PlinkSummary() Provides a summary of genotype datasets in plink format. Utility Functions
FilterSNP() Filters out specific SNPs from the dataset based on user-defined criteria. Utility Functions
DummyCovar() Recodes a categorical covariate into binary dummy variables for statistical analysis in GXwasR. Utility Functions
GXWASmiami() Generates Miami plots for GWAS and XWAS. Utility Functions
SumstatGenCorr() Genetic Correlation Calculation from GWAS Summary Statistics. High level analysis
LDPrune() Performs LD pruning on SNP data. Utility Functions
executePlinkMAF() Calculate minor allele frequencies. Utility Functions
ComputeLD() Calculate LD matrix. Utility Functions

Tutorials

Please follow these tutorials to learn more about the functionality of the package GXwasR.

Tutorial for performing post-imputation QC followed by sex-aware association tests: (vignette:Use of GXwasR)

Citing GXwasR

We hope that GXwasR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("GXwasR")
#> To cite package 'GXwasR' in publications use:
#> 
#>   Bose B, Blostein F, Kim J, Winters J, Actkins KV, Mayer D, Congivaram
#>   H, Niarchou M, Edwards DV, Davis LK, Stranger BE (2025). "GXwasR: A
#>   Toolkit for Investigating Sex-Differentiated Genetic Effects on
#>   Complex Traits." _medRxiv 2025.06.10.25329327_.
#>   doi:10.1101/2025.06.10.25329327
#>   <https://doi.org/10.1101/2025.06.10.25329327>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {GXwasR: A Toolkit for Investigating Sex-Differentiated Genetic Effects on Complex Traits},
#>     author = {Banabithi Bose and Freida Blostein and Jeewoo Kim and Jessica Winters and Ky’Era V. Actkins and David Mayer and Harrsha Congivaram and Maria Niarchou and Digna Velez Edwards and Lea K. Davis and Barbara E. Stranger},
#>     journal = {medRxiv 2025.06.10.25329327},
#>     year = {2025},
#>     doi = {10.1101/2025.06.10.25329327},
#>   }

Reproducibility

The GXwasR package (Bose, Blostein, Kim, Winters, Actkins, Mayer, Congivaram, Niarchou, Edwards, Davis, and Stranger, 2025) was made possible thanks to:

  • R (R Core Team, 2025)
  • BiocStyle (OleΕ›, 2025)
  • knitr (Xie, 2025)
  • RefManageR (McLean, 2017)
  • rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2025)
  • sessioninfo (Wickham, Chang, Flight, MΓΌller, and Hester, 2025)
  • testthat (Wickham, 2011)

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.1 (2025-06-13)
#>  os       macOS Sequoia 15.7.1
#>  system   aarch64, darwin24.4.0
#>  ui       unknown
#>  language en-US
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-10-16
#>  pandoc   3.6.3 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.8.25 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.5.0   2024-05-23 [1] CRAN (R 4.5.1)
#>  bibtex        0.5.1   2023-01-26 [1] CRAN (R 4.5.0)
#>  BiocManager   1.30.26 2025-06-05 [1] CRAN (R 4.5.0)
#>  BiocStyle   * 2.36.0  2025-04-15 [1] Bioconduc~
#>  bookdown      0.45    2025-10-03 [1] CRAN (R 4.5.1)
#>  bslib         0.9.0   2025-01-30 [1] CRAN (R 4.5.0)
#>  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
#>  cli           3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
#>  desc          1.4.3   2023-12-10 [1] CRAN (R 4.5.0)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
#>  evaluate      1.0.5   2025-08-27 [1] CRAN (R 4.5.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
#>  fs            1.6.6   2025-04-12 [1] CRAN (R 4.5.0)
#>  generics      0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
#>  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.5.0)
#>  httr          1.4.7   2023-08-15 [1] CRAN (R 4.5.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.5.0)
#>  jsonlite      2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
#>  knitr         1.50    2025-03-16 [1] CRAN (R 4.5.0)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
#>  lubridate     1.9.4   2024-12-08 [1] CRAN (R 4.5.1)
#>  magrittr      2.0.4   2025-09-12 [1] CRAN (R 4.5.1)
#>  pkgdown       2.1.3   2025-05-25 [2] CRAN (R 4.5.0)
#>  plyr          1.8.9   2023-10-02 [1] CRAN (R 4.5.1)
#>  R6            2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
#>  ragg          1.5.0   2025-09-02 [2] CRAN (R 4.5.1)
#>  Rcpp          1.1.0   2025-07-02 [1] CRAN (R 4.5.1)
#>  RefManageR  * 1.4.0   2022-09-30 [1] CRAN (R 4.5.1)
#>  rlang         1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
#>  rmarkdown     2.30    2025-09-28 [1] CRAN (R 4.5.1)
#>  sass          0.4.10  2025-04-11 [1] CRAN (R 4.5.0)
#>  sessioninfo * 1.2.3   2025-02-05 [1] CRAN (R 4.5.1)
#>  stringi       1.8.7   2025-03-27 [1] CRAN (R 4.5.0)
#>  stringr       1.5.2   2025-09-08 [1] CRAN (R 4.5.1)
#>  systemfonts   1.3.1   2025-10-01 [1] CRAN (R 4.5.1)
#>  textshaping   1.0.3   2025-09-02 [1] CRAN (R 4.5.1)
#>  timechange    0.3.0   2024-01-18 [1] CRAN (R 4.5.1)
#>  xfun          0.53    2025-08-19 [1] CRAN (R 4.5.1)
#>  xml2          1.4.0   2025-08-20 [1] CRAN (R 4.5.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.5.0)
#> 
#>  [1] /Users/mayerdav/Library/R/arm64/4.5/library
#>  [2] /opt/homebrew/lib/R/4.5/site-library
#>  [3] /opt/homebrew/Cellar/r/4.5.1/lib/R/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (OleΕ›, 2025) with knitr (Xie, 2025) and rmarkdown (Allaire, Xie, Dervieux et al., 2025) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.30. 2025. URL: https://github.com/rstudio/rmarkdown.

[2] B. Bose, F. Blostein, J. Kim, et al. β€œGXwasR: A Toolkit for Investigating Sex-Differentiated Genetic Effects on Complex Traits”. In: medRxiv 2025.06.10.25329327 (2025). DOI: 10.1101/2025.06.10.25329327.

[3] M. W. McLean. β€œRefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[4] A. OleΕ›. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.36.0. 2025. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.

[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2025. URL: https://www.R-project.org/.

[6] H. Wickham. β€œtestthat: Get Started with Testing”. In: The R Journal 3 (2011), pp.Β 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[7] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.3. 2025. DOI: 10.32614/CRAN.package.sessioninfo. URL: https://CRAN.R-project.org/package=sessioninfo.

[8] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. URL: https://yihui.org/knitr/.