GXwasR Overview
Banabithi Bose
University of Colorado Anschutz Medical;Northwestern Universitybanabithi.bose@gmail.com
16 October 2025
Source:vignettes/GXwasR_overview.Rmd
GXwasR_overview.RmdGXwasR Description:
This package implements various statistical genetics models for Genome-Wide Association (GWA) and X-Chromosome Wide Association (XWA) analyses in a sex-combined or sex-stratified way considering X-Chromosome Inactivation (XCI) pattern. In addition to association analysis, the package also enables testing for sex differences in genetic effects, including the implementation of specific models and applying best practices for additional quality control (QC) of genetic data required for these tests. The package includes thirty different functions in six different categories (A-F) which enable a comprehensive pipeline for sex-aware genetic association analysis of common variants with unrelated individuals.
Basics
Install GXwasR
R is an open-source statistical environment which can be
easily modified to enhance its functionality via packages. GXwasR is
a R package available via the Bioconductor repository for packages.
R can be installed on any operating system from CRAN after which you can install
GXwasR by
using the following commands in your R session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
## Check that you have a valid Bioconductor installation
BiocManager::valid()
## Install GXwasR
BiocManager::install("GXwasR")
## Load GXwasR
library(GXwasR)External Dependencies
This package requires PLINK and GCTA, two widely used command-line tools for genetic data analysis:
- PLINK: A toolset for genome association and linkage analysis.
- GCTA: Genome-wide Complex Trait Analysis, used for estimating genetic relationships and variance components.
Please follow the instructions below to ensure both tools are installed and available to your system before use.
PLINK
This package depends on the PLINK command-line tool (version 1.9). PLINK must be installed separately and made available on your system.
PLINK is not bundled with this package and must either:
- (preferred) be specified via the PLINK_PATH environment variable, or
- be on your system PATH.
π§ PLINK Installation Instructions
Binaries for all major platforms can be downloaded from:
Detailed, platform-specific setup instructions can be found in the INSTALL file included with this package.
π§ Configuring the PLINK Path
This package will attempt to locate PLINK using:
- The
PLINK_PATHenvironment variable, if set. - The system path, via
Sys.which("plink").
If PLINK is not found, an error will be raised with guidance on how to resolve it.
You can manually set the path in your R session:
Sys.setenv(PLINK_PATH = "/path/to/plink")
For a persistent configuration, you can add this line to your .Renviron file:
PLINK_PATH=/path/to/plink
To verify that PLINK is discoverable:
plink_path <- Sys.getenv("PLINK_PATH", unset = Sys.which("plink"))
if (!file.exists(plink_path) || !nzchar(plink_path)) {
stop("PLINK binary not found. Please install PLINK and/or set the PLINK_PATH environment variable.")
}GCTA
This package also utilizes the GCTA command-line tool (Genome-wide Complex Trait Analysis). GCTA must be installed separately and made available on your system.
GCTA is not bundled with this package and must either:
- (preferred) be specified via the GCTA_PATH environment variable, or
- be on your system PATH.
π§ GCTA Installation Instructions
Binaries for all major platforms can be downloaded from the GCTA website.
Detailed, platform-specific setup instructions can be found in the INSTALL file included with this package.
π§ Configuring the GCTA Path
This package will attempt to locate GCTA using:
- The GCTA_PATH environment variable, if set.
- The system path, via
Sys.which("gcta64").
If GCTA is not found, an error will be raised with guidance on how to resolve it.
You can manually set the path in your R session:
Sys.setenv(GCTA_PATH = "/path/to/gcta64")
For a persistent configuration, you can add this line to your .Renviron file:
GCTA_PATH=/path/to/gcta64
To verify that GCTA is discoverable:
gcta_path <- Sys.getenv("GCTA_PATH", unset = Sys.which("gcta64"))
if (!file.exists(gcta_path) || !nzchar(gcta_path)) {
stop("GCTA binary not found. Please install GCTA and/or set the GCTA_PATH environment variable.")
}β οΈ macOS Security Warning
macOS may block these applications from launching because they were downloaded from the internet and arenβt explicitly approved by Apple. If you see a warning like:
"β(PLINK/GCTA)β canβt be opened because Apple cannot check it for malicious software."
You can still run the app by following these steps:
- Open System Settings (or System Preferences on older macOS versions).
- Go to Privacy & Security.
- Scroll down to the Security section.
- You should see a message about the blocked app β click βOpen Anywayβ.
- Confirm when prompted.
For more details, see Appleβs official guide: https://support.apple.com/en-us/102445
Functions Overview
This document provides an overview of all the functions in GXwasR package. It offers thirty-three distinct functions, which are organized into six main categories:
A) Pre-imputation QC
B) Post-imputation QC
C) Sex-combined and sex-stratified GWAS with specialized analysis for XWAS
D) Sex-differential test
E) High level analysis
F) Utility Functions
These categories and their respective functions are detailed in the table below.
Table of Functions
| Function | Description | Category |
|---|---|---|
QCsnp() |
Performs quality control for SNPs in PLINK binary files. | Pre-imputation QC, Post-imputation QC |
QCsample() |
Identifies outlier individuals based on heterozygosity and missing genotype rates. | Pre-imputation QC, Post-imputation QC |
AncestryCheck() |
Evaluates samplesβ ancestry and flags outliers using PCA. | Pre-imputation QC |
SexCheck() |
Compares sex assignments with predictions from X chromosome inbreeding coefficients. | Pre-imputation QC |
Xhwe() |
Filters X-chromosome variants violating Hardy-Weinberg Equilibrium in females. | Post-imputation QC |
MAFdifSexControl() |
Tests for significant MAF differences between sexes in control samples. | Post-imputation QC |
FilterRegion() |
Filters out specific chromosomal regions from input PLINK files. | Post-imputation QC, Utility Functions |
GXwas() |
Runs GWAS models in autosomes with XWAS models like βFMcomb01β, βFMcomb02β, βFMstratifiedβ for binary and quantitative traits, and βGWAScxciβ for binary traits. Focuses on additive SNP effects, multi-collinearity issues, and includes multiple covariates and their interactions. | Sex-combined and sex-stratified GWAS with XWAS |
PvalComb() |
Combines p-values from separate GWAS using various statistical methods and corrects summary p-values. | Sex-combined and sex-stratified GWAS with XWAS |
SexDiff() |
Evaluates sex differences in genetic effect size for each SNP. | Sex-differential test |
SexDiffZscore() |
Presumably analyzes sex differences using Z-score methodology, comparing genetic effect sizes between males and females. | Sex-differential test |
DiffZeroOne() |
Assesses Z-scores for deviation from one and zero for statistics like genetic correlation. | Sex-differential test |
TestXGene() |
Performs gene-based association tests using GWAS/XWAS summary statistics. | High level analysis |
MetaGWAS() |
Combines summary-level GWAS results using fixed-effect and random-effect models. | High level analysis |
ComputePGS() |
Calculates polygenic scores from GWAS summary statistics. | High level analysis |
GeneticCorrBT() |
Computes genetic correlation between to traits. | High level analysis |
EstimateHerit() |
Computes SNP heritability using GREML or LDSC models. | High level analysis |
SexRegress() |
Not previously described. Presumably involves regression analyses specific to sex-stratified data. | High level analysis |
FilterPlinkSample() |
Prepares PLINK binary files with desired samples based on specified criteria. | Utility Functions |
ComputeGeneticPC() |
Computes principal components from a genetic relationship matrix for population stratification correction. | Utility Functions |
ClumpLD() |
Performs linkage disequilibrium clumping of SNPs. | Utility Functions |
GetMFPlink() |
Prepares separate male and female PLINK binary files from combined files. | Utility Functions |
plinkVCF() |
Converts VCF files to plink binary formats and vice versa, including creation of dummy FAM files. | Utility Functions |
MergeRegion() |
Combines two genotype datasets based on common or all SNPs. | Utility Functions |
FilterAllele() |
Filters out multi-allelic variants from the genetic dataset, essential for maintaining dataset integrity and simplifying genetic analyses. | Utility Functions |
PlinkSummary() |
Provides a summary of genotype datasets in plink format. | Utility Functions |
FilterSNP() |
Filters out specific SNPs from the dataset based on user-defined criteria. | Utility Functions |
DummyCovar() |
Recodes a categorical covariate into binary dummy variables for statistical analysis in GXwasR. | Utility Functions |
GXWASmiami() |
Generates Miami plots for GWAS and XWAS. | Utility Functions |
SumstatGenCorr() |
Genetic Correlation Calculation from GWAS Summary Statistics. | High level analysis |
LDPrune() |
Performs LD pruning on SNP data. | Utility Functions |
executePlinkMAF() |
Calculate minor allele frequencies. | Utility Functions |
ComputeLD() |
Calculate LD matrix. | Utility Functions |
Tutorials
Please follow these tutorials to learn more about the functionality of the package GXwasR.
Tutorial for performing post-imputation QC followed by sex-aware association tests: (vignette:Use of GXwasR)
- Tutorial for performing pre-imputation QC using GXwasR: https://boseb.github.io/GXwasR/articles/preimputationQC.html
- Tutorial for performing post-imputation QC using GXwasR: https://boseb.github.io/GXwasR/articles/postimputationQC.html
- Tutorial for running GWAS, XWAS and Sex-differential Tests using GXwasR: https://boseb.github.io/GXwasR/articles/gwas_models.html
- Tutorial for computing Polygenic Score using GXwasR: https://boseb.github.io/GXwasR/articles/GXwasR_PGS.html
- Tutorial for ancestry estimation: https://boseb.github.io/GXwasR/articles/decoding_ancestry.html
- Tutorial for heritability estimation: https://boseb.github.io/GXwasR/articles/GXwasR_heritability.html
- Tutorial for meta analysis using GXwasR: https://boseb.github.io/GXwasR/articles/meta_analysis.html
- Tutorial for genetic correlation using GWAS Summary Statistics using GXwasR: https://boseb.github.io/GXwasR/articles/genetic_correlation_sumstat.html
Citing GXwasR
We hope that GXwasR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation("GXwasR")
#> To cite package 'GXwasR' in publications use:
#>
#> Bose B, Blostein F, Kim J, Winters J, Actkins KV, Mayer D, Congivaram
#> H, Niarchou M, Edwards DV, Davis LK, Stranger BE (2025). "GXwasR: A
#> Toolkit for Investigating Sex-Differentiated Genetic Effects on
#> Complex Traits." _medRxiv 2025.06.10.25329327_.
#> doi:10.1101/2025.06.10.25329327
#> <https://doi.org/10.1101/2025.06.10.25329327>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> title = {GXwasR: A Toolkit for Investigating Sex-Differentiated Genetic Effects on Complex Traits},
#> author = {Banabithi Bose and Freida Blostein and Jeewoo Kim and Jessica Winters and KyβEra V. Actkins and David Mayer and Harrsha Congivaram and Maria Niarchou and Digna Velez Edwards and Lea K. Davis and Barbara E. Stranger},
#> journal = {medRxiv 2025.06.10.25329327},
#> year = {2025},
#> doi = {10.1101/2025.06.10.25329327},
#> }Reproducibility
The GXwasR package (Bose, Blostein, Kim, Winters, Actkins, Mayer, Congivaram, Niarchou, Edwards, Davis, and Stranger, 2025) was made possible thanks to:
- R (R Core Team, 2025)
- BiocStyle (OleΕ, 2025)
- knitr (Xie, 2025)
- RefManageR (McLean, 2017)
- rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2025)
- sessioninfo (Wickham, Chang, Flight, MΓΌller, and Hester, 2025)
- testthat (Wickham, 2011)
R session information.
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.5.1 (2025-06-13)
#> os macOS Sequoia 15.7.1
#> system aarch64, darwin24.4.0
#> ui unknown
#> language en-US
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2025-10-16
#> pandoc 3.6.3 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> quarto 1.8.25 @ /usr/local/bin/quarto
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [1] CRAN (R 4.5.1)
#> bibtex 0.5.1 2023-01-26 [1] CRAN (R 4.5.0)
#> BiocManager 1.30.26 2025-06-05 [1] CRAN (R 4.5.0)
#> BiocStyle * 2.36.0 2025-04-15 [1] Bioconduc~
#> bookdown 0.45 2025-10-03 [1] CRAN (R 4.5.1)
#> bslib 0.9.0 2025-01-30 [1] CRAN (R 4.5.0)
#> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.5.0)
#> cli 3.6.5 2025-04-23 [1] CRAN (R 4.5.0)
#> desc 1.4.3 2023-12-10 [1] CRAN (R 4.5.0)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.5.0)
#> evaluate 1.0.5 2025-08-27 [1] CRAN (R 4.5.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.5.0)
#> fs 1.6.6 2025-04-12 [1] CRAN (R 4.5.0)
#> generics 0.1.4 2025-05-09 [1] CRAN (R 4.5.0)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
#> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.5.0)
#> httr 1.4.7 2023-08-15 [1] CRAN (R 4.5.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.5.0)
#> jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.5.0)
#> knitr 1.50 2025-03-16 [1] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.5.0)
#> lubridate 1.9.4 2024-12-08 [1] CRAN (R 4.5.1)
#> magrittr 2.0.4 2025-09-12 [1] CRAN (R 4.5.1)
#> pkgdown 2.1.3 2025-05-25 [2] CRAN (R 4.5.0)
#> plyr 1.8.9 2023-10-02 [1] CRAN (R 4.5.1)
#> R6 2.6.1 2025-02-15 [1] CRAN (R 4.5.0)
#> ragg 1.5.0 2025-09-02 [2] CRAN (R 4.5.1)
#> Rcpp 1.1.0 2025-07-02 [1] CRAN (R 4.5.1)
#> RefManageR * 1.4.0 2022-09-30 [1] CRAN (R 4.5.1)
#> rlang 1.1.6 2025-04-11 [1] CRAN (R 4.5.0)
#> rmarkdown 2.30 2025-09-28 [1] CRAN (R 4.5.1)
#> sass 0.4.10 2025-04-11 [1] CRAN (R 4.5.0)
#> sessioninfo * 1.2.3 2025-02-05 [1] CRAN (R 4.5.1)
#> stringi 1.8.7 2025-03-27 [1] CRAN (R 4.5.0)
#> stringr 1.5.2 2025-09-08 [1] CRAN (R 4.5.1)
#> systemfonts 1.3.1 2025-10-01 [1] CRAN (R 4.5.1)
#> textshaping 1.0.3 2025-09-02 [1] CRAN (R 4.5.1)
#> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.5.1)
#> xfun 0.53 2025-08-19 [1] CRAN (R 4.5.1)
#> xml2 1.4.0 2025-08-20 [1] CRAN (R 4.5.1)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.5.0)
#>
#> [1] /Users/mayerdav/Library/R/arm64/4.5/library
#> [2] /opt/homebrew/lib/R/4.5/site-library
#> [3] /opt/homebrew/Cellar/r/4.5.1/lib/R/library
#> * ββ Packages attached to the search path.
#>
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Bibliography
This vignette was generated using BiocStyle (OleΕ, 2025) with knitr (Xie, 2025) and rmarkdown (Allaire, Xie, Dervieux et al., 2025) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.30. 2025. URL: https://github.com/rstudio/rmarkdown.
[2] B. Bose, F. Blostein, J. Kim, et al. βGXwasR: A Toolkit for Investigating Sex-Differentiated Genetic Effects on Complex Traitsβ. In: medRxiv 2025.06.10.25329327 (2025). DOI: 10.1101/2025.06.10.25329327.
[3] M. W. McLean. βRefManageR: Import and Manage BibTeX and BibLaTeX References in Rβ. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[4] A. OleΕ. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.36.0. 2025. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.
[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2025. URL: https://www.R-project.org/.
[6] H. Wickham. βtestthat: Get Started with Testingβ. In: The R Journal 3 (2011), pp.Β 5β10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[7] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.3. 2025. DOI: 10.32614/CRAN.package.sessioninfo. URL: https://CRAN.R-project.org/package=sessioninfo.
[8] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. URL: https://yihui.org/knitr/.