ComputeGeneticPC: Computing principal components from genetic relationship matrix
Source:R/GXwasR_main_functions.R
ComputeGeneticPC.Rd
This function performs principal components analysis (PCA) based on the variance-standardized relationship matrix (Purcell et al. 2007) .
Top principal components are generally used as covariates in association analysis regressions to help correct for population stratification
Usage
ComputeGeneticPC(
DataDir,
ResultDir = tempdir(),
finput,
countPC = 10,
plotPC = TRUE,
highLD_regions = NULL,
ld_prunning = TRUE,
window_size = 50,
step_size = 5,
r2_threshold = 0.02
)
Arguments
- DataDir
A character string for the file path of the input PLINK binary files.
- ResultDir
A character string for the file path where all output files will be stored. The default is
tempdir()
.- finput
Character string, specifying the prefix of the input PLINK binary files. This file needs to be in
DataDir.
- countPC
Integer value, specifying the number of principal components. The default is 10.
- plotPC
Boolean value,
TRUE
orFALSE
, specifying whether to plot the first two PCs.- highLD_regions
A R dataframe with genomic regions with high LD for using in finding pruned SNPs in the plots. The default is
NULL
.- ld_prunning
Numeric value between 0 to 1 of pairwise \(r^2\) threshold for LD-based filtering for pruned SNPs in the plots. The default is 0.02.
- window_size
Integer value, specifying a window size in variant count or kilobase for LD-based filtering. The default is 50.
- step_size
Integer value, specifying a variant count to shift the window at the end of each step for LD filtering for pruned SNPs in the plots. The default is 5.
- r2_threshold
Numeric value between 0 to 1 of pairwise \(r^2\) threshold for LD-based filtering for pruned SNPs in the plots. The default is 0.02.
Value
A dataframe with genetic principal components. The first two columns are IID (i.e., Individual Id) and FID (i.e., Family ID). The other columns are PCs.
References
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, others (2007). “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” The American Journal of Human Genetics, 81(3), 559–575. doi:10.1086/519795 .
Examples
data("highLD_hg19", package = "GXwasR")
DataDir <- GXwasR:::GXwasR_data()
ResultDir <- tempdir()
finput <- "GXwasR_example"
highLD_regions <- highLD_hg19
ld_prunning <- "TRUE"
window_size <- 50
step_size <- 5
r2_threshold <- 0.02
countPC <- 20
## Genetic PC
GP <- ComputeGeneticPC(
DataDir = DataDir, ResultDir = ResultDir,
finput = finput, highLD_regions = highLD_hg19, countPC = 20
)