Skip to contents

This function performs principal components analysis (PCA) based on the variance-standardized relationship matrix (Purcell et al. 2007) .

Top principal components are generally used as covariates in association analysis regressions to help correct for population stratification

Usage

ComputeGeneticPC(
  DataDir,
  ResultDir = tempdir(),
  finput,
  countPC = 10,
  plotPC = TRUE,
  highLD_regions = NULL,
  ld_prunning = TRUE,
  window_size = 50,
  step_size = 5,
  r2_threshold = 0.02
)

Arguments

DataDir

A character string for the file path of the input PLINK binary files.

ResultDir

A character string for the file path where all output files will be stored. The default is tempdir().

finput

Character string, specifying the prefix of the input PLINK binary files. This file needs to be in DataDir.

countPC

Integer value, specifying the number of principal components. The default is 10.

plotPC

Boolean value, TRUE or FALSE, specifying whether to plot the first two PCs.

highLD_regions

A R dataframe with genomic regions with high LD for using in finding pruned SNPs in the plots. The default is NULL.

ld_prunning

Numeric value between 0 to 1 of pairwise \(r^2\) threshold for LD-based filtering for pruned SNPs in the plots. The default is 0.02.

window_size

Integer value, specifying a window size in variant count or kilobase for LD-based filtering. The default is 50.

step_size

Integer value, specifying a variant count to shift the window at the end of each step for LD filtering for pruned SNPs in the plots. The default is 5.

r2_threshold

Numeric value between 0 to 1 of pairwise \(r^2\) threshold for LD-based filtering for pruned SNPs in the plots. The default is 0.02.

Value

A dataframe with genetic principal components. The first two columns are IID (i.e., Individual Id) and FID (i.e., Family ID). The other columns are PCs.

References

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, others (2007). “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” The American Journal of Human Genetics, 81(3), 559–575. doi:10.1086/519795 .

Author

Banabithi Bose

Examples

data("highLD_hg19", package = "GXwasR")
DataDir <- GXwasR:::GXwasR_data()
ResultDir <- tempdir()
finput <- "GXwasR_example"
highLD_regions <- highLD_hg19
ld_prunning <- "TRUE"
window_size <- 50
step_size <- 5
r2_threshold <- 0.02
countPC <- 20
## Genetic PC
GP <- ComputeGeneticPC(
    DataDir = DataDir, ResultDir = ResultDir,
    finput = finput, highLD_regions = highLD_hg19, countPC = 20
)