GeneticCorrBT: Computing genetic correlation between two traits.
Source:R/GXwasR_main_functions.R
GeneticCorrBT.Rd
This function computes genetic correlation, a quantitative genetic measure that describes the genetic link between two traits and has been predicted to indicate pleiotropic gene activity or correlation between causative loci in two traits. For example, it does a bivariate GREML analysis to determine the genetic association between two quantitative traits, two binary disease traits from case-control studies, and between a quantitative trait and a binary disease trait following (Yang et al. 2011; Lee et al. 2012) . If users want, this function gives the flexibility to compute the genetic correlation chromosome-wise.
Usage
GeneticCorrBT(
DataDir,
ResultDir,
finput,
byCHR = FALSE,
REMLalgo = c(0, 1, 2),
nitr = 100,
phenofile,
cat_covarfile = NULL,
quant_covarfile = NULL,
computeGRM = TRUE,
grmfile_name = NULL,
partGRM = FALSE,
autosome = TRUE,
Xsome = TRUE,
nGRM = 3,
cripticut = 0.025,
minMAF = NULL,
maxMAF = NULL,
excludeResidual = FALSE,
ncores = 2
)
Arguments
- DataDir
A character string for the file path of the all the input files.
- ResultDir
A character string for the file path where all output files will be stored. The default is
tempdir()
.- finput
Character string, specifying the prefix of the input PLINK binary files for the genotype data. This file needs to be in
DataDir
.- byCHR
Boolean value,
TRUE
orFALSE
, specifying whether the analysis will be performed chromosome wise or not. The default isFALSE
.- REMLalgo
Integer value of 0, 1 or 2, specifying the algorithm to run REML iterations, 0 for average information (AI), 1 for Fisher-scoring and 2 for EM. The default option is 0, i.e. AI-REML (1).
- nitr
Integer value, specifying the number of iterations for performing the REML. The default is 100.
- phenofile
A dataframe for Bivar RELM has four columns
family ID
,individual ID
and two trait columns. For binary trait, the phenotypic value should be coded as 0 or 1, then it will be recognized as a case-control study (0 for controls and 1 for cases). Missing value should be represented by "-9" or "NA".- cat_covarfile
A character string, specifying the name of the categorical covariate file which is a plain text file with no header line; columns are
family ID
,individual ID
and discrete covariates. The default isNULL
. This file needs to be inDataDir
.- quant_covarfile
A character string, specifying the name of the quantitative covariate file which is a plain text file with no header line; columns are
family ID
,individual ID
and continuous covariates. The default isNULL
. This file needs to be inDataDir
.- computeGRM
Boolean value,
TRUE
orFALSE
, specifying whether to compute GRM matrices or not. The default isTRUE
.- grmfile_name
A string of characters specifying the prefix of autosomal .grm.bin file. Users need to provide separate GRM files for autosomes and X chromosome in
ResultDir
. The X chromosomal GRM file should have "x" added in the autosomal prefix as file name.For instance, if autosomal file is "ABC.grm.bin", then X chromosomal file should be "xABC.grm.bim". If you are providing chromosome-wise GRMs, then the prefix should add "ChrNumber_" at the starting of the prefix like, "Chr1_ABC.grm.bin". The default is
NULL
.- partGRM
Boolean value,
TRUE
orFALSE
, specifying whether the GRM will be partitioned into n parts (by row) in GREML model. The default isFALSE
.- autosome
Boolean value,
TRUE
orFALSE
, specifying whether estimate of heritability will be done for autosomes or not. The default isTRUE
.- Xsome
Boolean value,
TRUE
orFALSE
, specifying whether estimate of heritability will be done for X chromosome or not. The default isTRUE
.- nGRM
Integer value, specifying the number of the partition of the GRM in GREML model. The default is 3.
- cripticut
Numeric value, specifying the threshold to create a new GRM of "unrelated" individuals in GREML model. The default is arbitrary chosen as 0.025 following (Yang et al. 2011) .
- minMAF
Positive numeric value (< maxMAF), specifying the minimum threshold for the MAF filter of the SNPs in the Bivariate GREML model.
- maxMAF
Positive numeric value (minMAF,1), specifying the maximum threshold for the MAF filter of the SNPs in the Bivariate GREML model.
- excludeResidual
Boolean value,
TRUE
orFALSE
, specifying whether to drop the residual covariance from the model. Recommended to set thisTRUE
if the traits were measured on different individuals. The default isFALSE
.- ncores
Integer value, specifying the number of cores to be used.
Value
A dataframe with minimum three columns:
Source" (i.e., source of heritability)
Variance" (i.e. estimated heritability)
SE" (i.e., standard error of the estimated heritability)
Source column will have rows, such as V(G)_tr1 (genetic variance for trait 1), V(G)_tr2 (genetic variance for trait 2), C(G)_tr12 (genetic covariance between traits 1 and 2),V(e)_tr1 (residual variance for trait 1), V(e)_tr2 (residual variance for trait 2), C(e)_tr12 (residual covariance between traits 1 and 2), Vp_tr1 (proportion of variance explained by all SNPs for trait 1), Vp_tr2 (proportion of variance explained by all SNPs for trait 2), V(G)/Vp_tr1 (phenotypic variance for trait 1), V(G)/Vp_tr2 (phenotypic variance for trait 2), rG (genetic correlation) and n (sample size). In case of chromosome-wise analysis, there will be 'chromosome' column for chromosome code.
References
Lee SH, Wray NR, Goddard ME, Visscher PM (2012).
“Estimation of pleiotropy between complex diseases using SNP-derived genomic relationships and restricted maximum likelihood.”
Bioinformatics, 28, 2540–2542.
doi:10.1093/bioinformatics/bts474
, http://www.ncbi.nlm.nih.gov/pubmed/22843982.
Yang J, Lee SH, Goddard ME, Visscher PM (2011).
“GCTA: a tool for Genome-wide Complex Trait Analysis.”
American Journal of Human Genetics, 88(1), 76–82.
Examples
data("Example_phenofile", package = "GXwasR")
DataDir <- GXwasR:::GXwasR_data()
ResultDir <- tempdir()
finput <- "GXwasR_example"
byCHR <- TRUE
REMLalgo <- 0
nitr <- 3
ncores <- 3
phenofile <- Example_phenofile # Cannot be NULL
cat_covarfile <- NULL
quant_covarfile <- NULL
partGRM <- FALSE # Partition the GRM into m parts (by row),
autosome <- TRUE
Xsome <- TRUE
cripticut <- 0.025
minMAF <- 0.01 # if MAF filter apply
maxMAF <- 0.04
excludeResidual <- TRUE
genetic_correlation <- GeneticCorrBT(
DataDir = DataDir, ResultDir = ResultDir, finput = finput, byCHR = byCHR,
REMLalgo = 0, nitr = 10, phenofile = phenofile, cat_covarfile = NULL, quant_covarfile = NULL,
partGRM = FALSE, autosome = TRUE, Xsome = TRUE, nGRM = 3,
cripticut = 0.025, minMAF = NULL, maxMAF = NULL, excludeResidual = TRUE, ncores = ncores
)
#> Processing chromosome 1
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 2
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 3
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#>
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 4
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 5
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 6
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 7
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 8
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 9
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 10
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>
#> Processing chromosome 23
#> ✔ GRM has been saved in the file [/var/folders/d6/gtwl3_017sj4pp14fbfcbqjh0000gp/T//RtmpO7c0S8/xGXwasR.grm.bin]
#> ℹ Number of SNPs in each pair of individuals has been saved in the file [/var/folders/d6/gtwl3_017sj4pp14fbfcbqjh0000gp/T//RtmpO7c0S8/xGXwasR.grm.N.bin]
#>
#> Processing chromosome 24
#> ✖ Error: Log-likelihood not converged (stop after 10 iteractions).
#> ℹ Note: to constrain the correlation being from -1 to 1, a genetic (or residual) variance-covariance matrix is bended to be positive definite. In this case, the SE is unreliable.
#> ✖ Convergence issue occurred, please:
#> - set byCHR = TRUE
#> - set different options
#> - verify SNP partitioning or quality of the data
#> ℹ The result will be provided for the last iteration.
#>