How to install and run SAIGE and SAIGE-GENE
Install SAIGE/SAIGE-GENE (current version 1.1.9 (updated on May 11 2023))
List of dependencies:
- R >= 3.6.1, gcc >= 5.4.0, cmake 3.14.1, cget, savvy
- R packages: Rcpp (>= 1.0.7), RcppArmadillo, RcppParallel, data.table, SPAtest (== 3.1.2), RcppEigen, Matrix, methods, BH, optparse, SKAT, qlcMatrix, RhpcBLASctl, roxygen2, rversions, devtools, dplyr, dbplyr
-
Rscript ./extdata/install_packages.R can be used to install the R packages
- Two libraries for reading VCF files. Will be automatically installed during the SAIGE installation
- cget, savvy
Logs:
v1.3.0 (Updated on Sept 09, 2023)
Bug fixed:
fix the bug that causes overestimation of SE for markers when SPA is applied. This bug was introduced since v1.1.6 when the function R::qnorm is used due to the logp argument.
To correct the SE estimation without re-run the association tests with v1.3.0, please use the following R to re-estimate the SE for markers with SPA applied
SE = abs(BETA/qnorm(p.value/2))
v1.2.0 (Updated on Jun 12, 2023)
Merge pull request #90 from belowlab/vectors-for-safer-memory
Update ComputeExact to use vectors rather than raw memory where easy
When PLINK files are used as input, marker IDs in the group file must match the ID column in PLINK files
Bug fixed:
fix the bug that rewrite the list of marker IDs when the first chunk is empty
v1.1.9 (Updated on May 11, 2023)
Merge pull request #90 from belowlab/vectors-for-safer-memory
Update ComputeExact to use vectors rather than raw memory where easy
Bugs fixed:
Fixed the error: Crash in step 2 with: mean(): object has no elements TImputed ultra rare variants become monomorphic because any dosages <= dosage_zero d_cutoff for genetic variants with MAC <= dosage_zerod_MAC_cutoff are set to be 0
v1.1.8 (Updated on Apr 24, 2023)
use SPA_fast only when sparse GRM is not used
v1.1.7 (Updated on Mar 23, 2023)
Bugs fixed:
Fixed the issue when MAC in a gene or a set is 10 Corrected the help message for –maxMAF_in_groupTest Replace SPAtest:::Saddle_Prob with SPA_pval
v1.1.6.3 (Updated on Nov 24, 2022)
Bugs fixed:
Fixed the LD mat info output to not include monomorphic markers
Corrected the help message for –maxMAF_in_groupTest
v1.1.6.2 (Updated on Oct 12, 2022)
Bugs fixed:
Fixed Step 2 crash if a region is empty
Improvements:
Simplify output message logic to prevent crash
Fixed indentation / formatting
Makes it much clearer which lines are within if/else blocks
v1.1.6.1 (Updated on Sept 8)
Bugs fixed:
Fixed the error when variance value is close to 0 in conditional analysis
Improvements:
Pass p-values as strings instead of double values from C++ to R
Clean and fix add_logp
v1.1.6 (Updated on Sep 7)
Bugs fixed:
Fixed the issue when SPA does not converge, Is.SPA is output as TRUE
Improvements:
Updated savvy version to v2.1.0
Back calculated se when Firth approx is applied to estimate beta
Added error check when stat is out of range for the double numbers
v1.1.5 (Updated on Aug 17)
Bugs fixed:
Fixed the sav input option
Improvements:
Fixed typo
Formatted p-values in single-variant assoc test output so very small p-values will not be 0
Added an option –subSampleFile to allow for using a subset of samples of those in Step 1 for Step 2
v1.1.4 (Updated on Aug 4, 2022)
Bugs fixed:
Fixed the bug for computing imputation info score with bgen file
Improvements:
ER is applied only when StdStat is greater than m_SPA_Cutoff
Improve the output with isLowMemLOCO=TRUE in Step 1
Use pinv for pseudo-inverse / generalised inverse when conditioning markers are in high LD
v1.1.3 (Updated on July 9, 2022)
Bugs fixed:
Fixed the bug for computating imputation scores in bgen files.
Improvements:
Added an option –isLowMemLOCO=TRUE to output the null model results by chromosome in different files in Step 1 when LOCO=TRUE. This can reduce memory usage in Step 1 and later in Step 2 when LOCO is TRUE.
Implemented the efficient resampling in C++ and added an option –max_MAC_for_ER to specify the maximum MAC to conduct ER. p-values of genetic variants with MAC <= max_MAC_for_ER will be calculated via efficient resampling. The default value is 4.
Implemented functions to output “LD matrices” for variant sets. This feature is under evaluation and will be used for set-based meta-analyses.
v1.1.2 (June 23, 2022)
Improvements:
Rewritten codes to obtain sample subset who are in the sparse GRM, plink file and phenotype file.
Support the sparse matrix with 1s
v1.1.1 (June 8, 2022)
Bugs fixed:
fixed the redudant output for single-variant conditional analysis for quantitative traits.
v1.0.9 (May 17, 2022)
Bugs fixed:
Fixed an output missing when the last group in the group file is empty
Improvements:
Get a more accurate variance estimation when the sparse GRM is used for fitting the null model and used for Step 2
v1.0.7 (May 8, 2022)
Bugs fixed:
Fixed the error when reading in per-marker weights from the gorup file
Fixed the issue when the CRHOM column contains string, e.g. “chr1”, in the VCF file and the matching with marker IDs in the group file and –chrom.
v1.0.6 (May 6, 2022)
Bugs fixed:
for vcf input, the largest chromosome position is 250000000. In the previouse version, it is 2000000000, which leads to missing Chr 1 and Chr 2 markers
Fixed the issue to read the group file separated by tab. Thanks to yonas-yuh!
When the markers are not ordered by chromosome position in the group file, reading genotypes from VCF files have issues. It has been fixed and now the markers do not need to be ordered by chromosome position within each set in the group file.
Improvements:
Added an option –is_fastTest for single and set-based tests to further improve the computation efficiency
Allow for different prefix (–bimFile, –bedFile, –famFile) for plink files in Step 0 and Step 1
Further improved the randomness of marker selection for variance ratio estimation
v1.0.5 (April 1, 2022)
Bugs fixed:
Removed the two extra header columns from the header line in the single assoc results when conducting the set-based tests
Improvements:
Updated and cleaned the dockerfile (Ubuntu 20.04 with system R, OpenBLAS) and docker readme (Thanks to Alex Petty!)
v1.0.4 (March 30, 2022)
Bugs fixed:
Fixed the issue when SKAT-O is performed with no weights applied to markers and one or more groups have no markers. error message “Error in if (abs(q - m1)/sqrt(var1) < Cutoff) { :”
v1.0.3 (March 30, 2022)
Bugs fixed:
Fixed the annolist issue.
Improvements:
Codes added to avoid using markers selected for variance ratio estimation in the GRM construction.
v1.0.2 (March 25, 2022):
Bugs fixed:
if LOCO = FALSE, –chrom is
- not required for set-based tests for VCF, BGEN or PLINK input
- not required for single-variant assoc tests for BGEN or PLINK input
- required for single-variant assoc tests for VCF input
if LOCO = TRUE, –chrom is always required
Improvements:
Sample file for bgen input now can either contain one column without header or be in the format of the bgenix (UKBB provided)
v1.0.1 (March 18, 2022):
Bugs fixed: –AlleleOrder for bgen input Header of the single-variant assoc test output
Improvements: change PI to M_PI in CCT.cpp add flag in Makevar for installation on Centos 8