- If SAIGE has been previously used for single-variants association tests. The null model fitting results from Step 1 can be reused for region/set-based tests
- First, estimate the categorical variance ratios with full and sparse GRMs
- Use –sparseGRMFile and –sparseGRMSampleIDFile for the sparse GRM and –useSparseGRMforVarRatio=TRUE
- Use –isCateVarianceRatio=TRUE to esrimate categotical variance ratios
- Use –skipModelFitting=FASLE to avoid re-fitting the null model. With –outputPrefix=prefix, the prgoram will read in the model from prefix.rda. Using –outputPrefix_varRatio to specify a seperate file prefix for the new variance ratio results if you try to avoid overwrite the previous variance ratio file, otherwise –IsOverwriteVarianceRatioFile=TRUE can be used to overwrite the previous file.
- Note: at least ~200 markers with 10<= MAC < 20 need to be added in the plink file specified with –plinkFile, which will be used for estimating variance ratio for lower MAF categories.
Rscript step1_fitNULLGLMM.R \ --plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly_22chr \ --sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \ --sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \ --phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \ --phenoCol=y_binary \ --covarColList=x1,x2,a9,a10 \ --qCovarColList=a9,a10 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=./output/example_binary_fullGRM \ --nThreads=64 \ --useSparseGRMtoFitNULL=FALSE \ --isCateVarianceRatio=TRUE \ --skipModelFitting=TRUE \ --outputPrefix_varRatio=./output/example_binary_cate \ --useSparseGRMforVarRatio=TRUE
- If no previous SAIGE jobs were run, please follow the step 1 and 2 for set-based tests, Use –is_single_in_groupTest=TRUE to also output single-variant assoc tests
a. (fast) Use sparse GRM to fit the null model, estimate the variance ratios, and run a faster version of the associaiton tests
```
Rscript step1_fitNULLGLMM.R \
--sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly_22chr.forCate_vr \
--useSparseGRMtoFitNULL=TRUE \
--phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \
--phenoCol=y_binary \
--covarColList=x1,x2 \
--qCovarColList=x2 \
--sampleIDColinphenoFile=IID \
--traitType=binary \
--isCateVarianceRatio=TRUE \
--outputPrefix=./output/example_binary_sparseGRM \
--IsOverwriteVarianceRatioFile=TRUE
#Step 2 for single variant tests only
Rscript step2_SPAtests.R \
--bgenFile=./input/genotype_100markers.bgen \
--bgenFileIndex=./input/genotype_100markers.bgen.bgi \
--sampleFile=./input/samplelist.txt \
--AlleleOrder=ref-first \
--SAIGEOutputFile=./output/genotype_100markers_marker_bgen_Firth.txt \
--minMAF=0 \
--minMAC=20 \
--GMMATmodelFile=./output/example_binary_sparseGRM_vr.rda \
--varianceRatioFile=./output/example_binary_sparseGRM_vr.varianceRatio.txt \
--LOCO=FALSE \
--is_Firth_beta=TRUE \
--pCutoffforFirth=0.05 \
--is_output_moreDetails=TRUE \
--sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--is_fastTest=TRUE
#Step 2 for set-based tests and also output single-variant tests results with –is_single_in_groupTest=TRUE
Rscript step2_SPAtests.R \
--bgenFile=./input/genotype_100markers.bgen \
--bgenFileIndex=./input/genotype_100markers.bgen.bgi \
--SAIGEOutputFile=./output/genotype_100markers_bgen_groupTest_out.txt \
--AlleleOrder=ref-first \
--minMAF=0 \
--minMAC=0.5 \
--sampleFile=./input/samplelist.txt \
--GMMATmodelFile=./output/example_binary_sparseGRM.rda \
--varianceRatioFile=./output/example_binary_sparseGRM_vr.varianceRatio.txt \
--sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--groupFile=./input/group_new_chrposa1a2.txt \
--annotation_in_groupTest="lof,missense:lof,missense:lof:synonymous" \
--maxMAF_in_groupTest=0.0001,0.001,0.01 \
--is_output_markerList_in_groupTest=TRUE \
--is_single_in_groupTest=TRUE \
--is_Firth_beta=TRUE \
--pCutoffforFirth=0.05 \
--LOCO=FALSE \
--is_fastTest=TRUE
```
b. (slower than a) Use full GRM to fit the null model, estimate the variance ratios, and run a faster version of the associaiton tests
```
Rscript step1_fitNULLGLMM.R \
--plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly_22chr \
--sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \
--phenoCol=y_binary \
--covarColList=x1,x2,a9,a10 \
--qCovarColList=a9,a10 \
--sampleIDColinphenoFile=IID \
--traitType=binary \
--outputPrefix=./output/example_binary_fullGRM \
--nThreads=64 \
--useSparseGRMtoFitNULL=FALSE \
--isCateVarianceRatio=TRUE \
--useSparseGRMforVarRatio=TRUE \
--IsOverwriteVarianceRatioFile=TRUE
#Step 2 for single variant tests only
Rscript step2_SPAtests.R \
--bgenFile=./input/genotype_100markers.bgen \
--bgenFileIndex=./input/genotype_100markers.bgen.bgi \
--sampleFile=./input/samplelist.txt \
--AlleleOrder=ref-first \
--SAIGEOutputFile=./output/genotype_100markers_marker_bgen_fullGRMforNull_with_vr.txt \
--chrom=1 \
--minMAF=0 \
--minMAC=20 \
--GMMATmodelFile=./output/example_binary_fullGRM.rda \
--varianceRatioFile=./output/example_binary_fullGRM.varianceRatio.txt \
--is_Firth_beta=TRUE \
--pCutoffforFirth=0.05 \
--is_output_moreDetails=TRUE \
--LOCO=TRUE
#Step 2 for set-based tests and also output single-variant tests results with --is_single_in_groupTest=TRUE
Rscript step2_SPAtests.R \
--bgenFile=./input/genotype_100markers.bgen \
--bgenFileIndex=./input/genotype_100markers.bgen.bgi \
--SAIGEOutputFile=./output/genotype_100markers_bgen_groupTest_out.txt \
--chrom=1 \
--LOCO=TRUE \
--AlleleOrder=ref-first \
--minMAF=0 \
--minMAC=0.5 \
--sampleFile=./input/samplelist.txt \
--GMMATmodelFile=./output/example_binary_fullGRM.rda \
--varianceRatioFile=./output/example_binary_fullGRM.varianceRatio.txt \
--sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--groupFile=./input/group_new_chrposa1a2.txt \
--annotation_in_groupTest="lof,missense:lof,missense:lof:synonymous" \
--maxMAF_in_groupTest=0.0001,0.001,0.01 \
--is_output_markerList_in_groupTest=TRUE \
--is_single_in_groupTest=TRUE \
--is_fastTest=TRUE