################################################################
##### Author: Zechuan Lin, Yale School of Medicine #############
##### E-mail: zechuan.lin@yale.edu #############################

These scripts are implementing a two-step approach for detecting eQTL and conditional eQTL analysis for cell type gene expression in single cell RNA seq study.

Relevant R packages: SVA (version 3.52.0, DOI: 10.18129/B9.bioc.sva), MatrixEQTL (version 2.1.0, link: https://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/runit.html)
optparse (version 1.7.5, for passing parameters from command line, link: https://cran.r-project.org/web/packages/optparse/index.html)

These scripts implemented eQTL detection for each cell type by these five steps:

Step 1. Remove of batch effect by using ComBat (pls see: 10.1093/biostatistics/kxj037):

Script run: Rscript control_of_covariances.R gene_expression_matrix covariate_matrix

Known sequencing batch effect will be used as a category variable and batch effect is removed by using ComBat function.

Note that the gene expression matrix should look like following:
	Subject1	Subject2
Gene1	1	1
Gene2	2	2

The covariate matrix should look like:
	Cov1	Cov2
Subject1 A	A
Subject2 B	B

Step 2. Estimate the hidden relatedness among subjects by using Surrogate Variable Analysis (SVA, pls see: 10.1093/bioinformatics/bts034), and remove the hidden relatedness effect + known covariate effects by using fsva.

Script run: Rscript control_of_covariances.R gene_expression_matrix covariate_matrix

This step is implemented in the same script as step2, pls see the input of step2.

Step 3. Detect eQTLs by implementing MatrixEQTL:

Script run: Rscript matrixeQTL_for_eQTL_detection.R -g genotype_input -e expression_matrix -p gene_position -s SNP_position -o output_prefix

All file formats should follow the standard input format of MatrixEQTL!!

The covariates free gene expression data, the genotype data and gene position files were input for detecting eQTLs in each cell type

Step 4. Detect conditional eQTLs by implementing iternative regression using generalized linear model (GLM):

Script run: Rscript conditional_eQTL_analysis_for_each_gene.R input_file. The input file is just like the Demo file: Demo_data.conditional_eQTL.BIN3

For a known eQTL for a eGene, the SNPs within 1Mb of the eGene and the expression of the eGene were input for detecting conditional eQTLs by an iternative approch:
For iternation 1, this script use eGene expression as Y and numeric genotype of each SNPs as x and the eSNPs of all significant eQTLs for the eGene as covariates.
If no additional eQTLs detected after the iternation, the iternation will stop. If additional eQTLs detected (pass FDR <= 0.05), this iternation will continue by adding the eQTL into
model covariates, and the iternation will never stop untill no additional significant eQTL detected. 

