perMarkerQC checks the markers in the plink dataset for their missingness
rates across samples, their deviation from Hardy-Weinberg-Equilibrium (HWE)
and their minor allele frequencies (MAF). Per default, it assumes that IDs of
individuals that have failed perIndividualQC
have been written
to qcdir/name.fail.IDs and removes these individuals when computing
missingness rates, HWE p-values and MAF. If the qcdir/name.fail.IDs file does
not exist, a message is written to stdout but the analyses will continue for
all samples in the name.fam/name.bed/name.bim dataset.
Depicts i) SNP missingness rates (stratified by minor allele
frequency) as histograms, ii) p-values of HWE exact test (stratified by all
and low p-values) as histograms and iii) the minor allele frequency
distribution as a histogram.
perMarkerQC(indir, qcdir = indir, name, do.check_snp_missingness = TRUE, lmissTh = 0.01, do.check_hwe = TRUE, hweTh = 1e-05, do.check_maf = TRUE, macTh = 20, mafTh = NULL, interactive = FALSE, verbose = TRUE, path2plink = NULL, showPlinkOutput = TRUE)
indir | [character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files. |
---|---|
qcdir | [character] /path/to/directory where results will be written to.
If |
name | [character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam. |
do.check_snp_missingness | [logical] If TRUE, run
|
lmissTh | [double] Threshold for acceptable variant missing rate across samples. |
do.check_hwe | [logical] If TRUE, run |
hweTh | [double] Significance threshold for deviation from HWE. |
do.check_maf | [logical] If TRUE, run |
macTh | [double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples). |
mafTh | [double] Threshold for minor allele frequency cut-off. |
interactive | [logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_marker) via ggplot2::ggsave(p=p_marker, other_arguments) or pdf(outfile) print(p_marker) dev.off(). |
verbose | [logical] If TRUE, progress info is printed to standard out. |
path2plink | [character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
showPlinkOutput | [logical] If TRUE, plink log and error messages are printed to standard out. |
Named [list] with i) fail_list, a named [list] with 1.
SNP_missingness, containing SNP IDs [vector] failing the missingness
threshold lmissTh, 2. hwe, containing SNP IDs [vector] failing the HWE exact
test threshold hweTh and 3. maf, containing SNPs Ids [vector] failing the MAF
threshold mafTh/MAC threshold macTh and ii) p_markerQC, a ggplot2-object
'containing' a sub-paneled plot with the QC-plots of
check_snp_missingness
, check_hwe
and
check_maf
, which can be shown by print(p_markerQC).
List entries contain NULL if that specific check was not chosen.
perMarkerQC wraps around the marker QC functions
check_snp_missingness
, check_hwe
and
check_maf
. For details on the parameters and outputs, check
these function documentations.
indir <- system.file("extdata", package="plinkQC") qcdir <- tempdir() name <- "data" path2plink <- '/path/to/plink' # the following code is not run on package build, as the path2plink on the # user system is not known. # All quality control checks# NOT RUN { fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name, interactive=FALSE, verbose=TRUE, path2plink=path2plink) # }