Runs and evaluates results from plink --freq. It calculates the minor allele
frequencies for all variants in the individuals that passed the
perIndividualQC
. The minor allele frequency distributions is
plotted as a histogram.
check_maf(indir, name, qcdir = indir, macTh = 20, mafTh = NULL, verbose = FALSE, interactive = FALSE, path2plink = NULL, showPlinkOutput = TRUE)
indir | [character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files. |
---|---|
name | [character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam. |
qcdir | [character] /path/to/directory where results will be written to.
If |
macTh | [double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples). |
mafTh | [double] Threshold for minor allele frequency cut-off. |
verbose | [logical] If TRUE, progress info is printed to standard out and specifically, if TRUE, plink log will be displayed. |
interactive | [logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_maf, other_arguments) or pdf(outfile) print(p_maf) dev.off(). |
path2plink | [character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
showPlinkOutput | [logical] If TRUE, plink log and error messages are printed to standard out. |
Named list with i) fail_maf containing a [data.frame] with CHR (Chromosome code), SNP (Variant identifier), A1 (Allele 1; usually minor), A2 (Allele 2; usually major), MAF (Allele 1 frequency), NCHROBS (Number of allele observations) for all SNPs that failed the mafTh/macTh and ii) p_maf, a ggplot2-object 'containing' the MAF distribution histogram which can be shown by (print(p_maf)).
check_maf
uses plink --remove name.fail.IDs --freq to calculate the
minor allele frequencies for all variants in the individuals that passed the
perIndividualQC
. It does so without generating a new dataset
but simply removes the IDs when calculating the statistics.
For details on the output data.frame fail_maf, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#frq.
indir <- system.file("extdata", package="plinkQC") qcdir <- tempdir() name <- "data" path2plink <- '/path/to/plink' # the following code is not run on package build, as the path2plink on the # user system is not known.# NOT RUN { fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, macTh=15, interactive=FALSE, verbose=TRUE, path2plink=path2plink) # }