R/applyQC.R
cleanData.Rd
Individuals that fail per-individual QC and markers that fail per-marker QC are removed from indir/name.bim/.bed/.fam and a new, dataset with the remaining individuals and markers is created as qcdir/name.clean.bim/.bed/.fam.
cleanData(indir, name, qcdir = indir, filterSex = TRUE, filterHeterozygosity = TRUE, filterSampleMissingness = TRUE, filterAncestry = TRUE, filterRelated = TRUE, filterSNPMissingness = TRUE, lmissTh = 0.01, filterHWE = TRUE, hweTh = 1e-05, filterMAF = TRUE, macTh = 20, mafTh = NULL, path2plink = NULL, verbose = FALSE, showPlinkOutput = TRUE)
indir | [character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files. |
---|---|
name | [character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam. |
qcdir | [character] /path/to/directory where results will be written to.
If |
filterSex | [logical] Set to exclude samples that failed the sex
check (via |
filterHeterozygosity | [logical] Set to exclude samples that failed
check for outlying heterozygosity rates (via
|
filterSampleMissingness | [logical] Set to exclude samples that failed
check for excessive missing genotype rates (via
|
filterAncestry | [logical] Set to exclude samples that failed ancestry
check (via |
filterRelated | [logical] Set to exclude samples that failed relatedness
check (via |
filterSNPMissingness | [logical] Set to exclude markers that have
excessive missing rates across samples (via
|
lmissTh | [double] Threshold for acceptable variant missing rate across samples. |
filterHWE | [logical] Set to exclude markers that fail HWE exact test
(via |
hweTh | [double] Significance threshold for deviation from HWE. |
filterMAF | [logical] Set to exclude markers that fail minor allele
frequency or minor allele count threshold (via |
macTh | [double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples). |
mafTh | [double] Threshold for minor allele frequency cut-off. |
path2plink | [character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
verbose | [logical] If TRUE, progress info is printed to standard out. |
showPlinkOutput | [logical] If TRUE, plink log and error messages are printed to standard out. |
names [list] with i) passIDs, containing a [data.frame] with family [FID] and individual [IID] IDs of samples that pass the QC, ii) failIDs, containing a [data.frame] with family [FID] and individual [IID] IDs of samples that fail the QC.
package.dir <- find.package('plinkQC') indir <- file.path(package.dir, 'extdata') qcdir <- tempdir() name <- "data" path2plink <- '/path/to/plink' # the following code is not run on package build, as the path2plink on the # user system is not known.# NOT RUN { # Run individual QC checks fail_individuals <- perIndividualQC(indir=indir, qcdir=qcdir, name=name, refSamplesFile=paste(qcdir, "/HapMap_ID2Pop.txt",sep=""), refColorsFile=paste(qcdir, "/HapMap_PopColors.txt", sep=""), prefixMergedDataset="data.HapMapIII", interactive=FALSE, verbose=FALSE) # Run marker QC checks fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name) # Create new dataset of indiviudals and markers passing QC ids_all <- cleanData(indir=indir, qcdir=qcdir, name=name, macTh=15, verbose=TRUE, path2plink=path2plink, filterAncestry=FALSE, filterRelated=TRUE) # }