Code and data archive for Nettle et al. 'Consequences of measurement error in qPCR telomere data: A simulation study'
Description
Code and data for Nettle et al. 'Consequences of measurement error in qPCR telomere data: A simulation study'
Main simulation functions are contained in the script ‘simulation.functions.r’. When called, these functions (listed below) return datasets with requested properties containing both the ideal values of the quantities (Cqs, TS, etc.), and their post-error measured values. This allows the user to determine the differences between ideal and measured values, and perform other analyses. All simulation parameter values are user-specifiable. The script ‘paper.results.r’ reproduces all the figures and simulation results from the main paper. 'paper.results.r' also reads in the two .csv files of empirical data (dataset1 and dataset2).
Datasets consist of observations from n individuals. The steps common to all of the simulation functions are as follows:
- A vector of n true single copy gene abundances, true.dna.scg is defined, drawn from a normal distribution with mean b and standard deviation var.sample.size (b is a constant).
- A vector of n relative telomere lengths, true.telo.var is defined, drawn from a normal distribution with mean 1 and standard deviation telomere.var.
- Hence, the true abundance of the telomere sequence is defined, as a*true.dna.scg*true.telo.var. Here, a is a scaling constant representing how many copies of the telomeric sequence there are per single copy gene in the average sample.
- Ideal Cq values for both reactions are defined as f – log2(true.dna.scg) and f – log2(true.dna.telo), where f is a constant representing the chosen fluorescence threshold.
- Measurement errors in the Cqs are generated from a normal distribution with mean 0; standard deviations given by error.scg and error.telo; and a correlation between error.scg and error.telo given by error.cor.
- Hence, measured Cqs are generated, which can be compared to the ideal Cq values.
- TS ratios are calculated both on the measured Cqs, and the ideal ones.
The following functions are available. Specify desired parameter values in the parenthesis, e.g. generate.one.dataset(n=10000, error.telo=0.1, error.scg=0.1, error.cor=0). Default values in the simulation functions are generally those given in table 1 of the main paper.
- generate.one.dataset() returns a simple dataset (one telomere measurement per individual) for chosen values of all the variables described in section 1. As well as ideal and measured Cqs, it returns ideal and measured TS ratios. It also returns the difference between the ideal and measured TS ratio, calculated two ways, computed (error.computed), and using equation (11) of online supplement 1 (error.analytic). Both methods produce the same number. This was included as an additional check of correctness of the simulation.
- generate.repeated.measure() returns a dataset where telomere lengths from the same individuals are measured twice, via two independent biological samples, and the true telomere length of each individual is assumed not to have changed at all. The data frame it returns is as for generate.one.dataset(), except that there are two of each variable (e.g. true.ts.1, true.ts.2, measured.ts.1, measured.ts.2, etc.).
- calculate.repeatability() calculates the repeatability of the measured T/S ratio (intra-class correlation coefficient) when generate.repeated.measure() is implemented using the given values for all the parameters. It requires prior installation of R package ‘irr’.
- compare.repeatability() returns the repeatability of the T/S ratio and the repeatability calculated on the raw Cq for the telomere reaction, for the given parameter values.