Investigating differential abundance methods in microbiome data: a benchmark study
Description
This is the repository containing the results shown in Cappellato M., Baruzzo G., Di Camillo B. "Investigating differential abundance methods in microbiome data: a benchmark study." (2021).
In the GitLab repository here there is the R package metaBenchDA used to perform the benchmarking study previously cited. The R package contains the data and the code to perform the simulation framework, to run all the differential abundance analysis methods and to assess methods' performance. The GitLab repository contains also the Docker image metabenchda:2.0.0; the Docker image contains the R package metaBenchDA and the code to reproduce the results shown in the paper.
As an alternative, to reproduce the results in the main manuscript follow the instructions here.
The results folder here in Zenodo contains all the files needed to run the code in every point of the framework. The results folder contains:
-
lib_unbalanced results in the scenario with DA features and setting both the intensity threshold \(\theta = 1 \cdot f_{2}\)and unbalanced sequencing depth between conditions.
-
metagenomeSeq_res results running two models implemented in metagenomeSeq, namely the zero-inflated Gaussian (ZIG) model and the zero-inflated Log-Gaussian (ZILG) mixture model, in the scenario with DA features and setting the intensity threshold \(\theta = 1 \cdot f_{2}\).
-
new_datasets results in the scenario with DA features and setting the intensity threshold \(\theta = 1 \cdot f_{2}\) for AnimalGut and Soil datasets.
-
NOth results in the scenario with DA features and without setting the intensity threshold \(\theta = 1 \cdot 0\) .
-
NULL results in the scenario without DA features.
-
WITHth results in the scenario with DA features and setting the intensity threshold \(\theta = 1 \cdot f_{2}\).
-
WITHth_GMPR results in the scenario with DA features and setting the intensity threshold \(\theta = 1 \cdot f_{2}\). We test the methods with GMPR normalisation.
-
WITHth_HALFvar results in the scenario with DA features and decreasing the variability parameter \(\varphi = \dfrac{\varphi}{2}\).
Inside the folders the results considering structural zeros as TN are available in the reassign folder, while considering the choice of methods in methodoutput.
Inside each folder there are:
- simulation all the dataset simulated. All results from the SECTION 1: GENERATE DATASET in the script file. In metagenomeSeq_res and WITHth_GMPR this folder is absent since we run the methods on the WITHth simulation scenario.
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- NAMECONF_WW_SSXX_PPYY_FCZ1-Z2_simN.RData where NAMECONF is the name of the configuration (e.g. NULL, NOth ...), N is the number of the simulation.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- methods the output of each method involved in the comparison. All results from the SECTION 2: RUN DA METHODS in the script file
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- METHODNAME one folder for each method. In WITHth_GMPR folder each metod is labelled with _gmpr.
- NAMECONF_WW_SSXX_PPYY_FCZ1-Z2_simN_METHODNAME.RData where NAMECONF is the name of the configuration (e.g. NULL, NOth ...), N is the number of the simulation.
- METHODNAME one folder for each method. In WITHth_GMPR folder each metod is labelled with _gmpr.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- metrics performance evaluation. All results from the SECTION 3: RUN METRICS in the script file
- methodoutput/reassign where methodoutput means that the output of the methods has been considered, while reassign that the structural zeros are considered as TN
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- NAMECONF_methodoutput/reassign_WW_SSXX_PPYY_FCZ1-Z2_NAMEMETRIC.RData where NAMECONF is the name of the configuration (e.g. NULL, NOth ...), NAMEMETRIC is the name of the metric.
- WW_SSXX_PPYY_FCZ1-Z2 where WW is the name of the dataset from which the simulation parameters are estimated.
- SSXX_PPYY_FCZ1-Z2 where XX is the sample size, YY is the percentage of DA features, Z1-Z2 is the Fold Change limit. In the NULL configuration PPYY is not present.
- methodoutput/reassign where methodoutput means that the output of the methods has been considered, while reassign that the structural zeros are considered as TN
- figure all the figures in the manuscript. All the figures are generated from the script file Figures.R.
- methodoutput/reassign where methodoutput means that the output of the methods has been considered, while reassign that the structural zeros are considered as TN
- NAMECONF_methodoutput/reassign_NAMEFIGURE.jpeg where NAMECONF is the name of the configuration (e.g. NULL, NOth ...), NAMEFIGURE is the name of the metrics e.g. RECALL_boxplot is the recall values of each method in each configuration of SS, PP, dataset.
- methodoutput/reassign where methodoutput means that the output of the methods has been considered, while reassign that the structural zeros are considered as TN
Files
results.zip
Files
(3.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f912cdf33e625e350fda7c835dd496a7
|
3.0 GB | Preview Download |