SCID Multiomics Post-Processed Data and Analysis
Description
In this repository are the post-processed datasets and analytical code for the SCID Multiomics paper. The repository is structured as an installable R package for dependency management and dataset loading; it does not export any functions.
Installation
The easiest way to install this is to download the repository and install using `devtools::install()`. This will allow the import of various datasets using the `data()` function, upon which many of the analysis scripts depend.
Datasets
In no particular order, the important datasets are described below:
- intsites: summary statistics from (Wang et al, Blood, 2010) for timepoints used in this study
- tcr: Aggregate TCR data from Adaptive Biotechnology's ImmunoSeq pipeline.
- mb: Metadata for the microbiome sampling timepoints, as well as species data from Metaphlan (not used)
- agg.mb.kz: Kraken species data for the microbiome samples, after low-complexity filtering
- agg.vp.kz: Kraken species data for the virome samples, after low-complexity filtering
- card: Antibiotic resistance gene data from CARD
- subject_ids.csv: Provides a mapping from the original sample IDs used in the datasets to the ones used in the manuscript.
The code for creating these datasets from the original data files are in the `data-raw` directory.
Analysis/Figures
The analysis code is broken apart by subject and is largely concerned with figure generation. The R
scripts are all located in the `inst` folder. To generate all figures, you should run each script in the
order specified by the `GenerateFigures.R` file.
Figures are output to the `figures` directory, while tables are output to the `tables` directory.
Please note: many of the figures used in the manuscript were aesthetically modified after generation (text size, color palette, orientation), precluding exact figure replication
Files
v0.1.0.zip
Files
(280.3 MB)
Name | Size | Download all |
---|---|---|
md5:1d56c70e6c9cf0808ac0bea8fb783115
|
280.3 MB | Preview Download |