Immune-microbiome coordination defines interferon setpoints in healthy humans - dataset
Authors/Creators
Contributors
Supervisor (2):
Description
This repository contains the data resource described in Babdor, Patel, et al (Cell, 2026), used to investigate immune–microbiome covariation in healthy individuals from the ImmunoMicrobiome cohort (n = 110) using multi-omic integrative analyses. Blood and stool samples were collected at two time points approximately 1.5 years apart.
Baseline samples (first time point) were processed initially and used to identify immune–microbiome associations in healthy individuals. Follow-up samples were subsequently analyzed using the same workflow to assess the longitudinal stability of the associations identified at baseline.
To minimize technical bias, all baseline samples were reprocessed and analyzed together with the follow-up samples. Baseline samples can be identified by the BLE1 tag in the sample names, whereas follow-up samples are labeled with AAE6 or ABE4 (both corresponding to the same follow-up time point).
Processed multiomic data
- MOFA model pf baseline data and following analysis results
mofa_model_final_object_v1.rdsis an R list with following elements:- "model" = "MOFA model"
- "model.noOutl" = "MOFA model after removing the outliers"
- "dataset_list" = "List of data.frames of each dataset"
- "factors" = "MOFA factor scores"
- "weights" = "MOFA factor weights"
- "factors.noOutl" = "MOFA factor scores after removing the outliers"
- "weights.noOutl" = "MOFA factor weights after removing the outliers"
- "variance_explained" = "Variance explained by factors"
- "variance_explained.noOutl" = "Variance explained by factors after removing the outliers"
- "projection_coef" = "Projection coefficient from the linear models used for finding association of features with factors"
- "projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors"
- "signif_features" = "Features significantly associated with factors"
- "signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction"
- "enrichment_results_mod_afterCorr" = "mHG enrichment analysis results using factor weights for features significantly associated with factors after multiple-test correction"
- "all_sig_terms_afterCorr" = "All significantly enriched terms"
- "all_sig_term_composites_afterCorr" = "Composite signatures for the significantly enriched terms"
- "enrichment_results_mod_afterCorr_withModSignatures" = "Enrichment analysis results including the modified signatures"
- "enrichment_results_mod_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 before multiple-test correction in limma"
- "enrichment_results_mod_afterCorr_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 after multiple-test correction in limma"
- "wOtherOmics_weights.noOutl" = "MOFA factor weights after removing the outliers, plus projection coefficients (i.e. coefficients from linear models used for finding association of these features with factors) for the omics that were not included in MOFA"
- "wOtherOmics_projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors, including the omics that were not included in MOFA"
- "wOtherOmics_dataset_list" = "List of data.frames of each dataset, including the omics that were not included in MOFA"
- "wOtherOmics_signif_features" = "Features significantly associated with factors, including features from omics that were not included in MOFA"
- "wOtherOmics_signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction, including features from omics that were not included in MOFA"
- "wOtherOmics_dataset_list_preScale" = "List of data.frames of each dataset before scaling the data, including the omics that were not included in MOFA. The data is log10-transformed with a pseudocount of 1 (log2-transformed plasmaOlink)."
- "enrichment_results_mod_afterCorr_IFNSignatures"= "Enrichment analysis results for Chronic/Tonic/Acute interferon signatures"
processed_baseline_data.RData: R list of baseline dataset- datasets: R list of data.frames for non-CITE-seq datasets
- "stoolMetab" = Stool metabolomics
- "asa24" = 24-hour recall survey (ASA24)
- "stoolSpecies" = Stool species abundance (stool metagenomics)
- "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
- "plasmaOlink" = Plasma proteomics (Olink)
- "cytof" = PBMC cell population abundance (CyTOF)
- "general_survey" = general survey
- Each dataset contains the following elements:
- "df" = log-transformed and scaled data
- "sample_meta" = sample metadata
- "feat_meta" = feature metadata
- "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
- "df_preScale_clr" = log- and CLR- transformed data
- "mad" = median absolute deviation (MAD) of features using "df_preScale"
- "mad.clr" = MAD of features using "df_preScale_clr"
- "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
- "var" = variance of features using "df_preScale"
- "cov" = coefficient of variation of features using "df_preScale"
- "cov.clr" = of features using "df_preScale_clr"
- "df_clr" = log- and CLR- transformed, and scaled data
- "pcs" = principal components
- "pcvar" = variantion explained by each principal component
- Each dataset contains the following elements:
- unmodnames: list of unmodified feature names for two datasets
- datasets: R list of data.frames for non-CITE-seq datasets
processed_followup_data.RData: R list of longitudinal datasets- datasets: R list of data.frames for non-CITE-seq datasets
- "stoolMetab" = Stool metabolomics
- "stoolSpecies" = Stool species abundance (stool metagenomics)
- "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
- "cytof" = PBMC cell population abundance (CyTOF)
- "plasmaOlink" = Plasma proteomics (Olink)
- Each dataset contains the following elements:
- "df" = log-transformed and scaled data
- "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
- "df_preScale_clr" = log- and CLR- transformed data
- "mad" = median absolute deviation (MAD) of features using "df_preScale"
- "mad.clr" = MAD of features using "df_preScale_clr"
- "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
- "var" = variance of features using "df_preScale"
- "cov" = coefficient of variation of features using "df_preScale"
- "cov.clr" = of features using "df_preScale_clr"
- "df_clr" = log- and CLR-transformed, and scaled data
- "pcs" = principal components
- "pcvar" = variation explained by each principal component
- Each dataset contains the following elements:
- datasets: R list of data.frames for non-CITE-seq datasets
CyTOF data
- CyTOF data: mass cytometry profiling of PBMCs from healthy individuals
cytof.tar- baseline.tar.gz: baseline samples from ImmunoMicrobiome cohort
- fcs/: FCS files of live singlet cells
- baseline_processed.rds: R data.frame containing batch-corrected (cyCombine) and archsinh-transformed (cofactor=5) baseline data along with cluster id and sample metadata. Columns: donor_id = ID of the donor, run_id = CyTOF run ID, UMAP1 & UMAP2 = UMAP coordinates, cluster = cluster ID)
- file_metadata.csv: FCS file metadata
- followup.tar.gz: samples from the follow-up timepoint
- fcs/: FCS files of live singlet cells
- followup_clusters.rds: R data.frame containing cell type annotation for the followup dataset. Columns: fcs_cell_idx = index of cells in corresponding FCS files, run = CyTOF run ID, donor_id = ID of the donor, tp = timepoint (baseline/followup), clust = cluster ID)
- clust_annotation.csv: cell type annotation for clusters in baseline_processed.rds and followup_clusters.rds
- baseline.tar.gz: baseline samples from ImmunoMicrobiome cohort
CITE-seq data
sobjs_all_both_corr_clust.rds: Seurat object of baseline data with the following assays- RNA: scRNA-seq data
- ADT: DSB-normalized ADT data
- raw.ADT: CLR-transformed ADT data
- integrated.ADT: RPCA-integrated ADT data across batches
top5kgenes_pseudobulk_rna_baseline.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline samplesallgenes_pseudobulk_rna_followup.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline and followup samples- Raw sequencing data and count matrix are available at GEO: GSE314416
Bulk RNA sequencing data GEO: GSE314922
Whole metagenome sequencing data BioProject: PRJNA1390888 / SRA: SRP656586
NOTE: HS99 and HS109 correspond to the same participant; measurements associated with the identifier HS109 were used for this participant.
Files
Files
(31.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:52a91dde7f27782813c6977d2b246de9
|
152.2 MB | Download |
|
md5:b77104a70009f07ec6c4a91de6dfd9ac
|
19.0 GB | Download |
|
md5:1b1027d818f67d23bc763209d2f4b4f9
|
918.2 MB | Download |
|
md5:1ce369751625b4d6e57e4f10910802b0
|
6.4 MB | Download |
|
md5:74cbb9cd937d31bc048a7c26736af1d7
|
9.1 MB | Download |
|
md5:57c3eae929990db0e1fbb6da3fa24863
|
11.0 GB | Download |
|
md5:22ed5ab99a5561ef16b0fc678d4f6958
|
120.2 MB | Download |
Additional details
Related works
- Is source of
- Publication: 10.1016/j.cell.2026.02.003 (DOI)