Immune-microbiome coordination defines interferon setpoints in healthy humans - dataset

Patel, Ravi; Babdor, Joel

doi:10.5281/zenodo.18012243

Published March 1, 2026 | Version v1

Dataset Open

Immune-microbiome coordination defines interferon setpoints in healthy humans - dataset

1. University of California, San Francisco
2. University of Pennsylvania

Contributors

Supervisor (2):

1. University of California, San Francisco

This repository contains the data resource described in Babdor, Patel, et al (Cell, 2026), used to investigate immune–microbiome covariation in healthy individuals from the ImmunoMicrobiome cohort (n = 110) using multi-omic integrative analyses. Blood and stool samples were collected at two time points approximately 1.5 years apart.

Baseline samples (first time point) were processed initially and used to identify immune–microbiome associations in healthy individuals. Follow-up samples were subsequently analyzed using the same workflow to assess the longitudinal stability of the associations identified at baseline.

To minimize technical bias, all baseline samples were reprocessed and analyzed together with the follow-up samples. Baseline samples can be identified by the BLE1 tag in the sample names, whereas follow-up samples are labeled with AAE6 or ABE4 (both corresponding to the same follow-up time point).

Processed multiomic data

MOFA model pf baseline data and following analysis results
- mofa_model_final_object_v1.rds is an R list with following elements:
  - "model" = "MOFA model"
  - "model.noOutl" = "MOFA model after removing the outliers"
  - "dataset_list" = "List of data.frames of each dataset"
  - "factors" = "MOFA factor scores"
  - "weights" = "MOFA factor weights"
  - "factors.noOutl" = "MOFA factor scores after removing the outliers"
  - "weights.noOutl" = "MOFA factor weights after removing the outliers"
  - "variance_explained" = "Variance explained by factors"
  - "variance_explained.noOutl" = "Variance explained by factors after removing the outliers"
  - "projection_coef" = "Projection coefficient from the linear models used for finding association of features with factors"
  - "projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors"
  - "signif_features" = "Features significantly associated with factors"
  - "signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction"
  - "enrichment_results_mod_afterCorr" = "mHG enrichment analysis results using factor weights for features significantly associated with factors after multiple-test correction"
  - "all_sig_terms_afterCorr" = "All significantly enriched terms"
  - "all_sig_term_composites_afterCorr" = "Composite signatures for the significantly enriched terms"
  - "enrichment_results_mod_afterCorr_withModSignatures" = "Enrichment analysis results including the modified signatures"
  - "enrichment_results_mod_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 before multiple-test correction in limma"
  - "enrichment_results_mod_afterCorr_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 after multiple-test correction in limma"
  - "wOtherOmics_weights.noOutl" = "MOFA factor weights after removing the outliers, plus projection coefficients (i.e. coefficients from linear models used for finding association of these features with factors) for the omics that were not included in MOFA"
  - "wOtherOmics_projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors, including the omics that were not included in MOFA"
  - "wOtherOmics_dataset_list" = "List of data.frames of each dataset, including the omics that were not included in MOFA"
  - "wOtherOmics_signif_features" = "Features significantly associated with factors, including features from omics that were not included in MOFA"
  - "wOtherOmics_signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction, including features from omics that were not included in MOFA"
  - "wOtherOmics_dataset_list_preScale" = "List of data.frames of each dataset before scaling the data, including the omics that were not included in MOFA. The data is log10-transformed with a pseudocount of 1 (log2-transformed plasmaOlink)."
  - "enrichment_results_mod_afterCorr_IFNSignatures"= "Enrichment analysis results for Chronic/Tonic/Acute interferon signatures"

processed_baseline_data.RData: R list of baseline dataset
- datasets: R list of data.frames for non-CITE-seq datasets
  - "stoolMetab" = Stool metabolomics
  - "asa24" = 24-hour recall survey (ASA24)
  - "stoolSpecies" = Stool species abundance (stool metagenomics)
  - "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
  - "plasmaOlink" = Plasma proteomics (Olink)
  - "cytof" = PBMC cell population abundance (CyTOF)
  - "general_survey" = general survey
    - Each dataset contains the following elements:
      - "df" = log-transformed and scaled data
      - "sample_meta" = sample metadata
      - "feat_meta" = feature metadata
      - "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
      - "df_preScale_clr" = log- and CLR- transformed data
      - "mad" = median absolute deviation (MAD) of features using "df_preScale"
      - "mad.clr" = MAD of features using "df_preScale_clr"
      - "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
      - "var" = variance of features using "df_preScale"
      - "cov" = coefficient of variation of features using "df_preScale"
      - "cov.clr" = of features using "df_preScale_clr"
      - "df_clr" = log- and CLR- transformed, and scaled data
      - "pcs" = principal components
      - "pcvar" = variantion explained by each principal component
- unmodnames: list of unmodified feature names for two datasets

processed_followup_data.RData: R list of longitudinal datasets
- datasets: R list of data.frames for non-CITE-seq datasets
  - "stoolMetab" = Stool metabolomics
  - "stoolSpecies" = Stool species abundance (stool metagenomics)
  - "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
  - "cytof" = PBMC cell population abundance (CyTOF)
  - "plasmaOlink" = Plasma proteomics (Olink)
    - Each dataset contains the following elements:
      - "df" = log-transformed and scaled data
      - "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
      - "df_preScale_clr" = log- and CLR- transformed data
      - "mad" = median absolute deviation (MAD) of features using "df_preScale"
      - "mad.clr" = MAD of features using "df_preScale_clr"
      - "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
      - "var" = variance of features using "df_preScale"
      - "cov" = coefficient of variation of features using "df_preScale"
      - "cov.clr" = of features using "df_preScale_clr"
      - "df_clr" = log- and CLR-transformed, and scaled data
      - "pcs" = principal components
      - "pcvar" = variation explained by each principal component

CyTOF data

CyTOF data: mass cytometry profiling of PBMCs from healthy individuals
- cytof.tar
  - baseline.tar.gz: baseline samples from ImmunoMicrobiome cohort
    - fcs/: FCS files of live singlet cells
    - baseline_processed.rds: R data.frame containing batch-corrected (cyCombine) and archsinh-transformed (cofactor=5) baseline data along with cluster id and sample metadata. Columns: donor_id = ID of the donor, run_id = CyTOF run ID, UMAP1 & UMAP2 = UMAP coordinates, cluster = cluster ID)
    - file_metadata.csv: FCS file metadata
  - followup.tar.gz: samples from the follow-up timepoint
    - fcs/: FCS files of live singlet cells
    - followup_clusters.rds: R data.frame containing cell type annotation for the followup dataset. Columns: fcs_cell_idx = index of cells in corresponding FCS files, run = CyTOF run ID, donor_id = ID of the donor, tp = timepoint (baseline/followup), clust = cluster ID)
  - clust_annotation.csv: cell type annotation for clusters in baseline_processed.rds and followup_clusters.rds

CITE-seq data

sobjs_all_both_corr_clust.rds: Seurat object of baseline data with the following assays
- RNA: scRNA-seq data
- ADT: DSB-normalized ADT data
- raw.ADT: CLR-transformed ADT data
- integrated.ADT: RPCA-integrated ADT data across batches
top5kgenes_pseudobulk_rna_baseline.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline samples
allgenes_pseudobulk_rna_followup.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline and followup samples
Raw sequencing data and count matrix are available at GEO: GSE314416

Bulk RNA sequencing data GEO: GSE314922

Whole metagenome sequencing data BioProject: PRJNA1390888 / SRA: SRP656586

NOTE: HS99 and HS109 correspond to the same participant; measurements associated with the identifier HS109 were used for this participant.

Files

Files (31.3 GB)

Name	Size	Download all
allgenes_pseudobulk_rna_followup.rds md5:52a91dde7f27782813c6977d2b246de9	152.2 MB	Download
cytof.tar md5:b77104a70009f07ec6c4a91de6dfd9ac	19.0 GB	Download
mofa_model_final_object_v1.rds md5:1b1027d818f67d23bc763209d2f4b4f9	918.2 MB	Download
processed_baseline_data.RData md5:1ce369751625b4d6e57e4f10910802b0	6.4 MB	Download
processed_followup_data.RData md5:74cbb9cd937d31bc048a7c26736af1d7	9.1 MB	Download
sobjs_all_both_corr_clust.rds md5:57c3eae929990db0e1fbb6da3fa24863	11.0 GB	Download
top5kgenes_pseudobulk_rna_baseline.rds md5:22ed5ab99a5561ef16b0fc678d4f6958	120.2 MB	Download

Additional details

Is source of: Publication: 10.1016/j.cell.2026.02.003 (DOI)

	All versions	This version
Views	204	143
Downloads	168	96
Data volume	882.5 GB	579.6 GB

Immune-microbiome coordination defines interferon setpoints in healthy humans - dataset

Authors/Creators

Contributors

Supervisor (2):

Description

Files

Files (31.3 GB)

Additional details

Related works