There is a newer version of the record available.

Published March 1, 2026 | Version v1
Dataset Open

Immune-microbiome coordination defines interferon setpoints in healthy humans - dataset

  • 1. ROR icon University of California, San Francisco
  • 2. ROR icon University of Pennsylvania
  • 1. ROR icon University of California, San Francisco

Description

This repository contains the data resource described in Babdor, Patel, et al (Cell, 2026), used to investigate immune–microbiome covariation in healthy individuals from the ImmunoMicrobiome cohort (n = 110) using multi-omic integrative analyses. Blood and stool samples were collected at two time points approximately 1.5 years apart.

Baseline samples (first time point) were processed initially and used to identify immune–microbiome associations in healthy individuals. Follow-up samples were subsequently analyzed using the same workflow to assess the longitudinal stability of the associations identified at baseline.

To minimize technical bias, all baseline samples were reprocessed and analyzed together with the follow-up samples. Baseline samples can be identified by the BLE1 tag in the sample names, whereas follow-up samples are labeled with AAE6 or ABE4 (both corresponding to the same follow-up time point).

Processed multiomic data

  • MOFA model pf baseline data and following analysis results
    • mofa_model_final_object_v1.rds is an R list with following elements:
      • "model" = "MOFA model"
      • "model.noOutl" = "MOFA model after removing the outliers"
      • "dataset_list" = "List of data.frames of each dataset"
      • "factors" = "MOFA factor scores"
      • "weights" = "MOFA factor weights"
      • "factors.noOutl" = "MOFA factor scores after removing the outliers"
      • "weights.noOutl" = "MOFA factor weights after removing the outliers"
      • "variance_explained" = "Variance explained by factors"
      • "variance_explained.noOutl" = "Variance explained by factors after removing the outliers"
      • "projection_coef" = "Projection coefficient from the linear models used for finding association of features with factors"
      • "projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors"
      • "signif_features" = "Features significantly associated with factors"
      • "signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction"
      • "enrichment_results_mod_afterCorr" = "mHG enrichment analysis results using factor weights for features significantly associated with factors after multiple-test correction"
      • "all_sig_terms_afterCorr" = "All significantly enriched terms"
      • "all_sig_term_composites_afterCorr" = "Composite signatures for the significantly enriched terms"
      • "enrichment_results_mod_afterCorr_withModSignatures" = "Enrichment analysis results including the modified signatures"
      • "enrichment_results_mod_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 before multiple-test correction in limma"
      • "enrichment_results_mod_afterCorr_f3vsf1_selectPathways" = "Enrichment analysis results using f3_vs_f1 log2FC for genes significantly different between f3 and f1 after multiple-test correction in limma"
      • "wOtherOmics_weights.noOutl" = "MOFA factor weights after removing the outliers, plus projection coefficients (i.e. coefficients from linear models used for finding association of these features with factors) for the omics that were not included in MOFA"
      • "wOtherOmics_projection_pval" = "Projection pvalues from the linear models used for finding association of features with factors, including the omics that were not included in MOFA"
      • "wOtherOmics_dataset_list" = "List of data.frames of each dataset, including the omics that were not included in MOFA"
      • "wOtherOmics_signif_features" = "Features significantly associated with factors, including features from omics that were not included in MOFA"
      • "wOtherOmics_signif_features_afterCorr" = "Features significantly associated with factors after multiple-test correction, including features from omics that were not included in MOFA"
      • "wOtherOmics_dataset_list_preScale" = "List of data.frames of each dataset before scaling the data, including the omics that were not included in MOFA. The data is log10-transformed with a pseudocount of 1 (log2-transformed plasmaOlink)."
      • "enrichment_results_mod_afterCorr_IFNSignatures"= "Enrichment analysis results for Chronic/Tonic/Acute interferon signatures"
  • processed_baseline_data.RData: R list of baseline dataset
    • datasets: R list of data.frames for non-CITE-seq datasets
      • "stoolMetab" = Stool metabolomics
      • "asa24" = 24-hour recall survey (ASA24)
      • "stoolSpecies" = Stool species abundance (stool metagenomics)
      • "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
      • "plasmaOlink" = Plasma proteomics (Olink)
      • "cytof" = PBMC cell population abundance (CyTOF)
      • "general_survey" = general survey
        • Each dataset contains the following elements:
          • "df" = log-transformed and scaled data
          • "sample_meta" = sample metadata
          • "feat_meta" = feature metadata
          • "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
          • "df_preScale_clr" = log- and CLR- transformed data
          • "mad" = median absolute deviation (MAD) of features using "df_preScale"
          • "mad.clr" = MAD of features using "df_preScale_clr"
          • "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
          • "var" = variance of features using "df_preScale"
          • "cov" = coefficient of variation of features using "df_preScale"
          • "cov.clr" = of features using "df_preScale_clr"
          • "df_clr" = log- and CLR- transformed, and scaled data
          • "pcs" = principal components
          • "pcvar" = variantion explained by each principal component
    • unmodnames: list of unmodified feature names for two datasets
  • processed_followup_data.RData: R list of longitudinal datasets
    • datasets: R list of data.frames for non-CITE-seq datasets
      • "stoolMetab" = Stool metabolomics
      • "stoolSpecies" = Stool species abundance (stool metagenomics)
      • "stoolPathAbu" = Stool metagenomic pathway abundance (stool metagenomics)
      • "cytof" = PBMC cell population abundance (CyTOF)
      • "plasmaOlink" = Plasma proteomics (Olink)
        • Each dataset contains the following elements:
          • "df" = log-transformed and scaled data
          • "df_preScale" = log-transformed data (base 2 for plasmaOlink, base 10 otherwise)
          • "df_preScale_clr" = log- and CLR- transformed data
          • "mad" = median absolute deviation (MAD) of features using "df_preScale"
          • "mad.clr" = MAD of features using "df_preScale_clr"
          • "norm.mad" = normalized MAD (MAD/mean) of features using "df_preScale"
          • "var" = variance of features using "df_preScale"
          • "cov" = coefficient of variation of features using "df_preScale"
          • "cov.clr" = of features using "df_preScale_clr"
          • "df_clr" = log- and CLR-transformed, and scaled data
          • "pcs" = principal components
          • "pcvar" = variation explained by each principal component

 

CyTOF data

  • CyTOF data: mass cytometry profiling of PBMCs from healthy individuals
    • cytof.tar
      • baseline.tar.gz: baseline samples from ImmunoMicrobiome cohort 
        • fcs/: FCS files of live singlet cells
        • baseline_processed.rds: R data.frame containing batch-corrected (cyCombine) and archsinh-transformed (cofactor=5) baseline data along with cluster id and sample metadata. Columns: donor_id = ID of the donor, run_id = CyTOF run ID, UMAP1 & UMAP2 = UMAP coordinates, cluster = cluster ID)
        • file_metadata.csv: FCS file metadata
      • followup.tar.gz: samples from the follow-up timepoint
        • fcs/: FCS files of live singlet cells
        • followup_clusters.rds: R data.frame containing cell type annotation for the followup dataset. Columns: fcs_cell_idx = index of cells in corresponding FCS files, run = CyTOF run ID, donor_id = ID of the donor, tp = timepoint (baseline/followup), clust = cluster ID)
      • clust_annotation.csv: cell type annotation for clusters in baseline_processed.rds and followup_clusters.rds

 

CITE-seq data

  • sobjs_all_both_corr_clust.rds: Seurat object of baseline data with the following assays
    • RNA: scRNA-seq data
    • ADT: DSB-normalized ADT data
    • raw.ADT: CLR-transformed ADT data
    • integrated.ADT: RPCA-integrated ADT data across batches
  • top5kgenes_pseudobulk_rna_baseline.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline samples
  • allgenes_pseudobulk_rna_followup.rds: pseudobulked (log10-transformed and scaled) expression of single-cell RNA-seq data for 5000 most highly variable genes for baseline and followup samples
  • Raw sequencing data and count matrix are available at GEO: GSE314416

 

Bulk RNA sequencing data GEO: GSE314922

Whole metagenome sequencing data BioProject: PRJNA1390888 / SRA: SRP656586

 

NOTE: HS99 and HS109 correspond to the same participant; measurements associated with the identifier HS109 were used for this participant.

Files

Files (31.3 GB)

Name Size Download all
md5:52a91dde7f27782813c6977d2b246de9
152.2 MB Download
md5:b77104a70009f07ec6c4a91de6dfd9ac
19.0 GB Download
md5:1b1027d818f67d23bc763209d2f4b4f9
918.2 MB Download
md5:1ce369751625b4d6e57e4f10910802b0
6.4 MB Download
md5:74cbb9cd937d31bc048a7c26736af1d7
9.1 MB Download
md5:57c3eae929990db0e1fbb6da3fa24863
11.0 GB Download
md5:22ed5ab99a5561ef16b0fc678d4f6958
120.2 MB Download

Additional details

Related works

Is source of
Publication: 10.1016/j.cell.2026.02.003 (DOI)