Published April 15, 2025 | Version v1
Dataset Open

RNA-seq raw count data for: Integrative mapping of pre-existing influenza immune landscapes predicts vaccine response

  • 1. ROR icon University of Oxford
  • 2. ROR icon Boston University
  • 3. Stanford University
  • 4. Hochschule Hannover

Description

This dataset supports the findings of the following manuscript:

Title: Integrative mapping of pre-existing influenza immune landscapes for vaccine response prediction

Authors: Hao S, Tomic I, Lindsey BB, Jagne YJ, Hoschler K, Meijer A, Carreño Quiroz JM, Meade P, Sano K, Peno C, Costa-Martins AG, Bogaert D, Kampmann B, Nakaya H, Krammer F, de Silva TI, Tomic A.

Abstract of associated study

Predicting individual vaccine responses remains a significant challenge due to the complexity and variability of immune processes. To address this gap, we developed immunaut, an open-source, data-driven framework implemented as an R package specifically designed for all systems vaccinologists seeking to analyze and predict immunological outcomes across diverse vaccination settings. Leveraging one of the most comprehensive live attenuated influenza vaccine (LAIV) datasets to date - 244 Gambian children enrolled in a phase 4 immunogenicity study - immunaut integrates humoral, mucosal, cellular, transcriptomic, and microbiological parameters collected before and after vaccination, providing an unprecedentedly holistic view of LAIV-induced immunity. Through advanced dimensionality reduction, clustering, and predictive modeling, immunaut identifies distinct immunophenotypic responder profiles and their underlying baseline determinants. [...] By integrating pathway-level analysis, model-derived contribution scores, and hierarchical decision rules, immunaut elucidates how distinct immunological landscapes shape each response trajectory and how key baseline features, including pre-existing immunity, mucosal preparedness, and cellular support, dictate vaccine outcomes.


Dataset description

This repository contains baseline (pre-vaccination, Day 0) transcriptomic data (RNA-Seq) from nasal swab and whole blood samples collected from participants in the Gambian LAIV study cohorts enrolled in 2017 and 2018. The data is provided in the efficient Apache Parquet format. Participant identifiers have been anonymized using hashing.

File Contents:

The data is organized into two main directories: 'blood' and 'nasal'.

1.  Blood transcriptomic data:
    *   Located in the 'blood/' directory.
    *   Files contain gene expression data (integer counts) from whole blood samples.
    *   Format: Each file is a table with genes listed in the 'gene' column and hashed participant IDs as subsequent columns.

    *   'blood/dataset_2017.parquet': Data from the 2017 cohort.
    *   'blood/dataset_2018.parquet': Data from the 2018 cohort.

2.  Nasal transcriptomic data:
    *   Located in the 'nasal/' directory.
    *   Files contain gene expression data from nasal swab samples.
    *   Format: Each file is a matrix with genes as row identifiers (index) and hashed participant IDs as columns.

    *   'nasal/dataset_2017.parquet': Normalized data from the 2017 cohort.
    *   'nasal/dataset_2017.parquet`: Count data from the 2018 cohort.


Methods summary (Excerpt from the manuscript)

Transcriptomic profiles: RNA sequencing was conducted on nasal swabs from 121 participants and blood samples from 93 participants collected before LAIV to generate transcriptomic profiles following the protocol detailed in our previous work8. Briefly, Gene Set Enrichment Analysis (GSEA) was performed using the fgsea Bioconductor package, ranking genes by their Spearman correlation coefficients between rlog-normalized expression and LAIV viral loads. Enrichment was assessed separately for Reactome pathways and a cell-subset marker set (50 defining genes per subset), and single-sample GSEA (ssGSEA) was also conducted using pre-vaccination (baseline) gene expression values for each participant. Normalized enrichment scores (NES), adjusted p-values, and leading-edge genes were extracted for each pathway. Pathways with an adjusted p < 0.1 were considered significant, representing a more stringent threshold than the commonly used p < 0.25.

Note: This Zenodo record provides the processed baseline gene expression data (rlog-normalized for nasal 2017, counts for nasal 2018, and blood 2017/2018) used as input for ssGSEA analysis described in the manuscript.

Related resources:

License:

This dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Contact:

https://www.atomic-lab.org

Files

transcriptome_data.zip

Files (16.3 MB)

Name Size Download all
md5:f616ffdf911a1a3c8de8d57105486175
16.3 MB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2025.01.22.634302 (DOI)

Software

Repository URL
https://github.com/atomiclaboratory/immunaut
Development Status
Active