RNA-seq raw count data for: Integrative mapping of pre-existing influenza immune landscapes predicts vaccine response

Tomic, Adriana; Tomic, Ivan; de Silva, Thushan

doi:10.5281/zenodo.15224712

Published April 15, 2025 | Version v1

Dataset Open

RNA-seq raw count data for: Integrative mapping of pre-existing influenza immune landscapes predicts vaccine response

1. University of Oxford
2. Boston University
3. Stanford University
4. Hochschule Hannover

This dataset supports the findings of the following manuscript:

Title: Integrative mapping of pre-existing influenza immune landscapes for vaccine response prediction

Authors: Hao S, Tomic I, Lindsey BB, Jagne YJ, Hoschler K, Meijer A, Carreño Quiroz JM, Meade P, Sano K, Peno C, Costa-Martins AG, Bogaert D, Kampmann B, Nakaya H, Krammer F, de Silva TI, Tomic A.

Abstract of associated study

Predicting individual vaccine responses remains a significant challenge due to the complexity and variability of immune processes. To address this gap, we developed immunaut, an open-source, data-driven framework implemented as an R package specifically designed for all systems vaccinologists seeking to analyze and predict immunological outcomes across diverse vaccination settings. Leveraging one of the most comprehensive live attenuated influenza vaccine (LAIV) datasets to date - 244 Gambian children enrolled in a phase 4 immunogenicity study - immunaut integrates humoral, mucosal, cellular, transcriptomic, and microbiological parameters collected before and after vaccination, providing an unprecedentedly holistic view of LAIV-induced immunity. Through advanced dimensionality reduction, clustering, and predictive modeling, immunaut identifies distinct immunophenotypic responder profiles and their underlying baseline determinants. [...] By integrating pathway-level analysis, model-derived contribution scores, and hierarchical decision rules, immunaut elucidates how distinct immunological landscapes shape each response trajectory and how key baseline features, including pre-existing immunity, mucosal preparedness, and cellular support, dictate vaccine outcomes.

Dataset description

This repository contains baseline (pre-vaccination, Day 0) transcriptomic data (RNA-Seq) from nasal swab and whole blood samples collected from participants in the Gambian LAIV study cohorts enrolled in 2017 and 2018. The data is provided in the efficient Apache Parquet format. Participant identifiers have been anonymized using hashing.

File Contents:

The data is organized into two main directories: 'blood' and 'nasal'.

1. Blood transcriptomic data:
* Located in the 'blood/' directory.
* Files contain gene expression data (integer counts) from whole blood samples.
* Format: Each file is a table with genes listed in the 'gene' column and hashed participant IDs as subsequent columns.

* 'blood/dataset_2017.parquet': Data from the 2017 cohort.
* 'blood/dataset_2018.parquet': Data from the 2018 cohort.

2. Nasal transcriptomic data:
* Located in the 'nasal/' directory.
* Files contain gene expression data from nasal swab samples.
* Format: Each file is a matrix with genes as row identifiers (index) and hashed participant IDs as columns.

* 'nasal/dataset_2017.parquet': Normalized data from the 2017 cohort.
* 'nasal/dataset_2017.parquet`: Count data from the 2018 cohort.

Methods summary (Excerpt from the manuscript)

Transcriptomic profiles: RNA sequencing was conducted on nasal swabs from 121 participants and blood samples from 93 participants collected before LAIV to generate transcriptomic profiles following the protocol detailed in our previous work8. Briefly, Gene Set Enrichment Analysis (GSEA) was performed using the fgsea Bioconductor package, ranking genes by their Spearman correlation coefficients between rlog-normalized expression and LAIV viral loads. Enrichment was assessed separately for Reactome pathways and a cell-subset marker set (50 defining genes per subset), and single-sample GSEA (ssGSEA) was also conducted using pre-vaccination (baseline) gene expression values for each participant. Normalized enrichment scores (NES), adjusted p-values, and leading-edge genes were extracted for each pathway. Pathways with an adjusted p < 0.1 were considered significant, representing a more stringent threshold than the commonly used p < 0.25.

Note: This Zenodo record provides the processed baseline gene expression data (rlog-normalized for nasal 2017, counts for nasal 2018, and blood 2017/2018) used as input for ssGSEA analysis described in the manuscript.

Related resources:

immunaut framework: The analytical platform used in the study is available via the PANDORA AI platform and as an R package 'immunaut' on CRAN
immunaut documentation: General package documentation is hosted on Atomic Lab's GitHub
Analysis code: The specific code used for figure generation and modeling presented in the paper can be found on Atomic Lab's GitHub
Full integrated dataset: The complete integrated and de-identified dataset supporting all findings: "Comprehensive Multimodal Immune Response Dataset for LAIV Vaccination in Pediatric Cohorts" is available at Zenodo

License:

This dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Contact:

https://www.atomic-lab.org

Files

transcriptome_data.zip

Files (16.3 MB)

Name	Size	Download all
transcriptome_data.zip md5:f616ffdf911a1a3c8de8d57105486175	16.3 MB	Preview Download

Additional details

Is supplement to: Preprint: 10.1101/2025.01.22.634302 (DOI)

Repository URL: https://github.com/atomiclaboratory/immunaut
Development Status: Active

	All versions	This version
Views	40	40
Downloads	15	15
Data volume	244.0 MB	244.0 MB

RNA-seq raw count data for: Integrative mapping of pre-existing influenza immune landscapes predicts vaccine response

Title: Integrative mapping of pre-existing influenza immune landscapes for vaccine response prediction

Authors: Hao S, Tomic I, Lindsey BB, Jagne YJ, Hoschler K, Meijer A, Carreño Quiroz JM, Meade P, Sano K, Peno C, Costa-Martins AG, Bogaert D, Kampmann B, Nakaya H, Krammer F, de Silva TI, Tomic A.

https://www.atomic-lab.org

Files

transcriptome_data.zip

Files (16.3 MB)

Additional details

Related works

Software

RNA-seq raw count data for: Integrative mapping of pre-existing influenza immune landscapes predicts vaccine response

Creators

Description

Title: Integrative mapping of pre-existing influenza immune landscapes for vaccine response prediction

Authors: Hao S, Tomic I, Lindsey BB, Jagne YJ, Hoschler K, Meijer A, Carreño Quiroz JM, Meade P, Sano K, Peno C, Costa-Martins AG, Bogaert D, Kampmann B, Nakaya H, Krammer F, de Silva TI, Tomic A.

https://www.atomic-lab.org

Files

transcriptome_data.zip

Files (16.3 MB)

Additional details

Related works

Software