Published July 16, 2025 | Version v1
Dataset Open

A Case for Absolute Gene Expression Estimates in Microbiome Studies using Metatranscriptomics

  • 1. ROR icon Flanders Marine Institute
  • 2. ROR icon VIB-UGent Center for Plant Systems Biology
  • 3. ROR icon Woods Hole Oceanographic Institution

Contributors

Contact person:

  • 1. ROR icon Flanders Marine Institute
  • 2. ROR icon VIB-UGent Center for Plant Systems Biology
  • 3. ROR icon Woods Hole Oceanographic Institution

Description

This repository contains the metadata and processed data underlying the analyses and figures presented in the manuscript “A Case for Absolute Gene Expression Estimates in Microbiome Studies using Metatranscriptomics” by Perneel et al. The study discusses the advantages and limitations of relative expression measures, such as TPM (Transcripts Per Million), in microbial metatranscriptomics and demonstrates the ecological insights gained through absolute expression measures like TPL (Transcripts Per Liter) using a seasonal metatranscriptomic dataset generated from surface microeukaryotic plankton in the southern North Sea.  The code analysing this data can be found here.

File Descriptions:

  • ERCC_Controls_Analysis.txt: Information on ERCC spike-in controls, including initial ERCC concentrations (concentration_Mix_1) used to calculate conversion factors from TPM to TPL.
  • sampling_RNA_Spike_LibPrep.csv: Sample processing metadata including RNA yield, ERCC spike-in volumes added, and library preparation information.
  • ERCC_tpm.csv: TPM values for ERCC spike-in transcripts only, used to derive per-sample conversion factors for TPM to TPL.
  • samples_env.csv: metadata for each sample, including environmental measurements.
  • samples.csv: metadata for each sample.
  • metatranscriptome.fasta: Metatranscriptome constructed by clustering samples' assemblies (rnaSPAdes) at 95% sequence identity using MMseqs2.
  • metatranscriptome.pep: Translated proteins from metatranscriptome using TransDecoder.
  • EukProt_included_data_sets.v03.2021_11_22.txt: Details of the reference data used for taxonomic annotation.
  • functional_annotation.emapper.annotations: Functional annotation output from EggNOG-mapper for all translated transcripts.
  • eukprot_DB.firsthit.60plus_alnscore.m8: Taxonomic annotation output for all translated transcripts based on best-hit alignments against the EukProt database, filtered for alingnment scores ≥ 60 % sequence identity.
  • tpm.csv: Transcripts per Million values for all assembled transcripts across samples, calculated using Kallisto.
  • transcripts_per_L.csv: Estimated Transcripts Per Liter (TPL) values for all assembled transcripts accross samples, calculated from TPM values using ERCC spike-in controls and sample metadata.
  • flowcam_data.csv: Cell count and density estimates derived from FlowCam imaging.

Files

ERCC_Controls_Analysis.txt

Files (11.2 GB)

Name Size Download all
md5:233ef6700a585325d228379531acb901
4.0 kB Preview Download
md5:f90a7532fe9ff902121db560a0923b4f
44.7 kB Preview Download
md5:ffed42b3425c1c3cb62fbb7aa15302ab
227.2 MB Download
md5:f8599a2db3c95ff322e7dd9cb50fbde7
787.5 kB Preview Download
md5:1168877d52863d604749e4848b99bccc
311.1 kB Preview Download
md5:af92b7fd9b43238ae3c36507685887ac
2.2 GB Download
md5:3a8b2b0431df9dfa2544ba201489f224
3.3 GB Download
md5:3f9f7c44fc7b1e34429dd6cc004c0d61
1.1 GB Download
md5:417e38b73b7ba8a17fac6471625d03d4
2.8 kB Preview Download
md5:66c4ec2a864bb53b2f69b99147d4f164
5.0 kB Preview Download
md5:0f4a56fbd7117566477f38f13c02eb45
3.4 kB Preview Download
md5:fbb816ba913a12d50e7c7d7b7a15d184
2.2 GB Preview Download
md5:ea9d8fffc38c73f62db24b1e6141e303
2.2 GB Preview Download

Additional details

Related works

Is supplement to
Computational notebook: https://github.com/MichielPerneel/absolute-metatranscriptomics (URL)

Dates

Submitted
2025-07-15