A Case for Absolute Gene Expression Estimates in Microbiome Studies using Metatranscriptomics
Authors/Creators
Contributors
Contact person:
Researcher (2):
Supervisor:
Description
This repository contains the metadata and processed data underlying the analyses and figures presented in the manuscript “A Case for Absolute Gene Expression Estimates in Microbiome Studies using Metatranscriptomics” by Perneel et al. The study discusses the advantages and limitations of relative expression measures, such as TPM (Transcripts Per Million), in microbial metatranscriptomics and demonstrates the ecological insights gained through absolute expression measures like TPL (Transcripts Per Liter) using a seasonal metatranscriptomic dataset generated from surface microeukaryotic plankton in the southern North Sea. The code analysing this data can be found here.
File Descriptions:
- ERCC_Controls_Analysis.txt: Information on ERCC spike-in controls, including initial ERCC concentrations (concentration_Mix_1) used to calculate conversion factors from TPM to TPL.
- sampling_RNA_Spike_LibPrep.csv: Sample processing metadata including RNA yield, ERCC spike-in volumes added, and library preparation information.
- ERCC_tpm.csv: TPM values for ERCC spike-in transcripts only, used to derive per-sample conversion factors for TPM to TPL.
- samples_env.csv: metadata for each sample, including environmental measurements.
- samples.csv: metadata for each sample.
- metatranscriptome.fasta: Metatranscriptome constructed by clustering samples' assemblies (rnaSPAdes) at 95% sequence identity using MMseqs2.
- metatranscriptome.pep: Translated proteins from metatranscriptome using TransDecoder.
- EukProt_included_data_sets.v03.2021_11_22.txt: Details of the reference data used for taxonomic annotation.
- functional_annotation.emapper.annotations: Functional annotation output from EggNOG-mapper for all translated transcripts.
- eukprot_DB.firsthit.60plus_alnscore.m8: Taxonomic annotation output for all translated transcripts based on best-hit alignments against the EukProt database, filtered for alingnment scores ≥ 60 % sequence identity.
- tpm.csv: Transcripts per Million values for all assembled transcripts across samples, calculated using Kallisto.
- transcripts_per_L.csv: Estimated Transcripts Per Liter (TPL) values for all assembled transcripts accross samples, calculated from TPM values using ERCC spike-in controls and sample metadata.
- flowcam_data.csv: Cell count and density estimates derived from FlowCam imaging.
Files
ERCC_Controls_Analysis.txt
Files
(11.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:233ef6700a585325d228379531acb901
|
4.0 kB | Preview Download |
|
md5:f90a7532fe9ff902121db560a0923b4f
|
44.7 kB | Preview Download |
|
md5:ffed42b3425c1c3cb62fbb7aa15302ab
|
227.2 MB | Download |
|
md5:f8599a2db3c95ff322e7dd9cb50fbde7
|
787.5 kB | Preview Download |
|
md5:1168877d52863d604749e4848b99bccc
|
311.1 kB | Preview Download |
|
md5:af92b7fd9b43238ae3c36507685887ac
|
2.2 GB | Download |
|
md5:3a8b2b0431df9dfa2544ba201489f224
|
3.3 GB | Download |
|
md5:3f9f7c44fc7b1e34429dd6cc004c0d61
|
1.1 GB | Download |
|
md5:417e38b73b7ba8a17fac6471625d03d4
|
2.8 kB | Preview Download |
|
md5:66c4ec2a864bb53b2f69b99147d4f164
|
5.0 kB | Preview Download |
|
md5:0f4a56fbd7117566477f38f13c02eb45
|
3.4 kB | Preview Download |
|
md5:fbb816ba913a12d50e7c7d7b7a15d184
|
2.2 GB | Preview Download |
|
md5:ea9d8fffc38c73f62db24b1e6141e303
|
2.2 GB | Preview Download |
Additional details
Related works
- Is supplement to
- Computational notebook: https://github.com/MichielPerneel/absolute-metatranscriptomics (URL)
Dates
- Submitted
-
2025-07-15
Software
- Repository URL
- https://github.com/MichielPerneel/absolute-metatranscriptomics
- Development Status
- Active