Data for reproducing "msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design"

Vanderaa, Christophe; Vandenbulcke, Stijn; Clement, Lieven

doi:10.5281/zenodo.14767905

Published January 29, 2025 | Version v1

Dataset Open

Data for reproducing "msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design"

1. Ghent University
2. KU Leuven

Labelling strategies in mass spectrometry (MS)-based proteomics enhance sample throughput by enabling the acquisition of multiplexed samples within a single run. However, contemporary experiments often involve increasingly complex designs, where the number of samples exceeds the capacity of a single run, resulting in a complex correlation structure that must be addressed for accurate statistical inference and reliable biomarker discovery. To this end, we introduce msqrob2TMT, a suite of mixed model-based workflows specifically designed for differential abundance analysis in labelled MS-based proteomics data. msqrob2TMT accommodates both sample-specific and feature-specific (e.g., peptide or protein) covariates, facilitating inference in experiments with arbitrarily complex designs and allowing for explicit correction of feature-specific covariates. We benchmark our innovative workflows against state-of-the-art tools, including DEqMS, MSstatsTMT, and msTrawler, using two spike-in studies. Our findings demonstrate that msqrob2TMT offers greater flexibility, improved modularity, and enhanced performance, particularly through the application of robust ridge regression. Finally, we demonstrate the practical relevance of msqrob2TMT in a real mouse study, highlighting its capacity to effectively account for the complex correlation structure in the data.

Vandenbulcke S, Vanderaa C, Crook O, Martens L, Clement L. Msqrob2TMT: Robust linear mixed models for inferring differential abundant proteins in labeled experiments with arbitrarily complex design. Mol Cell Proteomics. 2025;24(7):101002.

Also available as a preprint

Vandenbulcke, S., Vanderaa, C ., Crook, O., Martens, L. & Clement, L. msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design. bioRxiv 2024.03.29.587218 (2024) doi:10.1101/2024.03.29.587218

This repository provides the data required to reproduce the results shown in the msqrob2TMT study. Data are organised in two main parts: input data and processed data.

Input data

The input data consist of data generated by others that we used for our analyses. Files are organised using there prefixes, one for each data set.

spikein1

This data set has been published by Huang et al. 2020 and has been downloaded from the MassIVE repository (RMSV000000265). It contains 2 files:

spikein1_psms.txt: a table with identified and quantified peptide-to-spectrum matches (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000265/2020-06-08_huang704_4336d436/quant/161117_SILAC_HeLa_UPS1_TMT10_5Mixtures_3TechRep_UPSdB_Multiconsensus_PD22_Intensity_03_with_FDR_control_PSMs.txt)
spikein1_annotations.csv: the associated sample annotations (FTP link: ftp://massive.ucsd.edu/v02/MSV000084264/metadata/SpikeIn5mix_PD_annotation.csv)

spikein2

This data set has been published by O'Brien et al. 2024 and has been downloaded from a private Google Cloud Storage. It contains 3 files:

spikein2_psms.csv: a table with identified and quantified peptide-to-spectrum matches (link)
spikein2_annotations.csv: a table with the associated sample annotations (link).
spilein2_covariateFile.csv: a file required to run the msTrawler method (link).

mouse

The data for the mouse study has been published by Plubell et al. 2017 and has been downloaded from the MassIVE RMSV000000264.7 reanalysis repository:

mouse_psms.txt : a table with identified and quantified peptide-to-spectrum matches (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000264/2020-06-07_huang704_518429df/metadata/mouse3mix_PD_annotation.csv)
mouse_annotations.csv: the associated sample annotations (FTP link: ftp://massive.ucsd.edu/x01/RMSV000000264/2020-06-07_huang704_518429df/181017_Plubell_mouse_sh_lo_LF_HF_diet_adipocytes_3TMT10_HpH_Fusion_PD22_multi_01_PSMs.txt)

Processed data

We generated these data during our analyses and are provided in the processed.zip file. Each file is prefixed with the name of the data set it is related to. Here is a comprehensive list:

mouse_model_MsstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the mouse dataset.
mouse_model_msqrob2tmt.rds: a data.frame containing the msqrob2TMT statistical inference results for the mouse dataset where proteins were summarised within fraction.
mouse_model_msqrob2tmt_mixture.rds: a data.frame containing the msqrob2TMT statistical inference results for the mouse dataset where proteins were summarised within mixture.
spikein1_input_deqms.rds: a data.frame containing the spikein1 data after PSM filtering, ready for analysis by DEqMS.
spikein1_input_msTrawler.txt: a tabular text file containing the spikein1 data after PSM filtering, ready for analysis by msTrawler.
spikein1_input_msqrob2tmt.rds: a QFeatures object containing the spikein1 dataafter PSM filtering, ready for analysis by msqrob2.
spikein1_input_msstatstmt.rds: a data.frame containing the spikein1 data after PSM filtering, ready for analysis by MSstatsTMT.
spikein1_model_DEqMS.rds: a data.frame containing the DEqMS statistical inference results for the spikin1 dataset.
spikein1_model_MsstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the spikin1 dataset.
spikein1_model_compare_preprocessing.rds: a data.frame containing MSstatsTMT and msqrob2TMT statistical inference results for the spikin1 dataset upon different processing workflows carried out by MSstastTMT.
spikein1_model_msTrawler.rds: a data.frame containing the msTrawler statistical inference results for the spikin1 dataset.
spikein1_model_msqrob2tmt.rds: a data.frame containing the msqrob2TMT statistical inference results for the spikin1 dataset.
spikein2_input.rds: a data.frame containing the spikein2 data after running the custom preprocessing pipeline by O'Brien et al. 2024.
spikein2_input_preprocessed.rds: a data.frame containing the spikein2 data after running the custom preprocessing workflow by O'Brien et al. 2024 and the preprocessing workflow by msTrawler.
spikein2_model_DEqMS.rds: a data.frame containing the DEqMS statistical inference results for the spikin2 dataset.
spikein2_model_msqrob2tmt.rds: a data.frame containing the msqrob2tmt statistical inference results for the spikin2 dataset.
spikein2_model_MSstatsTMT.rds: a data.frame containing the MSstatsTMT statistical inference results for the spikin2 dataset.
spikein2_model_msTrawler.rds: a data.frame containing the msTrawler statistical inference results for the spikin2 dataset.

Files

processed.zip

Files (613.9 MB)

Name	Size	Download all
mouse_annotations.csv md5:b0aaaf766ee0d5885ee6a349639859a3	47.1 kB	Preview Download
mouse_psms.txt md5:660204e06b1fd58524edd4b269ce1b00	80.9 MB	Preview Download
processed.zip md5:90a68e6c9d8415b2ee1d0aa60efea562	270.5 MB	Preview Download
spikein1_annotations.csv md5:06ae01ad1163ee7459a686110d951184	13.0 kB	Preview Download
spikein1_psms.txt md5:ffdfe99d667017a76409648ed7ad1cdb	238.8 MB	Preview Download
spikein2_annotations.csv md5:2edbf51b2c7fb4a4da2cf39f4d0a5f6c	3.3 kB	Preview Download
spikein2_covariateFile.csv md5:624582bd78c5f21be6e5ec116f7d1b65	121 Bytes	Preview Download
spikein2_psms.csv md5:6143b17432334232464a28060aba2e0b	23.6 MB	Preview Download

Additional details

Is supplement to: Publication: 10.1101/2024.03.29.587218 (DOI)

Programming language: R

Vandenbulcke S, Vanderaa C, Crook O, Martens L, Clement L. msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design. bioRxiv. Published online March 29, 2024:2024.03.29.587218. doi:10.1101/2024.03.29.587218

	All versions	This version
Views	67	67
Downloads	214	214
Data volume	22.4 GB	22.4 GB

Data for reproducing "msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design"

Input data

spikein1

spikein2

mouse

Processed data

Files

processed.zip

Files (613.9 MB)

Additional details

Related works

Software

References

Data for reproducing "msqrob2TMT: robust linear mixed models for inferring differential abundant proteins in labelled experiments with arbitrarily complex design"

Creators

Description

Input data

spikein1

spikein2

mouse

Processed data

Files

processed.zip

Files (613.9 MB)

Additional details

Related works

Software

References