Published March 1, 2026 | Version v1
Journal article Open

Prospective ICH Q2(R2)-aligned total-error validation of label-free untargeted proteomics for host cell protein quantification in biotherapeutics

  • 1. ROR icon GlaxoSmithKline (Belgium)

Description

This repository contains the complete downstream statistical analysis pipeline, preprocessing scripts, and two peptide-level quantitative datasets (entrapment search results) supporting the prospective ICH Q2(R2)-aligned total-error validation of label-free ddaPASEF proteomics for host cell protein (HCP) quantification in biotherapeutic matrices.

 

Prospective ICH Q2(R2)-aligned total-error validation of label-free untargeted proteomics for host cell protein quantification in biotherapeutics

Somar Khalil, Jean-François Dierick, Pascal Bourguignon, and Michel Plisnier. Proteomes.

The pipeline implements:

  • Dual entrapment database construction (shuffled and trimmed foreign-proteome) with empirical false discovery proportion estimation via bootstrap percentile bands and Wilson score confidence intervals.
  • Deterministic greedy parsimony protein inference with unique-peptide constraints.
  • Peptide-level quality control filtering (modified Z-score outlier removal, intraprotein intensity deviation screening, replicate CV gating).
  • Hi3 label-free protein quantification with MassPREP response-factor calibration.
  • Weighted least-squares calibration with HC3-robust inference.
  • One-way random-effects ANOVA variance decomposition with Welch–Satterthwaite degrees of freedom adjustment.
  • 95% beta-expectation and 95/95 content tolerance interval construction for aggregate total-error accuracy profiling.
  • Abundance-stratified total-error analysis with nonparametric bootstrap-based stratum estimation for derivation of abundance-aware LLOQ and ULOQ.

 

All stochastic procedures use fixed random seeds. Validation experiments employed a SIL-HCP spike-in series in a NISTmAb matrix under a hierarchical replication design (7 concentration levels × 3 preparations × 3 injections).

Reproducibility of all numerical results, tables, and figures reported in the manuscript requires the processed peptide- and protein-level intensity matrices generated from the LC–MS/MS workflow described therein. Raw mass spectrometry data were acquired on a timsTOF Pro instrument and processed using SpectroMine v5.2 under fixed database-search parameters.

 

All analytical parameters, model specifications, and acceptance criteria were locked prior to execution of the validation campaign.

Execution instructions, dependency versions, and configuration details are provided in the README.

Files

README.md

Files (90.9 MB)

Name Size Download all
md5:94f98477244c1b7cc4efc64a79dff167
6.1 MB Preview Download
md5:c2ccfa5ba1272636ea32eaf8b0f35e9e
48.7 MB Preview Download
md5:848c996baa48369c195f66fb4c59c708
78.9 kB Preview Download
md5:f4c683d5581ba3e3259a6347b0dd48a7
966.4 kB Preview Download
md5:df1d345ed181fbb736d18fc71f48ef8c
34.8 MB Preview Download
md5:6c5578deb5af7b9aa280e6bf06ade971
171.5 kB Preview Download
md5:23b252d4eba30d2c37090bb5878f6c5c
2.9 kB Preview Download
md5:375cc2e4fac812b550d92752da2c9d9d
3.5 kB Preview Download
md5:51c5fbc1eae45a68a2a7bb2dc741472c
687 Bytes Download
md5:527c93a26c851e0efb6efb9a0f40769b
21.0 kB Preview Download
md5:6b07d7aaaf7797b687809954419e3414
284 Bytes Preview Download

Additional details

Related works

Is supplemented by
Publication: 10.64898/2026.03.06.710150 (DOI)

Software

Programming language
Python