Translational Readiness of Sweat-Based Cancer Biomarkers: A Novel Scoring Instrument and AI-Assisted NLP Field Analysis
Description
This dataset supports a systematic review examining the translational readiness of sweat-based cancer biomarkers published between 2015 and 2025. Sweat is an underexplored diagnostic biofluid that can be collected non-invasively and contains a rich repertoire of molecules — including metabolites, proteins, nucleic acids, and extracellular vesicles — that may reflect systemic disease states including cancer. Despite growing interest in sweat diagnostics, no prior study has systematically evaluated how close these biomarkers are to clinical use.
The review was conducted in accordance with PRISMA 2020 guidelines and registered on PROSPERO. A validated Boolean search string was applied across PubMed, Scopus, and Google Scholar covering the period 2015 to 2025. A total of 15 records were screened, of which 9 primary research studies met the inclusion criteria and were carried forward for data extraction. Fifteen variables were extracted per biomarker entry including biomarker identity, chemical class, cancer type, sample sizes, detection method, and diagnostic performance metrics (sensitivity, specificity, AUC, and limit of detection). Each biomarker was then scored using a novel instrument developed for this study — the Sweat Biomarker Readiness Scorecard (SBRS) — a seven-dimension weighted scoring tool (D1 analytical sensitivity, D2 diagnostic performance, D3 external validation, D4 wearable platform integration, D5 cancer specificity, D6 AI and machine learning use, D7 regulatory pathway) that assigns each biomarker a composite score from 0 to 100 and classifies it into one of four translational tiers: pre-discovery (0–30), emerging (31–55), promising (56–75), or translational-ready (76–100). A complementary BERTopic natural language processing analysis was applied to the full PubMed abstract corpus to map research clusters and identify temporal trends in the field.
The dataset contains 51 rows across 14 screened articles with 40 included biomarker entries. Biomarker classes represented include metabolites (volatile organic compounds and amino acids), proteins, nucleic acids (microRNAs, piRNAs, and cell-free DNA), and extracellular vesicles. Cancer types studied include lung cancer, breast cancer, colorectal cancer, prostate cancer, and multiple cancer types. Key findings show that 85 percent of scored biomarkers fall in the pre-discovery tier and 15 percent in the emerging tier. No biomarker in the current literature has reached the promising or translational-ready tier. The highest-scoring biomarkers are VOC panels from sweat analysed by GC-MS and electronic nose platforms (SBRS composite scores 44–48). No study in the dataset reports a regulatory pathway, and independent external validation is present in only 2 of 40 biomarker entries.
These findings reveal a significant and measurable gap between research activity and clinical translational readiness across the entire sweat-based cancer biomarker field. The SBRS instrument provides a reusable, standardised framework for future researchers to benchmark new sweat biomarker discoveries against a consistent translational readiness scale. The openly available dataset enables independent replication, meta-analysis extension, and integration into future systematic reviews as the field grows. This work was conducted as part of an independent study project in the MS Biotechnology Management and Entrepreneurship programme at Yeshiva University, May 2026
Files
rayyan_import_sweat_cancer_review.csv
Files
(7.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:471862ce95d359fb74b671ab8fabd93b
|
7.3 kB | Preview Download |