Dataset Overview

This repository contains the raw data underlying the publication:
Polyphenol Diversity and Chemotype Variation in Origanum majorana and Related species: Implications for Chemotaxonomic Differentiation, Standardisation and Genotype Selection
Brigitte Lukas, Johannes Novak, Magdalena Neumller, Jennifer Romana Valek, Salme Ahmed, Zehra Ayta, Ahmet Gms

Zenodo DOI: 10.5281/zenodo.19002003

The dataset includes peak area data for 1185 extracts and 122 chromatographic peaks, quantified values for eight major and one minor phenolic compounds, calibration curves for these nine components, and UV spectra of the analytical standards used for quantification and of the 20 main components detected in the study.
These files support re-analysis, method comparison, and reuse in phytochemical, metabolomics, and chemotaxonomic research.

File Description

1. 2025_HPLC-Origanum_PeakAreas_Quant.xlsx
Sheet: HPLC_Origanum_PeakAreas
Contains peak area data for all 1185 extracts and 122 peaks.
Rows: individual extracts/samples
Columns:
Sample metadata: accession (sample ID), taxon, population, country, organ (tissue type; F = inflorescences, L = leaves), year of plant material collection, season (1 = Jan-Mar, 2 = Apr-Jun, 3 = July-September, 4 = October-December), month, location (W = wild collected, GH = cultivated)
Peak columns: 122 peaks, named according to compound identity or retention-time-based peak ID)
Units: arbitrary DAD peak area units
Note: x denotes not yet identified compounds
Sheet: HPLC_Origanum_Quant
Contains quantified values for nine compounds.
Rows: individual extracts/samples
Columns:
Same sample metadata as above
Compound columns: 9 quantified compounds 
Units: milligram per gram dry weight (mg/g dry wt)

2. File: 2025_HPLC-Origanum_CalibrationCurves.xlsx
Contains calibration data for the following compounds: arbutin, apigenin 6,8-di-glucopyranoside, luteolin 7-diglucuronide, luteolin 7-glucuronide, rosmarinic acid, apigenin 7-glucuronide, lithospermic acid, salvianolic acid B, blumeatin
Each sheet includes concentration, peak area, regression parameters (slope, intercept, R2), calibration range, LOD and LOQ.

3. File: AnalyticalStandardsQuant.pdf
Contains UV spectra of all analytical standards used for main compound identification and quantification.

4. MainComponents.pdf
Contains UV spectra of the 20 major components.

Sample ID Structure

Sample IDs (accessions) for wild-collected plants encode the specimen number and tissue type.
Example: M0009F = wild-collected specimen 9, leaves.
Sample IDs (accessions) for cultivated plants encode the seed accession, the individual plant, and the tissue type.
Example: cmaj10-02F = seed accession cmaj10, plant individual 2, leaves.

Methods Summary

High-Performance Liquid Chromatography (HPLC) analyses were performed on a Shimadzu Nexera XR system equipped with a CBM-20A controller, DGU-20A5R degasser, LC-20ADXR quaternary pump, SIL-20AXR autosampler, CTO-20AC column oven, and SPD-M20A photodiode array detector. Data acquisition and processing were carried out using LabSolutions 5.97 (Shimadzu, Austria).
Chromatographic separations were conducted on a Symmetry Shield RP18 column (5 m, 4.6  250 mm; Waters, Austria) with a C18 guard column (ODS Octadecyl, 4  0.3 mm; Phenomenex, Germany). The mobile phases consisted of solvent A (0.25% formic acid in acetonitrile:methanol, 60:40) and solvent B (0.25% formic acid in Milli-Q water). A gradient elution was applied at 0.8 mL/min and 25 C using the following program:
07 min: 525% A
735 min: 2535% A
3540 min: 3555% A
4050 min: 70% A (isocratic)
5055 min: re-equilibration to 5% A
Injection volume was 20 L.

A total of 122 recurring or diagnostically relevant peaks were defined based on comparative chromatographic evaluation across all Origanum species. Peak detection was performed at 280, 330, or 354 nm depending on the UV absorption maxima of each compound. Identification was based on retention time and UV spectra and supported by analytical standards (full list in the publication). Minor peaks could not always be unambiguously assigned due to retention time shifts, overlapping peaks, or insufficient UV signal quality.
Quantification of arbutin, apigenin 6,8-di-glucopyranoside, luteolin 7-O-diglucuronide, luteolin 7-O-glucuronide, rosmarinic acid, apigenin 7-O-glucuronide, salvianolic acid B, and blumeatin was performed using external calibration. Calibration curves were constructed from six concentration levels, each prepared in triplicate and injected twice (six injections per calibration point). Calibration parameters (equation, r, LOD, LOQ) are provided in the dataset. Quantified values are expressed as mg/g dry weight.

Limitations and Notes

Peaks not fully identified are marked as x.
Quantification is limited to fully identified compounds.
UV spectra of extract-derived peaks may differ slightly from pure standards.

Reuse and Licensing

This dataset is released under Creative Commons Attribution Share Alike 4.0 International (CC BY-SA 4.0) license. 
Please cite both the publication and the Zenodo record when reusing the data.
Version: This ReadMe corresponds to Zenodo record version V1.

