Published September 7, 2025
| Version v1
Dataset
Open
SESAME Beam Availability Dataset
Creators
Description
The dataset presents a data-driven approach to improving the reliability of synchrotron operations. At the heart of this work is the creation of a carefully curated dataset built from SESAME’s machine operation records between March 2020 and December 2023. The dataset combines operator-logged trip events as well as stable no-trip intervals, capturing the dynamics of machine performance over several years. A large pool of more than 16,800 Process Variables (PVs) was originally archived through SESAME’s EPICS control system. With the guidance of domain experts, this pool was narrowed down to 263 meaningful signals. Out of the 263 PVs 169 PVs that were consistently available across all years. Each recorded trip event was validated by aligning operator logs with the actual machine data in the EPICS Archiver. Using this pipeline, the dataset was expanded into multiple time-windowed samples (10 to 300 seconds), each labeled as either a trip or a no-trip event.
Dataset describtion
The pvData directory contains two main subfolders:
- trip_raw: This folder includes trip-event data that has been filtered and validated against operator hand-written logs. The raw operator-reported statistics were cleaned, corrected, and aligned with the actual machine signals from the EPICS archiver to ensure precise timestamps and reliable labels.
- noTrip_raw: This folder contains samples of stable machine operation, randomly collected from the EPICS archiver between 2020 and 2023 during periods of full and reliable beamtime. i.e. these no-trip intervals serve as the negative class examples (label = 0) in the predictive modeling.
Each of the subfolders under pvData/trip_raw and pvData/noTrip_raw contains 10 additional subfolders, named WS_XX, where XX represents the time-window size in seconds. The available windows are 10, 20, 30, 40, 50, 60, 120, 180, 240, and 300 seconds.
- For trip data, each WS_XX folder includes PV samples extracted for the given window-sizes before the exact validated trip timestamp.
- For no-trip data, the WS_XX folders store PV samples collected during stable operation, with the same time-window sizes applied, ensuring consistency with the trip dataset.
- Each subfolder WS_XX contains a set of CSV files named using the convention:
- WS_10: indicates 10 seconds window size
- YYYYMMDDTHHMMSS.csv file under WS_XX, For example: 20231203T192156.csv
The date and time stamp in the filename indicates the exact event time:
- For trip data, this corresponds to the validated timestamp of a beam interruption.
- For no-trip data, it marks the precise moment when the sample of stable operation was collected.
This means every file can directly traced back to its collection time, enabling reproducibility and chronological analyses.
- When opening a CSV file, some columns may contain the tag NATRD, which indicates that no data was available for that PV at the collection time. This can occur for several reasons:
- A device was not yet installed or in use at SESAME during the given year (e.g., unavailable in 2022 but operational and archived from 2023 onwards).
- A PV was not being archived at that time but was added to the EPICS archiver in later years.
Important Notes About Dataset
It is important to note that the dataset is not balanced between trip and no-trip events. Trip events, which represent actual beam interruptions, are naturally less frequent compared to the long periods of stable operation (no-trip). For example, over the collection period 2020–2023, only 222 validated trip events were identified, whereas thousands of no-trip intervals were available. This imbalance reflects the real operational environment of the synchrotron, where beam interruptions are rare but critical. In addition, the number of rows per file is not uniform, since it depends on how each PV was archived. Monitored PVs are recorded only when their values change (event-driven), while scanned PVs are archived at fixed sampling rates (e.g., 10 readings per second). As a result, some files contain denser time series than others. Users of the dataset should take these characteristics into account when training models.
This is raw dataset
The data provided in this dataset is not processed, cleaned, or augmented; it is simply raw data collected directly from the source without any modifications.
Need more clarifications?
For any more clariffications, please contact Mustafa Alzubi @ (mostafa.zoubi at sesame.org.jo). In the email, please put the title SESAME Beam Availability Dataset @ Zenodo
Files
pvData.zip
Files
(3.1 GB)
Name | Size | Download all |
---|---|---|
md5:bcd5333615d11abfc0d5a642eacd9e88
|
3.1 GB | Preview Download |