CRLBD - The Corinth Rift Laboratory Benchmark Dataset
Authors/Creators
Description
Description
We present the Corinth Rift Laboratory Benchmark Dataset (CRLBD), a collection of waveforms and associated phase arrival times from the Western Gulf of Corinth (WGoC), Greece, the area monitored by the Corinth Rift Laboratory (CRL) Near-Fault Observatory (NFO). A plethora of seismic instruments are installed in WGoC by multiple institutes, to record the high seismic activity, characterized by swarms. CRLBD comprises 224,573 three-component records from 13,918 local earthquakes and 67,371 noise recordings, between June 2018 and December 2024. The dataset has a dominant local character, with over 99% of the traces having been recorded in hypocentral distances less or equal to 100 km. CRLBD has been created primarily focused on Deep-Learning (DL) phase pickers and is fully compatible with the SeisBench toolbox.
The dataset includes
- 13,918 earthquakes located within the boundaries of the CRL-NFO area in WGoC
- 224,573 3-channel waveforms, from broadband and short-period seismometers, as well as accelerometers
- 121 (for events) and 46 (for noise) metadata fields
- 8 networks (CL, HA, HP, HL, HT, HI, HC, 1Y)
- 128 stations
- 224,573 P and 127,629 S manually picked phase arrivals
The seismic catalogue was located using picked phase arrivals from the Geodynamic Institute of the National Observatory of Athens (GI-NOA) and the Seismological Laboratory of the National and Kapodistrian University of Athens (SL-NKUA).
The dataset was created in the context of the TRANSFORM² project.
Files
The following files comprise the full dataset:
crlbd.tar.gz: Compressed archive that contains the event waveforms in HDF5 format (waveforms.hdf5) and their metadata (metadata.csv) according to the structure expected by SeisBench.crlbdnoise.tar.gz: Compressed archive that contains the noise waveforms, structured similarly to the events subset.crlbd_inventory.xml: StationXML file containing full station information (including instrument responses) post-June 2018, as acquired from the GI-NOA and RESIF EIDA nodes.metadata_fields.csv: Descriptions of metadata fields.LICENSE.txt: CC-BY-4.0 legal code.README.md: The README file.
How to use
After downloading the record, you need to uncompress the two archives with (from within the record's folder):
tar -xvzf crlbd.tar.gz
tar -xvzf crlbdnoise.tar.gz
The datasets may then be loaded with SeisBench:
# import the relevant module from SeisBench
from seisbench import data as sbd
# Load the events dataset - run from the directory which includes the uncompressed folders
print("Loading the events CRLBD dataset...")
ds = sbd.WaveformDataset("crlbd", component_order="ZNE")
# Load the noise dataset
print("Loading the noise CRLBD dataset...")
dsn = sbd.WaveformDataset("crlbdnoise", component_order="ZNE")
# Check the first five records in the metadata
# - You will see 123/48 columns in the metadata, as `index` and `trace_chunk` are assigned by SeisBench on loading.
print(ds.metadata.head(5))
print(dsn.metadata.head(5))
# show some dataset info
print("\n=== Dataset Summary ===")
print(f"Unique events: {len(ds.metadata['source_id'].unique()):,}")
print(f"Unique networks: {len(ds.metadata['station_network_code'].unique()):,}")
print(f"Unique stations: {len(ds.metadata['station_code'].unique()):,}")
print(f"P picks: {ds.metadata.trace_P_arrival_sample.notna().sum():,}") # all records have non-null P
print(f"S picks: {ds.metadata.trace_S_arrival_sample.notna().sum():,}") # only records with picked S
print(f"epi <= 100 km: {100 * sum(ds.metadata['path_ep_distance_km'] <= 100) / len(ds.metadata):.2f}%")
print(f"Noise records: {len(dsn.metadata):,}")
Otherwise, you may access the files indepdenently. Options include other Python tools (e.g., h5py for data and Pandas for metadata), or even graphical tools (e.g., any spreadsheet software, for the metadata).
License
CRLBD is distributed under the CC-BY-4.0 license.
Funding
- TRANSFORM² is funded by the European Union under project number 101188365 within the HORIZON-INFRA-2024-DEV-01-01 call.
Version
- v1.0.0
Publication date
- 2026-02-25
Size and formats
- Total size: ~15 GB (compressed), ~40 GB (uncompressed)
- File formats: HDF5, CSV, StationXML
How to cite
To cite the dataset itself:
Spingos, I., Kapetanidis, V., Münchmeyer, J., Zymvragakis, A., Karakonstantis, A., Voulgaris, N., Kaviris, G., (2026). CRLBD - The Corinth Rift Laboratory Benchmark Dataset. Zenodo. doi: 10.5281/zenodo.18768358
An article is currently under the process of submission and this README file will be updated with the relevant citation information, when available.
Contact
Ioannis Spingos, PhD: ispingos@geol.uoa.gr
Files
metadata_fields.csv
Files
(15.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:550b8a1fcb2aea79fc4a2ebb6246d4fb
|
12.0 GB | Download |
|
md5:e250a1feb14da91e6ff09e259e14eeac
|
32.5 MB | Preview Download |
|
md5:7525b63e7f0953e939f2b8235e2750cc
|
3.5 GB | Download |
|
md5:8ff88658cb98a717f8e36576bb447977
|
19.0 kB | Preview Download |
|
md5:c2e6229b1bf444a0601f1cec4e37f7ac
|
7.5 kB | Preview Download |
|
md5:acc16340704ec409a5c8a43bb58320f6
|
6.4 kB | Preview Download |