Published February 25, 2026 | Version 1.0.0
Dataset Open

CRLBD - The Corinth Rift Laboratory Benchmark Dataset

  • 1. ROR icon National and Kapodistrian University of Athens
  • 2. ROR icon Université Grenoble Alpes
  • 3. Institute of Physics of the Earth's Interior & Geohazards of the Hellenic Mediterranean University

Description

Description

We present the Corinth Rift Laboratory Benchmark Dataset (CRLBD), a collection of waveforms and associated phase arrival times from the Western Gulf of Corinth (WGoC), Greece, the area monitored by the Corinth Rift Laboratory (CRL) Near-Fault Observatory (NFO). A plethora of seismic instruments are installed in WGoC by multiple institutes, to record the high seismic activity, characterized by swarms. CRLBD comprises 224,573 three-component records from 13,918 local earthquakes and 67,371 noise recordings, between June 2018 and December 2024. The dataset has a dominant local character, with over 99% of the traces having been recorded in hypocentral distances less or equal to 100 km. CRLBD has been created primarily focused on Deep-Learning (DL) phase pickers and is fully compatible with the SeisBench toolbox.

The dataset includes

  • 13,918 earthquakes located within the boundaries of the CRL-NFO area in WGoC
  • 224,573 3-channel waveforms, from broadband and short-period seismometers, as well as accelerometers
  • 121 (for events) and 46 (for noise) metadata fields
  • 8 networks (CLHAHPHLHTHIHC1Y)
  • 128 stations
  • 224,573 P and 127,629 S manually picked phase arrivals

The seismic catalogue was located using picked phase arrivals from the Geodynamic Institute of the National Observatory of Athens (GI-NOA) and the Seismological Laboratory of the National and Kapodistrian University of Athens (SL-NKUA).

The dataset was created in the context of the TRANSFORM² project.

Files

The following files comprise the full dataset:

  • crlbd.tar.gz: Compressed archive that contains the event waveforms in HDF5 format (waveforms.hdf5) and their metadata (metadata.csv) according to the structure expected by SeisBench.
  • crlbdnoise.tar.gz: Compressed archive that contains the noise waveforms, structured similarly to the events subset.
  • crlbd_inventory.xml: StationXML file containing full station information (including instrument responses) post-June 2018, as acquired from the GI-NOA and RESIF EIDA nodes.
  • metadata_fields.csv: Descriptions of metadata fields.
  • LICENSE.txt: CC-BY-4.0 legal code.
  • README.md: The README file.

How to use

After downloading the record, you need to uncompress the two archives with (from within the record's folder):

tar -xvzf crlbd.tar.gz
tar -xvzf crlbdnoise.tar.gz

The datasets may then be loaded with SeisBench:

# import the relevant module from SeisBench
from seisbench import data as sbd

# Load the events dataset - run from the directory which includes the uncompressed folders
print("Loading the events CRLBD dataset...")
ds = sbd.WaveformDataset("crlbd", component_order="ZNE")

# Load the noise dataset
print("Loading the noise CRLBD dataset...")
dsn = sbd.WaveformDataset("crlbdnoise", component_order="ZNE")

# Check the first five records in the metadata
# - You will see 123/48 columns in the metadata, as `index` and `trace_chunk` are assigned by SeisBench on loading.
print(ds.metadata.head(5))
print(dsn.metadata.head(5))

# show some dataset info
print("\n=== Dataset Summary ===")
print(f"Unique events:   {len(ds.metadata['source_id'].unique()):,}")
print(f"Unique networks: {len(ds.metadata['station_network_code'].unique()):,}")
print(f"Unique stations: {len(ds.metadata['station_code'].unique()):,}")
print(f"P picks:         {ds.metadata.trace_P_arrival_sample.notna().sum():,}")  # all records have non-null P
print(f"S picks:         {ds.metadata.trace_S_arrival_sample.notna().sum():,}")  # only records with picked S
print(f"epi <= 100 km:   {100 * sum(ds.metadata['path_ep_distance_km'] <= 100) / len(ds.metadata):.2f}%")
print(f"Noise records:   {len(dsn.metadata):,}")

Otherwise, you may access the files indepdenently. Options include other Python tools (e.g., h5py for data and Pandas for metadata), or even graphical tools (e.g., any spreadsheet software, for the metadata).

License

CRLBD is distributed under the CC-BY-4.0 license.

Funding

  • TRANSFORM² is funded by the European Union under project number 101188365 within the HORIZON-INFRA-2024-DEV-01-01 call.

Version

  • v1.0.0

Publication date

  • 2026-02-25

Size and formats

  • Total size: ~15 GB (compressed), ~40 GB (uncompressed)
  • File formats: HDF5, CSV, StationXML

How to cite

To cite the dataset itself:

Spingos, I., Kapetanidis, V., Münchmeyer, J., Zymvragakis, A., Karakonstantis, A., Voulgaris, N., Kaviris, G., (2026). CRLBD - The Corinth Rift Laboratory Benchmark Dataset. Zenodo. doi: 10.5281/zenodo.18768358

An article is currently under the process of submission and this README file will be updated with the relevant citation information, when available.

Contact

Ioannis Spingos, PhD: ispingos@geol.uoa.gr

Files

metadata_fields.csv

Files (15.6 GB)

Name Size Download all
md5:550b8a1fcb2aea79fc4a2ebb6246d4fb
12.0 GB Download
md5:e250a1feb14da91e6ff09e259e14eeac
32.5 MB Preview Download
md5:7525b63e7f0953e939f2b8235e2750cc
3.5 GB Download
md5:8ff88658cb98a717f8e36576bb447977
19.0 kB Preview Download
md5:c2e6229b1bf444a0601f1cec4e37f7ac
7.5 kB Preview Download
md5:acc16340704ec409a5c8a43bb58320f6
6.4 kB Preview Download

Additional details

Funding

European Commission
TRANSFORM2 - TowaRds AdvaNced multidiSciplinary Fault ObseRvatory systeMs² 101188365