Published September 30, 2025 | Version 1.0.0
Dataset Open

Western Gulf of Corinth – Seismic Benchmark Dataset (WGoC-SBD)

  • 1. ROR icon National and Kapodistrian University of Athens

Contributors

Description

1. Introduction

Deep-Learning (DL) pickers have revolutianized seismology by enabling the creation of enhanched seismic catalogues with multiple times the number of available phase arrivals, compared to either manually produced or automatically-generated (with conventional methods) catalogues. The rapid evolution DL picking has created the necessity of building datasets to benchmark algorithms and specialize them. General-purpose datasets contain data from broad areas - global or regional. This dataset represents an attempt to build a localized dataset that is used to specialize generalized models to the limits of the Western Gulf of Corinth (WGoC), monitored by the Corinth Rift Laboratory (CRL) Near-Fault Observatory (NFO).

The dataset has been built with the methodology used by INSTANCE (Michelini et al., 2021) and structured to be compatible with the seismic benchmarking toolbox SeisBench (Woollam et al., 2022). WGoC-SBD contains:

  • 3,757 earthquakes located within the boundaries of the CRL-NFO area in WGoC
  • 38,109 3-channel waveforms
  • 119 metadata fields (to be fully populated in the next dataset version)
  • 6 networks (CLHAHPHLHTHI)
  • 84 stations

The seismic catalogue was obtained from Serpetsidaki et al. (2023), a publication detailing the 2020-2021 Trizonia seismic crisis in WGoC. The following adjustments have been made to the INSTANCE workflow:

  • During event selection, the accepted time interval was set to 60 s, a more suitable value for a local dataset.
  • Due to limitations in the initial number of earthquakes, no random selection was applied to small magnitude events (all passing time filtering were kept).
  • To accomodate misoriented components, we introduced a new metadata field (station_component_map), which maps the actual sensor's component codes to the ones defined in the metadata.

2. Files

The following files comprise the full dataset:

  • wgoc-sbd-v1.tar.bz: Compressed archive that contains the waveforms in HDF5 format (waveforms.hdf5) and their metadata (metadata.csv) according to the structure expected by SeisBench.
  • metadata_fields.csv: Descriptions of metadata fields.
  • LICENSE: CC-BY-SA-4.0 legal code.
  • README.md: The current README file.

Files

README.md

Files (732.1 MB)

Name Size Download all
md5:014fcac9ff75bc4bcd1ac28ed93d748b
20.6 kB Download
md5:22160c18713fd2ef336a50f44dab33b1
6.4 kB Preview Download
md5:4b4692922ffedd5549e76695666de871
4.1 kB Preview Download
md5:92256f50b4dac7b8082caafe961c4907
732.1 MB Download

Additional details

Funding

European Commission
OSCARS - O.S.C.A.R.S. - Open Science Clusters’ Action for Research and Society 101129751