STEAD subsample 4 CDiffSD

Trappolini, Daniele

doi:10.5281/zenodo.11094536

Published April 30, 2024 | Version v3

Dataset Open

STEAD subsample 4 CDiffSD

Trappolini, Daniele (Other)^{1, 2}

1. Sapienza University of Rome
2. INGV Osservatorio Nazionale Terremoti

STEAD Subsample Dataset for CDiffSD Training

Overview

This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

Dataset Files

The dataset includes the following files:

train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
test: Used for the testing phase, structured similarly to train.
noise_test: Used for the testing phase, contains noise data for testing.

Each file is structured to support the training and evaluation of seismic denoising models.

Data

The HDF5 files named noise contain two main datasets:

traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).
metadata: This dataset contains the names of the traces for each event.

Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

p_arrival: Contains the arrival indices of P-waves, expressed in counts.
s_arrival: Contains the arrival indices of S-waves, also expressed in counts.

Usage

To load these files in a Python environment, use the following approach:

```python

import h5py
import numpy as np

# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))

if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces

if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

Ensure that the path to the file is correctly specified relative to your Python script.

Requirements

To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

```bash
pip install numpy
pip install h5py
```

Files

Files (5.8 GB)

Name	Size	Download all
test.hdf5 md5:d3123e11abce4b3e0fa96b3d6228110b	432.5 MB	Download
test_noise.hdf5 md5:b94ca2ccf9b26f9d3c645d5b86e66a42	432.4 MB	Download
train.hdf5 md5:c120395601050199723326310cbff7aa	2.5 GB	Download
train_noise.hdf5 md5:a48fca0caf250cf3679ddcbd8b2166cb	2.5 GB	Download

Additional details

DOI: 10.5281/zenodo.10972601

Is version of: Dataset: 10.1109/ACCESS.2019.2947848 (DOI)

Available: 2024-04-15

Repository URL: https://github.com/Daniele-Trappolini/Diffusion-Model-for-Earthquake

	All versions	This version
Views	620	215
Downloads	1,180	193
Data volume	2.1 TB	447.3 GB

STEAD subsample 4 CDiffSD

STEAD Subsample Dataset for CDiffSD Training

Overview

Dataset Files

Data

Usage

Requirements

Files

Files (5.8 GB)

Additional details

Identifiers

Related works

Dates

Software

STEAD subsample 4 CDiffSD

Creators

Description

STEAD Subsample Dataset for CDiffSD Training

Overview

Dataset Files

Data

Usage

Requirements

Files

Files (5.8 GB)

Additional details

Identifiers

Related works

Dates

Software