Published July 11, 2025 | Version 1.0.0
Dataset Open

SED-Augmented Adobe Audition Sound Effects dataset (ASFX-SED)

  • 1. ROR icon Université de Montréal
  • 2. Adobe Research
  • 3. ROR icon Massachusetts Institute of Technology

Description

Overview

This repository contains the SED-Augmented SFX dataset (ASFX-SED) used in the paper [FLAM: frame-wise language-audio modeling](https://arxiv.org/abs/2505.05335). The dataset is designed for research and development in open-set sound event detection, and can also be used in event separation and other related machine learning tasks.

- **Original Source:** Adobe Audition sound effects dataset (https://www.adobe.com/products/audition/offers/adobeauditiondlcsfx.html). The same dataset is also used in other audio research (e.g. https://arxiv.org/abs/2308.09089).
- **Format:** Parquet (tabular metadata) + JSON (per-sample metadata) + WAV (audio files)
- **License:** ADOBE RESEARCH LICENSE (see LICENSE.md)

Dataset Structure

```
├── asfx_sed_metadata.parquet   # Metadata (Parquet)
├── asfx_sed/                   # Dataset folder
│   ├── 0000000.json           # Per-sample metadata (JSON)
│   ├── 0000000_mix.wav        # Mixed audio
│   ├── 0000000_event_0.wav    # Event audio
│   └── ...
```

All audio files are mono with a 48kHz sample rate.

Parquet File (`asfx_sed_metadata.parquet`)


Each row corresponds to a single audio sample. The following fields are included:

  • events (list): List of event dicts before RMS relabeling (see below)

  • background (dict): Background audio metadata

  • background_caption (str): Description of the background audio

  • events_loudness (list): Loudness values for each event (in dB) before RMS relabeling

  • events_caption (list): Caption for each event

  • events_ucs_category (list): UCS category for each event (https://universalcategorysystem.com/)

  • events_caption_range (list): Start and end times for each event occurrence, in seconds

  • events_id (list): Event IDs

  • id (str): Unique sample ID for mixture

 

RMS relabeling:


During dataset synthesis, we analyze the RMS (root mean square) energy of each event to identify and relabel silent segments as negative examples. As a result, a single original event may be split into two or more events after relabeling. The "events" and "events_loudness" fields contain metadata for each event before RMS relabeling, while "events_caption", "events_ucs_category", "events_caption_range", and "events_id" correspond to each event after relabeling. If an event is split into multiple segments, the lists in these latter fields will be longer than those in the former.

Example of an `events` entry (list of dicts):

```
[
  {
    "id": "...",
    "sample_rate": 48000,
    "wav": "...wav",
    "duration": 1.23,
    "caption": "...",
    "ucs_category": "...",
    "start_time": 0.0,
    "end_time": 1.23
  },
  ...
]
```

Example of a `background` entry (dict):


```
{
  "id": "...",
  "sample_rate": 48000,
  "wav": "...wav",
  "duration": 90.1,
  "caption": "...",
  "ucs_category": "..."
}
```

JSON Files


Each JSON file in `asfx_sed/` contains the same fields as a row in the Parquet file, but for a single sample. The corresponding audio files are in the same folder.

Usage Example

Loading the Parquet Metadata


```python
import pandas as pd
metadata = pd.read_parquet('asfx_sed_metadata.parquet')
print(metadata.head())
```

Accessing Audio and JSON


```python
import json
with open('asfx_sed/0000000.json', 'r') as f:
    sample = json.load(f)
print(sample['background_caption'])
```

PyTorch DataLoader Example

 

A simple PyTorch `Dataset` and `DataLoader` for this dataset is provided in `dataloader_example.py`.

Example Usage


```python
from dataloader_example import ASFX_SED_Dataset
from torch.utils.data import DataLoader

dataset = ASFX_SED_Dataset(
    parquet_path='asfx_sed_metadata.parquet',
    audio_dir='asfx_sed/'
)
dataloader = DataLoader(dataset, batch_size=8, shuffle=True)

for batch in dataloader:
    print(batch['id'])
    print(batch['background_caption'])
    # batch['audio'] is a list of numpy arrays (waveforms)
    break
```

See the code and comments in `dataloader_example.py` for details on how to customize loading, audio processing, and batching.

Citation

 

If you use this dataset in your research or find it helpful, please cite the following paper:

```
@inproceedings{wu2025flam,
title={{FLAM}: Frame-Wise Language-Audio Modeling},
author={Yusong Wu and Christos Tsirigotis and Ke Chen and Cheng-Zhi Anna Huang and 
Aaron Courville and Oriol Nieto and Prem Seetharaman and Justin Salamon},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
}
```

---

**Contact:** Yusong Wu (wu.yusong@mila.quebec), Justin Salamon (salamon@adobe.com)

**License:** ADOBE RESEARCH LICENSE (see LICENSE.md)

Files

LICENSE.md

Files (15.8 GB)

Name Size Download all
md5:bcd40f74c0d4cd7cb7a62e991d1e31a3
15.8 GB Download
md5:c35fa08053a526604e21e9eb6cf52fb9
4.3 MB Download
md5:1950dbb5ffc9cabdd866865f666b4d60
3.3 kB Download
md5:44f444e2d3d55cb3cbc9ec8a8511e278
2.3 kB Preview Download
md5:4f5cbb1639c21f2f98ca3a20800b6ad0
5.2 kB Preview Download

Additional details

Identifiers

Related works

Is described by
Conference paper: arXiv:2505.05335 (arXiv)