Published July 11, 2025 | Version Version 1
Dataset Open

ID2 : Indonesian Dataset2

  • 1. Universitas Diponegoro
  • 2. Universitas Gadjah
  • 3. ROR icon Universitas Gadjah Mada

Description

ID2 (Indonesian Dataset 2) is an Indonesian speech dataset that features dialectal variations recorded from 31 speakers belonging to various ethnic groups in Indonesia, namely Javanese, Sundanese, Batak, Balinese, and Minang. The speakers comprise both male and female individuals aged between 17 and 25 years. This dataset includes 330 sentences from diverse domains, accompanied by manually created transcriptions. The dataset has a total of 10,230 sentences, spanning 7 hours, 40 minutes, and 48 seconds.

Notes

The recording process was conducted using Audacity software with a sampling rate of 44,100 Hz, 32-bit float type, and a mono channel. Each audio file was saved in WAV (Waveform Audio File Format). The recordings were collected in quiet environments and in a closed, soundproof studio to minimize noise interference during the recording process. The number of speakers from each ethnic group is as follows:

Dialect of an ethnic group Male Female
Javanese 2 3
Sundanese 5 3
Batak 2 2
Bali 4 5
Minang 3 2
Total 16 15
 

 

Audio File Naming Format

For example, ID2JF06OR-0001 follows the format:

  • The first three digits indicate the name of the dataset used.
  • The next single digit denotes the ethnic group of the speaker for the uttered sentence, consisting of:
    • L: Balinese
    • T: Batak
    • J: Javanese
    • M: Minang
    • S: Sundanese
  • The subsequent three digits represent the initials of the recorded individual, where F denotes Female and M denotes Male.

‘OR’ stands for Original, indicating that the data is not augmented.

Files

Files (1.3 GB)

Name Size Download all
md5:fa05ecfdde7661ac16568aa0dde1f4ea
1.3 GB Download

Additional details

Funding

Ministry of Education and Culture
Indonesian Education Scholarship, Center for Higher Education Funding and Assessment, and Indonesian Endowment Fund for Education, Indonesia 01431/J5.2.3./BPI.06/9/2022