Dataset Open Access

Jingju a cappella singing dataset part2

Rong Gong; Rafael Caro Repetto

这个京剧清唱数据库包含有120个唱段、1265个唱句。此数据库是CompMusic项目所有数据库的一个组成部分(http://compmusic.upf.edu/corpora, http://compmusic.upf.edu/datasets)。CompMusic前期所使用的另一个京剧数据库可以在这里找到(https://doi.org/10.5281/zenodo.344932)。我们邀请了专业和业余的京剧演员参与到录音过程当中,大部分的京剧音乐元素都被囊括在了这个数据库中。此外,它还包含有每个唱段和每个唱句的元数据,以供自动演唱评价的研究使用。

This is an jingju (also known as Beijing or Peking opera) a cappella singing audio dataset which consists of 120 arias, accounting for 1265 melodic lines. This dataset is also an extension our existing CompMusic jingju corpora (http://compmusic.upf.edu/corpora) and dataset (http://compmusic.upf.edu/datasets), for example, Jingju a cappella singing dataset part1 (https://doi.org/10.5281/zenodo.344932). Both professional and amateur singers were invited to the dataset recording sessions, and the most common jingju musical elements have been covered. This dataset is also accompanied by metadata per aria and melodic line annotated for automatic singing evaluation research purpose.

 

艺术家 Artists:

我们邀请了5位专业的京剧演员(中国戏曲学院,他们都有丰富的舞台表演和教学经验)和4位非艺术类高校京剧社团的业余京剧演员。

We invited 5 professional singers from NACTA (National Academy of Chinese Theatre Arts, all of them have rich experience in stage performance and teaching) and another 4 amateur singers from jingju associations in non-art schools to the recording sessions. 

 

伴奏 Accompaniment:

7位演员(3位专业和4位业余)跟随商业录音伴奏;另外2位专业演员由专业京胡乐手伴奏(中国戏曲学院)。

7 singers (3 professional and 4 amateurs) were singing along with the accompaniment of commercial audio recordings; other 2 professional singers were accompanied by 2 professional jinghu players (NACTA).

 

数据库的覆盖性,完整性,质量和重复利用性 Coverage, completeness, quality and reusability:

  1. 覆盖性: 数据库包含三个主要的京剧行当 - 老生、旦和净;两个主要声腔 - 西皮和二黄,和一些附属声腔,比如四平调、南梆子;包含所有的有节拍的板式 - 原版、慢板、快板、二六、流水、三眼和他们的变化板式。Coverage: The dataset includes the three main role-types - laosheng, dan and jing; two main shengqiang - xipi and erhuang, and a few auxiliary ones, such as sipingdiao, nanbangzi; the whole range of metered banshi - yuanban, manban, kuaiban, erliu, liushui, sanyan and its three variations.
  2. 完整性: 数据库包含有录音和唱句层级的元数据,由Excel spreadsheet格式保存。对于录音层级,元数据包括唱段名、行当、声腔、板式、是否由京胡伴奏。对于唱句层级,每一句都包含行当、声腔、板式、上下句、唱词和所匹配的MusicXML曲谱(有需要曲谱请联系作者)。Completeness: The dataset contains the metadata of the recordings and annotations both at the recording and the line level, organized in separate spreadsheets. For the recordings, the metadata contains the title of the work in Chinese, role-type, shengqiangbanshi, whether it contains jinghu accompaniment. As for the lines, each of them is annotated with the role-type, shengqiang, banshi, line type, that is, opening or closing, the lyrics for the whole line and the related score in the score collection (available on request).
  3. 质量: 一小部分的录音带有中等程度的房间混响和轻微的背景噪声。其余的录音质量都很好。Quality: A small number of the recordings contain medium room reverberation and minor background noise. However, apart from those, the other recordings are dry, clean and of good quality.
  4. 重复利用性: 所有数据库音频和元数据都由Creative Commons Attribution-NonCommercial 4.0 International方式授权。Reusability: All the audio and metadata files in this dataset are licensed under Creative Commons Attribution-NonCommercial 4.0 International.

 

标注 Annotation:

数据库包含一部分录音的唱句起始位置和音节起始位置标注,标注格式为Praat TextGrid。唱句标注包含有每一唱句的歌词,此歌词从曲谱提取,并不与实际演唱一致;音节标注包含拼音,经过作者修正,试图与演唱发音一致。标注的统计如下:

  • 老生唱句数量,音节数量,音节平均时长 (秒),音节时长标准差 (秒): 405, 3941, 1.32, 2.15
  • 旦唱句数量, 音节数量, 音节平均时长 (秒), 音节时长标准差 (秒): 467, 4394, 1.63, 3.25
  • 总体唱句数量, 音节数量, 音节平均时长 (秒), 音节时长标准差 (秒): 872, 8335, 1.48, 2.79

The dataset contains the line and syllable boundary annotation for a part of recordings, in Praat TextGrid format. The line annotation contains the lyrics for each line, which is extracted from the score, and might not coherent with the actual singing; the syllable annotation contains pinyin, corrected by the author to be coherent with the actual singing. The statistics of the annotation are:

  • laosheng num. of lines, num. of syllables, average syllable duration (s), standard deviation (s): 405, 3941, 1.32, 2.15
  • dan num. of lines, num. of syllables, average syllable duration (s), standard deviation (s): 467, 4394, 1.63, 3.25
  • Overall num. of lines, num. of lines, num. of syllables, average syllable duration (s), standard deviation (s): 872, 8335, 1.48, 2.79

 

如需更多信息,请参考下面论文;如果您在工作中使用该数据库,请引用下面论文:

For more information, please refer the following publication and If you use this dataset in your work, please cite the following publication:

Rong Gong, Rafael Caro Repetto, Xavier Serra, “Creating an A Cappella Singing Audio Dataset for Automatic Jingju Singing Evaluation Research,” in 4th International Digital Libraries for Musicology workshop (DLfM 2017), Shanghai, China.

 

联系方式 Contact information:

如果任何问题,请联系作者 If you have any question, please contact the authors:

龚嵘 Rong Gong: Email - rong<dot>gong<at>upf<dot>edu, Wechat id - gongr86

贵云飞 Rafael Caro Repetto: Email - rafael<dot>caro<at>upf<dot>edu 

 

如果您想联系京剧演员 If you want to contact the jingju singers:

廖佳尼 Jiani Liao: Wechat id - v1307624197

邵雅昆 Yakun Shao: Wechat id - S_yakun-

 

或京胡乐手 Or jinghu players:

张蓝天 Lantian Zhang: Wechat id - tian576632395

Files (6.0 GB)
Name Size
annotation.zip md5:bb14c8e95b82f431a7f9e62724c495e6 256.3 kB Download
audio_part1.zip md5:6199d202c8327b47dabc8219fce590be 1.2 GB Download
audio_part2.zip md5:6513ffdb3d51312608f777ba0001eed6 2.0 GB Download
audio_part3.zip md5:6397fa9532bcf0b497f7033ccd56762a 1.1 GB Download
audio_part4.zip md5:88840bfb58919bb7d46af1f73c920d88 888.6 MB Download
audio_part5.zip md5:514bb7c6bf5a43aed27a94228f0c051d 830.4 MB Download
metadata.zip md5:45d3ac368f4d58a4348a2ed283f44717 145.4 kB Download

Share

Cite as