Dataset Open Access

Jingju Lyrics Datasets

R. Caro Repetto

In order to study the expressive functions of jingju metrical patterns according to its lyrics, a series of different datasets have been created from the Jingju Lyrics Collection, that has been collected through scraping the online repository of jingju libretti Zhongguo jingju xikao 中国京剧戏考. These datasets have been created for the analysis of lyrics of the banshi yuanban, manban, kuaiban and yaoban both in the shengqiang xipi and erhuang (kuaiban is not used in erhuang) by applying NLP techniques, namely topic modelling and document classification.

Using this dataset

We are interested in knowing if you find our datasets useful! If you use our dataset please email us at and tell us about your research.

Files (15.4 MB)
Name Size
15.4 MB Download
All versions This version
Views 263263
Downloads 5252
Data volume 798.3 MB798.3 MB
Unique views 239239
Unique downloads 4848


Cite as