Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese
Description
This dataset includes the raw frequency counts (classical_chinese_learners_vocabularies_raw_frequencies.zip) used in the article Thoughts on “Reliable” Learner’s Vocabularies for Classical and Literary Chinese.
Corpus I – Micheal Loewe (1993)’s Early Chinese Texts
Corpus II – Official Histories (zhengshi 正史)
Corpus III Six Novels (xiaoshuo 小說), as defined in Hsia 1968
The download includes one folder per corpus, structured as follows:
- xx_corpus.csv > list of texts and sources / used versions, token and type counts
- xx_freq_1-1.csv > unigram / character frequencies and counts
- xx_freq_1-4.csv > 1 to 4 character word frequencies and counts, "words" according to Hanyu da cidian 漢語大詞典 (Luo 1986–1994))
- xx_freq_2-4.csv > 2 to 4 character words
Additionally, pca_zhengshi_vs_loewe_vs_xiaoshuo.html is an interactive version of the Principal Component Analysis (PCA) presented in the article, texts from the three corpora are represented using the 1.000 most frequent 1–4 character combinations from the dataset.
Files
classical_chinese_learners_vocabularies_raw_frequencies.zip
Files
(14.7 MB)
Name | Size | Download all |
---|---|---|
md5:ab0b7a78e71838e41a3f313d66de3d02
|
11.8 MB | Preview Download |
md5:628130805a97c8d93246d8f0444bfd20
|
2.9 MB | Download |
Additional details
References
- Schalmey, Tilman (2013). "Corpus-driven Creation of a Reliable Learner's Vocabulary for Classical Chinese", Taiwan huayu jiaoxue yanjiu 臺灣華語 教學研究 / Taiwan Journal of Chinese as a Second Language 7.2, 109–137.
- Loewe, Michael, editor (1993). Early Chinese Texts: A Bibliographical Guide. The Society for the Study of Early China; The Institute of East Asian Studies, Berkeley.
- Hsia Chih-tsing (1968). The Classic Chinese Novel: A Critical Introduction. Columbia University Press, New York and London.
- Luo Zhufeng, editor (1986–1994). Hanyu da cidian 漢語大詞典, volume 1-13. Cishu chubanshe, Shanghai .