Published September 16, 2021 | Version 1.0.0
Dataset Open

Cross-Register Authorship Attribution Corpus

  • 1. Indiana University Bloomington
  • 2. Shanghai Normal University

Description

This corpus contains writings of eight authors known to have written in both vernacular and classical Chinese. The corpus has 4.2 million Chinese characters and can be useful in authorship identification research.

The file README.md contains a full description of the data.

All materials in this archive are in the public domain.

 

 

Files

cross-register-authorship-attribution-corpus-v1.0.0.zip

Files (8.0 MB)