Understanding the impact of three derived text formats on authorship classification with Delta
Contributors
Editors:
Project members:
- 1. Universität Potsdam, Deutschland
- 2. Digital Humanities im deutschsprachigen Raum e.V., Deutschland
- 3. University of Luxembourg
- 4. Universität Trier, Deutschland
Description
Due to copyright law, Text and Data Mining with copyrighted texts faces a lot of restrictions in terms of storage, publication and follow-up use of the resulting corpora, which, however, is against the spirit of open data in digital humanities. As a solution to the problem, the concept of derived text formats (DTFs) have been suggested and discussed. The presented paper did an empirical study by transforming texts into token-based DTFs and provide a review on the usefulness of the transformed texts on authorship classification. The results show that selectively reducing information on individual tokens could ensure, to a certain extent, that the authorship classification results are not affected too much. Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.
Files
DU_Keli_Evaluating_token_based_DTFs_on_authorship_classifica.pdf
Files
(247.6 kB)
Name | Size | Download all |
---|---|---|
md5:61c7bbdfec4cb017cf99f4f891e8987b
|
213.4 kB | Preview Download |
md5:7d7dfa06cd353fbf473fb85f86776c70
|
34.2 kB | Preview Download |
Additional details
Related works
- Is part of
- Book: 10.5281/zenodo.7688632 (DOI)