Published March 10, 2023 | Version v1
Conference paper Open

Understanding the impact of three derived text formats on authorship classification with Delta

Creators

  • 1. Universität Trier, Deutschland
  • 1. Universität Potsdam, Deutschland
  • 2. Digital Humanities im deutschsprachigen Raum e.V., Deutschland
  • 3. University of Luxembourg
  • 4. Universität Trier, Deutschland

Description

Due to copyright law, Text and Data Mining with copyrighted texts faces a lot of restrictions in terms of storage, publication and follow-up use of the resulting corpora, which, however, is against the spirit of open data in digital humanities. As a solution to the problem, the concept of derived text formats (DTFs) have been suggested and discussed. The presented paper did an empirical study by transforming texts into token-based DTFs and provide a review on the usefulness of the transformed texts on authorship classification. The results show that selectively reducing information on individual tokens could ensure, to a certain extent, that the authorship classification results are not affected too much. Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.

Files

DU_Keli_Evaluating_token_based_DTFs_on_authorship_classifica.pdf

Additional details

Related works

Is part of
Book: 10.5281/zenodo.7688632 (DOI)