Zeta & Eta: An Exploration and Evaluation of two Dispersion-based Measures of Distinctiveness

doi:10.5281/zenodo.5532519

Published September 27, 2021 | Version v2

Preprint Open

Zeta & Eta: An Exploration and Evaluation of two Dispersion-based Measures of Distinctiveness

1. University of Trier

In Corpus Linguistics, numerous statistical measures have been adopted to analyze large amounts of textual data in a contrastive perspective, in order to extract characteristic or “distinctive” features. While the most widely-used keyness measures are based on word frequency, an increasing number of research papers recently suggested dispersion-based measures as a better solution. These, however, are not new to Computational Literary Studies (CLS). In 2007, John Burrows introduced Zeta, a statistical measure that is mainly based on the degree of dispersion of a feature in a text corpus. In this paper, we also introduce Eta, a new measure of distinctiveness that is based on deviation of proportions suggested by Stefan Gries. By comparing Eta with Zeta, we demonstrate that both measures are able to identify relevant, interpretable distinctive words in a target corpus. Additionally, we make a first attempt to detect the key differences between these two measures by interpreting the top distinctive words.

DFG Schwerpunktprogramm SPP 2207 "Computational Literary Studies"

Online:

Teilprojekt: "Zeta und Konsorten. Distinktivitätsmaße für die Digitalen Literaturwissenschaften"

Online:

Files

Du-Dudar-Rok-Schoech_2021_Zeta-and-Eta-CHR2021.pdf

Files (1.2 MB)

Name	Size	Download all
Du-Dudar-Rok-Schoech_2021_Zeta-and-Eta-CHR2021.pdf md5:75cc8eb143cd42767a1ca3f7cc211c28	1.2 MB	Preview Download

	All versions	This version
Views	289	228
Downloads	103	70
Data volume	145.4 MB	94.8 MB

Zeta & Eta: An Exploration and Evaluation of two Dispersion-based Measures of Distinctiveness

Creators

Description

Files

Du-Dudar-Rok-Schoech_2021_Zeta-and-Eta-CHR2021.pdf

Files (1.2 MB)