README - TTR Data of the ChildPoeDE Corpus
(Lehmann, Heumann, Kuijpers, Lauer & Lüdtke, 2023)
All TTR and MATTR values were calculated in R using the quanteda (http://quanteda.io/) package.
Poem titles were included if present.
doc_id
ID combining poem ID and file name.
types_per_doc
Number of types in the document/poem.
token_per_doc
Number of tokens in the document/poem.
TTR
Type-Token Ratio calculated by dividing the number of types by the total number of tokens.
(https://quanteda.io/reference/textstat_lexdiv.html).
MATTR_w8
Moving-Average Type-Token Ratio with a window of 8.
A window of 8 is the smallest possible unit for which a MATTR can be computed for all poems, since the shortest poem is 9 tokens long.
The Moving-Average Type-Token Ratio (Covington & McFall, 2010) calculates TTRs for a moving window of tokens from the first to the last token, computing a TTR for each window. The MATTR is the mean of the TTRs of each window.
(https://quanteda.io/reference/textstat_lexdiv.html)
MATTR_w15
Moving-Average Type-Token Ratio with a window of 15 (25% quartile of stanza length).
Cannot be calculated for all poems. Value is NA if not applicable.
MATTR_w21
Moving-Average Type-Token Ratio with a window of 21 (median of stanza length).
Cannot be calculated for all poems. Value is NA if not applicable.
MATTR_w25
Moving-Average Type-Token Ratio with a window of 25 (arithmetic mean of stanza length).
Cannot be calculated for all poems. Value is NA if not applicable.
MATTR_w29
Moving-Average Type-Token Ratio with a window of 29 (75% quartile of stanza length).
Cannot be calculated for all poems. Value is NA if not applicable.
Note: The correlation between TTR and MATTR increases with increasing window size (from r=0.52 to r=.68).