README - TTR Data of the ChildPoeDE Corpus
(Lehmann, Heumann, Kuijpers, Lauer & Lüdtke, 2023)

All TTR and MATTR values were calculated in R using the quanteda (http://quanteda.io/) package.
Poem titles were included if present.

doc_id
	ID combining poem ID and file name.

types_per_doc
	Number of types in the document/poem.

token_per_doc
	Number of tokens in the document/poem.

TTR
	Type-Token Ratio calculated by dividing the number of types by the total number of tokens.
	(https://quanteda.io/reference/textstat_lexdiv.html).

MATTR_w8
	Moving-Average Type-Token Ratio with a window of 8.
	A window of 8 is the smallest possible unit for which a MATTR can be computed for all poems, since the shortest poem is 9 tokens 	long.

	The Moving-Average Type-Token Ratio (Covington & McFall, 2010) calculates TTRs for a moving window of tokens from the first to the 	last token, computing a TTR for each window. The MATTR is the mean of the TTRs of each window. 	
         (https://quanteda.io/reference/textstat_lexdiv.html)

MATTR_w15
	Moving-Average Type-Token Ratio with a window of 15 (25% quartile of stanza length).
	Cannot be calculated for all poems. Value is NA if not applicable.

MATTR_w21
	Moving-Average Type-Token Ratio with a window of 21 (median of stanza length).
	Cannot be calculated for all poems. Value is NA if not applicable.

MATTR_w25
	Moving-Average Type-Token Ratio with a window of 25 (arithmetic mean of stanza length).
	Cannot be calculated for all poems. Value is NA if not applicable.

MATTR_w29
	Moving-Average Type-Token Ratio with a window of 29 (75% quartile of stanza length).
	Cannot be calculated for all poems. Value is NA if not applicable.

Note: The correlation between TTR and MATTR increases with increasing window size (from r=0.52 to  r=.68).
