Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published April 28, 2021 | Version 0.1.0
Dataset Open

Leibniz-ZAS corpus of MAIN

  • 1. Leibniz-Centre General Linguistics (ZAS)


The presented dataset is part of the narrative corpus collected at Leibniz-Centre General Linguistics (Leibniz-ZAS). It contains transcriptions of oral narratives elicited with the Multilingual Assessment Instrument for Narratives (MAIN; read more here), developed as part of the LITMUS battery of tests in the framework of COST Action IS0804 Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment. Narratives were elicited in the Russian, Turkish and German languages in the telling elicitation mode using two MAIN picture stories, Baby Birds and Baby Goats. The data were collected during two large-scale longitudinal studies conducted at ZAS in the framework of the Berlin Interdisciplinary Network for Multilingualism (BIVEM) and Interdisciplinary Research Alliance (IFV) projects (more information about the studies). The participants of the studies were Russian-German and Turkish-German bilingual children from different areas of Berlin. Their language development was closely documented every year from early kindergarten up to the end of the third grade of primary school (age 2;9 to 10;4 years). It is the longest and largest study of language development in bilingual children in Germany allowing for cross-sectional and longitudinal analyses from a cross-linguistic perspective.

The narratives were audio recorded and transcribed in the standardized CHAT format (MacWhinney, 2000) using the CLAN program according to the CHILDES transcription rules for later analysis. The transcriptions can be used to analyze the narrative abilities of bilingual children on macro- and microstructural levels (more information can be found here).

In total, the dataset contains 210 transcriptions of narratives from 29 participants (10 Russian-German bilingual children and 19 Turkish-German bilingual children), who were tested 5 times after the initial testing (pretest). The 5 testing points are therefore referred to as posttests: post1, post2, post3, post4, post5, post6 (this dataset does not contain data from post5, as oral narratives were not elicited at the end of the second grade). The corresponding age ranges at all testing points are given below for each part of the dataset. The dataset is divided into two parts, Russian-German and Turkish-German narrative corpus respectively.

The narrative corpus of Russian-German bilingual children includes two folders with narratives elicited in Russian and German, at 5 testing points.

Total number of transcriptions=100

Number of children=10

Total age range=2;9-10;4

Age range of children for narratives in Russian at each testing point:

post 1: 2;9-4;3 (kindergarten)

post 2: 3;9-5;2 (kindergarten)

post 3: 4;9-6;1 (kindergarten)

post 4: 6;9-7;6 (end of first grade)

post 6: 8;7-9;10 (end of third grade)

Age range of children for narratives in German at each testing point:

post 1: 2;10-4;3 (kindergarten)

post 2: 3;9-5;3 (kindergarten)

post 3: 4;9-6;2 (kindergarten)

post 4: 6;9-7;6 (end of first grade)

post 6: 8;8-10;4 (end of third grade)

The narrative corpus of Turkish-German bilingual children includes two folders.

One folder contains narratives elicited in German at the earlier 3 testing points, which allows the analysis of early narrative development in one language.

Total number of transcriptions=30

Number of children=10

Total age range=3;5-6;4

Age range of children for narratives in German at each testing point:

post 1: 3;5-4;3 (kindergarten)

post 2: 4;4-5;4 (kindergarten)

post 3: 5;3-6;4 (kindergarten)

Another folder contains narratives elicited in both languages, Turkish and German, at 4 testing points starting from post2 and allowing for the analysis of narrative development up to the third grade in both languages.

Total number of transcriptions=80

Number of children=10

Total age range=3;10-9;9

Age range of children for narratives in Turkish at each testing point:

post 2: 3;10-5;1 (kindergarten)

post 3: 4;9-6;1 (kindergarten)

post 4: 6;5-7;8 (end of first grade)

post 6: 8;6-9;9 (end of third grade)

Age range of children for narratives in German at each testing point:

post 2: 4;1-5;4 (kindergarten)

post 3: 5;1-6;4 (kindergarten)

post 4: 6;6-7;8 (end of first grade)

post 6: 8;5-9;8 (end of third grade)

The files are named according to the following pattern: child’s code (letters refer to child’s first languages: r-Russian, t-Turkish), test (MAIN), story (bb=Baby Birds, bg=Baby Goats), language of elicitation (de/ru/tr), testing point (1=post1, 2=post2 etc.), and child’s age (year/month). Here is an example: r009_MAIN_bb_de_4_610.


Files (220.9 kB)

Name Size Download all
220.9 kB Preview Download

Additional details


  • Gagarina, Natalia, Klop, Daleen, Kunnari, Sari, Tantele, Koula, Välimaa, Taina, Balčiūnienė, Ingrida, Bohnacker, Ute & Walters, Joel. 2012. MAIN: Multilingual Assessment Instrument for Narratives. ZAS Papers in Linguistics, 56. Berlin: ZAS.
  • Narrative abilities in bilingual children. Applied Psycholinguistics, 37, special issue. 2016
  • Bohnacker, Ute & Gagarina, Natalia (Eds.). 2020. Developing Narrative Comprehension: Multilingual Assessment Instrument for Narratives. [Studies in Bilingualism, 61]. Amsterdam/Philadelphia: John Benjamins