Exploring Vocabulary Development of European and Latin American Spanish-Speaking Children: Insights from the Wordbank Dataset
May, 2023
1 Introduction
This paper presents a study on vocabulary development in children, focusing on European and Latin American Spanish-speaking populations. The research was conducted as part of a data analysis class with a specific emphasis on data visualization using R Studio. Vocabulary development plays a critical role in language acquisition and cognitive growth in children. Understanding the variations in vocabulary development among different populations is crucial for gaining insights into the language acquisition process and identifying potential linguistic and cultural influences. The primary objective of this study is to explore and compare the vocabulary development across different age groups among European Spanish and Latin American Spanish-speaking children. By addressing this research question, we aim to contribute to the existing knowledge on vocabulary development and provide valuable insights into the linguistic differences between these two populations.
2 Datasets
To investigate this research question, we employ the Wordbank dataset, a valuable resource that provides comprehensive data on children’s language development across various linguistic contexts. The Wordbank dataset comprises a large collection of parental reports, offering insights into children’s vocabulary acquisition, language exposure, and linguistic milestones. For this study, we extract data from the Wordbank dataset specifically for children growing up in European Spanish-speaking countries, i.e. Spain, and Latin American Spanish-speaking countries, i.e. Mexico. We aim to capture potential variations in vocabulary development influenced by linguistic, cultural, and environmental factors.
Wordbank is a site for archiving, sharing, and exploring anonymized MacArthur-Bates Communicative Development Inventory (CDIs) data from the original English form and from CDI adaptations in many languages (such as Croatian, Danish, English, German, Italian, Norwegian, Russian, Spanish, Swedish, and Turkish). Wordbank compiles responses from norming studies but also includes data that individual researchers have contributed from various research projects, large and small (Frank et al 2021).
Out of 16,868 entries in admins dataframe, we filter it to have only Spanish language in Europe and Mexico and are left with 2,939 entries.
3 Methods
After we get the datasets, we employ several statistical analyses and data visualization methods in R to compare vocabulary size, growth trajectories, or specific word types between these variations of Spanish, for example.
4 Results
From the dataset, there are certain variables that could affect vocabulary development of Spanish-speaking babies in Europe and Mexico. Those variables include age, gender, mothetnal education, and birth order.
4.1 Age
4.2 Age and Gender
The charts in sections 4.1. and 4.2. reveal that during the first 20 months, Spanish-speaking babies comprehend more words than they produce. They continue to build up their vocabularies and convey more when they reach 25 months at about 300 words, except Spanish baby girls from both Spain and Mexico with an average of only around 200 words. By the age of 30 months, they all reach around 400 words in both comprehension and production.
4.3 Mother Education
We can examine if the educational level of mother correlates to vocabulary development of a child.
The bar chart shows an overview of mother’s education of the children speaking Spanish in Spain and Mexico ranging from the lowest level “None” to the highest level “Graduate.” A majority of mothers in Spain possess a graduate and a college degree (514 and 263 respectively), while in Mexico, most mothers go to some college (361) or have a degree lower than college (1879). It can be concluded that mothers in Spain have higher education than those in Mexico.
4.3.1 Mother’s Education Level and Comprehended Words
To see the correlation between level of mother education and child’s vocabulary development, we make 2 types of visualizations. First of all, we make a box plot. Then, we draw a correlation plot and calculate a correlation coefficient.
4.3.2 Mother’s Education Level and Produced Words
The charts in 4.4.1. and 4.4.2. sections show that the babies raised by mothers who finish a primary degree in Spain understand and express more vocabulary than in Mexico (199.87 and 163.61 VS 188.82 and 132.44 respectively.) On the other hand, babies who grow up with mothers with a secondary school degree in Spain comprehend and produce less words than in Mexico (188.59 and 154.49 VS 254.24 and 254.24 respectively.) The same observation can be found in the college level as well with 182.27 and 152.65 words in Spain VS 525.17 and 525.17 words in Mexico. It is worth noting that the above findings might be biased because there are much more entries in the Mexico dataset when compared to Spain (391,419 VS 169,446 for comprehension and 312,395 VS 136242 for production) as in the table below.
| language | comprehension_count | production_count |
|---|---|---|
| Spanish (European) | 169,446 | 136,242 |
| Spanish (Mexican) | 391,419 | 312,395 |
4.3.3 Correlation between Mother’s Education and Vocabulary Development of Spanish-speaking Babies in Europe and Mexico
We can see that the correlation between education level of mother and vocabulary development in terms of both language production and comprehension of a child is weak to moderate (value less than 0.60) in both Spain and Mexico.
4.4 Birth Order
For Spanish-speaking babies in Europe, where mothers typically have up to three children, there is a consistent pattern in vocabulary development across birth order. The mean of vocabulary production and comprehension shows only a slight decline from the first-born to the second-born and further to the third-born child. This suggests that as the number of siblings increases, there might be a slight decrease in vocabulary development.
On the contrary, for babies who speak Spanish (Mexican), their mothers can have up to seven children, and vocabulary development tends to decrease with an increasing number of children, particularly in terms of language production. The mean of vocabulary production shows a noticeable decline as the birth order increases, indicating a potential impact of larger family size on language development.
These findings suggest that the cultural and social context, including family size and dynamics, may play a role in shaping vocabulary development among Spanish-speaking children.
4.4.1 Correlation between Birth Order and Vocabulary Development of Spanish-speaking Babies in Europe and Mexico
| Language | Production | Comprehension |
|---|---|---|
| Spanish (European) | -0.9975905 | -0.9453150 |
| Spanish (Mexican) | -0.4564367 | -0.5220962 |
The results showed significant correlations between birth order and vocabulary production as well as comprehension. In Spanish (European), a strong negative correlation was observed for both production (-0.998) and comprehension (-0.945), indicating that higher birth order was associated with lower levels of vocabulary production and comprehension. In Spanish (Mexican), although the correlations were still negative, they were relatively weaker, with production showing a moderate negative correlation (-0.456) and comprehension showing a slightly stronger moderate negative correlation (-0.522). These findings suggest that birth order may have a significant influence on vocabulary development, particularly in Spanish (European), where the impact appears to be more pronounced.
5 Conclusions and Discussions
In this study, we explored the vocabulary development of European and Latin American Spanish-speaking children using the Wordbank dataset. Our research question focused on understanding the variations in vocabulary development across age groups between these two populations.
Our analysis revealed interesting insights into the vocabulary development of Spanish-speaking children. Firstly, we found that during the first 20 months, babies comprehend more words than they produce, and their vocabulary continues to grow steadily. By the age of 30 months, children in both groups reached a vocabulary size of around 400 words.
Additionally, we examined the influence of mother’s education on vocabulary development. We observed that children raised by mothers with higher education levels tend to have a greater vocabulary in both comprehension and production. However, it is important to note that the educational levels of mothers differed between Spain and Mexico, with a higher proportion of mothers in Spain having graduate or college degrees compared to Mexico.
Furthermore, the correlation analysis showed a weak to moderate correlation between the mother’s education level and vocabulary development in both Spain and Mexico. This indicates that while there is some association between the two variables, other factors may also contribute to vocabulary development.
Lastly, The analysis of birth order reveals interesting patterns in vocabulary development among Spanish-speaking children in both Spain and Mexico. In Spain, there is a consistent decline in vocabulary production and comprehension from the first-born to the third-born child. Similarly, in Mexico, vocabulary production decreases as the birth order increases. These findings suggest that family size may influence language development, with larger families potentially impacting vocabulary growth. Understanding the influence of birth order can inform interventions to support Spanish-speaking children’s language development based on their unique family dynamics.
Overall, our findings suggest that vocabulary development varies across age groups and is influenced by factors such as birth order. However, further research is needed to explore additional factors that may contribute to the observed variations in vocabulary development among Spanish-speaking children.
This study provides valuable insights into the vocabulary development of European and Latin American Spanish-speaking children and highlights the importance of understanding language acquisition processes in different populations. These findings can inform educational practices and interventions aimed at promoting optimal language development in Spanish-speaking children.
6 References
Frank, Michael C., Braginsky, Mika, Yurovsky, Daniel & Marchman, Virginia A. 2021. Variability and Consistency in Early Language Learning: The Wordbank Project. Cambridge, MA: MIT Press.