This Rmarkdown
script contains the full results supporting the main paper (but very little interpretation). As described in detail in the README.md
document, this script uses various types of input data (linguistic and genetic) and multiple methods of analysis.
WALS uses a categorical classification with 3 ordered categories ‘None’ < ‘Simple’ < ‘Complex.’ There are 513 languages with data.
None | Simple | Complex |
---|---|---|
301 | 127 | 85 |
LAPSyD gives both a categorical classification with 5 ordered categories ‘None’ < ‘Simple’ < ‘Complex,’ and the actual count of tones. There are 569 languages with data.
None | Marginal | Simple | Moderately complex | Complex |
---|---|---|---|---|
386 | 8 | 94 | 39 | 42 |
This uses a categorical classification with 2 (presence/absence) categories ‘No’ and ‘Yes.’ There are 60 languages with data.
No | Yes |
---|---|
30 | 30 |
PHOIBLE gives the actual count in 2030 languages with data.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
1495 | 4 | 148 | 173 | 101 | 60 | 25 | 11 | 4 | 6 | 3 |
WPHON gives the actual count in 3160 languages with data.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2193 | 3 | 427 | 222 | 174 | 66 | 43 | 11 | 15 | 2 | 1 | 2 | 1 |
None | Marginal | Simple | Moderately complex | Complex | |
---|---|---|---|---|---|
None | 229 | 2 | 1 | 0 | 0 |
Simple | 4 | 4 | 59 | 12 | 4 |
Complex | 1 | 0 | 2 | 16 | 24 |
Test statistic | df | P value |
---|---|---|
515.4 | 8 | 3.407e-106 * * * |
Test statistic | df | P value |
---|---|---|
515.4 | NA | 9.999e-05 * * * |
No | Yes | |
---|---|---|
None | 12 | 1 |
Simple | 0 | 5 |
Complex | 0 | 5 |
Test statistic | df | P value |
---|---|---|
19.3 | 2 | 6.44e-05 * * * |
Test statistic | df | P value |
---|---|---|
19.3 | NA | 9.999e-05 * * * |
No | Yes | |
---|---|---|
None | 12 | 0 |
Marginal | 0 | 1 |
Simple | 0 | 1 |
Moderately complex | 0 | 2 |
Complex | 0 | 4 |
Test statistic | df | P value |
---|---|---|
20 | 4 | 0.0004994 * * * |
Test statistic | df | P value |
---|---|---|
20 | NA | 9.999e-05 * * * |
None | Simple | Complex | |
---|---|---|---|
0 | 272 | 70 | 41 |
1 | 1 | 0 | 0 |
2 | 1 | 17 | 4 |
3 | 2 | 7 | 12 |
4 | 0 | 6 | 6 |
5 | 0 | 4 | 7 |
6 | 1 | 6 | 3 |
7 | 0 | 1 | 2 |
8 | 0 | 0 | 1 |
9 | 0 | 0 | 3 |
10 | 0 | 0 | 2 |
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
wa_tone | 2 | 371.4 | 185.7 | 76.53 | 1.826e-29 |
Residuals | 466 | 1131 | 2.427 | NA | NA |
diff | lwr | upr | p adj | |
---|---|---|---|---|
Simple-None | 1.225 | 0.8137 | 1.637 | 3.976e-11 |
Complex-None | 2.292 | 1.829 | 2.754 | 1.3e-11 |
Complex-Simple | 1.066 | 0.5312 | 1.602 | 1.099e-05 |
None | Marginal | Simple | Moderately complex | Complex | |
---|---|---|---|---|---|
0 | 314 | 6 | 57 | 13 | 15 |
1 | 1 | 0 | 0 | 0 | 0 |
2 | 2 | 1 | 13 | 0 | 1 |
3 | 0 | 0 | 3 | 9 | 3 |
4 | 0 | 0 | 2 | 4 | 2 |
5 | 0 | 0 | 1 | 3 | 6 |
6 | 1 | 0 | 1 | 3 | 0 |
7 | 0 | 0 | 1 | 0 | 1 |
8 | 0 | 0 | 0 | 0 | 1 |
9 | 0 | 0 | 0 | 0 | 2 |
10 | 0 | 0 | 0 | 0 | 1 |
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
la_tone | 4 | 368.3 | 92.06 | 61.43 | 1.324e-41 |
Residuals | 462 | 692.3 | 1.499 | NA | NA |
diff | lwr | upr | p adj | |
---|---|---|---|---|
Marginal-None | 0.2511 | -1.03 | 1.532 | 0.9835 |
Simple-None | 0.7475 | 0.3239 | 1.171 | 1.812e-05 |
Moderately complex-None | 2.34 | 1.719 | 2.962 | 4.919e-12 |
Complex-None | 2.84 | 2.219 | 3.462 | 4.874e-12 |
Simple-Marginal | 0.4963 | -0.8264 | 1.819 | 0.8425 |
Moderately complex-Marginal | 2.089 | 0.6904 | 3.488 | 0.0004863 |
Complex-Marginal | 2.589 | 1.19 | 3.988 | 5.735e-06 |
Moderately complex-Simple | 1.593 | 0.8892 | 2.297 | 1.263e-08 |
Complex-Simple | 2.093 | 1.389 | 2.797 | 4.965e-12 |
Complex-Moderately complex | 0.5 | -0.3381 | 1.338 | 0.4765 |
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
13.86 | 465 | 8.444e-37 * * * | two.sided | 0.5406 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
7516233 | 1.916e-39 * * * | two.sided | 0.5572 |
No | Yes | |
---|---|---|
0 | 20 | 7 |
1 | 0 | 0 |
2 | 1 | 3 |
3 | 0 | 3 |
4 | 0 | 2 |
5 | 0 | 3 |
6 | 0 | 0 |
7 | 0 | 0 |
8 | 0 | 1 |
9 | 0 | 0 |
10 | 0 | 0 |
Test statistic | df | P value | Alternative hypothesis |
---|---|---|---|
-4.264 | 19.13 | 0.0004133 * * * | two.sided |
mean in group No | mean in group Yes |
---|---|
0.09524 | 2.421 |
None | Simple | Complex | |
---|---|---|---|
0 | 270 | 9 | 3 |
1 | 0 | 0 | 0 |
2 | 13 | 73 | 9 |
3 | 6 | 24 | 18 |
4 | 1 | 11 | 20 |
5 | 0 | 2 | 10 |
6 | 0 | 1 | 8 |
7 | 0 | 0 | 2 |
8 | 0 | 0 | 3 |
9 | 0 | 0 | 1 |
10 | 0 | 0 | 0 |
11 | 0 | 0 | 0 |
12 | 0 | 0 | 1 |
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
wa_tone | 2 | 1094 | 546.8 | 490 | 7.358e-117 |
Residuals | 482 | 537.9 | 1.116 | NA | NA |
diff | lwr | upr | p adj | |
---|---|---|---|---|
Simple-None | 2.151 | 1.882 | 2.421 | 5.838e-11 |
Complex-None | 3.954 | 3.633 | 4.276 | 5.838e-11 |
Complex-Simple | 1.803 | 1.438 | 2.169 | 5.838e-11 |
None | Marginal | Simple | Moderately complex | Complex | |
---|---|---|---|---|---|
0 | 334 | 1 | 11 | 2 | 1 |
1 | 0 | 0 | 1 | 0 | 0 |
2 | 19 | 6 | 49 | 6 | 6 |
3 | 3 | 0 | 11 | 20 | 6 |
4 | 2 | 0 | 6 | 6 | 9 |
5 | 0 | 0 | 2 | 1 | 7 |
6 | 0 | 0 | 1 | 0 | 4 |
7 | 0 | 0 | 0 | 0 | 1 |
8 | 0 | 0 | 0 | 0 | 3 |
9 | 0 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 | 0 |
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
la_tone | 4 | 868.5 | 217.1 | 278.1 | 6.119e-127 |
Residuals | 513 | 400.6 | 0.7808 | NA | NA |
diff | lwr | upr | p adj | |
---|---|---|---|---|
Marginal-None | 1.561 | 0.6375 | 2.484 | 4.595e-05 |
Simple-None | 1.97 | 1.672 | 2.267 | 2.014e-10 |
Moderately complex-None | 2.732 | 2.304 | 3.16 | 2.014e-10 |
Complex-None | 4.063 | 3.645 | 4.48 | 2.014e-10 |
Simple-Marginal | 0.4092 | -0.5438 | 1.362 | 0.7655 |
Moderately complex-Marginal | 1.171 | 0.1699 | 2.173 | 0.01257 |
Complex-Marginal | 2.502 | 1.505 | 3.499 | 3.885e-10 |
Moderately complex-Simple | 0.7623 | 0.273 | 1.252 | 0.0002305 |
Complex-Simple | 2.093 | 1.613 | 2.573 | 2.014e-10 |
Complex-Moderately complex | 1.331 | 0.7601 | 1.901 | 4.023e-09 |
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
31.51 | 516 | 2.475e-122 * * * | two.sided | 0.8112 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
3591376 | 2.359e-142 * * * | two.sided | 0.845 |
No | Yes | |
---|---|---|
0 | 22 | 1 |
1 | 0 | 0 |
2 | 1 | 6 |
3 | 0 | 7 |
4 | 0 | 4 |
5 | 0 | 0 |
6 | 0 | 1 |
7 | 0 | 2 |
8 | 0 | 0 |
9 | 0 | 0 |
10 | 0 | 0 |
11 | 0 | 0 |
12 | 0 | 0 |
Test statistic | df | P value | Alternative hypothesis |
---|---|---|---|
-8.362 | 22.18 | 2.64e-08 * * * | two.sided |
mean in group No | mean in group Yes |
---|---|
0.08696 | 3.286 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 918 | 1 | 8 | 6 | 8 | 3 | 1 | 0 | 0 | 0 | 0 |
1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 131 | 0 | 27 | 21 | 16 | 7 | 2 | 1 | 1 | 0 | 0 |
3 | 48 | 0 | 17 | 37 | 6 | 5 | 6 | 2 | 1 | 0 | 0 |
4 | 43 | 0 | 14 | 4 | 13 | 6 | 7 | 2 | 1 | 0 | 0 |
5 | 9 | 0 | 2 | 5 | 4 | 7 | 1 | 0 | 0 | 1 | 1 |
6 | 5 | 0 | 1 | 4 | 1 | 2 | 2 | 1 | 0 | 1 | 0 |
7 | 3 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
8 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 2 | 0 |
9 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
26.83 | 1428 | 9.157e-129 * * * | two.sided | 0.579 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
1.97e+08 | 3.554e-138 * * * | two.sided | 0.5959 |
The 5-level coding in LAPSyD is too fine-grained, especially “Marginal” is very rare, and seemingly quite similar with “Simple” (rather than “None”) in its behaviour in the other data sets. On the other hand, “Moderately complex,” while quite similar with “Complex” (but not “Simple”), seems to have an identity of its own. Thus, I collapsed “Marginal” into “Simple,” resulting in a 4-way classification: “None” < “Simple” < “Moderately complex” < “Complex.”
With this (and as a reminder), the sources contain the following information:
I designed a set of rules for deciding on a set of two “agreement” categorical classifications, based on a precedence of the sources and the patterns of (dis)agreement between them:
More precisely, I preferred to use manually-curated categorical classifications to count sources, resulting in the following (rough) ordering: LAPSyD > WALS > Dediu & Ladd (2007) > WPHON > PHOIBLE.
For the sources that give actual numbers (i.e., counts of tones or tone symbols), we observe that 1
is very rare, probably signalling coding errors, marginal systems (“pitch-accent”) or theoretical arguments, so they can probably be safely collapsed it into 2
, and then move everything “one step down” (i.e., 2 → 1, 3 → 2, etc) so we have a continuum of counts from 0 onward. With this, the pairwise correlations between the count sources become:
Thus, the main idea is to use LAPSyD wherever these data exists, followed by WPHON and finally PHOIBLE (thus with precedence LAPSyD > WPHON > PHOIBLE). Please note that the counts in WPHON and PHOIBLE are “corrected” to better map on those in LAPSyD and to “predict” missing data, using quadratic regression (i.e., the “corrected” counts are computed as WPHONcorr = 0.079 +0.919WPHON -0.04WPHON2, and PHOIBLEcorr = 0.394 +0.68PHOIBLE -0.037PHOIBLE2, respectively).
# languages with data: 3798:
No | Yes |
---|---|
2541 | 1257 |
# languages with data: 3785:
None | Simple | Complex |
---|---|---|
2538 | 936 | 311 |
Rounded
# languages with data: 3785:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 8 | 10 |
---|---|---|---|---|---|---|---|---|
2544 | 516 | 524 | 114 | 56 | 26 | 3 | 1 | 1 |
Unrounded
# languages with data: 3785:
No | Yes | |
---|---|---|
None | 297 | 4 |
Simple | 4 | 123 |
Complex | 0 | 85 |
Test statistic | df | P value |
---|---|---|
480.7 | 2 | 4.049e-105 * * * |
Test statistic | df | P value |
---|---|---|
480.7 | NA | 9.999e-05 * * * |
None | Simple | Complex | |
---|---|---|---|
None | 298 | 3 | 0 |
Simple | 4 | 119 | 4 |
Complex | 1 | 2 | 82 |
Test statistic | df | P value |
---|---|---|
921 | 4 | 4.723e-198 * * * |
Test statistic | df | P value |
---|---|---|
921 | NA | 9.999e-05 * * * |
No | Yes | |
---|---|---|
None | 385 | 1 |
Simple | 0 | 102 |
Moderately complex | 0 | 39 |
Complex | 0 | 42 |
Test statistic | df | P value |
---|---|---|
564.4 | 3 | 5.148e-122 * * * |
Test statistic | df | P value |
---|---|---|
564.4 | NA | 9.999e-05 * * * |
None | Simple | Complex | |
---|---|---|---|
None | 386 | 0 | 0 |
Simple | 0 | 102 | 0 |
Moderately complex | 0 | 12 | 27 |
Complex | 0 | 0 | 42 |
Test statistic | df | P value |
---|---|---|
1028 | 6 | 7.755e-219 * * * |
Test statistic | df | P value |
---|---|---|
1028 | NA | 9.999e-05 * * * |
Rounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
Inf | 567 | 0 * * * | two.sided | 1 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
0 | 0 * * * | two.sided | 1 |
Unrounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
Inf | 567 | 0 * * * | two.sided | 1 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
0 | 0 * * * | two.sided | 1 |
No | Yes | |
---|---|---|
No | 30 | 0 |
Yes | 0 | 30 |
Test statistic | df | P value |
---|---|---|
56.07 | 1 | 7.005e-14 * * * |
Test statistic | df | P value |
---|---|---|
60 | NA | 9.999e-05 * * * |
None | Simple | Complex | |
---|---|---|---|
No | 24 | 0 | 0 |
Yes | 2 | 8 | 13 |
Test statistic | df | P value |
---|---|---|
39.61 | 2 | 2.502e-09 * * * |
Test statistic | df | P value |
---|---|---|
39.61 | NA | 9.999e-05 * * * |
Rounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
41.42 | 2028 | 2.854e-272 * * * | two.sided | 0.677 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
358843117 | 0 * * * | two.sided | 0.7426 |
Unrounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
38.8 | 2028 | 9.067e-247 * * * | two.sided | 0.6527 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
488625970 | 1.276e-243 * * * | two.sided | 0.6495 |
Rounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
156 | 3158 | 0 * * * | two.sided | 0.9408 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
136411420 | 0 * * * | two.sided | 0.9741 |
Unrounded
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
168.4 | 3158 | 0 * * * | two.sided | 0.9486 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
689289938 | 0 * * * | two.sided | 0.8689 |
These three agreement tone codings were obtain using the full information from the 5 sources, but, of course, we have information about much fewer languages for this study, so that we end up using fewer languages here.
After this sub-setting, in summary, I used 5 primary sources:
1
tone into the original 2
tones and moving all tones one step down (i.e., original 2
tones become 1
tone),1
tone into the original 2
tones and moving all tones one step down (i.e., original 2
tones become 1
tone), and1
tone into the original 2
tones and moving all tones one step down (i.e., original 2
tones become 1
tone),From these, I built 3 “agreement” combined and reconciled measures:
However, for the analyses reported here, I used the following variables:
For counts, I will also use the unrounded (i.e., raw) “counts,” n_tones_raw, varying between 0 to 10 tones (mean 0.67 and median 0.0793991), to avoid any biases induced by numerically rounding to integer counts.
# languages with data: 321:
No | Yes |
---|---|
251 | 70 |
# languages with data: 314:
None | Simple | Complex |
---|---|---|
248 | 39 | 27 |
# languages with data: 314:
0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
249 | 26 | 23 | 6 | 5 | 3 | 2 |
# languages with data: 314:
There are 314 languages with data for binary, 3-way and counts.
I will denote the “derived” alleles of ASPM and MCPH1 (Microcephalin) as ASPM-D and MCPH1-D, respectively.
ASPM-D this was originally defined in relation to “haplotype 63” and two of its polymorphic nonsynonymous sites in exon 18 in an open reading frame (ORF), A44871G and C45126A with the ancestral alleles, respectively, A and C, and the derived ones, G and A (Mekel-Bobrov et al., 2005, p. 1720). Later relevant publications (Patrick C. M. Wong, Chandrasekaran, & Zheng, 2012; Patrick C. M. Wong et al., 2020) however, use SNP rs41310927 with ancestral allele T and derived allele C. While most databases do contain info about this SNP, others do not, such that I also collected data about SNPs in very tight LD with it: rs41308365, rs3762271, rs41304071, rs147068597 and rs61819087 (the LD data was obtained from LDlink’s “LDproxy Tool” using all populations in that database).
Thus, I collected the following data:
Locus/SNP | “derived” allele | Datatbases | Position and LD to target |
---|---|---|---|
“haplotype 63” | “haplogroup D” | MB2005 | the target |
rs41310927 | C | WONG2020, LDLink, gnomAD, dbSNP | the target |
rs41308365 | A | LDLink, gnomAD, dbSNP | chr1:197070707; D’=1.00, R2=1.00 |
rs3762271 | T | LDLink, gnomAD, dbSNP, ALFRED | chr1:197070442; D’=1.00, R2=1.00 |
rs41304071 | T | LDLink, dbSNP | chr1:197063352; D’=1.00, R2=1.00 |
rs147068597 | A | LDLink | chr1:197058136; D’=1.00, R2=1.00 |
rs61819087 | G | LDLink, dbSNP | chr1:197084857; D’=1.00, R2=1.00 |
where the databases are identified as:
Database | URL | Info | ID |
---|---|---|---|
Mekel-Bobrov et al. (2005) | https://science.sciencemag.org/content/309/5741/1720 | The original source; 59 populations | MB2005 |
Patrick C. M. Wong et al. (2020) | https://advances.sciencemag.org/content/6/22/eaba5090 | Massive experimental study in Cantonese speakers; 1 population | WONG2020 |
LDLink | https://ldlink.nci.nih.gov/?tab=home | “[…] a suite of web-based applications designed to easily and efficiently interrogate linkage disequilibrium in population groups”; 1000 genomes data in 32 individual and grouped populations | LDLink |
gnomAD | https://gnomad.broadinstitute.org/ | Genome Aggregation Database v2.1.1; very broad populations | gnomAD |
dbSNP | https://www.ncbi.nlm.nih.gov/snp/ | aggregation of info form multiple databases, mostly using very broad populations | dbSNP |
1000 genomes | https://www.internationalgenome.org/ | this info is included in other databases (gnomAD) so is not specifically used here | 1KG |
ALFRED | https://alfred.med.yale.edu/alfred/index.asp | The ALlele FREquency Database; lots of info in many populations; unfortunately, for ASPM only one SNP in strong LD with the target rs41310927 (rs3762271) is available | ALFRED |
I ended up with frequency data about these loci in 170 unique samples coming from 127 unique meta-populations (such as “Han Chinese,” “Italians” or “Finnish”). After making sure the frequencies of these SNPs are very highly correlated (in those samples where they do co-occur), I computed their weighted average frequency (weighed by the number of sampled individuals).
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0 | 0.1012 | 0.2291 | 0.2416 | 0.3886 | 0.684 |
Of these 7 SNPs, 5 are “proxy” SNPs (rs147068597, rs3762271, rs41304071, rs41308365, rs61819087), representing 289 unique samples (and 233237 total alleles) out of 396 (73%) unique samples (and 367519 total alleles; 63.5%) available for ASPM-D.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0 | 0.0995 | 0.2073 | 0.2275 | 0.38 | 0.6 |
Due to this high proportion of the data being represented by “proxy” SNPs, I also conducted separate analyses excluding these SNPs.
Moreover, 111 are new samples from 84 unique (meta)populations, compared to the 59 samples in 56 (meta)populations in the original Mekel-Bobrov et al. (2005). These new samples are distributed as:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
12 | 90 | 5 | 4 |
and the corresponding new (meta)populations as:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
12 | 63 | 5 | 4 |
MCPH1-D was originally defined in relation to G37995C in exon 8 in an open reading frame (ORF) with the ancestral allele G, and the derived one C (Evans et al., 2005, p. 1717). Later relevant publications (Patrick C. M. Wong et al., 2020) however, use SNP rs930557 with ancestral allele G and derived allele C. While most databases do contain info about this SNP, others do not, such that I also collected info about the SNP rs1129706 which is in very tight LD with it (the linkage data was obtained from LDlink’s “LDproxy Tool” using all populations in that database).
Thus, I obtained the following data:
Locus/SNP | “derived” allele | Datatbases | Position and LD to target |
---|---|---|---|
G37995C | C | MB2005 | the target |
rs930557 | C | WONG2020, LDLink, dbSNP | the target |
rs1129706 | G | ALFRED | chr8:6304814; D’=0.995, R2=0.936 |
I ended up with frequency data about these loci in 166 unique samples coming from 128 unique meta-populations. After making sure the frequencies of these SNPs are very highly correlated (in those samples where they do co-occur), I computed their weighted average frequency (weighted by the number of sampled individuals).
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0.0315 | 0.658 | 0.7986 | 0.7125 | 0.8652 | 1 |
Of these 3 SNPs, 1 are “proxy” SNPs (rs1129706), representing 141 unique samples (and 13028 total alleles) out of 245 (57.6%) unique samples (and 107258 total alleles; 12.1%) available for MCPH1-D.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0.033 | 0.5634 | 0.7737 | 0.6729 | 0.8357 | 1 |
Due to this high proportion of the data being represented by “proxy” SNPs, I also conducted separate analyses excluding these SNPs.
Moreover, 107 are new samples from 85 unique (meta)populations, compared to the 59 samples in 56 (meta)populations in the original Evans et al. (2005). These new samples are distributed as:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
12 | 86 | 5 | 4 |
and the corresponding new (meta)populations as:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
12 | 64 | 5 | 4 |
These are the same for ASPM-D and MCPH1-D:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
15 | 37 | 5 | 2 |
and the corresponding new (meta)populations as:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
14 | 35 | 5 | 2 |
When combining the linguistic and genetic data, we are left with 175 unique samples in 129 unique (meta)populations speaking 321 unique “languages” (i.e., Glottolog codes) (from now on, denoted as 175:129:321), of which:
Information for | Number of samples:(meta)pops:languages | Missing samples:(meta)pops:languages |
---|---|---|
tone binary | 175:129:321 | 0:0:0 = {} : {} : {} |
tone 3-way | 170:124:314 | 5:5:7 = {SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Burunge, Hazara, Mozabite, Oroqen, Xibe} : {buru1320, efee1239, gyel1242, haza1239, oroq1238, tumz1238, xibe1242} |
tone counts | 170:124:314 | 5:5:7 = {SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Burunge, Hazara, Mozabite, Oroqen, Xibe} : {buru1320, efee1239, gyel1242, haza1239, oroq1238, tumz1238, xibe1242} |
ASPM-D | 170:127:319 | 5:2:2 = {FINRISK, GenDan, GenNed5, KRGDB, Qatari} : {Dutch, Qatari} : {dutc1256, gulf1241} |
MCPH1-D | 166:128:320 | 9:1:1 = {gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish} : {Bulgarian} : {bulg1262} |
ASPM-D & MCPH1-D | 161:126:318 | 14:3:3 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari} : {Bulgarian, Dutch, Qatari} : {bulg1262, dutc1256, gulf1241} |
tone binary & ASPM-D & MCPH1-D | 161:126:318 | 14:3:3 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari} : {Bulgarian, Dutch, Qatari} : {bulg1262, dutc1256, gulf1241} |
tone 3-way & ASPM-D & MCPH1-D | 156:121:311 | 19:8:10 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari, SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Bulgarian, Burunge, Dutch, Hazara, Mozabite, Oroqen, Qatari, Xibe} : {bulg1262, buru1320, dutc1256, efee1239, gulf1241, gyel1242, haza1239, oroq1238, tumz1238, xibe1242} |
tone counts & ASPM-D & MCPH1-D | 156:121:311 | 19:8:10 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari, SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Bulgarian, Burunge, Dutch, Hazara, Mozabite, Oroqen, Qatari, Xibe} : {bulg1262, buru1320, dutc1256, efee1239, gulf1241, gyel1242, haza1239, oroq1238, tumz1238, xibe1242} |
Some pair-wise differences in terms of samples:(meta)populations:languages with data:
Present in… | … but absent from | samples:(meta)pops:languages |
---|---|---|
tone binary | tone 3-way (and counts) | 5:5:7 = {SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Burunge, Hazara, Mozabite, Oroqen, Xibe} : {buru1320, efee1239, gyel1242, haza1239, oroq1238, tumz1238, xibe1242} |
tone binary | ASPM-D | 5:2:2 = {FINRISK, GenDan, GenNed5, KRGDB, Qatari} : {Dutch, Qatari} : {dutc1256, gulf1241} |
tone binary | MCPH1-D | 9:1:1 = {gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish} : {Bulgarian} : {bulg1262} |
tone binary | ASPM-D & MCPH1-D | 14:3:3 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari} : {Bulgarian, Dutch, Qatari} : {bulg1262, dutc1256, gulf1241} |
tone 3-way (and counts) | ASPM-D | 5:2:2 = {FINRISK, GenDan, GenNed5, KRGDB, Qatari} : {Dutch, Qatari} : {dutc1256, gulf1241} |
tone 3-way (and counts) | MCPH1-D | 9:1:1 = {gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish} : {Bulgarian} : {bulg1262} |
tone 3-way (and counts) | ASPM-D & MCPH1-D | 14:3:3 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari} : {Bulgarian, Dutch, Qatari} : {bulg1262, dutc1256, gulf1241} |
I kept only the entries with non-missing data for the tone1, ASPM-D and MCPH1-D, and if there are more than one possible languages or allele frequencies for a given sample, I only kept those entries that have different tone or allele data. The resulting dataset has 181 observations, distributed among 119 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 48, with a mean 5.2 and median 2 languages per family) and 4 macroareas.
There are 161:126:119 unique samples:(meta)populations:languages retained, dropping 14:3:202 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari} : {Bulgarian, Dutch, Qatari} : {adze1240, ajie1238, amar1272, ambu1247, anei1239, apma1241, arak1252, arib1241, arop1243, aros1241, aulu1238, awtu1239, ayiw1239, baba1268, bahi1254, bann1247, bign1238, bili1260, boik1241, bulg1262, caro1242, cham1313, chek1238, chuu1238, dehu1237, dumb1241, dutc1256, east2443, east2447, fiji1243, futu1245, fwai1237, gapa1238, geez1241, gela1263, gilb1244, gulf1241, guma1254, hali1244, hang1263, hano1246, hmon1264, hoav1238, iaai1238, iatm1242, idak1243, idun1242, iris1253, iwam1256, juho1239, kaia1245, kair1263, kamb1297, kapi1249, kara1486, kaul1240, kela1255, kele1258, kiku1240, kili1267, kire1240, koko1269, kosr1238, kuan1247, kuan1248, kuma1276, kung1261, kwai1243, kwam1251, kwam1252, kwas1243, kwom1262, labu1248, lala1268, lame1260, lauu1247, lena1238, lese1243, lewo1242, long1395, loni1238, lonw1238, louu1245, lusi1240, maee1241, mais1250, male1289, malo1243, mana1295, mana1298, maor1246, mars1254, masa1299, matu1261, mbal1255, mbul1263, mehe1243, meke1243, mele1250, mina1269, ming1252, moch1256, moki1238, moks1248, mono1273, motl1237, motu1246, mudu1242, muri1260, muso1238, muss1246, muyu1244, naka1262, nali1244, nama1264, nami1256, natu1246, naur1243, ndon1254, neha1247, neng1238, ngan1300, niua1240, niue1239, nort2646, nort2836, nort2845, nuku1260, onto1237, paam1238, pate1247, patp1243, pile1238, ping1243, pohn1238, port1285, pulu1242, qima1242, raoo1244, rapa1244, renn1242, rotu1241, rovi1238, russ1264, saaa1240, saam1283, saka1289, sali1295, samo1305, sapo1253, scot1243, siar1238, siee1239, sina1266, sioo1240, sobe1238, sons1242, sout2642, sout2679, sout2807, sout2856, sout2866, sout2869, stan1318, sude1239, surs1246, tahi1242, tain1252, taki1248, tawa1275, tean1237, teop1238, tiga1245, tigr1271, tiri1258, toab1237, toba1266, toke1240, tong1325, tsot1241, tswa1253, tuam1242, tuml1238, tung1290, tuva1244, ulit1238, urav1235, urip1239, vinm1237, waim1251, wall1257, wata1253, west2500, west2519, woga1249, wole1240, xamt1239, xara1244, yabe1254, yess1239, yima1243, zulu1248}.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 9 | 100 | 4 | 7 | 120 |
Yes | 27 | 26 | 6 | 2 | 61 |
Sum | 36 | 126 | 10 | 9 | 181 |
glmer
Model is nearly unidentifiable: very large eigenvalue
)To better understand this overlap between family, macroarea and the two “derived” alleles, I regressed (separately) the ASPM-D and MCPH1-D on the macroarea, using mixed-effects beta regression (after replacing all \(0.0\) values by \(10^{-7}\) and all \(1.0\) by \(1.0-10^{-7}\), respectively) with language family as random effect:
For these randomization analyses there are several important parameters:
Parameter | Meaning | Values |
---|---|---|
permute |
what to permute? | nothing = the original data |
tone = permute the tone variable |
||
alleles-together = permute the two alleles together |
||
alleles-independent = permute the two alleles separately, i.e., each is independently permuted |
||
within |
how are the permutations constrained? | unrestricted = all the observations are freely permuted (i.e., there are no constraints, no structure in the data is preserved) |
families = only observations within the same language family are permuted (i.e., the structure of the families is preserved) |
||
macroareas = only observations within the same macroarea are permuted (i.e., the structure of the macroareas is preserved) |
||
macroarea |
how do we control for macroareas? | none = no control for macroareas at all |
fixef = as fixed effects |
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 4% | 6% | 0% | 4% | 0% |
unrestricted | none | alleles-together | 0% | 5% | 4% | 0% | 5% | 2% |
unrestricted | none | alleles-independent | 1% | 6% | 6% | 0% | 6% | 1% |
unrestricted | fixef | tone | 0% | 5% | 5% | 8% | 5% | 25% |
unrestricted | fixef | alleles-together | 68% | 6% | 7% | 15% | 4% | 23% |
unrestricted | fixef | alleles-independent | 68% | 5% | 6% | 16% | 6% | 20% |
macroareas | none | tone | 0% | 95% | 42% | 4% | 86% | 28% |
macroareas | none | alleles-together | 26% | 76% | 10% | 7% | 59% | 73% |
macroareas | none | alleles-independent | 32% | 83% | 20% | 15% | 66% | 78% |
macroareas | fixef | tone | 0% | 7% | 7% | 11% | 6% | 29% |
macroareas | fixef | alleles-together | 65% | 5% | 5% | 19% | 5% | 35% |
macroareas | fixef | alleles-independent | 66% | 4% | 5% | 20% | 4% | 35% |
families | none | tone | 2% | 16% | 2% | 3% | 14% | 36% |
families | none | alleles-together | 2% | 11% | 3% | 5% | 5% | 16% |
families | none | alleles-independent | 2% | 16% | 13% | 10% | 12% | 20% |
families | fixef | tone | 1% | 8% | 3% | 16% | 11% | 74% |
families | fixef | alleles-together | 66% | 4% | 5% | 46% | 2% | 16% |
families | fixef | alleles-independent | 66% | 3% | 5% | 37% | 3% | 22% |
brms
tone1 on ASPM-D and MCPH1-D in a mixed-effects Bayesian framework (using brms
) with macroarea, language family and (meta)population as (nested) random effects. The ROPE is the region of practical equivalence around 0.0, usually [-0.1, 0.1] but may vary by regression type" the idea is that the HDI should have an as small intersection as possible with the ROPE. Another take is represented by the pROPE which is the proportion of the whole posterior distribution (i.e., 100%HDI) inside the ROPE; so, it can be interpreted like a “classic” p-value.
Here, I try to disentangle the fact that macroarea is a very good predictor of tone1, but also of the frequency of the two alleles, from any effect that the alleles might have on tone1. For this, I conducted mediation analysis and path analysis, where I model the effect of macroarea on tone1 as partially mediated by the two alleles.
Please note that there are several technical issues with these approaches:
for mediation analysis, the method used (as implemented by function mediate
in package mediation
):
to adress these issues, I also conducted Bayesian mediation analysis (using brms
) with logistic regression for the outcome, beta regression for the “derived” allele frequencies, and family and (meta)population as random effects (the macroarea cannot be a random effect as it is the treatment as Africa vs the rest of the world).
for path analysis, the method used (as implemented by function sem
with robust estimators in package lavaan
):
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.49 (0.33, 0.63), p=0, decomposed into:
average direct effect (ADE): 0.27 (0.08, 0.47), p=0.008, and
average indirect effect (ACME) mediated by ASPM-D: 0.22 (0.11, 0.34), p=0, mediating 44.9% (19.1%, 79.5%), p=0 of the effect, resulting from:
For MCPH1-D:
TE: 0.50 (0.34, 0.65), p=0, decomposed into:
ADE: 0.55 (0.19, 0.75), p=0.002, and
ACME: -0.05 (-0.22, 0.25), p=0.49, mediating -14.7% (-51.3%, 56.8%), p=0.49 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.38, median = 0.38; 44.5% significant at α-level 0.05 and 72.8% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 134.2, p = 0;
ADE: mean = 0.28, median = 0.28; 8.2% significant at α-level 0.05 and 29.6% significant at α-level 0.10; 99.6% > 0.0; one-sample one-sided t-test vs 0: t(999) = 87.2, p = 0;
ACME: mean = 0.094, median = 0.091; 3.6% significant at α-level 0.05 and 20.1% significant at α-level 0.10; 99.5% > 0.0; one-sample one-sided t-test vs 0: t(999) = 61.4, p = 0;
β(Africa → allele): mean = -0.86, median = -0.87; 79.8% significant at α-level 0.05 and 96.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -211.9, p = 0;
β(allele → tone | Africa): mean = -0.61, median = -0.6; 10.2% significant at α-level 0.05 and 27.7% significant at α-level 0.10; 99.1% < 0.0; one-sample one-sided t-test vs 0: t(999) = -60.0, p = 0.
For MCPH1-D:
TE: mean = 0.38, median = 0.39; 44.3% significant at α-level 0.05 and 72.6% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 133.1, p = 0;
ADE: mean = 0.41, median = 0.44; 6.0% significant at α-level 0.05 and 18.9% significant at α-level 0.10; 96.9% > 0.0; one-sample one-sided t-test vs 0: t(999) = 74.0, p = 0;
ACME: mean = -0.029, median = -0.052; 0.1% significant at α-level 0.05 and 1.4% significant at α-level 0.10; 35.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = -6.2, p = 1;
β(Africa → allele): mean = -2.5, median = -2.5; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -884.5, p = 0;
β(allele → tone | Africa): mean = 0.42, median = 0.42; 0.2% significant at α-level 0.05 and 1.5% significant at α-level 0.10; 25.5% < 0.0; one-sample one-sided t-test vs 0: t(999) = 21.4, p = 1.
Given the low sample size N = 35 unique families, relatively few effect sizes are big enough to be significant for each individual analysis; however, there are many more significant ACMEs for ASPM-D than for MCPH1-D: 10.2% vs 0.2% (51.0 times) for α-level 0.05, and 27.7% vs 1.5% (18.5 times) for α-level 0.10.
brms
With Africa and tone1 coded numerically, the model fits the data very well6 (χ2(1)=0.22, p=0.64; CFI=1.00, TLI=1.01, NNFI=1.01 and RFI=1.00):
## lavaan 0.6-8 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 181
##
## Model Test User Model:
##
## Test statistic 0.225
## Degrees of freedom 1
## P-value (Chi-square) 0.635
##
## Model Test Baseline Model:
##
## Test statistic 371.522
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.013
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -448.206
## Loglikelihood unrestricted model (H1) -448.094
##
## Akaike (AIC) 912.413
## Bayesian (BIC) 938.000
## Sample-size adjusted Bayesian (BIC) 912.664
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.154
## P-value RMSEA <= 0.05 0.704
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.005
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## tone_bin_num ~
## Africa_num 0.390 0.166 2.351 0.019 0.065 0.716
## ASPM_z -0.144 0.035 -4.163 0.000 -0.212 -0.076
## MCPH1_z 0.025 0.061 0.410 0.682 -0.095 0.145
## ASPM_z ~
## Africa_num -1.249 0.111 -11.254 0.000 -1.467 -1.032
## MCPH1_z ~
## Africa_num -2.190 0.081 -27.039 0.000 -2.349 -2.031
## Std.lv Std.all
##
## 0.390 0.330
## -0.144 -0.305
## 0.025 0.053
##
## -1.249 -0.500
##
## -2.190 -0.877
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .tone_bin_num 0.165 0.015 10.844 0.000 0.135 0.195
## .ASPM_z 0.746 0.074 10.144 0.000 0.602 0.890
## .MCPH1_z 0.230 0.039 5.899 0.000 0.154 0.307
## Std.lv Std.all
## 0.165 0.740
## 0.746 0.750
## 0.230 0.232
##
## R-Square:
## Estimate
## tone_bin_num 0.260
## ASPM_z 0.250
## MCPH1_z 0.768
Likewise, with Africa and tone1 coded as ordered binary factors, the model also fits the data very well (χ2(1)=0.57, p=0.45; CFI=1.00, TLI=1.07, NNFI=1.07 and RFI=0.92):
Here I use only the numerical coding.
models fits:
Africa → ASPM-D: mean = -0.87, median = -0.89, sd = 0.12, IQR = 0.17, 100.0% < 0; 98.7% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -2.2e+02, p = 0;
Africa → MCPH1-D: mean = -2.5, median = -2.5, sd = 0.086, IQR = 0.12, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -9e+02, p = 0;
Africa → tone1: mean = 0.43, median = 0.44, sd = 0.32, IQR = 0.46, 89.5% > 0; 12.8% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 41, p = 3.2e-219;
ASPM-D → tone1: mean = -0.11, median = -0.11, sd = 0.058, IQR = 0.089, 98.6% < 0; 36.2% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -62, p = 0;
MCPH1-D → tone1: mean = 0.041, median = 0.041, as = 0.11, IQR = 0.17, 36.7% < 0; 0.9% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 11, p = 1.
Here I apply various “machine learning” techniques to explore how well the macroarea and the two alleles predict tone1. For these techniques, in general I:
Thus, these techniques can:
Using the frequency of the two alleles and the macroarea as predictors, the fit to the data is: accuracy = 77.3%, sensitivity = 71.7%, specificity = 79.3%, precision = 54.1%, and recall = 71.7%.
On the 100 training/testing sets, the fit is: accuracy = 77.1% ±6.6%, sensitivity = 71.6% ±15.2%, specificity = 79.3% ±6.8%, precision = 52.9% ±12.4%, recall = 71.6% ±15.2%.
When using the frequency of the two alleles only as predictors, the fit to the data is: accuracy = 75.1%, sensitivity = 75.0%, specificity = 75.2%, precision = 39.3%, and recall = 75.0%:
On the 100 training/testing sets,the fit is: accuracy = 70.1% ±7.6%, sensitivity = 61.3% ±19.1%, specificity = 76.2% ±8.7%, precision = 44.0% ±23.0%, recall = 61.3% ±19.1%.
I use two methods: random forests as implemented by randomForest()
in package randomForest
, and conditional random forests as implemented by cforest()
in package partykit
. As (conditional) random forests do internal bootstrapping, there is no need for the explicit training/testing set repeated refitting.
When using the frequency of the two alleles and the macroarea as predictors, the models fit to the full data is:
When using the frequency of the two alleles only, the models fit the full as:
Here I try various analyses that explicitly take into account the diachronic nature of the processes.
The families with more than 2 tips are:
It can be seen that, unfortunately, there are very few families with more than 2 languages with data (17), and even for those with relatively many languages, there is very little variation in tone1 and in the frequencies of the two “derived” alleles. Unfortunately, combined with the issues concerning branch length for language family trees, this precludes the estimation of correlated evolution or phylogenetic regression methods.
I kept only the entries with non-missing data for the tone2, ASPM-D and MCPH1-D, and if there are more than one possible languages or allele frequencies for a given sample, I only kept those entries that have different tone or allele data. The resulting dataset has 180 observations, distributed among 118 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.1 and median 2 languages per family) and 4 macroareas.
There are 156:121:118 unique samples:(meta)populations:languages retained, dropping 19:8:203 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari, SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Bulgarian, Burunge, Dutch, Hazara, Mozabite, Oroqen, Qatari, Xibe} : {adze1240, ajie1238, amar1272, ambu1247, anei1239, apma1241, arak1252, arib1241, arop1243, aros1241, aulu1238, awtu1239, ayiw1239, baba1268, bahi1254, bann1247, bign1238, bili1260, boik1241, bulg1262, buru1320, caro1242, cham1313, chek1238, chuu1238, dehu1237, dumb1241, dutc1256, east2443, east2447, efee1239, fiji1243, futu1245, gapa1238, geez1241, gela1263, gilb1244, gulf1241, guma1254, gyel1242, hali1244, hang1263, hano1246, haza1239, hmon1264, hoav1238, iaai1238, iatm1242, idak1243, idun1242, iris1253, iwam1256, juho1239, kaia1245, kair1263, kamb1297, kapi1249, kara1486, kaul1240, kela1255, kele1258, kili1267, kire1240, koko1269, kosr1238, kuan1247, kuan1248, kuma1276, kung1261, kwai1243, kwam1251, kwam1252, kwom1262, labu1248, lala1268, lame1260, lauu1247, lena1238, lewo1242, long1395, loni1238, lonw1238, louu1245, lusi1240, maee1241, mais1250, male1289, malo1243, mana1295, mana1298, maor1246, mars1254, masa1299, matu1261, mbal1255, mbul1263, mehe1243, meke1243, mele1250, mina1269, ming1252, moch1256, moki1238, moks1248, mono1273, motl1237, motu1246, mudu1242, muri1260, muso1238, muss1246, muyu1244, naka1262, nali1244, nami1256, natu1246, naur1243, ndon1254, neha1247, neng1238, ngan1300, niua1240, niue1239, nort2646, nort2836, nort2845, nuku1260, onto1237, oroq1238, paam1238, pate1247, patp1243, pile1238, ping1243, pohn1238, port1285, pulu1242, qima1242, raoo1244, rapa1244, renn1242, rotu1241, rovi1238, russ1264, saaa1240, saam1283, saka1289, sali1295, samo1305, sapo1253, scot1243, siar1238, siee1239, sina1266, sioo1240, sobe1238, sons1242, sout2642, sout2679, sout2807, sout2856, sout2866, sout2869, stan1318, sude1239, surs1246, tahi1242, tain1252, taki1248, tawa1275, tean1237, teop1238, tiga1245, tigr1271, tiri1258, toab1237, toba1266, toke1240, tong1325, tswa1253, tuam1242, tuml1238, tumz1238, tung1290, tuva1244, ulit1238, urav1235, urip1239, vinm1237, waim1251, wall1257, wata1253, west2500, west2519, woga1249, wole1240, xamt1239, xara1244, xibe1242, yabe1254, yess1239, yima1243, zulu1248}.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 28 | 105 | 9 | 9 | 151 |
Yes | 9 | 18 | 1 | 1 | 29 |
Sum | 37 | 123 | 10 | 10 | 180 |
Please note that the distribution of this variable is very skewed, so the results might not be very solid…
glmer
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 4% | 4% | 1% | 6% | 0% |
unrestricted | none | alleles-together | 31% | 7% | 6% | 10% | 6% | 6% |
unrestricted | none | alleles-independent | 32% | 7% | 7% | 9% | 6% | 4% |
unrestricted | fixef | tone | 0% | 6% | 6% | 6% | 7% | 23% |
unrestricted | fixef | alleles-together | 84% | 7% | 7% | 17% | 7% | 25% |
unrestricted | fixef | alleles-independent | 83% | 9% | 8% | 16% | 7% | 21% |
macroareas | none | tone | 0% | 4% | 3% | 1% | 10% | 1% |
macroareas | none | alleles-together | 40% | 7% | 5% | 21% | 6% | 42% |
macroareas | none | alleles-independent | 44% | 8% | 6% | 28% | 10% | 45% |
macroareas | fixef | tone | 0% | 4% | 4% | 7% | 4% | 22% |
macroareas | fixef | alleles-together | 80% | 8% | 8% | 28% | 7% | 38% |
macroareas | fixef | alleles-independent | 80% | 9% | 8% | 25% | 8% | 37% |
families | none | tone | 31% | 4% | 4% | 28% | 1% | 16% |
families | none | alleles-together | 20% | 4% | 5% | 29% | 1% | 15% |
families | none | alleles-independent | 24% | 4% | 6% | 25% | 3% | 22% |
families | fixef | tone | 45% | 8% | 9% | 54% | 4% | 43% |
families | fixef | alleles-together | 80% | 4% | 6% | 43% | 3% | 18% |
families | fixef | alleles-independent | 80% | 5% | 7% | 34% | 4% | 24% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.14 (-0.01, 0.30), p=0.078, decomposed into:
average direct effect (ADE): -0.05 (-0.20, 0.10), p=0.43, and
average indirect effect (ACME) mediated by ASPM-D: 0.19 (0.08, 0.31), p=0.004, mediating 133.8% (-419.6%, 802.0%), p=0.082 of the effect, resulting from:
For MCPH1-D:
TE: 0.11 (-0.02, 0.27), p=0.12, decomposed into:
ADE: 0.12 (-0.21, 0.45), p=0.47, and
ACME: -0.01 (-0.29, 0.29), p=0.9, mediating -11.4% (-804.5%, 1112.4%), p=0.93 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.11, median = 0.12; 0.4% significant at α-level 0.05 and 2.3% significant at α-level 0.10; 89.6% > 0.0; one-sample one-sided t-test vs 0: t(999) = 39.7, p = 5.9e-208;
ADE: mean = 0.04, median = 0.039; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 67.3% > 0.0; one-sample one-sided t-test vs 0: t(999) = 16.0, p = 7.9e-52;
ACME: mean = 0.072, median = 0.07; 0.0% significant at α-level 0.05 and 0.1% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 78.5, p = 0;
β(Africa → allele): mean = -0.88, median = -0.89; 88.6% significant at α-level 0.05 and 98.7% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -249.8, p = 0;
β(allele → tone | Africa): mean = -0.58, median = -0.57; 0.0% significant at α-level 0.05 and 0.4% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -79.8, p = 0.
For MCPH1-D:
TE: mean = 0.11, median = 0.11; 0.2% significant at α-level 0.05 and 2.3% significant at α-level 0.10; 82.1% > 0.0; one-sample one-sided t-test vs 0: t(999) = 37.1, p = 1.3e-190;
ADE: mean = 0.13, median = 0.14; 0.0% significant at α-level 0.05 and 0.7% significant at α-level 0.10; 75.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 24.8, p = 2.3e-106;
ACME: mean = -0.028, median = -0.034; 0.0% significant at α-level 0.05 and 0.3% significant at α-level 0.10; 43.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = -5.4, p = 1;
β(Africa → allele): mean = -2.4, median = -2.4; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -927.9, p = 0;
β(allele → tone | Africa): mean = 0.26, median = 0.21; 0.0% significant at α-level 0.05 and 0.1% significant at α-level 0.10; 38.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = 11.3, p = 1.
brms
Coding Africa and tone2 numerically, the model fit is: χ2(1)=0.36, p=0.55; CFI=1.00, TLI=1.01, NNFI=1.01 and RFI=0.99.
## lavaan 0.6-8 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 180
##
## Model Test User Model:
##
## Test statistic 0.361
## Degrees of freedom 1
## P-value (Chi-square) 0.548
##
## Model Test Baseline Model:
##
## Test statistic 354.897
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.011
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -407.836
## Loglikelihood unrestricted model (H1) -407.655
##
## Akaike (AIC) 831.671
## Bayesian (BIC) 857.215
## Sample-size adjusted Bayesian (BIC) 831.879
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.166
## P-value RMSEA <= 0.05 0.630
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.006
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## tone_complex_num ~
## Africa_num -0.051 0.139 -0.366 0.714 -0.322 0.221
## ASPM_z -0.108 0.026 -4.124 0.000 -0.159 -0.056
## MCPH1_z -0.005 0.050 -0.093 0.926 -0.102 0.093
## ASPM_z ~
## Africa_num -1.338 0.101 -13.315 0.000 -1.535 -1.141
## MCPH1_z ~
## Africa_num -2.189 0.072 -30.443 0.000 -2.330 -2.048
## Std.lv Std.all
##
## -0.051 -0.056
## -0.108 -0.292
## -0.005 -0.013
##
## -1.338 -0.542
##
## -2.189 -0.887
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .tone_complx_nm 0.125 0.016 7.848 0.000 0.094 0.157
## .ASPM_z 0.702 0.072 9.767 0.000 0.561 0.843
## .MCPH1_z 0.212 0.037 5.748 0.000 0.140 0.284
## Std.lv Std.all
## 0.125 0.927
## 0.702 0.706
## 0.212 0.213
##
## R-Square:
## Estimate
## tone_complx_nm 0.073
## ASPM_z 0.294
## MCPH1_z 0.787
Coding Africa and tone2 as ordered binary factors, the model fit is: χ2(1)=0.98, p=0.32; CFI=1.00, TLI=1.01, NNFI=1.01 and RFI=0.79.
Here I use here only the numerically-coded model.
It can be seen that:
the models fit are:
Africa → ASPM-D: mean = -0.89, median = -0.9, sd = 0.11, IQR = 0.15, 100.0% < 0; 99.8% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -2.5e+02, p = 0;
Africa → MCPH1-D: mean = -2.4, median = -2.4, sd = 0.079, IQR = 0.11, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -9.6e+02, p = 0;
Africa → tone2: mean = 0.039, median = 0.021, sd = 0.26, IQR = 0.37, 53.6% > 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 4.7, p = 1.2e-06;
ASPM-D → tone2: mean = -0.071, median = -0.071, sd = 0.027, IQR = 0.037, 99.8% < 0; 5.9% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -83, p = 0;
MCPH1-D → tone2: mean = -0.00016, median = -0.0016, as = 0.1, IQR = 0.14, 50.8% < 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -0.051, p = 0.48.
When using the frequency of the two alleles and the macroarea as predictors, the decision tree is trivial: it uniformly predicts just the majority value “No.”
accuracy = 83.9%, sensitivity = NA%, specificity = 83.9%, precision = 0.0%, and recall = NA%.
On the 100 training/testing sets: accuracy = 82.9% ±5.8%, sensitivity = 15.8% ±9.0%, specificity = 83.5% ±5.3%, precision = 0.8% ±4.4%, recall = 15.8% ±9.0%.
Here the imputed counts are rounded to the nearest integer; please see below for using the actually predicted values.
I kept only the entries with non-missing data for the tone counts, ASPM-D and MCPH1-D, and if there are more than one possible languages or allele frequencies for a given sample, I only kept those entries that have different tone or allele data. The resulting dataset has 184 observations, distributed among 121 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.3 and median 2 languages per family) and 4 macroareas.
There are 156:121:121 unique samples:(meta)populations:languages retained, dropping 19:8:200 = {FINRISK, GenDan, GenNed5, gnomAD_asj, gnomAD_bgr, gnomAD_est, gnomAD_fin, gnomAD_jpn, gnomAD_kor, gnomAD_swe, gnomADexomes_AshkenaziJewish, gnomADgenomes_AshkenaziJewish, KRGDB, Qatari, SA001471N, SA001477T, SA001487U, SA001491P, SA001681Q} : {Bulgarian, Burunge, Dutch, Hazara, Mozabite, Oroqen, Qatari, Xibe} : {adze1240, ajie1238, amar1272, ambu1247, anei1239, apma1241, arak1252, arib1241, arop1243, aros1241, aulu1238, awtu1239, ayiw1239, baba1268, bahi1254, bann1247, bign1238, bili1260, boik1241, bulg1262, buru1320, caro1242, cham1313, chek1238, chuu1238, dehu1237, dumb1241, dutc1256, east2443, east2447, efee1239, fiji1243, futu1245, gapa1238, geez1241, gela1263, gilb1244, gulf1241, guma1254, gyel1242, hali1244, hang1263, hano1246, haza1239, hoav1238, iaai1238, iatm1242, idak1243, idun1242, iris1253, iwam1256, juho1239, kaia1245, kair1263, kamb1297, kapi1249, kara1486, kaul1240, kela1255, kele1258, kili1267, kire1240, koko1269, kosr1238, kuan1248, kuma1276, kung1261, kwai1243, kwam1251, kwam1252, kwom1262, labu1248, lala1268, lame1260, lauu1247, lena1238, lewo1242, long1395, loni1238, lonw1238, louu1245, lusi1240, maee1241, mais1250, male1289, malo1243, mana1295, mana1298, maor1246, mars1254, masa1299, matu1261, mbal1255, mbul1263, mehe1243, meke1243, mele1250, mina1269, ming1252, moch1256, moki1238, moks1248, mono1273, motl1237, motu1246, mudu1242, muri1260, muso1238, muss1246, muyu1244, naka1262, nali1244, nami1256, natu1246, naur1243, ndon1254, neha1247, neng1238, ngan1300, niua1240, niue1239, nort2646, nort2836, nort2845, nuku1260, onto1237, oroq1238, paam1238, pate1247, patp1243, pile1238, ping1243, pohn1238, port1285, pulu1242, qima1242, raoo1244, rapa1244, renn1242, rotu1241, rovi1238, russ1264, saaa1240, saam1283, saka1289, sali1295, samo1305, sapo1253, scot1243, siar1238, siee1239, sina1266, sioo1240, sobe1238, sons1242, sout2642, sout2679, sout2807, sout2856, sout2866, sout2869, stan1318, sude1239, surs1246, tahi1242, taki1248, tawa1275, tean1237, teop1238, tiga1245, tigr1271, tiri1258, toab1237, toba1266, toke1240, tong1325, tswa1253, tuam1242, tuml1238, tumz1238, tung1290, tuva1244, ulit1238, urav1235, urip1239, vinm1237, waim1251, wall1257, wata1253, west2500, west2519, woga1249, wole1240, xamt1239, xara1244, xibe1242, yabe1254, yess1239, yima1243, zulu1248}.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 9 | 98 | 4 | 7 | 118 |
1 | 10 | 6 | 5 | 1 | 22 |
2 | 16 | 3 | 0 | 2 | 21 |
3 | 2 | 5 | 0 | 0 | 7 |
4 | 0 | 8 | 1 | 0 | 9 |
5 | 1 | 4 | 0 | 0 | 5 |
6 | 0 | 2 | 0 | 0 | 2 |
Sum | 38 | 126 | 10 | 10 | 184 |
I used a mixed-effects Poisson model.
glmer
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 18% | 14% | 4% | 13% | 2% |
unrestricted | none | alleles-together | 1% | 2% | 3% | 0% | 3% | 0% |
unrestricted | none | alleles-independent | 1% | 3% | 4% | 0% | 3% | 0% |
unrestricted | fixef | tone | 0% | 23% | 16% | 33% | 19% | 30% |
unrestricted | fixef | alleles-together | 81% | 2% | 3% | 16% | 3% | 12% |
unrestricted | fixef | alleles-independent | 81% | 4% | 3% | 13% | 4% | 6% |
macroareas | none | tone | 0% | 44% | 19% | 12% | 31% | 7% |
macroareas | none | alleles-together | 18% | 36% | 8% | 6% | 34% | 20% |
macroareas | none | alleles-independent | 20% | 34% | 12% | 7% | 37% | 19% |
macroareas | fixef | tone | 0% | 31% | 23% | 33% | 21% | 35% |
macroareas | fixef | alleles-together | 79% | 3% | 4% | 23% | 4% | 26% |
macroareas | fixef | alleles-independent | 81% | 4% | 4% | 24% | 4% | 26% |
families | none | tone | 24% | 19% | 14% | 28% | 8% | 8% |
families | none | alleles-together | 9% | 16% | 12% | 32% | 4% | 4% |
families | none | alleles-independent | 10% | 20% | 20% | 40% | 8% | 8% |
families | fixef | tone | 18% | 7% | 9% | 63% | 2% | 54% |
families | fixef | alleles-together | 83% | 4% | 8% | 61% | 2% | 5% |
families | fixef | alleles-independent | 82% | 5% | 7% | 59% | 2% | 11% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.94 (0.40, 1.72), p=0, decomposed into:
average direct effect (ADE): -0.16 (-0.69, 0.30), p=0.48, and
average indirect effect (ACME) mediated by ASPM-D: 1.11 (0.63, 1.79), p=0, mediating 117.0% (76.3%, 223.9%), p=0 of the effect, resulting from:
For MCPH1-D:
TE: 0.69 (0.32, 1.13), p=0, decomposed into:
ADE: 0.44 (-0.44, 1.38), p=0.32, and
ACME: 0.25 (-0.57, 1.06), p=0.53, mediating 36.6% (-95.5%, 197.0%), p=0.53 of the effect, resulting from:
For ASPM-D:
TE: mean = 1.3, median = 1.3; 59.2% significant at α-level 0.05 and 72.5% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 79.3, p = 0;
ADE: mean = 0.73, median = 0.74; 19.4% significant at α-level 0.05 and 33.9% significant at α-level 0.10; 97.4% > 0.0; one-sample one-sided t-test vs 0: t(999) = 58.4, p = 0;
ACME: mean = 0.54, median = 0.5; 21.5% significant at α-level 0.05 and 44.6% significant at α-level 0.10; 98.4% > 0.0; one-sample one-sided t-test vs 0: t(999) = 53.7, p = 2.6e-297;
β(Africa → allele): mean = -0.89, median = -0.9; 87.0% significant at α-level 0.05 and 98.9% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -237.7, p = 0;
β(allele → tone | Africa): mean = -0.38, median = -0.36; 34.1% significant at α-level 0.05 and 51.0% significant at α-level 0.10; 98.2% < 0.0; one-sample one-sided t-test vs 0: t(999) = -66.8, p = 0.
For MCPH1-D:
TE: mean = 1.1, median = 1.1; 57.6% significant at α-level 0.05 and 70.0% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 83.9, p = 0;
ADE: mean = 3.9, median = 2; 16.4% significant at α-level 0.05 and 26.1% significant at α-level 0.10; 83.2% > 0.0; one-sample one-sided t-test vs 0: t(999) = 17.2, p = 1.3e-58;
ACME: mean = -2.8, median = -0.89; 5.2% significant at α-level 0.05 and 11.9% significant at α-level 0.10; 35.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = -12.3, p = 1;
β(Africa → allele): mean = -2.4, median = -2.4; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -915.3, p = 0;
β(allele → tone | Africa): mean = 0.13, median = 0.1; 5.7% significant at α-level 0.05 and 11.9% significant at α-level 0.10; 40.4% < 0.0; one-sample one-sided t-test vs 0: t(999) = 10.3, p = 1.
Given the low sample size N = 35 unique families, relatively few effect sizes are big enough to be significant; however, there are many more significant indirect effects (ACME) for ASPM-D than for MCPH1-D: 34.1% vs 5.7% (6.0 times) for α-level 0.05, and 51.0% vs 11.9% (4.3 times) for α-level 0.10.
brms
Please note that path analysis uses a linear model (so not a Poisson one) for the tone counts; also I only use the numeric coding for Africa.
Coding Africa numerically, the model fits the data very well (χ2(1)=0.29, p=0.59; CFI=1.00, TLI=1.01, NNFI=1.01 and RFI=1.00):
## lavaan 0.6-8 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 184
##
## Model Test User Model:
##
## Test statistic 0.292
## Degrees of freedom 1
## P-value (Chi-square) 0.589
##
## Model Test Baseline Model:
##
## Test statistic 369.194
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.012
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -663.138
## Loglikelihood unrestricted model (H1) -662.992
##
## Akaike (AIC) 1342.276
## Bayesian (BIC) 1367.995
## Sample-size adjusted Bayesian (BIC) 1342.657
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.159
## P-value RMSEA <= 0.05 0.666
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.005
##
## Parameter Estimates:
##
## Standard errors Robust.sem
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## n_tones ~
## Africa_num -0.265 0.601 -0.441 0.659 -1.443 0.913
## ASPM_z -0.490 0.114 -4.291 0.000 -0.713 -0.266
## MCPH1_z -0.131 0.229 -0.571 0.568 -0.580 0.319
## ASPM_z ~
## Africa_num -1.338 0.099 -13.477 0.000 -1.533 -1.144
## MCPH1_z ~
## Africa_num -2.180 0.071 -30.517 0.000 -2.320 -2.040
## Std.lv Std.all
##
## -0.265 -0.075
## -0.490 -0.342
## -0.131 -0.091
##
## -1.338 -0.543
##
## -2.180 -0.885
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .n_tones 1.790 0.273 6.566 0.000 1.256 2.324
## .ASPM_z 0.701 0.071 9.837 0.000 0.561 0.841
## .MCPH1_z 0.216 0.036 5.943 0.000 0.145 0.287
## Std.lv Std.all
## 1.790 0.879
## 0.701 0.705
## 0.216 0.217
##
## R-Square:
## Estimate
## n_tones 0.121
## ASPM_z 0.295
## MCPH1_z 0.783
It can be seen that:
the models fits:
Africa → ASPM-D: mean = -0.88, median = -0.89, sd = 0.12, IQR = 0.16, 100.0% < 0; 99.9% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -2.3e+02, p = 0
Africa → MCPH1-D: mean = -2.4, median = -2.4, sd = 0.083, IQR = 0.12, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -9.1e+02, p = 0
Africa → tone counts: mean = 0.77, median = 0.83, sd = 1.1, IQR = 1.6, 74.9% > 0; 10.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 22, p = 2e-86
ASPM-D → tone counts: mean = -0.32, median = -0.31, sd = 0.16, IQR = 0.22, 98.7% < 0; 22.5% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -63, p = 0
MCPH1-D → tone counts: mean = 0.019, median = 0.034, as = 0.45, IQR = 0.65, 47.4% < 0; 3.3% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 1.3, p = 0.91
I kept only the entries with non-missing data for the tone counts, ASPM-D and MCPH1-D, and if there are more than one possible languages or allele frequencies for a given sample, I only kept those entries that have different tone or allele data. The resulting dataset has 200 observations, distributed among 136 unique Glottolg codes in 37 families (ranging from a minimum of 1 language per family to a maximum of 51, with a mean 5.4 and median 2 languages per family) and 4 macroareas.
I use simulations for power analysis (as implemented by package simr
), focusing on the effect of ASPM-D on tone1 using glmer
, i.e. logistic regression with ASPM-D as fixed effect and controlling for language family (as random effect) and macroarea as fixed effect.
The observed effect size of ASPM-D is βASPM-D = -0.4, pASPM-D = 0.41, with an ICC = 68.4% on 35 level-2 groups (families) and 181 observations (languages/samples). The observed (post-hoc) power 1 - β = %, 95%CI = .
If we keep the families but change the number of languages per family:
If we change the number of families:
If we change the number of families and the number of languages per family:
Here I model language contact with a 2D Gaussian Process as suggested in, for example, McElreath (2020), using brms
’s gp()
. tone is regressed on ASPM-D and MCPH1-D with language family and (meta)population as (nested) random effects, and a 2D Gaussian process separately for each macroarea.
Here I explore the sensitivity to the prior of the brms
models, focusing on each “derived” allele independently.
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -0.70 | [-1.52, 0.33] | 0.89 (8.0) | 0.73 (2.7) | 13.3% | 0.12 |
flat | normal(0, 10) | -0.74 | [-1.63, 0.24] | 0.9 (9.0) | 0.89 (8.0) | 13.1% | 0.12 |
default_normal | normal(0, 5) | -0.70 | [-1.60, 0.24] | 0.89 (8.5) | 0.8 (4.0) | 13.8% | 0.12 |
narrow_0 | student_t(3, 0, 1) | -0.55 | [-1.32, 0.23] | 0.87 (7.0) | 0.55 (1.2) | 18.7% | 0.17 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.05 | [-0.28, 0.16] | 0.62 (1.6) | 0.49 (1.0) | 91.1% | 0.82 |
negative_default | student_t(3, -1, 3) | -0.75 | [-1.71, 0.13] | 0.91 (10.0) | 0.73 (2.7) | 11.4% | 0.11 |
negative_narrow | student_t(3, -1, 1) | -0.82 | [-1.60, -0.06] | 0.96 (25.1) | 0.47 (0.9) | 4.0% | 0.07 |
verynegative_default | student_t(3, -3, 3) | -0.87 | [-1.83, 0.10] | 0.94 (14.5) | 0.79 (3.7) | 8.3% | 0.08 |
verynegative_narrow | student_t(3, -3, 1) | -1.33 | [-2.44, -0.21] | 0.98 (51.6) | 0.8 (3.9) | 0.0% | 0.03 |
positive_default | student_t(3, 1, 3) | -0.62 | [-1.52, 0.28] | 0.87 (6.8) | 0.76 (3.2) | 14.0% | 0.12 |
positive_narrow | student_t(3, 1, 1) | -0.34 | [-1.13, 0.52] | 0.74 (2.9) | 0.74 (2.9) | 25.7% | 0.23 |
verypositive_default | student_t(3, 3, 3) | -0.55 | [-1.39, 0.44] | 0.84 (5.2) | 0.86 (6.4) | 17.8% | 0.16 |
verypositive_narrow | student_t(3, 3, 1) | -0.23 | [-1.12, 0.78] | 0.66 (1.9) | 0.97 (27.8) | 25.0% | 0.22 |
informative | student_t(3, -0.7, 3) | -0.72 | [-1.64, 0.21] | 0.9 (9.2) | 0.73 (2.7) | 13.0% | 0.12 |
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -0.65 | [-1.69, 0.47] | 0.84 (5.2) | 0.75 (3.0) | 14.3% | 0.13 |
flat | normal(0, 10) | -0.68 | [-1.76, 0.48] | 0.85 (5.5) | 0.9 (8.6) | 13.6% | 0.12 |
default_normal | normal(0, 5) | -0.67 | [-1.76, 0.47] | 0.84 (5.1) | 0.83 (4.8) | 14.9% | 0.13 |
narrow_0 | student_t(3, 0, 1) | -0.49 | [-1.35, 0.46] | 0.82 (4.4) | 0.56 (1.3) | 18.6% | 0.17 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.04 | [-0.25, 0.17] | 0.59 (1.4) | 0.52 (1.1) | 94.0% | 0.84 |
negative_default | student_t(3, -1, 3) | -0.73 | [-1.79, 0.35] | 0.87 (6.6) | 0.75 (3.0) | 12.6% | 0.11 |
negative_narrow | student_t(3, -1, 1) | -0.80 | [-1.74, 0.03] | 0.93 (12.7) | 0.57 (1.3) | 7.5% | 0.09 |
verynegative_default | student_t(3, -3, 3) | -0.85 | [-1.93, 0.27] | 0.9 (8.7) | 0.8 (3.9) | 11.5% | 0.10 |
verynegative_narrow | student_t(3, -3, 1) | -1.31 | [-2.51, -0.09] | 0.97 (27.8) | 0.84 (5.3) | 2.1% | 0.04 |
positive_default | student_t(3, 1, 3) | -0.61 | [-1.64, 0.52] | 0.81 (4.3) | 0.77 (3.3) | 15.6% | 0.14 |
positive_narrow | student_t(3, 1, 1) | -0.22 | [-1.26, 0.80] | 0.63 (1.7) | 0.75 (2.9) | 23.6% | 0.21 |
verypositive_default | student_t(3, 3, 3) | -0.47 | [-1.56, 0.59] | 0.77 (3.3) | 0.87 (6.4) | 18.1% | 0.16 |
verypositive_narrow | student_t(3, 3, 1) | -0.06 | [-1.39, 1.16] | 0.55 (1.2) | 0.96 (22.6) | 19.9% | 0.18 |
informative | student_t(3, -0.6, 3) | -0.66 | [-1.83, 0.42] | 0.83 (4.9) | 0.74 (2.8) | 13.0% | 0.12 |
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -1.30 | [-2.72, 0.17] | 0.93 (13.7) | 0.56 (1.3) | 6.3% | 0.06 |
flat | normal(0, 10) | -1.76 | [-3.44, 0.22] | 0.95 (20.4) | 0.71 (2.5) | 3.8% | 0.03 |
default_normal | normal(0, 5) | -1.50 | [-3.04, 0.22] | 0.94 (17.0) | 0.63 (1.7) | 5.3% | 0.05 |
narrow_0 | student_t(3, 0, 1) | -0.67 | [-1.73, 0.40] | 0.84 (5.4) | 0.52 (1.1) | 15.6% | 0.14 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.03 | [-0.23, 0.20] | 0.55 (1.2) | 0.5 (1.0) | 94.4% | 0.84 |
negative_default | student_t(3, -1, 3) | -1.42 | [-2.85, 0.07] | 0.95 (17.7) | 0.55 (1.2) | 4.9% | 0.05 |
negative_narrow | student_t(3, -1, 1) | -1.05 | [-1.98, 0.02] | 0.96 (23.1) | 0.44 (0.8) | 4.3% | 0.06 |
verynegative_default | student_t(3, -3, 3) | -1.76 | [-3.33, -0.17] | 0.98 (40.2) | 0.54 (1.2) | 0.2% | 0.03 |
verynegative_narrow | student_t(3, -3, 1) | -1.89 | [-3.12, -0.61] | 0.99 (120.2) | 0.57 (1.4) | 0.0% | 0.01 |
positive_default | student_t(3, 1, 3) | -1.21 | [-2.58, 0.36] | 0.91 (10.1) | 0.64 (1.8) | 7.9% | 0.07 |
positive_narrow | student_t(3, 1, 1) | -0.45 | [-1.81, 0.77] | 0.7 (2.3) | 0.7 (2.3) | 19.6% | 0.17 |
verypositive_default | student_t(3, 3, 3) | -1.14 | [-2.60, 0.44] | 0.89 (7.8) | 0.77 (3.3) | 9.1% | 0.08 |
verypositive_narrow | student_t(3, 3, 1) | -0.40 | [-1.94, 1.51] | 0.68 (2.1) | 0.94 (16.3) | 14.2% | 0.13 |
informative | student_t(3, -1.3, 3) | -1.48 | [-2.94, 0.10] | 0.95 (20.5) | 0.53 (1.1) | 3.9% | 0.04 |
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -0.93 | [-2.41, 0.53] | 0.85 (5.6) | 0.68 (2.2) | 10.6% | 0.09 |
flat | normal(0, 10) | -1.23 | [-3.11, 0.51] | 0.88 (7.0) | 0.83 (5.0) | 8.2% | 0.07 |
default_normal | normal(0, 5) | -1.07 | [-2.60, 0.66] | 0.86 (6.2) | 0.75 (2.9) | 9.2% | 0.08 |
narrow_0 | student_t(3, 0, 1) | -0.47 | [-1.61, 0.68] | 0.75 (3.0) | 0.57 (1.3) | 18.7% | 0.17 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.01 | [-0.25, 0.19] | 0.53 (1.1) | 0.5 (1.0) | 93.8% | 0.84 |
negative_default | student_t(3, -1, 3) | -1.05 | [-2.58, 0.45] | 0.88 (7.0) | 0.66 (2.0) | 8.8% | 0.08 |
negative_narrow | student_t(3, -1, 1) | -0.89 | [-1.98, 0.15] | 0.92 (10.9) | 0.5 (1.0) | 8.1% | 0.07 |
verynegative_default | student_t(3, -3, 3) | -1.31 | [-2.88, 0.30] | 0.92 (10.8) | 0.73 (2.7) | 6.7% | 0.06 |
verynegative_narrow | student_t(3, -3, 1) | -1.73 | [-3.11, -0.32] | 0.98 (39.4) | 0.75 (2.9) | 0.0% | 0.02 |
positive_default | student_t(3, 1, 3) | -0.81 | [-2.25, 0.84] | 0.81 (4.4) | 0.72 (2.6) | 12.2% | 0.11 |
positive_narrow | student_t(3, 1, 1) | -0.08 | [-1.36, 1.31] | 0.53 (1.1) | 0.69 (2.2) | 19.5% | 0.17 |
verypositive_default | student_t(3, 3, 3) | -0.67 | [-2.30, 0.85] | 0.77 (3.4) | 0.82 (4.5) | 13.8% | 0.12 |
verypositive_narrow | student_t(3, 3, 1) | 0.40 | [-1.51, 2.62] | 0.42 (0.7) | 0.94 (14.7) | 11.8% | 0.10 |
informative | student_t(3, -0.9, 3) | -1.01 | [-2.44, 0.46] | 0.88 (7.5) | 0.69 (2.2) | 10.1% | 0.09 |
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -0.24 | [-0.65, 0.16] | 0.82 (4.7) | 0.89 (8.1) | 22.2% | 0.20 |
flat | normal(0, 10) | -0.25 | [-0.67, 0.16] | 0.83 (5.0) | 0.96 (23.4) | 21.1% | 0.19 |
default_normal | normal(0, 5) | -0.24 | [-0.66, 0.16] | 0.82 (4.7) | 0.92 (12.1) | 21.8% | 0.19 |
narrow_0 | student_t(3, 0, 1) | -0.22 | [-0.63, 0.18] | 0.81 (4.4) | 0.75 (2.9) | 23.4% | 0.21 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.04 | [-0.22, 0.14] | 0.63 (1.7) | 0.53 (1.1) | 72.9% | 0.65 |
negative_default | student_t(3, -1, 3) | -0.26 | [-0.69, 0.15] | 0.83 (5.0) | 0.9 (8.5) | 21.6% | 0.19 |
negative_narrow | student_t(3, -1, 1) | -0.32 | [-0.74, 0.07] | 0.9 (8.9) | 0.76 (3.2) | 14.3% | 0.14 |
verynegative_default | student_t(3, -3, 3) | -0.28 | [-0.69, 0.14] | 0.86 (6.2) | 0.93 (12.3) | 18.6% | 0.17 |
verynegative_narrow | student_t(3, -3, 1) | -0.34 | [-0.77, 0.09] | 0.89 (8.4) | 0.96 (27.0) | 14.3% | 0.13 |
positive_default | student_t(3, 1, 3) | -0.23 | [-0.67, 0.16] | 0.82 (4.6) | 0.9 (9.1) | 22.4% | 0.20 |
positive_narrow | student_t(3, 1, 1) | -0.15 | [-0.55, 0.27] | 0.72 (2.6) | 0.87 (6.5) | 28.6% | 0.25 |
verypositive_default | student_t(3, 3, 3) | -0.22 | [-0.62, 0.21] | 0.79 (3.9) | 0.94 (15.3) | 23.0% | 0.20 |
verypositive_narrow | student_t(3, 3, 1) | -0.14 | [-0.57, 0.27] | 0.7 (2.3) | 0.98 (58.0) | 30.0% | 0.27 |
informative | student_t(3, -0.2, 3) | -0.26 | [-0.66, 0.14] | 0.85 (5.6) | 0.89 (7.8) | 20.2% | 0.18 |
All models show good mixing and convergence (not shown).
Prior name | Prior distribution | β | HDI | p(β<0) | p(β=0) | %HDI in ROPE | pROPE |
---|---|---|---|---|---|---|---|
default | student_t(3, 0, 3) | -0.23 | [-0.70, 0.20] | 0.79 (3.7) | 0.89 (7.9) | 21.8% | 0.19 |
flat | normal(0, 10) | -0.24 | [-0.66, 0.24] | 0.81 (4.3) | 0.96 (22.7) | 20.4% | 0.18 |
default_normal | normal(0, 5) | -0.24 | [-0.68, 0.24] | 0.8 (4.0) | 0.92 (11.4) | 19.6% | 0.17 |
narrow_0 | student_t(3, 0, 1) | -0.21 | [-0.65, 0.21] | 0.79 (3.7) | 0.72 (2.6) | 21.9% | 0.20 |
verynarrow_0 | student_t(3, 0, 0.1) | -0.04 | [-0.22, 0.16] | 0.62 (1.6) | 0.53 (1.1) | 70.3% | 0.63 |
negative_default | student_t(3, -1, 3) | -0.24 | [-0.69, 0.22] | 0.81 (4.3) | 0.89 (7.8) | 19.5% | 0.17 |
negative_narrow | student_t(3, -1, 1) | -0.30 | [-0.73, 0.11] | 0.87 (6.8) | 0.79 (3.8) | 16.6% | 0.15 |
verynegative_default | student_t(3, -3, 3) | -0.25 | [-0.71, 0.19] | 0.81 (4.4) | 0.93 (12.7) | 20.2% | 0.18 |
verynegative_narrow | student_t(3, -3, 1) | -0.31 | [-0.77, 0.14] | 0.86 (5.9) | 0.97 (31.5) | 15.4% | 0.14 |
positive_default | student_t(3, 1, 3) | -0.22 | [-0.66, 0.23] | 0.79 (3.8) | 0.9 (8.8) | 21.3% | 0.19 |
positive_narrow | student_t(3, 1, 1) | -0.13 | [-0.59, 0.32] | 0.67 (2.1) | 0.85 (5.6) | 27.5% | 0.24 |
verypositive_default | student_t(3, 3, 3) | -0.20 | [-0.65, 0.25] | 0.77 (3.3) | 0.93 (13.8) | 22.3% | 0.20 |
verypositive_narrow | student_t(3, 3, 1) | -0.12 | [-0.59, 0.37] | 0.66 (1.9) | 0.98 (52.4) | 25.9% | 0.23 |
informative | student_t(3, -0.2, 3) | -0.24 | [-0.71, 0.17] | 0.82 (4.5) | 0.89 (7.9) | 20.6% | 0.18 |
Here I conduct some of the analyses using only the actual “derived” loci for the two genes (i.e., excluding all the “proxy” SNPs used in the full analysis).
There are 108 observations, distributed among 75 unique Glottolg codes in 29 families (ranging from a minimum of 1 language per family to a maximum of 38, with a mean 3.7 and median 2 languages per family) and 4 macroareas.
There are 98:83:75 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 6 | 53 | 2 | 3 | 64 |
Yes | 19 | 21 | 3 | 1 | 44 |
Sum | 25 | 74 | 5 | 4 | 108 |
glmer
To better understand this overlap between family, macroarea and the two “derived” alleles, I regressed (separately) the ASPM-D and MCPH1-D on the macroarea, using mixed-effects beta regression (after replacing all \(0.0\) values by \(10^{-7}\) and all \(1.0\) by \(1.0-10^{-7}\), respectively) with language family as random effect:
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 5% | 4% | 3% | 5% | 1% |
unrestricted | none | alleles-together | 8% | 6% | 6% | 9% | 6% | 5% |
unrestricted | none | alleles-independent | 8% | 5% | 6% | 9% | 5% | 2% |
unrestricted | fixef | tone | 0% | 7% | 7% | 29% | 5% | 33% |
unrestricted | fixef | alleles-together | 89% | 6% | 6% | 34% | 6% | 28% |
unrestricted | fixef | alleles-independent | 86% | 5% | 5% | 30% | 5% | 28% |
macroareas | none | tone | 0% | 82% | 14% | 19% | 71% | 51% |
macroareas | none | alleles-together | 39% | 30% | 8% | 27% | 22% | 56% |
macroareas | none | alleles-independent | 43% | 35% | 12% | 34% | 25% | 59% |
macroareas | fixef | tone | 0% | 5% | 4% | 26% | 5% | 33% |
macroareas | fixef | alleles-together | 86% | 5% | 5% | 32% | 5% | 38% |
macroareas | fixef | alleles-independent | 86% | 4% | 5% | 36% | 4% | 38% |
families | none | tone | 20% | 13% | 2% | 10% | 17% | 46% |
families | none | alleles-together | 9% | 6% | 2% | 16% | 4% | 21% |
families | none | alleles-independent | 13% | 9% | 7% | 25% | 8% | 26% |
families | fixef | tone | 20% | 6% | 2% | 33% | 7% | 65% |
families | fixef | alleles-together | 79% | 1% | 2% | 35% | 1% | 19% |
families | fixef | alleles-independent | 84% | 2% | 3% | 32% | 1% | 19% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.44 (0.24, 0.61), p=0, decomposed into:
average direct effect (ADE): 0.28 (0.04, 0.51), p=0.02, and
average indirect effect (ACME) mediated by ASPM-D: 0.16 (0.03, 0.30), p=0.022, mediating 35.7% (5.6%, 85.0%), p=0.022 of the effect, resulting from:
For MCPH1-D:
TE: 0.44 (0.24, 0.61), p=0, decomposed into:
ADE: 0.53 (0.10, 0.73), p=0.012, and
ACME: -0.09 (-0.29, 0.28), p=0.43, mediating -25.9% (-104.5%, 70.5%), p=0.43 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.32, median = 0.33; 15.6% significant at α-level 0.05 and 49.1% significant at α-level 0.10; 99.9% > 0.0; one-sample one-sided t-test vs 0: t(999) = 115.5, p = 0;
ADE: mean = 0.28, median = 0.28; 6.4% significant at α-level 0.05 and 24.7% significant at α-level 0.10; 99.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = 87.8, p = 0;
ACME: mean = 0.046, median = 0.044; 0.0% significant at α-level 0.05 and 1.8% significant at α-level 0.10; 79.3% > 0.0; one-sample one-sided t-test vs 0: t(999) = 30.0, p = 1.1e-141;
β(Africa → allele): mean = -0.82, median = -0.84; 69.4% significant at α-level 0.05 and 89.8% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -228.0, p = 0;
β(allele → tone | Africa): mean = -0.31, median = -0.29; 0.1% significant at α-level 0.05 and 4.3% significant at α-level 0.10; 78.7% < 0.0; one-sample one-sided t-test vs 0: t(999) = -28.3, p = 5.8e-130.
For MCPH1-D:
TE: mean = 0.33, median = 0.33; 15.8% significant at α-level 0.05 and 50.0% significant at α-level 0.10; 99.9% > 0.0; one-sample one-sided t-test vs 0: t(999) = 116.1, p = 0;
ADE: mean = 0.35, median = 0.39; 2.1% significant at α-level 0.05 and 12.3% significant at α-level 0.10; 95.6% > 0.0; one-sample one-sided t-test vs 0: t(999) = 65.5, p = 0;
ACME: mean = -0.029, median = -0.048; 0.1% significant at α-level 0.05 and 0.8% significant at α-level 0.10; 38.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = -6.5, p = 1;
β(Africa → allele): mean = -2.3, median = -2.3; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -953.8, p = 0;
β(allele → tone | Africa): mean = 0.34, median = 0.33; 0.2% significant at α-level 0.05 and 0.5% significant at α-level 0.10; 27.6% < 0.0; one-sample one-sided t-test vs 0: t(999) = 18.9, p = 1.
Given the low sample size N = 29 unique families, relatively few effect sizes are big enough to be significant for each individual analysis; however, there are many more significant ACMEs for ASPM-D than for MCPH1-D: 0.1% vs 0.2% (0.5 times) for α-level 0.05, and 4.3% vs 0.5% (8.6 times) for α-level 0.10.
brms
With Africa and tone1 coded numerically, the model fit is: χ2(1)=0.00, p=0.96; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00:
Likewise, with Africa and tone1 coded as ordered binary factors, the model fit is: χ2(1)=0.01, p=0.94; CFI=1.00, TLI=1.66, NNFI=1.66 and RFI=1.00:
models fits:
Africa → ASPM-D: mean = -0.83, median = -0.85, sd = 0.11, IQR = 0.14, 100.0% < 0; 95.6% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -2.4e+02, p = 0;
Africa → MCPH1-D: mean = -2.3, median = -2.3, sd = 0.077, IQR = 0.099, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -9.5e+02, p = 0;
Africa → tone1: mean = 0.45, median = 0.47, sd = 0.3, IQR = 0.44, 92.2% > 0; 12.4% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 47, p = 5.6e-253;
ASPM-D → tone1: mean = -0.057, median = -0.057, sd = 0.071, IQR = 0.11, 76.0% < 0; 10.8% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -25, p = 2.3e-110;
MCPH1-D → tone1: mean = 0.057, median = 0.064, as = 0.11, IQR = 0.16, 30.4% < 0; 0.9% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 16, p = 1.
The resulting dataset has 106 observations, distributed among 73 unique Glottolg codes in 29 families (ranging from a minimum of 1 language per family to a maximum of 37, with a mean 3.7 and median 2 languages per family) and 4 macroareas.
There are 93:78:73 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 17 | 58 | 5 | 4 | 84 |
Yes | 9 | 13 | 0 | 0 | 22 |
Sum | 26 | 71 | 5 | 4 | 106 |
glmer
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 5% | 6% | 0% | 6% | 1% |
unrestricted | none | alleles-together | 20% | 9% | 8% | 3% | 9% | 13% |
unrestricted | none | alleles-independent | 23% | 9% | 8% | 2% | 8% | 9% |
unrestricted | fixef | tone | 0% | 6% | 6% | 0% | 6% | 91% |
unrestricted | fixef | alleles-together | 60% | 9% | 9% | 5% | 8% | 92% |
unrestricted | fixef | alleles-independent | 62% | 9% | 7% | 3% | 9% | 94% |
macroareas | none | tone | 0% | 19% | 3% | 0% | 28% | 13% |
macroareas | none | alleles-together | 46% | 19% | 4% | 10% | 24% | 78% |
macroareas | none | alleles-independent | 48% | 23% | 8% | 15% | 25% | 79% |
macroareas | fixef | tone | 0% | 6% | 6% | 0% | 5% | 90% |
macroareas | fixef | alleles-together | 54% | 6% | 6% | 13% | 7% | 73% |
macroareas | fixef | alleles-independent | 55% | 7% | 8% | 15% | 7% | 79% |
families | none | tone | 6% | 10% | 5% | 13% | 3% | 26% |
families | none | alleles-together | 18% | 8% | 8% | 18% | 0% | 18% |
families | none | alleles-independent | 18% | 8% | 12% | 19% | 3% | 23% |
families | fixef | tone | 6% | 6% | 9% | 28% | 3% | 90% |
families | fixef | alleles-together | 70% | 8% | 6% | 25% | 10% | 31% |
families | fixef | alleles-independent | 66% | 10% | 8% | 17% | 11% | 35% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.20 (0.01, 0.40), p=0.036, decomposed into:
average direct effect (ADE): 0.01 (-0.18, 0.25), p=0.96, and
average indirect effect (ACME) mediated by ASPM-D: 0.19 (0.02, 0.34), p=0.024, mediating 95.9% (-1.7%, 470.4%), p=0.052 of the effect, resulting from:
For MCPH1-D:
TE: 0.18 (-0.01, 0.39), p=0.064, decomposed into:
ADE: 0.30 (-0.15, 0.63), p=0.17, and
ACME: -0.12 (-0.39, 0.28), p=0.46, mediating -72.3% (-674.6%, 386.6%), p=0.5 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.15, median = 0.15; 0.7% significant at α-level 0.05 and 4.8% significant at α-level 0.10; 90.1% > 0.0; one-sample one-sided t-test vs 0: t(999) = 50.2, p = 7.1e-276;
ADE: mean = 0.13, median = 0.13; 0.1% significant at α-level 0.05 and 2.4% significant at α-level 0.10; 90.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = 46.1, p = 1.3e-249;
ACME: mean = 0.016, median = 0.016; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 71.4% > 0.0; one-sample one-sided t-test vs 0: t(999) = 19.0, p = 5e-69;
β(Africa → allele): mean = -0.88, median = -0.88; 93.2% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -433.7, p = 0;
β(allele → tone | Africa): mean = -0.097, median = -0.11; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 69.5% < 0.0; one-sample one-sided t-test vs 0: t(999) = -14.6, p = 3.1e-44.
For MCPH1-D:
TE: mean = 0.15, median = 0.15; 0.9% significant at α-level 0.05 and 4.8% significant at α-level 0.10; 90.1% > 0.0; one-sample one-sided t-test vs 0: t(999) = 49.6, p = 3.8e-272;
ADE: mean = 0.12, median = 0.12; 0.0% significant at α-level 0.05 and 0.1% significant at α-level 0.10; 81.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 28.7, p = 7e-133;
ACME: mean = 0.024, median = 0.023; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 55.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 5.7, p = 7.3e-09;
β(Africa → allele): mean = -2.3, median = -2.3; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -1219.2, p = 0;
β(allele → tone | Africa): mean = 0.033, median = 0.021; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 48.5% < 0.0; one-sample one-sided t-test vs 0: t(999) = 1.9, p = 0.97.
Given the low sample size N = 29 unique families, relatively few effect sizes are big enough to be significant for each individual analysis; however, there are many more significant ACMEs for ASPM-D than for MCPH1-D: 0.0% vs 0.0% (NaN times) for α-level 0.05, and 0.0% vs 0.0% (NaN times) for α-level 0.10.
brms
With Africa and tone1 coded numerically, the model fit is: χ2(1)=0.03, p=0.86; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00:
Likewise, with Africa and tone1 coded as ordered binary factors, the model fit is: χ2(1)=0.07, p=0.79; CFI=1.00, TLI=1.61, NNFI=1.61 and RFI=0.97:
models fits:
Africa → ASPM-D: mean = -0.88, median = -0.89, sd = 0.062, IQR = 0.084, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -4.5e+02, p = 0;
Africa → MCPH1-D: mean = -2.3, median = -2.3, sd = 0.056, IQR = 0.074, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -1.3e+03, p = 0;
Africa → tone1: mean = 0.12, median = 0.11, sd = 0.21, IQR = 0.3, 69.9% > 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 18, p = 5.9e-64;
ASPM-D → tone1: mean = -0.013, median = -0.013, sd = 0.028, IQR = 0.039, 67.2% < 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -14, p = 3.9e-41;
MCPH1-D → tone1: mean = -0.004, median = -0.0039, as = 0.09, IQR = 0.13, 51.1% < 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -1.4, p = 0.08.
The resulting dataset has 110 observations, distributed among 76 unique Glottolg codes in 29 families (ranging from a minimum of 1 language per family to a maximum of 37, with a mean 3.8 and median 2 languages per family) and 4 macroareas.
There are 93:78:76 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 6 | 52 | 2 | 3 | 63 |
1 | 6 | 6 | 3 | 0 | 15 |
2 | 12 | 2 | 0 | 1 | 15 |
3 | 2 | 3 | 0 | 0 | 5 |
4 | 0 | 6 | 0 | 0 | 6 |
5 | 1 | 3 | 0 | 0 | 4 |
6 | 0 | 2 | 0 | 0 | 2 |
Sum | 27 | 74 | 5 | 4 | 110 |
glmer
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 18% | 16% | 15% | 13% | 4% |
unrestricted | none | alleles-together | 11% | 4% | 3% | 6% | 4% | 0% |
unrestricted | none | alleles-independent | 10% | 3% | 4% | 5% | 3% | 0% |
unrestricted | fixef | tone | 0% | 24% | 18% | 26% | 18% | 34% |
unrestricted | fixef | alleles-together | 86% | 3% | 3% | 15% | 4% | 19% |
unrestricted | fixef | alleles-independent | 84% | 3% | 4% | 12% | 3% | 15% |
macroareas | none | tone | 0% | 40% | 20% | 18% | 33% | 22% |
macroareas | none | alleles-together | 32% | 15% | 4% | 16% | 17% | 25% |
macroareas | none | alleles-independent | 30% | 12% | 4% | 18% | 16% | 23% |
macroareas | fixef | tone | 0% | 34% | 24% | 33% | 22% | 38% |
macroareas | fixef | alleles-together | 85% | 4% | 5% | 21% | 4% | 35% |
macroareas | fixef | alleles-independent | 86% | 3% | 4% | 19% | 4% | 33% |
families | none | tone | 11% | 6% | 8% | 49% | 2% | 6% |
families | none | alleles-together | 24% | 10% | 14% | 66% | 0% | 3% |
families | none | alleles-independent | 26% | 9% | 15% | 65% | 0% | 5% |
families | fixef | tone | 17% | 7% | 12% | 67% | 1% | 49% |
families | fixef | alleles-together | 86% | 2% | 6% | 58% | 1% | 3% |
families | fixef | alleles-independent | 83% | 2% | 6% | 53% | 1% | 6% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.81 (0.19, 1.64), p=0.006, decomposed into:
average direct effect (ADE): -0.20 (-0.91, 0.42), p=0.53, and
average indirect effect (ACME) mediated by ASPM-D: 1.01 (0.49, 1.89), p=0, mediating 124.1% (61.9%, 380.1%), p=0.006 of the effect, resulting from:
For MCPH1-D:
TE: 0.62 (0.13, 1.17), p=0.018, decomposed into:
ADE: 0.62 (-0.69, 2.13), p=0.35, and
ACME: 0.00 (-1.26, 1.27), p=1, mediating 0.3% (-277.8%, 294.1%), p=1 of the effect, resulting from:
For ASPM-D:
TE: mean = 1.2, median = 1.1; 55.1% significant at α-level 0.05 and 65.0% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 72.9, p = 0;
ADE: mean = 0.91, median = 0.9; 32.4% significant at α-level 0.05 and 46.1% significant at α-level 0.10; 98.3% > 0.0; one-sample one-sided t-test vs 0: t(999) = 60.8, p = 0;
ACME: mean = 0.26, median = 0.22; 3.0% significant at α-level 0.05 and 11.2% significant at α-level 0.10; 78.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 25.7, p = 8.1e-113;
β(Africa → allele): mean = -0.89, median = -0.89; 92.1% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -447.7, p = 0;
β(allele → tone | Africa): mean = -0.18, median = -0.17; 6.5% significant at α-level 0.05 and 14.2% significant at α-level 0.10; 78.7% < 0.0; one-sample one-sided t-test vs 0: t(999) = -27.1, p = 3.3e-122.
For MCPH1-D:
TE: mean = 1.1, median = 1.1; 55.6% significant at α-level 0.05 and 64.9% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 74.9, p = 0;
ADE: mean = 0.79, median = 0.74; 1.1% significant at α-level 0.05 and 3.7% significant at α-level 0.10; 70.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = 15.0, p = 1.6e-46;
ACME: mean = 0.31, median = 0.36; 0.2% significant at α-level 0.05 and 2.9% significant at α-level 0.10; 60.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = 5.7, p = 8.9e-09;
β(Africa → allele): mean = -2.3, median = -2.3; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -1254.2, p = 0;
β(allele → tone | Africa): mean = -0.12, median = -0.13; 0.4% significant at α-level 0.05 and 2.7% significant at α-level 0.10; 67.5% < 0.0; one-sample one-sided t-test vs 0: t(999) = -13.5, p = 1.3e-38.
Given the low sample size N = 35 unique families, relatively few effect sizes are big enough to be significant; however, there are many more significant indirect effects (ACME) for ASPM-D than for MCPH1-D: 6.5% vs 0.4% (16.2 times) for α-level 0.05, and 14.2% vs 2.7% (5.3 times) for α-level 0.10.
brms
Please note that path analysis uses a linear model (so not a Poisson one) for the tone counts; also I only use the numeric coding for Africa.
Coding Africa numerically, the model fits the data very well (χ2(1)=0.02, p=0.88; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00):
It can be seen that:
the models fits:
Africa → ASPM-D: mean = -0.89, median = -0.89, sd = 0.064, IQR = 0.084, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -4.4e+02, p = 0
Africa → MCPH1-D: mean = -2.3, median = -2.3, sd = 0.057, IQR = 0.078, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -1.3e+03, p = 0
Africa → tone counts: mean = 0.45, median = 0.49, sd = 0.89, IQR = 1.2, 70.2% > 0; 3.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 16, p = 1.1e-51
ASPM-D → tone counts: mean = -0.16, median = -0.16, sd = 0.19, IQR = 0.26, 79.9% < 0; 7.3% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -27, p = 2.7e-122
MCPH1-D → tone counts: mean = -0.18, median = -0.18, as = 0.37, IQR = 0.51, 69.2% < 0; 0.1% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -16, p = 4.7e-49
The correspondence between genetic samples and languages is given below (excluding the 139 samples that unambiguously correspond to a single language):
pop_ID | n_languages | n_families | languages | families |
---|---|---|---|---|
SA004382R | 144 | 1 | ’Are’are, Adzera, Äiwoo, Ajië, Amara, Aneityum, Araki, Aribwatsa, Arop-Lokep, Arosi, Aulua, Babatana, Bannoni, Big Nambas, Carolinian, Cemuhî, Cheke Holo, Chuukese, Dehu, Dumbea, East Ambae, East Futuna, East Uvean, Fijian, Futuna-Aniwa, Fwâi, Gapapaiwa, Gela, Gilbertese, Gumawana, Halia, Hano, Hawaiian, Hoava, Iaai, Iduna, Kairiru, Kapingamarangi, Kara (Papua New Guinea), Kaulong, Kela (Papua New Guinea), Kele (Papua New Guinea), Kilivila, Kokota, Kosraean, Kuanua, Kwaio, Kwamera, Labu, Lala, Lamenu, Lau, Lenakel, Lewo, Longgu, Loniu, Lonwolwol, Lou, Luangiua, Lusi, Maisin, Maleu-Kilenge, Manam, Maori, Marshallese, Matukar, Mbula, Mekeo, Mele-Fila, Minaveha, Mokilese, Mono-Alu, Motu, Muduapa, Musom, Mussau-Emira, Muyuw, Mwotlap, Nakanai, Nalik, Natügu, Nauru, Nehan, Nêlêmwa-Nixumwak, Nengone, Neve’ei, Niuafo’ou, Niuean, North Efate, North Marquesan, Nukuoro, Paama, Patep, Patpatar, Pingelapese, Pohnpeian, Port Sandwich, Puluwatese, Rapanui, Rennell-Bellona, Rotuman, Roviana, Sa’a, Saliba, Samoan, Saposa, Siar-Lak, Sie, Sinaugoro, Sio, Sobei, Sonsorol, South Efate, South Marquesan, Southwest Tanna, Sudest, Sursurunga, Tahitian, Takia, Tamambo, Tawala, Teanu, Teop, Tigak, Tirax, Tiri-Mea, To’abaita, Tobati, Tokelau, Tonga (Tonga Islands), Tuamotuan, Tumleo, Tungag, Tuvalu, Ulithian, Ura (Vanuatu), Uripiv-Wala-Rano-Atchin, Vaeakau-Taumako, Waima, Wanohe, Western Fijian, Woleaian, Xârâcùù, Yabem | Austronesian |
SA001501H | 23 | 4 | Abau, Alamblak, Ambulas, Ap Ma, Awtuw, Bahinemo, Boikin, Chambri, Hanga Hundi, Iatmul, Iwam, Kaian, Kire, Kwoma, Manambu, Mehek, Murik (Papua New Guinea), Namia, Rao, Watam, Wogamusin, Yessan-Mayo, Yimas | Ap Ma, Lower Sepik-Ramu, Ndu, Sepik |
SA001818S | 9 | 1 | Herero, Kuanyama, Kwambi, Mbalanhu, Ndonga, Ngandyera, Southern Sotho, Tswana, Zulu | Atlantic-Congo |
SA004368V | 7 | 1 | Amharic, Awngi, Bilin, Geez, Qimant, Tigrinya, Xamtanga | Afro-Asiatic |
SA004046O | 6 | 1 | Bukusu, Idakho-Isukha-Tiriki, Kisa, Masaaba, Saamia, Tsotso | Atlantic-Congo |
SA001469U | 5 | 3 | East Taa, Hai//om-Akhoe, Nama (Namibia), North-Central Ju, South-Eastern Ju | Khoe-Kwadi, Kxa, Tuu |
SA001467S | 4 | 1 | Eastern Maninkakan, Kita Maninkakan, Mandinka, Western Maninkakan | Mande |
SA001819T | 4 | 1 | Gusii, Kamba (Kenya), Kikuyu, Meru | Atlantic-Congo |
SA003646T | 4 | 1 | Eastern Maninkakan, Kita Maninkakan, Mandinka, Western Maninkakan | Mande |
SA001476S | 3 | 1 | Eastern Balochi, Southern Balochi, Western Balochi | Indo-European |
SA001478U | 3 | 1 | Eastern Balochi, Southern Balochi, Western Balochi | Indo-European |
SA004365S | 3 | 1 | Kahe, Machame, Mochi | Atlantic-Congo |
SA004371P | 3 | 1 | Modern Hebrew, South Levantine Arabic, Standard Arabic | Afro-Asiatic |
ESTONIAN_VAR | 2 | 1 | Estonian, South Estonian | Uralic |
MB2005_BakolaPygmy | 2 | 1 | Gyele, Kwasio | Atlantic-Congo |
Qatari | 2 | 1 | Gulf Arabic, Standard Arabic | Afro-Asiatic |
SA001466R | 2 | 1 | Efe, Lese | Central Sudanic |
SA001474Q | 2 | 1 | South Levantine Arabic, Standard Arabic | Afro-Asiatic |
SA001483Q | 2 | 1 | Mandarin Chinese, Yue Chinese | Sino-Tibetan |
SA001486T | 2 | 1 | Central Mashan Hmong, Hmong Njua | Hmong-Mien |
SA001493R | 2 | 1 | Lü, Tai Nüa | Tai-Kadai |
SA001508O | 2 | 1 | English, Scots | Indo-European |
SA002254N | 2 | 1 | South Levantine Arabic, Standard Arabic | Afro-Asiatic |
SA002257Q | 2 | 1 | South Levantine Arabic, Standard Arabic | Afro-Asiatic |
SA002262M | 2 | 1 | Central Pashto, Northern Pashto | Indo-European |
SA003028N | 2 | 1 | Estonian, South Estonian | Uralic |
SA004111H | 2 | 1 | English, Spanish | Indo-European |
SA004238R | 2 | 1 | Lü, Tai Nüa | Tai-Kadai |
SA004361O | 2 | 1 | Efe, Lese | Central Sudanic |
SA004370O | 2 | 1 | Judeo-Yemeni Arabic, Modern Hebrew | Afro-Asiatic |
SA004378W | 2 | 1 | English, Irish | Indo-European |
SA004587Y | 2 | 1 | Mongolia Buriat, Russia Buriat | Mongolic-Khitan |
SA004592U | 2 | 1 | Mandarin Chinese, Yue Chinese | Sino-Tibetan |
SA004599B | 2 | 1 | Erzya, Moksha | Uralic |
SA004603N | 2 | 1 | Church Slavic, Russian | Indo-European |
SA004623P | 2 | 1 | Georgian, Mingrelian | Kartvelian |
Even if it is very conservative (e.g., Church Slavic and Russian corresponding to SA004603N
are very similar for tone, as are Modern Hebrew, South Levantine Arabic and Standard Arabic corresponding to SA004371P
), I removed from this analysis all genetic samples that map to more than 1 language.
This results in 139 unique genetic samples corresponding to 103 (meta)populations, each mapping to a single language, distributed across 91 unique languages (i.e., it is still the case that more than one sample maps to the same language) in 4 macroareas:
Africa | Eurasia | America | Papunesia |
---|---|---|---|
16 | 109 | 10 | 4 |
There are 126 observations, distributed among 88 unique Glottolg codes in 30 families (ranging from a minimum of 1 language per family to a maximum of 37, with a mean 4.2 and median 2 languages per family) and 4 macroareas.
There are 126:100:88 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 2 | 79 | 4 | 4 | 89 |
Yes | 14 | 17 | 6 | 0 | 37 |
Sum | 16 | 96 | 10 | 4 | 126 |
glmer
To better understand this overlap between family, macroarea and the two “derived” alleles, I regressed (separately) the ASPM-D and MCPH1-D on the macroarea, using mixed-effects beta regression (after replacing all \(0.0\) values by \(10^{-7}\) and all \(1.0\) by \(1.0-10^{-7}\), respectively) with language family as random effect:
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 4% | 4% | 0% | 5% | 0% |
unrestricted | none | alleles-together | 6% | 5% | 5% | 7% | 6% | 4% |
unrestricted | none | alleles-independent | 7% | 6% | 5% | 7% | 5% | 2% |
unrestricted | fixef | tone | 0% | 6% | 5% | 13% | 7% | 15% |
unrestricted | fixef | alleles-together | 70% | 6% | 5% | 25% | 6% | 19% |
unrestricted | fixef | alleles-independent | 67% | 6% | 6% | 24% | 5% | 16% |
macroareas | none | tone | 0% | 72% | 28% | 24% | 67% | 7% |
macroareas | none | alleles-together | 30% | 24% | 8% | 28% | 20% | 40% |
macroareas | none | alleles-independent | 35% | 29% | 14% | 37% | 25% | 46% |
macroareas | fixef | tone | 0% | 5% | 5% | 14% | 6% | 21% |
macroareas | fixef | alleles-together | 64% | 4% | 4% | 24% | 5% | 28% |
macroareas | fixef | alleles-independent | 64% | 5% | 4% | 27% | 4% | 26% |
families | none | tone | 44% | 21% | 5% | 24% | 16% | 38% |
families | none | alleles-together | 18% | 15% | 6% | 32% | 6% | 17% |
families | none | alleles-independent | 30% | 27% | 20% | 49% | 19% | 27% |
families | fixef | tone | 27% | 6% | 4% | 37% | 4% | 38% |
families | fixef | alleles-together | 70% | 8% | 6% | 50% | 3% | 24% |
families | fixef | alleles-independent | 74% | 8% | 10% | 52% | 5% | 35% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.63 (0.37, 0.78), p=0, decomposed into:
average direct effect (ADE): 0.44 (0.15, 0.67), p=0, and
average indirect effect (ACME) mediated by ASPM-D: 0.19 (0.07, 0.34), p=0, mediating 30.0% (9.7%, 62.3%), p=0 of the effect, resulting from:
For MCPH1-D:
TE: 0.64 (0.40, 0.79), p=0, decomposed into:
ADE: 0.71 (0.38, 0.86), p=0, and
ACME: -0.07 (-0.22, 0.22), p=0.37, mediating -12.3% (-51.6%, 35.7%), p=0.37 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.2, median = 0.15; 0.0% significant at α-level 0.05 and 10.4% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 59.0, p = 0;
ADE: mean = 0.15, median = 0.11; 0.0% significant at α-level 0.05 and 2.6% significant at α-level 0.10; 98.8% > 0.0; one-sample one-sided t-test vs 0: t(999) = 42.9, p = 5.1e-229;
ACME: mean = 0.057, median = 0.056; 0.0% significant at α-level 0.05 and 2.4% significant at α-level 0.10; 99.9% > 0.0; one-sample one-sided t-test vs 0: t(999) = 86.9, p = 0;
β(Africa → allele): mean = -0.63, median = -0.67; 5.0% significant at α-level 0.05 and 38.4% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -123.7, p = 0;
β(allele → tone | Africa): mean = -0.81, median = -0.8; 8.1% significant at α-level 0.05 and 31.0% significant at α-level 0.10; 99.9% < 0.0; one-sample one-sided t-test vs 0: t(999) = -100.0, p = 0.
For MCPH1-D:
TE: mean = 0.2, median = 0.15; 0.0% significant at α-level 0.05 and 10.4% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 57.6, p = 3.9e-320;
ADE: mean = 0.22, median = 0.2; 0.1% significant at α-level 0.05 and 2.1% significant at α-level 0.10; 98.9% > 0.0; one-sample one-sided t-test vs 0: t(999) = 58.3, p = 0;
ACME: mean = -0.017, median = -0.04; 0.0% significant at α-level 0.05 and 0.5% significant at α-level 0.10; 34.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = -4.9, p = 1;
β(Africa → allele): mean = -2.5, median = -2.5; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -487.0, p = 0;
β(allele → tone | Africa): mean = 0.37, median = 0.39; 0.0% significant at α-level 0.05 and 0.4% significant at α-level 0.10; 26.3% < 0.0; one-sample one-sided t-test vs 0: t(999) = 21.3, p = 1.
Given the low sample size N = 30 unique families, relatively few effect sizes are big enough to be significant for each individual analysis; however, there are many more significant ACMEs for ASPM-D than for MCPH1-D: 8.1% vs 0.0% (Inf times) for α-level 0.05, and 31.0% vs 0.4% (77.5 times) for α-level 0.10.
brms
With Africa and tone1 coded numerically, the model fit is: χ2(1)=0.03, p=0.86; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00:
Likewise, with Africa and tone1 coded as ordered binary factors, the model fit is: χ2(1)=0.08, p=0.78; CFI=1.00, TLI=1.21, NNFI=1.21 and RFI=0.99:
models fits:
Africa → ASPM-D: mean = -0.62, median = -0.66, sd = 0.16, IQR = 0.21, 100.0% < 0; 79.4% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -1.2e+02, p = 0;
Africa → MCPH1-D: mean = -2.5, median = -2.5, sd = 0.16, IQR = 0.23, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -5.1e+02, p = 0;
Africa → tone1: mean = 0.49, median = 0.53, sd = 0.36, IQR = 0.52, 88.1% > 0; 20.3% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 43, p = 4.3e-232;
ASPM-D → tone1: mean = -0.16, median = -0.16, sd = 0.051, IQR = 0.069, 99.9% < 0; 52.2% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -1e+02, p = 0;
MCPH1-D → tone1: mean = 0.0056, median = 0.02, as = 0.12, IQR = 0.17, 43.1% < 0; 1.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 1.4, p = 0.92.
The resulting dataset has 121 observations, distributed among 83 unique Glottolg codes in 30 families (ranging from a minimum of 1 language per family to a maximum of 36, with a mean 4 and median 1.5 languages per family) and 4 macroareas.
There are 121:95:83 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 8 | 82 | 9 | 4 | 103 |
Yes | 6 | 11 | 1 | 0 | 18 |
Sum | 14 | 93 | 10 | 4 | 121 |
glmer
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 6% | 5% | 7% | 6% | 0% |
unrestricted | none | alleles-together | 42% | 11% | 10% | 25% | 10% | 6% |
unrestricted | none | alleles-independent | 42% | 9% | 9% | 28% | 7% | 4% |
unrestricted | fixef | tone | 0% | 6% | 5% | 13% | 6% | 1% |
unrestricted | fixef | alleles-together | 81% | 10% | 10% | 30% | 10% | 8% |
unrestricted | fixef | alleles-independent | 80% | 11% | 8% | 27% | 11% | 9% |
macroareas | none | tone | 0% | 43% | 2% | 7% | 52% | 0% |
macroareas | none | alleles-together | 46% | 8% | 7% | 41% | 8% | 28% |
macroareas | none | alleles-independent | 46% | 6% | 7% | 41% | 7% | 28% |
macroareas | fixef | tone | 0% | 6% | 5% | 15% | 5% | 1% |
macroareas | fixef | alleles-together | 79% | 9% | 7% | 37% | 9% | 21% |
macroareas | fixef | alleles-independent | 79% | 10% | 10% | 35% | 9% | 21% |
families | none | tone | 82% | 9% | 5% | 44% | 10% | 24% |
families | none | alleles-together | 49% | 6% | 6% | 42% | 6% | 16% |
families | none | alleles-independent | 52% | 7% | 7% | 48% | 6% | 20% |
families | fixef | tone | 87% | 6% | 7% | 53% | 5% | 25% |
families | fixef | alleles-together | 87% | 12% | 7% | 38% | 13% | 18% |
families | fixef | alleles-independent | 88% | 13% | 9% | 42% | 12% | 21% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 0.32 (0.09, 0.57), p=0.004, decomposed into:
average direct effect (ADE): 0.14 (-0.07, 0.40), p=0.23, and
average indirect effect (ACME) mediated by ASPM-D: 0.19 (0.03, 0.37), p=0.004, mediating 58.1% (12.2%, 158.4%), p=0.008 of the effect, resulting from:
For MCPH1-D:
TE: 0.32 (0.09, 0.57), p=0, decomposed into:
ADE: 0.36 (-0.11, 0.70), p=0.13, and
ACME: -0.04 (-0.32, 0.36), p=0.74, mediating -21.2% (-179.3%, 161.8%), p=0.74 of the effect, resulting from:
For ASPM-D:
TE: mean = 0.22, median = 0.23; 2.9% significant at α-level 0.05 and 15.2% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 61.3, p = 0;
ADE: mean = 0.17, median = 0.18; 0.1% significant at α-level 0.05 and 3.7% significant at α-level 0.10; 95.8% > 0.0; one-sample one-sided t-test vs 0: t(999) = 52.2, p = 9e-288;
ACME: mean = 0.043, median = 0.042; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 99.8% > 0.0; one-sample one-sided t-test vs 0: t(999) = 72.0, p = 0;
β(Africa → allele): mean = -0.74, median = -0.74; 9.7% significant at α-level 0.05 and 65.4% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -319.7, p = 0;
β(allele → tone | Africa): mean = -0.35, median = -0.35; 0.0% significant at α-level 0.05 and 0.0% significant at α-level 0.10; 99.6% < 0.0; one-sample one-sided t-test vs 0: t(999) = -81.0, p = 0.
For MCPH1-D:
TE: mean = 0.22, median = 0.23; 2.8% significant at α-level 0.05 and 14.9% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 61.3, p = 0;
ADE: mean = 0.33, median = 0.34; 0.0% significant at α-level 0.05 and 2.5% significant at α-level 0.10; 99.3% > 0.0; one-sample one-sided t-test vs 0: t(999) = 82.5, p = 0;
ACME: mean = -0.11, median = -0.12; 0.0% significant at α-level 0.05 and 0.9% significant at α-level 0.10; 19.1% > 0.0; one-sample one-sided t-test vs 0: t(999) = -28.4, p = 1;
β(Africa → allele): mean = -2.6, median = -2.6; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -725.0, p = 0;
β(allele → tone | Africa): mean = 0.7, median = 0.66; 0.0% significant at α-level 0.05 and 0.6% significant at α-level 0.10; 10.7% < 0.0; one-sample one-sided t-test vs 0: t(999) = 37.7, p = 1.
Given the low sample size N = 30 unique families, relatively few effect sizes are big enough to be significant for each individual analysis; however, there are many more significant ACMEs for ASPM-D than for MCPH1-D: 0.0% vs 0.0% (NaN times) for α-level 0.05, and 0.0% vs 0.6% (0.0 times) for α-level 0.10.
brms
With Africa and tone1 coded numerically, the model fit is: χ2(1)=0.14, p=0.71; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00:
Likewise, with Africa and tone1 coded as ordered binary factors, the model fit is: χ2(1)=0.43, p=0.51; CFI=1.00, TLI=1.69, NNFI=1.69 and RFI=0.77:
models fits:
Africa → ASPM-D: mean = -0.73, median = -0.74, sd = 0.072, IQR = 0.096, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -3.2e+02, p = 0;
Africa → MCPH1-D: mean = -2.6, median = -2.6, sd = 0.11, IQR = 0.16, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -7.3e+02, p = 0;
Africa → tone1: mean = 0.36, median = 0.37, sd = 0.22, IQR = 0.29, 95.1% > 0; 3.7% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 53, p = 1.7e-293;
ASPM-D → tone1: mean = -0.041, median = -0.04, sd = 0.027, IQR = 0.036, 93.9% < 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -49, p = 6.9e-266;
MCPH1-D → tone1: mean = 0.067, median = 0.073, as = 0.079, IQR = 0.093, 18.4% < 0; 0.1% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 27, p = 1.
The resulting dataset has 121 observations, distributed among 83 unique Glottolg codes in 30 families (ranging from a minimum of 1 language per family to a maximum of 36, with a mean 4 and median 1.5 languages per family) and 4 macroareas.
There are 121:95:83 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 2 | 77 | 4 | 4 | 87 |
1 | 2 | 4 | 5 | 0 | 11 |
2 | 9 | 3 | 0 | 0 | 12 |
3 | 1 | 2 | 0 | 0 | 3 |
4 | 0 | 4 | 1 | 0 | 5 |
5 | 0 | 2 | 0 | 0 | 2 |
6 | 0 | 1 | 0 | 0 | 1 |
Sum | 14 | 93 | 10 | 4 | 121 |
glmer
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 22% | 18% | 15% | 15% | 7% |
unrestricted | none | alleles-together | 27% | 6% | 6% | 6% | 6% | 2% |
unrestricted | none | alleles-independent | 26% | 6% | 6% | 5% | 7% | 0% |
unrestricted | fixef | tone | 0% | 28% | 20% | 34% | 19% | 46% |
unrestricted | fixef | alleles-together | 98% | 8% | 6% | 30% | 7% | 37% |
unrestricted | fixef | alleles-independent | 98% | 8% | 6% | 30% | 8% | 39% |
macroareas | none | tone | 0% | 41% | 22% | 25% | 30% | 25% |
macroareas | none | alleles-together | 50% | 18% | 13% | 26% | 12% | 28% |
macroareas | none | alleles-independent | 53% | 17% | 12% | 26% | 16% | 31% |
macroareas | fixef | tone | 0% | 35% | 25% | 39% | 24% | 42% |
macroareas | fixef | alleles-together | 98% | 11% | 11% | 34% | 8% | 45% |
macroareas | fixef | alleles-independent | 98% | 9% | 11% | 39% | 8% | 45% |
families | none | tone | 89% | 34% | 23% | 62% | 17% | 41% |
families | none | alleles-together | 50% | 22% | 20% | 54% | 5% | 19% |
families | none | alleles-independent | 58% | 27% | 28% | 62% | 10% | 25% |
families | fixef | tone | 86% | 11% | 16% | 70% | 2% | 48% |
families | fixef | alleles-together | 99% | 11% | 15% | 65% | 3% | 37% |
families | fixef | alleles-independent | 97% | 10% | 14% | 63% | 5% | 45% |
brms
(g)lm
For ASPM-D:
total effect (TE) of being in Africa on tone: 1.63 (0.64, 3.36), p=0, decomposed into:
average direct effect (ADE): 0.43 (-0.21, 1.26), p=0.19, and
average indirect effect (ACME) mediated by ASPM-D: 1.20 (0.47, 2.56), p=0, mediating 73.4% (44.5%, 123.6%), p=0 of the effect, resulting from:
For MCPH1-D:
TE: 1.19 (0.53, 2.06), p=0, decomposed into:
ADE: 3.69 (0.97, 9.95), p=0, and
ACME: -2.50 (-8.35, -0.01), p=0.05, mediating -165.7% (-811.6%, -0.5%), p=0.05 of the effect, resulting from:
For ASPM-D:
TE: mean = 1.2, median = 1.3; 44.2% significant at α-level 0.05 and 67.0% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 116.8, p = 0;
ADE: mean = 0.89, median = 0.9; 14.8% significant at α-level 0.05 and 32.0% significant at α-level 0.10; 99.8% > 0.0; one-sample one-sided t-test vs 0: t(999) = 86.8, p = 0;
ACME: mean = 0.35, median = 0.34; 0.0% significant at α-level 0.05 and 1.2% significant at α-level 0.10; 99.7% > 0.0; one-sample one-sided t-test vs 0: t(999) = 72.3, p = 0;
β(Africa → allele): mean = -0.73, median = -0.73; 10.3% significant at α-level 0.05 and 64.3% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -310.5, p = 0;
β(allele → tone | Africa): mean = -0.29, median = -0.3; 2.8% significant at α-level 0.05 and 10.2% significant at α-level 0.10; 99.8% < 0.0; one-sample one-sided t-test vs 0: t(999) = -93.0, p = 0.
For MCPH1-D:
TE: mean = 1.3, median = 1.3; 37.1% significant at α-level 0.05 and 61.0% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 101.0, p = 0;
ADE: mean = 43, median = 22; 73.6% significant at α-level 0.05 and 87.0% significant at α-level 0.10; 100.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = 23.8, p = 8.4e-100;
ACME: mean = -42, median = -21; 50.6% significant at α-level 0.05 and 66.2% significant at α-level 0.10; 0.0% > 0.0; one-sample one-sided t-test vs 0: t(999) = -23.1, p = 1;
β(Africa → allele): mean = -2.6, median = -2.6; 100.0% significant at α-level 0.05 and 100.0% significant at α-level 0.10; 100.0% < 0.0; one-sample one-sided t-test vs 0: t(999) = -720.1, p = 0;
β(allele → tone | Africa): mean = 0.88, median = 0.87; 50.8% significant at α-level 0.05 and 67.8% significant at α-level 0.10; 0.2% < 0.0; one-sample one-sided t-test vs 0: t(999) = 78.3, p = 1.
Given the low sample size N = 35 unique families, relatively few effect sizes are big enough to be significant; however, there are many more significant indirect effects (ACME) for ASPM-D than for MCPH1-D: 2.8% vs 50.8% (0.1 times) for α-level 0.05, and 10.2% vs 67.8% (0.2 times) for α-level 0.10.
brms
Please note that path analysis uses a linear model (so not a Poisson one) for the tone counts; also I only use the numeric coding for Africa.
Coding Africa numerically, the model fits the data very well (χ2(1)=0.14, p=0.71; CFI=1.00, TLI=1.03, NNFI=1.03 and RFI=1.00):
It can be seen that:
the models fits:
Africa → ASPM-D: mean = -0.73, median = -0.73, sd = 0.07, IQR = 0.095, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -3.3e+02, p = 0
Africa → MCPH1-D: mean = -2.6, median = -2.6, sd = 0.12, IQR = 0.15, 100.0% < 0; 100.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -7.1e+02, p = 0
Africa → tone counts: mean = 2.7, median = 2.7, sd = 0.84, IQR = 1.3, 100.0% > 0; 65.4% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 1e+02, p = 0
ASPM-D → tone counts: mean = -0.17, median = -0.17, sd = 0.12, IQR = 0.16, 92.2% < 0; 0.0% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = -46, p = 2.5e-251
MCPH1-D → tone counts: mean = 0.68, median = 0.68, as = 0.28, IQR = 0.38, 0.8% < 0; 36.1% significant at α-level 0.05; one-sample one-sided t-test vs 0: t(999) = 76, p = 1
The explicit hierarchy of the sources for tone as used in the paper is:
but there can be other justified choices; among these choices, I test here the alternative hierarchy:
The “corrected” counts are computed as LAPSyDcorr = 0.074 +1.256LAPSyD -0.11LAPSyD2, and PHOIBLEcorr = 0.415 +0.815PHOIBLE -0.049PHOIBLE2, respectively.
No | Yes | |
---|---|---|
No | 2527 | 14 |
Yes | 13 | 1244 |
Test statistic | df | P value |
---|---|---|
3673 | 1 | 0 * * * |
Test statistic | df | P value |
---|---|---|
3677 | NA | 9.999e-05 * * * |
The disagreements are:
glottocode | PHOIBLE | WALS | LAPSyD | LAPSyD (#) | DL2007 | WPHON | agreement (orig) | decision (orig) | agreement (alt) | decision (alt) |
---|---|---|---|---|---|---|---|---|---|---|
amah1246 | 0 | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
beja1238 | 0 | Simple | None | 0 | NA | 1 | No | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | Yes | WALS |
broo1239 | NA | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
chua1250 | 0 | NA | None | 0 | NA | 3 | No | LAPSyD | Yes | WPHON |
cofa1242 | 0 | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
fuln1247 | 0 | Simple | None | 0 | NA | 0 | No | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | Yes | WALS |
gras1249 | 0 | Simple | None | 0 | NA | 1 | No | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | Yes | WALS |
hopi1249 | 0 | Simple | None | 0 | NA | 1 | No | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | Yes | WALS |
mand1446 | NA | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
meri1244 | NA | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
naas1242 | 0 | NA | None | 0 | No | 1 | No | LAPSyD + Dediu & Ladd, Dediu & Ladd winns except when LAPSyD says Moderately complex or Complex | Yes | WPHON |
sapu1248 | NA | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
sout2982 | 0 | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
wano1243 | NA | NA | None | 0 | NA | 1 | No | LAPSyD | Yes | WPHON |
bora1263 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
brib1243 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
buru1296 | 2 | None | NA | NA | Yes | 0 | Yes | WALS + Dediu & Ladd, Dediu & Ladd winns except when WALS says Complex | No | WALS |
chim1309 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
darf1239 | 0 | None | Simple | 1 | NA | 2 | Yes | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | No | WALS |
lepc1244 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
lith1251 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
mund1330 | 1 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
scot1245 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
sene1264 | 0 | None | Simple | 1 | NA | 1 | Yes | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | No | WALS |
shek1245 | NA | NA | Complex | 8 | NA | 0 | Yes | LAPSyD | No | WPHON |
wich1260 | 0 | None | Simple | 1 | NA | 1 | Yes | LAPSyD + WALS, LAPSyD winns except when WALS says Complex | No | WALS |
yuru1263 | 0 | NA | Simple | 1 | NA | 0 | Yes | LAPSyD | No | WPHON |
None | Simple | Complex | |
---|---|---|---|
None | 2523 | 14 | 1 |
Simple | 11 | 922 | 3 |
Complex | 1 | 21 | 289 |
Test statistic | df | P value |
---|---|---|
7027 | 4 | 0 * * * |
Test statistic | df | P value |
---|---|---|
7027 | NA | 9.999e-05 * * * |
The disagreements are:
glottocode | PHOIBLE | WALS | LAPSyD | LAPSyD (#) | DL2007 | WPHON | agreement (orig) | decision (orig) | agreement (alt) | decision (alt) |
---|---|---|---|---|---|---|---|---|---|---|
amah1246 | 0 | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
beja1238 | 0 | Simple | None | 0 | NA | 1 | None | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
broo1239 | NA | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
chua1250 | 0 | NA | None | 0 | NA | 3 | None | LAPSyD | Simple | WPHON |
cofa1242 | 0 | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
fuln1247 | 0 | Simple | None | 0 | NA | 0 | None | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
gras1249 | 0 | Simple | None | 0 | NA | 1 | None | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
hopi1249 | 0 | Simple | None | 0 | NA | 1 | None | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
mand1446 | NA | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
meri1244 | NA | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
naas1242 | 0 | NA | None | 0 | No | 1 | None | LAPSyD wins | Simple | WPHON |
sapu1248 | NA | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
sout2982 | 0 | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
wano1243 | NA | NA | None | 0 | NA | 1 | None | LAPSyD | Simple | WPHON |
ndut1239 | 0 | Complex | None | 0 | NA | 0 | None | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Complex | WALS |
bora1263 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
brib1243 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
chim1309 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
darf1239 | 0 | None | Simple | 1 | NA | 2 | Simple | LAPSyD + WALS, LAPSyD winns except for Moderately complex | None | WALS |
lepc1244 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
lith1251 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
mund1330 | 1 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
scot1245 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
sene1264 | 0 | None | Simple | 1 | NA | 1 | Simple | LAPSyD + WALS, LAPSyD winns except for Moderately complex | None | WALS |
wich1260 | 0 | None | Simple | 1 | NA | 1 | Simple | LAPSyD + WALS, LAPSyD winns except for Moderately complex | None | WALS |
yuru1263 | 0 | NA | Simple | 1 | NA | 0 | Simple | LAPSyD | None | WPHON |
east2652 | 1 | Complex | Simple | 1 | NA | 1 | Simple | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Complex | WALS |
nort2740 | 0 | NA | Simple | 1 | NA | 4 | Simple | LAPSyD | Complex | WPHON |
vani1248 | 0 | Complex | Simple | 1 | NA | 2 | Simple | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Complex | WALS |
shek1245 | NA | NA | Complex | 8 | NA | 0 | Complex | LAPSyD | None | WPHON |
abun1252 | NA | NA | Moderately complex | 2 | NA | 1 | Complex | LAPSyD | Simple | WPHON |
bamu1253 | 0 | NA | NA | NA | Yes | 3 | Complex | From n_tones | Simple | WPHON |
bass1258 | NA | NA | Moderately complex | 3 | NA | 2 | Complex | LAPSyD | Simple | WPHON |
cacu1241 | 0 | Simple | Complex | 3 | NA | 3 | Complex | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
cent2144 | 2 | NA | Complex | 3 | NA | 2 | Complex | LAPSyD | Simple | WPHON |
diga1241 | NA | NA | Complex | 3 | NA | 3 | Complex | LAPSyD | Simple | WPHON |
gaam1241 | 2 | Simple | Complex | 3 | NA | 1 | Complex | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
hlai1239 | NA | NA | Moderately complex | 2 | NA | 3 | Complex | LAPSyD | Simple | WPHON |
jeme1245 | 0 | NA | Moderately complex | 2 | NA | 3 | Complex | LAPSyD | Simple | WPHON |
jica1244 | 2 | NA | Moderately complex | 2 | NA | 1 | Complex | LAPSyD | Simple | WPHON |
kala1373 | NA | Simple | Complex | 4 | NA | 1 | Complex | LAPSyD + WALS, LAPSyD winns except for Moderately complex | Simple | WALS |
kris1246 | 3 | NA | Complex | 3 | NA | 3 | Complex | LAPSyD | Simple | WPHON |
lele1276 | 2 | NA | Moderately complex | 2 | NA | 2 | Complex | LAPSyD | Simple | WPHON |
madi1260 | 3 | NA | Moderately complex | 2 | Yes | 2 | Complex | LAPSyD wins | Simple | WPHON |
nort2732 | NA | NA | NA | NA | Yes | 3 | Complex | From n_tones | Simple | WPHON |
nucl1620 | 0 | NA | Moderately complex | 2 | NA | 2 | Complex | LAPSyD | Simple | WPHON |
puin1248 | 0 | NA | Complex | 3 | NA | 1 | Complex | LAPSyD | Simple | WPHON |
sand1273 | 4 | Simple | Complex | 3 | Yes | 2 | Complex | LAPSyD wins | Simple | WALS |
xhos1239 | 1 | NA | NA | NA | Yes | 2 | Complex | From n_tones | Simple | WPHON |
yaka1272 | NA | NA | NA | NA | Yes | 2 | Complex | From n_tones | Simple | WPHON |
yuhu1238 | 0 | NA | Complex | 3 | NA | 1 | Complex | LAPSyD | Simple | WPHON |
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
164.8 | 3783 | 0 * * * | two.sided | 0.9369 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
193189299 | 0 * * * | two.sided | 0.9786 |
Look at the serious disagreements (i.e., more than 1):
glottocode | PHOIBLE | WALS | LAPSyD | LAPSyD (#) | DL2007 | WPHON | agreement (orig) | agreement (alt) | difference (abs) |
---|---|---|---|---|---|---|---|---|---|
achu1247 | 0 | Simple | Simple | 1 | NA | 3 | 1 | 3 | 2 |
anga1290 | 0 | Simple | Moderately complex | 2 | NA | 0 | 2 | 0 | 2 |
awng1244 | 6 | Simple | Simple | 1 | NA | 3 | 1 | 3 | 2 |
cent2050 | 3 | Simple | Simple | 1 | NA | 3 | 1 | 3 | 2 |
efik1245 | 0 | Simple | Moderately complex | 2 | NA | 4 | 2 | 4 | 2 |
gaam1241 | 2 | Simple | Complex | 3 | NA | 1 | 3 | 1 | 2 |
gads1258 | 3 | Complex | Complex | 3 | NA | 1 | 3 | 1 | 2 |
hmon1333 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
iumi1238 | 3 | Complex | NA | NA | NA | 7 | 5 | 7 | 2 |
kera1255 | 0 | Complex | Moderately complex | 2 | NA | 0 | 2 | 0 | 2 |
komc1235 | 6 | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
koro1298 | NA | None | None | 0 | NA | 2 | 0 | 2 | 2 |
koyr1240 | 5 | None | None | 0 | NA | 2 | 0 | 2 | 2 |
kuta1241 | NA | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
lamn1239 | 6 | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
larg1235 | NA | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
mind1253 | 0 | Complex | NA | NA | NA | 7 | 5 | 7 | 2 |
mruu1242 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
murl1244 | 0 | Simple | Simple | 2 | NA | 4 | 2 | 4 | 2 |
nama1264 | 0 | Complex | Complex | 5 | Yes | 3 | 5 | 3 | 2 |
ncan1245 | 5 | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
ngba1285 | 0 | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
nort2747 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
nort2819 | 2 | Complex | NA | NA | NA | 6 | 4 | 6 | 2 |
nucl1649 | 0 | None | None | 0 | NA | 2 | 0 | 2 | 2 |
nucl1770 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
nung1283 | 0 | Complex | Complex | 5 | NA | 3 | 5 | 3 | 2 |
pira1253 | 0 | Simple | Simple | 1 | NA | 3 | 1 | 3 | 2 |
puin1248 | 0 | NA | Complex | 3 | NA | 1 | 3 | 1 | 2 |
puxi1243 | NA | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
pwon1235 | 4 | Complex | Complex | 3 | NA | 5 | 3 | 5 | 2 |
smal1236 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
sout2741 | NA | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
sout2754 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
sout2844 | 0 | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
tain1252 | 0 | NA | NA | NA | Yes | 6 | 4 | 6 | 2 |
thak1245 | 0 | NA | Simple | 1 | NA | 3 | 1 | 3 | 2 |
timn1235 | 3 | Simple | Simple | 1 | NA | 3 | 1 | 3 | 2 |
veng1238 | 8 | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
yako1252 | NA | NA | NA | NA | NA | 6 | 4 | 6 | 2 |
youn1235 | NA | NA | NA | NA | NA | 7 | 5 | 7 | 2 |
yuhu1238 | 0 | NA | Complex | 3 | NA | 1 | 3 | 1 | 2 |
bero1242 | 0 | Complex | Complex | 6 | NA | 3 | 6 | 3 | 3 |
chua1250 | 0 | NA | None | 0 | NA | 3 | 0 | 3 | 3 |
kala1373 | NA | Simple | Complex | 4 | NA | 1 | 4 | 1 | 3 |
lada1244 | 0 | None | None | 0 | NA | 3 | 0 | 3 | 3 |
mmen1238 | 4 | NA | NA | NA | NA | 8 | 5 | 8 | 3 |
nige1255 | 1 | Complex | NA | NA | NA | 8 | 5 | 8 | 3 |
nort2740 | 0 | NA | Simple | 1 | NA | 4 | 1 | 4 | 3 |
ticu1245 | 8 | Complex | Complex | 4 | NA | 7 | 4 | 7 | 3 |
aghe1239 | 1 | Simple | Simple | 1 | NA | 5 | 1 | 5 | 4 |
cent1394 | NA | NA | Complex | 3 | NA | 7 | 3 | 7 | 4 |
ejag1239 | 4 | Complex | Complex | 5 | NA | 1 | 5 | 1 | 4 |
monz1249 | NA | NA | NA | NA | NA | 9 | 5 | 9 | 4 |
gban1258 | NA | NA | NA | NA | NA | 10 | 5 | 10 | 5 |
vute1244 | 4 | NA | NA | NA | NA | 10 | 5 | 10 | 5 |
niel1243 | 2 | Complex | NA | NA | NA | 11 | 5 | 11 | 6 |
mali1285 | NA | NA | Complex | 10 | NA | NA | 10 | 2 | 8 |
shek1245 | NA | NA | Complex | 8 | NA | 0 | 8 | 0 | 8 |
So, the “original” and the “alternative” codings agree rather well…
Here, I re-do that stats using the “alternative” coding.
There are 181 observations, distributed among 119 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 48, with a mean 5.2 and median 2 languages per family) and 4 macroareas.
There are 161:126:119 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 9 | 101 | 4 | 6 | 120 |
Yes | 27 | 25 | 6 | 3 | 61 |
Sum | 36 | 126 | 10 | 9 | 181 |
The agreement with the original tone1 coding is extremely high:
No | Yes | |
---|---|---|
No | 119 | 1 |
Yes | 1 | 60 |
Test statistic | df | P value |
---|---|---|
167.8 | 1 | 2.212e-38 * * * |
Test statistic | df | P value |
---|---|---|
172.2 | NA | 9.999e-05 * * * |
The disagreements are:
glottocode | Pop_ID | metapopulation | family | macroarea | original | alternative |
---|---|---|---|---|---|---|
naas1242 | SA002261L | Melanesian_Nasioi | South Bougainville | Papunesia | No | Yes |
buru1296 | SA001482P | Burusho | Burushaski | Eurasia | Yes | No |
so I expect the results of the analysis to be virtually identical…
The resulting dataset has 180 observations, distributed among 118 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.1 and median 2 languages per family) and 4 macroareas.
There are 156:121:118 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 31 | 106 | 9 | 9 | 155 |
Yes | 6 | 17 | 1 | 1 | 25 |
Sum | 37 | 123 | 10 | 10 | 180 |
The agreement with the original tone2 coding is extremely high:
No | Yes | |
---|---|---|
No | 151 | 0 |
Yes | 4 | 25 |
Test statistic | df | P value |
---|---|---|
144 | 1 | 3.472e-33 * * * |
Test statistic | df | P value |
---|---|---|
151.2 | NA | 9.999e-05 * * * |
The disagreements are:
glottocode | Pop_ID | metapopulation | family | macroarea | original | alternative |
---|---|---|---|---|---|---|
bamu1253 | MB2005_Bamoun | Bamoun | Atlantic-Congo | Africa | Yes | No |
sand1273 | SA004366T | Sandawe | Sandawe | Africa | Yes | No |
nort2732 | SA001484R | Tujia | Sino-Tibetan | Eurasia | Yes | No |
yaka1272 | SA002256P | Biaka | Atlantic-Congo | Africa | Yes | No |
so I expect the results of the analysis to be very similar…
The resulting dataset has 183 observations, distributed among 120 unique Glottolg codes in 35 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.2 and median 2 languages per family) and 4 macroareas.
There are 156:121:120 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 9 | 97 | 4 | 6 | 116 |
1 | 9 | 7 | 5 | 2 | 23 |
2 | 15 | 1 | 0 | 1 | 17 |
3 | 4 | 5 | 0 | 1 | 10 |
4 | 1 | 3 | 0 | 0 | 4 |
5 | 0 | 8 | 0 | 0 | 8 |
6 | 0 | 3 | 0 | 0 | 3 |
7 | 0 | 1 | 1 | 0 | 2 |
Sum | 38 | 125 | 10 | 10 | 183 |
The original and alternative tone counts are very similar:
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
38.96 | 181 | 6.161e-90 * * * | two.sided | 0.9452 |
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
17364 | 3.499e-135 * * * | two.sided | 0.983 |
Look at the serious disagreements (i.e., more than 1):
glottocode | Pop_ID | metapopulation | family | macroarea | original | alternative | difference |
---|---|---|---|---|---|---|---|
awng1244 | SA004368V | Jews_Ethiopian | Afro-Asiatic | Africa | 1 | 3 | 2 |
nama1264 | SA001469U | San | Khoe-Kwadi | Africa | 5 | 3 | 2 |
tain1252 | SA001493R | Dai | Tai-Kadai | Eurasia | 4 | 6 | 2 |
tain1252 | SA004238R | Dai | Tai-Kadai | Eurasia | 4 | 6 | 2 |
ticu1245 | SA004389Y | Ticuna | Ticuna-Yuri | America | 4 | 7 | 3 |
cent1394 | SA001486T | Miao | Hmong-Mien | Eurasia | 3 | 7 | 4 |
so I expect the results of the analysis to be very similar…
Here I exclude from the analysis all the African data points.
There are 145 observations, distributed among 89 unique Glottolg codes in 28 families (ranging from a minimum of 1 language per family to a maximum of 48, with a mean 5.2 and median 2 languages per family) and 3 macroareas.
There are 134:102:89 unique samples:(meta)populations:languages retained.
America | Eurasia | Papunesia | Sum | |
---|---|---|---|---|
No | 4 | 100 | 7 | 111 |
Yes | 6 | 26 | 2 | 34 |
Sum | 10 | 126 | 9 | 145 |
glmer
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 4% | 4% | 6% | 5% | 79% |
unrestricted | none | alleles-together | 85% | 7% | 7% | 22% | 8% | 64% |
unrestricted | none | alleles-independent | 84% | 7% | 7% | 22% | 6% | 64% |
unrestricted | fixef | tone | 0% | 5% | 4% | 84% | 5% | 45% |
unrestricted | fixef | alleles-together | 93% | 9% | 9% | 66% | 7% | 46% |
unrestricted | fixef | alleles-independent | 94% | 6% | 6% | 70% | 6% | 46% |
macroareas | none | tone | 0% | 19% | 15% | 33% | 13% | 47% |
macroareas | none | alleles-together | 87% | 7% | 7% | 36% | 7% | 52% |
macroareas | none | alleles-independent | 87% | 8% | 7% | 36% | 9% | 51% |
macroareas | fixef | tone | 0% | 5% | 6% | 82% | 6% | 47% |
macroareas | fixef | alleles-together | 94% | 7% | 7% | 65% | 6% | 46% |
macroareas | fixef | alleles-independent | 93% | 7% | 7% | 69% | 6% | 48% |
families | none | tone | 88% | 6% | 7% | 49% | 3% | 53% |
families | none | alleles-together | 91% | 9% | 10% | 62% | 5% | 44% |
families | none | alleles-independent | 90% | 9% | 10% | 57% | 6% | 55% |
families | fixef | tone | 81% | 5% | 5% | 73% | 2% | 38% |
families | fixef | alleles-together | 93% | 6% | 7% | 78% | 3% | 37% |
families | fixef | alleles-independent | 92% | 5% | 6% | 75% | 5% | 47% |
brms
The resulting dataset has 143 observations, distributed among 87 unique Glottolg codes in 28 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.1 and median 2 languages per family) and 3 macroareas.
There are 131:99:87 unique samples:(meta)populations:languages retained.
America | Eurasia | Papunesia | Sum | |
---|---|---|---|---|
No | 9 | 105 | 9 | 123 |
Yes | 1 | 18 | 1 | 20 |
Sum | 10 | 123 | 10 | 143 |
glmer
I performed 1000 independent replications of each of these parameter combinations, and below are the distributions of the permuted values versus the original ones (i.e., those obtained on the original, non-permuted data).
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 6% | 6% | 87% | 6% | 72% |
unrestricted | none | alleles-together | 94% | 8% | 8% | 65% | 8% | 60% |
unrestricted | none | alleles-independent | 95% | 8% | 8% | 65% | 7% | 58% |
unrestricted | fixef | tone | 0% | 6% | 5% | 96% | 6% | 56% |
unrestricted | fixef | alleles-together | 94% | 8% | 8% | 74% | 8% | 54% |
unrestricted | fixef | alleles-independent | 94% | 10% | 8% | 74% | 9% | 54% |
macroareas | none | tone | 0% | 3% | 4% | 87% | 5% | 74% |
macroareas | none | alleles-together | 95% | 9% | 8% | 70% | 8% | 54% |
macroareas | none | alleles-independent | 95% | 10% | 10% | 68% | 9% | 58% |
macroareas | fixef | tone | 0% | 6% | 6% | 95% | 6% | 53% |
macroareas | fixef | alleles-together | 93% | 12% | 10% | 74% | 11% | 52% |
macroareas | fixef | alleles-independent | 92% | 12% | 10% | 74% | 10% | 51% |
families | none | tone | 63% | 8% | 7% | 77% | 4% | 45% |
families | none | alleles-together | 90% | 6% | 5% | 75% | 3% | 45% |
families | none | alleles-independent | 89% | 7% | 7% | 74% | 6% | 54% |
families | fixef | tone | 63% | 8% | 8% | 82% | 4% | 34% |
families | fixef | alleles-together | 86% | 6% | 5% | 76% | 4% | 40% |
families | fixef | alleles-independent | 89% | 7% | 9% | 78% | 7% | 48% |
brms
The resulting dataset has 146 observations, distributed among 89 unique Glottolg codes in 28 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 5.2 and median 2 languages per family) and 3 macroareas.
There are 131:99:89 unique samples:(meta)populations:languages retained.
America | Eurasia | Papunesia | Sum | |
---|---|---|---|---|
0 | 4 | 98 | 7 | 109 |
1 | 5 | 6 | 1 | 12 |
2 | 0 | 3 | 2 | 5 |
3 | 0 | 5 | 0 | 5 |
4 | 1 | 8 | 0 | 9 |
5 | 0 | 4 | 0 | 4 |
6 | 0 | 2 | 0 | 2 |
Sum | 10 | 126 | 10 | 146 |
glmer
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 35% | 24% | 40% | 25% | 35% |
unrestricted | none | alleles-together | 89% | 3% | 3% | 43% | 4% | 24% |
unrestricted | none | alleles-independent | 88% | 3% | 4% | 44% | 4% | 22% |
unrestricted | fixef | tone | 0% | 36% | 25% | 56% | 24% | 28% |
unrestricted | fixef | alleles-together | 78% | 4% | 4% | 68% | 3% | 15% |
unrestricted | fixef | alleles-independent | 74% | 2% | 3% | 71% | 3% | 12% |
macroareas | none | tone | 0% | 33% | 20% | 50% | 24% | 28% |
macroareas | none | alleles-together | 88% | 3% | 3% | 54% | 4% | 17% |
macroareas | none | alleles-independent | 88% | 3% | 4% | 55% | 4% | 17% |
macroareas | fixef | tone | 0% | 35% | 24% | 54% | 25% | 28% |
macroareas | fixef | alleles-together | 77% | 3% | 4% | 69% | 3% | 15% |
macroareas | fixef | alleles-independent | 76% | 3% | 4% | 68% | 4% | 18% |
families | none | tone | 69% | 5% | 11% | 79% | 1% | 33% |
families | none | alleles-together | 90% | 6% | 10% | 80% | 0% | 23% |
families | none | alleles-independent | 88% | 6% | 13% | 80% | 1% | 23% |
families | fixef | tone | 67% | 5% | 10% | 84% | 1% | 23% |
families | fixef | alleles-together | 77% | 7% | 10% | 86% | 1% | 22% |
families | fixef | alleles-independent | 77% | 6% | 11% | 86% | 1% | 19% |
brms
Here I use only the African data points.
There are 36 observations, distributed among 30 unique Glottolg codes in 8 families (ranging from a minimum of 1 language per family to a maximum of 16, with a mean 4.5 and median 1.5 languages per family) and 1 macroareas.
There are 27:24:30 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 9 | 0 | 0 | 0 | 9 |
Yes | 27 | 0 | 0 | 0 | 27 |
Sum | 36 | 0 | 0 | 0 | 36 |
glmer
trying to fit a random effects structure with language family as the random effects results in convergence problems (boundary (singular) fit: see ?isSingular
) and the random effects seems to not matter at all (Can't compute random effect variances. Some variance components equal zero. Your model may suffer from singulariy. Solution: Respecify random structure!
), so that I reverted to a “flat” model without random effects (using glm()
instead of glmer()
). As expected, the anova()
comparisons produce the same p-values for this glm
“flat” approach as for the glmer
with family as random effects, but without the convergence issues…
1000 independent replications.
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 64% | 6% | 7% | 31% | 8% | 38% |
unrestricted | none | alleles-together | 61% | 7% | 8% | 28% | 7% | 40% |
unrestricted | none | alleles-independent | 61% | 8% | 8% | 22% | 6% | 32% |
families | none | tone | 55% | 5% | 3% | 10% | 5% | 60% |
families | none | alleles-together | 51% | 5% | 4% | 10% | 4% | 59% |
families | none | alleles-independent | 55% | 5% | 5% | 19% | 6% | 51% |
brms
The resulting dataset has 37 observations, distributed among 31 unique Glottolg codes in 8 families (ranging from a minimum of 1 language per family to a maximum of 18, with a mean 4.6 and median 2 languages per family) and 1 macroareas.
There are 25:22:31 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 28 | 0 | 0 | 0 | 28 |
Yes | 9 | 0 | 0 | 0 | 9 |
Sum | 37 | 0 | 0 | 0 | 37 |
glmer
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 86% | 7% | 8% | 34% | 5% | 61% |
unrestricted | none | alleles-together | 88% | 9% | 9% | 40% | 6% | 60% |
unrestricted | none | alleles-independent | 87% | 8% | 7% | 32% | 7% | 60% |
families | none | tone | 90% | 7% | 10% | 55% | 6% | 39% |
families | none | alleles-together | 92% | 7% | 10% | 54% | 5% | 43% |
families | none | alleles-independent | 87% | 9% | 11% | 43% | 6% | 50% |
brms
The resulting dataset has 38 observations, distributed among 32 unique Glottolg codes in 8 families (ranging from a minimum of 1 language per family to a maximum of 19, with a mean 4.8 and median 2 languages per family) and 1 macroareas.
There are 25:22:32 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 9 | 0 | 0 | 0 | 9 |
1 | 10 | 0 | 0 | 0 | 10 |
2 | 16 | 0 | 0 | 0 | 16 |
3 | 2 | 0 | 0 | 0 | 2 |
5 | 1 | 0 | 0 | 0 | 1 |
Sum | 38 | 0 | 0 | 0 | 38 |
glmer
trying to fit a random effects structure with language family as the random effects results in convergence problems (boundary (singular) fit: see ?isSingular
) and the random effects seems to not matter at all (Can't compute random effect variances. Some variance components equal zero. Your model may suffer from singulariy. Solution: Respecify random structure!
), so that I reverted to a “flat” model without random effects (using glm()
instead of glmer()
). As expected, the anova()
comparisons produce the same p-values for this glm
“flat” approach as for the glmer
with family as random effects, but without the convergence issues…
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 87% | 2% | 3% | 41% | 3% | 40% |
unrestricted | none | alleles-together | 87% | 4% | 4% | 42% | 2% | 42% |
unrestricted | none | alleles-independent | 86% | 3% | 4% | 36% | 2% | 39% |
families | none | tone | 96% | 1% | 0% | 33% | 2% | 76% |
families | none | alleles-together | 80% | 0% | 0% | 36% | 2% | 74% |
families | none | alleles-independent | 86% | 1% | 0% | 52% | 3% | 74% |
brms
Here I use only the Eurasian data points.
There are 126 observations, distributed among 74 unique Glottolg codes in 19 families (ranging from a minimum of 1 language per family to a maximum of 48, with a mean 6.6 and median 3 languages per family) and 1 macroareas.
There are 118:89:74 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 0 | 100 | 0 | 0 | 100 |
Yes | 0 | 26 | 0 | 0 | 26 |
Sum | 0 | 126 | 0 | 0 | 126 |
glmer
1000 independent replications.
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 4% | 5% | 74% | 6% | 28% |
unrestricted | none | alleles-together | 95% | 4% | 6% | 60% | 4% | 34% |
unrestricted | none | alleles-independent | 96% | 5% | 6% | 62% | 4% | 35% |
families | none | tone | 74% | 4% | 5% | 66% | 3% | 30% |
families | none | alleles-together | 96% | 5% | 4% | 66% | 4% | 31% |
families | none | alleles-independent | 97% | 6% | 5% | 68% | 6% | 37% |
brms
The resulting dataset has 123 observations, distributed among 71 unique Glottolg codes in 19 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 6.5 and median 3 languages per family) and 1 macroareas.
There are 115:86:71 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
No | 0 | 105 | 0 | 0 | 105 |
Yes | 0 | 18 | 0 | 0 | 18 |
Sum | 0 | 123 | 0 | 0 | 123 |
glmer
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 5% | 6% | 44% | 5% | 0% |
unrestricted | none | alleles-together | 90% | 12% | 12% | 48% | 10% | 27% |
unrestricted | none | alleles-independent | 88% | 12% | 12% | 49% | 10% | 28% |
families | none | tone | 49% | 9% | 10% | 62% | 4% | 19% |
families | none | alleles-together | 80% | 8% | 10% | 60% | 4% | 16% |
families | none | alleles-independent | 84% | 10% | 12% | 68% | 6% | 19% |
brms
The resulting dataset has 126 observations, distributed among 73 unique Glottolg codes in 19 families (ranging from a minimum of 1 language per family to a maximum of 47, with a mean 6.6 and median 3 languages per family) and 1 macroareas.
There are 115:86:73 unique samples:(meta)populations:languages retained.
Africa | Eurasia | America | Papunesia | Sum | |
---|---|---|---|---|---|
0 | 0 | 98 | 0 | 0 | 98 |
1 | 0 | 6 | 0 | 0 | 6 |
2 | 0 | 3 | 0 | 0 | 3 |
3 | 0 | 5 | 0 | 0 | 5 |
4 | 0 | 8 | 0 | 0 | 8 |
5 | 0 | 4 | 0 | 0 | 4 |
6 | 0 | 2 | 0 | 0 | 2 |
Sum | 0 | 126 | 0 | 0 | 126 |
glmer
We performed 1000 independent replications:
Permute within | Macroarea | Permute | AIC | Signif. | pASPM-D | βASPM-D | pMCPH1-D | βMCPH1-D |
---|---|---|---|---|---|---|---|---|
unrestricted | none | tone | 0% | 37% | 25% | 47% | 26% | 16% |
unrestricted | none | alleles-together | 46% | 3% | 3% | 50% | 3% | 4% |
unrestricted | none | alleles-independent | 48% | 2% | 3% | 52% | 3% | 4% |
families | none | tone | 47% | 5% | 11% | 74% | 1% | 10% |
families | none | alleles-together | 49% | 5% | 10% | 74% | 1% | 9% |
families | none | alleles-independent | 47% | 4% | 8% | 70% | 1% | 7% |
brms
Too little data…
Too little data…
CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
RAM (memory): 67.5 GB
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: benchmarkme(v.1.0.7), magick(v.2.7.2), pdftools(v.3.0.1), rsvg(v.2.1.2), DiagrammeRsvg(v.0.1), simr(v.1.0.5), phytools(v.0.7-70), ape(v.5.5), tidybayes(v.2.3.1), bayestestR(v.0.9.0), brms(v.2.15.0), Rcpp(v.1.0.6), dagitty(v.0.3-1), e1071(v.1.7-6), cowplot(v.1.1.1), maps(v.3.3.0), lavaanPlot(v.0.5.1), lavaan(v.0.6-8), randomForest(v.4.6-14), caret(v.6.0-86), lattice(v.0.20-44), partykit(v.1.2-13), libcoin(v.1.0-8), rsample(v.0.1.0), mediation(v.4.5.0), sandwich(v.3.0-0), mvtnorm(v.1.1-1), MASS(v.7.3-54), DiagrammeR(v.1.0.6.1), pbapply(v.1.4-3), sjPlot(v.2.8.7), ggnewscale(v.0.4.5), glmmTMB(v.1.0.2.1), data.table(v.1.14.0), reshape2(v.1.4.4), dplyr(v.1.0.6), lmerTest(v.3.1-3), lme4(v.1.1-26), Matrix(v.1.3-3), performance(v.0.7.1), png(v.0.1-7), jpeg(v.0.1-8.1), tiff(v.0.1-8), gridExtra(v.2.3), ggplot2(v.3.3.3), stringr(v.1.4.0), pander(v.0.6.3), knitr(v.1.33) and RhpcBLASctl(v.0.20-137)
loaded via a namespace (and not attached): estimability(v.1.3), ModelMetrics(v.1.2.2.2), coda(v.0.19-4), tidyr(v.1.1.3), clusterGeneration(v.1.3.7), dygraphs(v.1.1.1.6), rpart(v.4.1-15), inline(v.0.3.17), doParallel(v.1.0.16), generics(v.0.1.0), callr(v.3.7.0), combinat(v.0.0-8), proxy(v.0.4-25), future(v.1.21.0), RLRsim(v.3.1-6), lubridate(v.1.7.10), httpuv(v.1.6.1), StanHeaders(v.2.21.0-7), assertthat(v.0.2.1), gower(v.0.2.2), xfun(v.0.22), hms(v.1.0.0), ggdist(v.2.4.0), jquerylib(v.0.1.4), bayesplot(v.1.8.0), evaluate(v.0.14), promises(v.1.2.0.1), fansi(v.0.4.2), readxl(v.1.3.1), igraph(v.1.2.6), DBI(v.1.1.1), tmvnsim(v.1.0-2), htmlwidgets(v.1.5.3), stats4(v.4.0.5), benchmarkmeData(v.1.0.4), purrr(v.0.3.4), ellipsis(v.0.3.2), crosstalk(v.1.1.1), backports(v.1.2.1), binom(v.1.1-1), V8(v.3.4.2), pbivnorm(v.0.6.0), insight(v.0.14.0), markdown(v.1.1), RcppParallel(v.5.1.4), vctrs(v.0.3.8), sjlabelled(v.1.1.8), abind(v.1.4-5), withr(v.2.4.2), checkmate(v.2.0.0), emmeans(v.1.6.0), xts(v.0.12.1), prettyunits(v.1.1.1), mnormt(v.2.0.2), cluster(v.2.1.2), crayon(v.1.4.1), recipes(v.0.1.16), pkgconfig(v.2.0.3), nlme(v.3.1-152), nnet(v.7.3-16), rlang(v.0.4.11), globals(v.0.14.0), lifecycle(v.1.0.0), miniUI(v.0.1.1.1), colourpicker(v.1.1.0), modelr(v.0.1.8), cellranger(v.1.1.0), distributional(v.0.2.2), matrixStats(v.0.58.0), phangorn(v.2.7.0), loo(v.2.4.1), carData(v.3.0-4), boot(v.1.3-28), zoo(v.1.8-9), base64enc(v.0.1-3), gamm4(v.0.2-6), ggridges(v.0.5.3), processx(v.3.5.2), parameters(v.0.13.0), visNetwork(v.2.0.9), pROC(v.1.17.0.1), parallelly(v.1.25.0), qpdf(v.1.1), shinystan(v.2.5.0), ggeffects(v.1.1.0), scales(v.1.1.1), lpSolve(v.5.6.15), magrittr(v.2.0.1), plyr(v.1.8.6), threejs(v.0.3.3), compiler(v.4.0.5), rstantools(v.2.1.1), RColorBrewer(v.1.1-2), plotrix(v.3.8-1), cli(v.2.5.0), listenv(v.0.8.0), ps(v.1.6.0), TMB(v.1.7.20), Brobdingnag(v.1.2-6), htmlTable(v.2.1.0), Formula(v.1.2-4), mgcv(v.1.8-35), tidyselect(v.1.1.1), stringi(v.1.6.1), forcats(v.0.5.1), projpred(v.2.0.2), yaml(v.2.2.1), askpass(v.1.1), svUnit(v.1.0.6), latticeExtra(v.0.6-29), bridgesampling(v.1.1-2), sass(v.0.4.0), fastmatch(v.1.1-0), tools(v.4.0.5), rio(v.0.5.26), parallel(v.4.0.5), rstudioapi(v.0.13), foreach(v.1.5.1), foreign(v.0.8-81), inum(v.1.0-4), prodlim(v.2019.11.13), scatterplot3d(v.0.3-41), farver(v.2.1.0), digest(v.0.6.27), shiny(v.1.6.0), lava(v.1.6.9), quadprog(v.1.5-8), car(v.3.0-10), broom(v.0.7.6), later(v.1.2.0), httr(v.1.4.2), rsconnect(v.0.8.17), effectsize(v.0.4.4-1), sjstats(v.0.18.1), colorspace(v.2.0-1), splines(v.4.0.5), statmod(v.1.4.36), expm(v.0.999-6), shinythemes(v.1.2.0), xtable(v.1.8-4), jsonlite(v.1.7.2), nloptr(v.1.2.2.2), timeDate(v.3043.102), rstan(v.2.21.2), ipred(v.0.9-11), R6(v.2.5.0), Hmisc(v.4.5-0), pillar(v.1.6.0), htmltools(v.0.5.1.1), mime(v.0.10), glue(v.1.4.2), fastmap(v.1.1.0), minqa(v.1.2.4), DT(v.0.18), class(v.7.3-19), codetools(v.0.2-18), pkgbuild(v.1.2.0), furrr(v.0.2.2), utf8(v.1.2.1), bslib(v.0.2.5), tibble(v.3.1.1), pbkrtest(v.0.5.1), numDeriv(v.2016.8-1.1), arrayhelpers(v.1.1-0), curl(v.4.3.1), gtools(v.3.8.2), zip(v.2.1.1), openxlsx(v.4.2.3), shinyjs(v.2.0.0), survival(v.3.2-11), rmarkdown(v.2.8), munsell(v.0.5.0), iterators(v.1.0.13), haven(v.2.4.1), sjmisc(v.2.8.7) and gtable(v.0.3.0)
For mixed-effects models, this is Nakagawa’s R2 estimate, where the marginal estimate considers only the fixed effects, while the conditional also considers the random effects as well. Here, we show only the marginal ICC, as we are interested in the fixed effects. See ?performance::r2
for more details.↩︎
ICC represents the proportion of the variance explained by the grouping due to the random effects, and varies between 0% (the grouping contains no info) to 100% (basically all individual observations in a given group are identical); the adjusted ICC only considers the random effect, while the conditional ICC also considers the fixed effects as well; they are equal when there are no fixed effects (i.e., for the null models). Here, we show only the adjusted ICC, as we are interested in the random effects. See ?performance::icc
for more details.↩︎
Here I use model comparisons to estimate the p-value of adding (or removing) a predictor, v, by comparing the model without the predictor (m) with the model with the predictor (m_v), anova(m, m_v)
and report the p-value denoted as pv/m to make clear what predictor is added to which model.↩︎
The ROPE (region of practical equivalence) is a small interval around 0.0 (usually, [-0.1, 0.1] but can vary depending on the particular model). Can be used either to estimate the percent of the HDI that falls within this interval, or the proportion of the whole posterior distribution that does so. It can be used in a manner similar to that of frequentist p-values to judge if 0.0 can be ruled out as a probable value of the parameter of interest.↩︎
This compares two brms
models, m1 and m2, using Bayes Factors (BF), LOO, WAIC and KFOLD. For the latter three, I show the difference between m1 and m2 (in this order), and the SE of this difference; if the difference is negative (<0) then m1 is worse, while if it is positive (>0) m1 is better, but the “significance” of this difference can be interpreted only in the context of the SE. These results are summarized using the [B? L? W?(x%:y%) K?] notation, where the symbol * can be “=” when the models are pretty much equivalent, “<” if m1 is worse than m2 (and “<<” if this difference is really big), or “>” if m1 is better than m2 (and “>>” if this difference is really big); for WAIC (“W”) I also give the relative weights of the two models as (x%:y%).↩︎
Please note that for path analyses/SEM models, we want the goodness-of-fit χ2 test to be non-significant, meaning that there is no reason to reject the hypothesis that the model fits the data. On the other hand, there is a plethora of goodness of fit indices (we show a few) where the idea is that the closer they are to 1.00 the better the model fits to the data.↩︎