3|Baxter2000c|The relation 'is phonologically derivable from' has the mathematical properties of a partial ordering. If we use the sumbol '≤' to represent 'is phonologically derivable from', then it has the properties of transitivity, reflexifvity, and antisymmetry. Letting *A*, *B*, and *C* represent phonologies, we have [...].|103|Chinese, etymological relations, transitivity, reflexivity, antisymmetry 7|Bopp1816|Der Zweck dieses Versuchs ist, zu zeigen, wie in der Conjugation der altindischen Zeitwörter die Verhältnisbestimmungen durch entsprechende Modifikationen der Wurzel ausgedrückt werden, wie aber zuweilen das {\em verbum abstractum} mit der Stammsylbe zu einem Worte verschmolzen wird, und Stammsylbe und Hilfszeitwort sich in die grammatischen Funktionen des {\em verbum} theilen; zu zei- [pb] gen, wie dasselbe in der griechischen Sprache der Fall sey, wie im Lateinischen das System der Verbindung der Wurzel mit einem Hilfszeitworte herrschend geworden, und wie nur dadurch die scheinbare Verschiedenheit der lateinischen Conjugation von der des Sanskrits und des Griechischen entstanden sey; zu beweisen endlich, daß an allen den Sprachen, die von dem Sanskrit, oder mit ihm von einer gemeinschaftlichen Mutter abstammen, seine Verhältnisbestimmung durch eine Flexion ausgedrückt werde, die ihnen nicht mit jener Ursprache gemein sey, und scheinbare Eigenheiten nur daraus entstehen, daß entweder die Stammsylbe mit Hilfszeitwörtern zu einem Worte verschmolzen werden, oder daß aus Partizipien, die schon im Sanskrit gebräuchliche {\em tempora derivativa} abgeleitet werden, nach Art, wie man im Sanskrit, Griechischen und vielen andern Sprachen aus Substantiven {\em verba derivativa} bilden kann.|8f|- 8|Bopp1816|Verwunderungswürdig ist es, daß [pb] das Bengalische, welches doch unter den neu-indischen Mundarten am wenigsten fremde Einmischungen erlitten, in der Grammatik bey weitem nicht so sehr mit dem Sanskrit übereinstimmt, als die oben erwähnten Sprachen, während es doch eine weit größere Anzahl altindischer Wörter aufzuweisen hat. Neue organische Modifikationen sind aber nicht an die Stelle der altindischen Flexionen getreten, sondern nach dem deren Sinn und Geist nach und nach erstorben, fiel auch ihr Gebrauch weg, und es ersetzten {\em tempora participalia} [...] die Zeiten, die im Sanskrit durch innere Veränderung der Stammsylbe gebildet wurden.|9f|language mixture 9|Gabelentz1891|Es handelt sich hier um das, was man im Detectivwesen entfernte Indicien nennt. Der Richtungen, in denen nachgeforscht werden könnte, sind unzählig viele; aber es fragt sich: welche Richtungen versprechen am Ersten zum Ziele zu führen, welche sind also in erster Reihe zu verfolgen? Denn wir wollen nicht mit blindem Umhertappen und Ausprobiren Zeit und Kräfte vergeuden. Es muss eine Kunst des Suchens geben, die sich lehren und lernen lässt: es muss für diese Kunst apriorisch geltenden Grundsätze geben, die sich aus der Natur der Sache entwickeln lassen.|154|abduction 10|Gabelentz1891|Nun fragt man sich: Können die Übereinstimmungen nicht auch auf Entlehnung beruhen? Denn dass sie nicht zufällig sind, das haben eben jene Lautvergleichungen ergeben. Die Frage betrifft sowohl die Menge als die Art des Verwandten; es kann sehr Vieles entlehnt sein, wie im Englischen aus dem Altfranzösischen, — und doch gerade das Wesentlichste nicht. Hier muss die Vergleichung des Sprachbaues, der Wortformen und Formwörter entscheiden; und somit reiht sich an die lexikalische und phonetische Vergleichung die *grammatische* an. Das ganze hier geschilderte Verfahren ist scheinbar rein mechanisch und ist es oft auch wirklich. Allein in vielen Fällen wird neben einem guten Gedächtnisse, das die Arbeit verkürzt, auch ein gewisser Tact erfordert, der den Forscher vor thörichten Combinationen behütet, also ein Verständnis für das, was in der Sprachgeschichte möglich und wahrscheinlich ist.|179|basic vocabulary 11|Grimm1822|Der dialect, den uns die geschichte als den ältesten, unverdorbensten weist, muß zuletzt auch für die allgemeine darstellung aller verzweigungen des stamms die tiefste regel darbieten und dann bisher entdeckte gesetzte der späteren mundarten reformieren, ohne sie sämmtlich aufzuheben.|VI|family tree, branching, history of science 12|Grimm1822|Die schriftsteller dieser zwischenzeit vergröbern stufenweise die frühere sprachregel und überlassen sich sorglos den einmischungen landschaftlicher gemeiner mundart; oft weiß man nicht, ob ihre besonderheit von der alten reinen sprache her übrig geblieben oder aus dem gebiete des volksdialects eingedrungen ist. Genügende darstellung solcher beosnderheiten würde weitläuftige anstalten und erörterungen verlangen.|XI|language mixture 13|Grimm1822|In der frühen zeit gelten viele dialecte gleichansehnlich nebeneinander, ihre grenzen laufen mit denen der einzelnen stämme; sobald herrschaft und bildung einem volke vorgewicht geben, fängt seine mundart an sich über benachbarte, abhängige auszubreiten, d.h. von deren edlem theile angenommen zu werden, während die einheimische mundart unter den volkshaufen flüchtet. Die stärkere mundart steigt, die schwächere sinkt und wird gemein, doch selbst die herrschende muß durch ihre wachsende ausdehnung unvermerkt eigenheiten der andern stämme an sich ziehen, folgich dem ungebildeten theile des stammes, von dem sie ausgieng, gleichfalls entrückt werden.|XII|dialect boundaries, dialect mixture 14|Grimm1822|Mundarten welche durch natürliche lage gehegt und von andern unangestozssen bleiben, werden ihre flexionen langsamer verändern; berührung mehrerer dialecte muß, auch wenn der siegende vollendetere formen besäße, weil er sie mit aufgenommenen wörtern der andern mundart auszugleichen hat, abstumpfung bieder mundarten beschleunigen.|XIV|dialect mixture 15|Harrison2003|Comparative historical linguists have been rather more careful in stipulating what it means for linguistic symbols to be similar in form. Observe first that similarity of form must be *complete* similarity. Put rather brutally, if the front halves of two forms are similar, but the back halves aren't, then the forms are not similar. In practice, we observe this condition by segmenting each form into its component (segmental or autosegmental) parts, and then mapping the segmented forms into a set of *correspondences* between a part or parts of one form and a part or parts (possibly nil) of the other. We need not go into the mechanics of that segmentation process here. The problem of similarity of sign forms then reduces to the problem of similarity of objects in a *correspondence* relation .|219|segmentation, sound correspondences, comparative method 16|Harrison2003|Similar symbols must be similar in both form and interpretation.|219|segmentation, similarity, comparative method 17|Harrison2003|Feature (attribute-value) theories of phonological representation (and of articulatory description that precedes them) make it possible for us to measure the similarity between two representations of phonological form, in terms of shared attribute-value pairs. Phonological feature theories do not, of course, tell us precisely how many attribute-value pairs must be shared by two forms for them to be deemed sufficiently similar to be cognate. Nor is it clear how one would, in practice, begin to construct a method that makes such a determination.|219|synchronic similarity, phenotypic similarity, comparative method 18|Hock1991|In order to understand how sound change operates, it is important to understand something of the nature of speech sounds. For sound change by definition is controlled by the phonetic characteristics of speech. Moreover, in many cases it does not just affect a single sound, but whole classes of phonetically similar sounds at the same time. An elementary familiarity with phonetics is necessary also in order to define such classes of similar sounds.|11|sound change 19|Hock1991|But note that while certain changes are explainable only acoustically, for the majority of changes, an articulatory account is at least as feasible as an acoustic one. It seems useful to be able to capture this difference by defining the majoritiy of segments articulatorily, and to reserve acoustic definitions for just thoes cases in whic hpurely articulatory terminology would be inappropriate or insufficient.|11|sound change, explanation of sound change 20|Hoenigswald1960b|Let there be two archives, one containing straight documents; the other, documents written in a substitution cipher. Some documents exist in both forms (A, A') and hence in both archives; others exist in straight form only (B), while still others exist in cipher only (C'). The order in which each archive keeps its papers is irrelevant. The cryptanalyst studies the recurrence of the letters, spaces, etc. and thus finds (1) the A' documents, (2) their A originals, and (3) the cipher linking the two.|13|sound correspondences, substitution cipher 22|Hoenigswald1963|Ever since then, the term `comparative' in technical linguistic use has referred, not to comparison at large, comparison for comparison's sake (i.e. typological comparison), but to a process whereby original features can be separated from recent ones and where the aim of classification is subordinated to the aim of reconstruction.|2|comparative method, comparison 23|Hoenigswald1963|Humboldt felt, at certain periods in his writing, that `inner form' (i.e., grammatical structure) is universal (although sometimes more, or simetimes less perfectly manifested in different languages), while the observable differences are mainly those in external shape.|5|- 24|Hoenigswald1963|When the biologist, Ernst Haeckel, gave him a copy of the Origin of Species which had just been translated, Schleicher reacted with an open letter on Darwinism and Linguistics (1864) and later with a lecture on the Significance of Language for the Natural History of Man (1864).|6|August Schleicher, Darwinism, Haeckel 25|Hoenigswald1963|There is an implication that the idea had long been familiar (as we know it had been); and that he was surprised, on reading Darwin, to find its biological counterpart (he seems to have known little about Darwin's predecessors); that, in short, the agreement suggested to him new interpretations of an old and well-known concept.|6|Darwin, August Schleicher, family tree 26|Hoenigswald1963|Sound change has certain striking properties; to put it very anachronistically, sound change is not only simpler and more amenable to formula than are other change processes; it also affords the neatest examples of innovation in linguistic change.|7|sound change, uniformitarianism 27|Hoenigswald1963|It remained for his successors (Karl Brugmann in particular) to say firmly that groupings within a language family (or nodes on the family tree) are established by shared innovations, and not by shared traits in general.|7|shared innovation, classification, Brugmann 28|Hoenigswald1963|The \`method' was of course text criticism with particular, often exclusive, stress on the moves and rules which serve to recover a lost manuscript source or archetype, down to every place in the text.|8|family tree, stemmatics, August Schleicher 29|Hoenigswald1963|But if we were asked what their most important part is, the answer would almost certainly be: the doctrine of the shared error.|8|shared innovation, stemmatics 30|Hoenigswald1963|The analogy is evident, and the connection seems palpable. Schleicher the linguist had hit upon the unique importance of the phonological correspondences for discriminating between retention and innovation.|8|- 31|Hoenigswald1963|One is half pedagogical in nature, but very important: it is his insistence that the Comparative Method is concerned not with similarities but with correspondences, even between quite dissimilar elements, from language to language. He was fond of pointing out that such a correspondence exists between Germanic /tw/ and Armenian /erk/, two sound sequences which could not be more unlike each other.|10|comparative method, sound correspondences, Antoine Meillet 32|Hoenigswald1990|Does language grow on trees? Yes, of course it does, if we so wish; but we had better be very careful.|18|language history, family tree 33|Holzer1996|Da ein Rekonstrukt weder als bloßes Bild der Ursprache noch als Aussage über ihre Lautungsrelationen den Anspruch erhebt, die urpsrachliche Phonetik oder Phonologie wiederzugeben, gibt es in so verstandenen Rekonstruktien nichts, was natürlich sein müßte oder mit realen Sprachen typologisch verglichen werden kónnte. [\ldots] Auch die Frage der typologischen Normalität und der phonetischen Natürlichkeit des Lautwandels, der zwischen der Ursprache und den verglichenen Teilsprachenkernen anzunehmen ist, stellt sich nur bei Vorliegen einer Rekonstruktion der absoluten Lautungen der Ursprache. |172|proto-form, linguistic reconstruction, reality 34|Holzer1996|Nur in Zusammenstellungen semantisch verbleichbarer Bedeutungsträger ist es auffällig, erklärungswürdig und nicht trivial, wenn sie es gestatten, häufig rekurrente Lautungszuordnungen aufzustellen. Verzichtet man auf eine semantische Vergleichbarkeit zusammenzustellender Bedeutungsträger, könnte man zwischen beliebigen Bedeutungsträgermengen fast beliebig viele Sets von Bedeutungsträgerzusammenstellungen mit rekurrenten Lautungszuordnungen zustande bringen. In folgenden Zusammenstellungen semnatisch nicht vergleichbarer russischer und deutscher Wörter: *nos* -- *Mahl*, *son* -- *lahm*, *nas* -- *Mehl*, *san* -- *Lehm*, *osa* -- *Ahle*, *ona* -- *ahme* zum Beispiel lassen sich leicht rekurrente Lautungszuordnungen vornehmen; und ebenso etwa in *nos* -- *lahm*, *son* -- *Mahl*, *nas* -- *Lehm*, *san* -- *Mehl*, *osa* -- *ahme*, *ona* -- *Ahle*. Man benötigt also sowohl das Kriterium der Rekurrenz der Lautungszuordnungen, als auch das der semantischen Vergleichbarkeit, um Bedeutungsträgergleichungen aufzustellen.|131|linguistic reconstruction, comparative method, semantic similarity 35|Holzer1996|Ein **Bedeutungsträger** ist eine Lautung mit Bedeutung (Morph, Wort, Satz...). Eine **Sprache** ist eine Menge von Bedeutungsträgern. Nur auf diesen stark reduzierten Aspekt von Sprache kommt es hier an; insbesondere wird von syntaktischen Beziehungen zwischen den Bedeutungsträgern und überhaupt vonm "System", als das eine Sprache in anderen Zusammenhängen betrachtet werden kann, abgesehen.|49|set theory, language model 36|Makaev1977|Одна из характерных черт реконструкции общеиндоевропейского языка (а при прочих равных условиях и любого другого праязыка) это -- множественность решений, а не их единственность. Хорошо известно, что в зависимости от того, какой из индоевропейских языков будет взят в качестве эталона при восстановлении общеиндоевропейского состояния, картина праязыка всякий раз будет иной. В то же время известно, что опора на все индоевропейские языки при реконструкции исходного состояния всегда приводила к эклектичности решения и бессодержательности самого понятия праязыка [...].|88|cumulative evidence, comparative method 37|Moore1994|Cladistic theory can only be applied legitimately to qualitative data that are unidirectional, not to qualitative data that are random in their development, and not to quantitative data at all.|929|cladistics, qualitative data 38|Moore1994|The logic of cladism, and the graphics derived from that logic, demand that each entity must have one and only one parent (Ashlock 1984)|929|family tree, cladistics 39|Moore1994|The most reliable method for reconstructing protolanguages ist to trace the phonemic changes that have occurred among daughter languages, following htem backward in time on the basis of phonological structure and syntactic organization.[...] The method is traditional in linguistics and decidedly cladistic (Figure 5(. It becomes less precise as it moves backward in time, since the uncertainties of reconstruction at lower levels are compounded as they are combined at higher levels (Southwort 1964; Stewart 1976).|932|comparative method, cladistics 40|Moore1994|The only possible outcome is to discover that the languages are related and then construct a cladogram indicating the extent of that relationship. It is impossible to discover, on alexical basis, that a language might be derived from two or more parent languages. There is also the problem of loan words: if a loan word appears in both of the languages being compared, it might be mistaken for a cognate and thus add strength to the alleged historical relationship between the languages. Further, if the language contributing the loan word is extinct, it may be impossible to tell the differen ce between a loan word widely borrowed by the members of a language family and a word that derives from a common origin. |933|lexicostatistics, cladistics, lexical borrowing 41|Mueller1862|-|82|language mixture, mixed languages 42|Mueller1862|-|135|origin of language 43|Mueller1862|-|135|primitive languages, origin of language 44|Mueller1862|-|34|language faculty, linguistics 45|Sapir1921|Are there resistances of a more intimate nature to the borrowing of words? It is generally assumed that the nature and extent of borrowing depend entirely on the historical facts of culture relation; that if German, for instance, has borrowed less copiously than English from Latin and French it is only because Germany has had less intimate relations than England with the culture spheres of classical Rome and France. This is true to a considerable extent, but it is not the whole truth. We must not exaggerate the physical importance of the Norman invasion nor underrate the significance of the fact that Germany’s central geographical position made it peculiarly sensitive to French influences all through the Middle Ages, to humanistic influences in the latter fifteenth and early sixteenth centuries, and again to the powerful French influences of the seventeenth and eighteenth centuries. It seems very probable that the psychological attitude of the borrowing language itself towards linguistic material has much to do with its receptivity to foreign words. English has long been striving for the completely unified, unanalyzed word, regardless of whether it is monosyllabic or polysyllabic. Such words as credible, certitude, intangible are entirely welcome in English because each represents a unitary, well-nuanced idea and because their formal analysis (cred-ible, certitude, in-tang-ible) is not a necessary act of the unconscious mind (cred-, cert-, and tang- have no real existence in English comparable to that of good- in goodness). A word like intangible, once it is acclimated, is nearly as simple a psychological entity as any radical monosyllable (say vague, thin, grasp). In German, however, polysyllabic words strive to analyze themselves into significant elements. Hence vast numbers of French and Latin words, borrowed at the height of certain cultural influences, could not maintain themselves in the language. Latin-German words like kredibel “credible” and French-German words like reussieren “to succeed” offered nothing that the unconscious mind could assimilate to its customary method of feeling and handling words. It is as though this unconscious mind said: “I am perfectly willing to accept kredibel if you will just tell me what you mean by kred-.” Hence German has generally found it easier to create new words out of its own resources, as the necessity for them arose.|IX:3|lexical borrowing, borrowing constraints 46|Trubetzkoy1930|Gruppen, bestehend aus Sprachen, die eine grosse Ähnlichkeit in syntaktischer Hinsicht, eine Ähnlichkeit in den Grundsätzen des morphologischen Baus aufweisen, und eine grosse Anzahl gemeinsamer Kulturwörter bieten, machmal auch äussere Ähnlichkeit im Bestande der Lautsysteme, -- dabei aber keine systematische Lautentsprechungen, keine Übereinstimmung in der lautlichen Gestalt der morhpologischen Elemente und keine gemeinsamen Elementarwörter besitzen, -- *solche Sprachgruppen nennen wir Sprachbunde.* |18|language union 47|Trubetzkoy1930|Gruppen, bestehend aus Sprachen, die eine beträchtliche Anzahl von gemeinsamen Elementarwörtern besitzen, Übereinstimmungen im lautlichen Ausdruck morphologischer Kategorien aufweisen und, vor allem, konstante Lautentsprechungen bieten, -- *solche Sprachgruppen nennen wir Sprachfamilien*.|18|language family 48|Aikhenvald2007b|Languages can resemble each other in categories, constructions, and meanings, and in the actual forms used to express them. Categories can be similar because they are universal [...]. Occasionally, two languages share a form by pure coincidence. [...] What these two kinds of similarities have in common is that they tell us nothing about hte history of languages or their speakers. In this volume we focus on two other types of similarities: those due to genetic inheritance and those due to areal contact.|1|similarity 49|Aikhenvald2007b|A shared feature may be based on common linguistic origin. Then, the languages can be shown to have descended from the same ancestor [...]. Related languages \`will pass through the same or strikingly similar phases': this \`parallelism in drift' (Sapir 1921: 171-2; LaPolla 1994, Borg 1994) [pb] accounts for additional similarities between related languages, even for those \`long disconnected'.|1f|common origin 50|Aikhenvald2007b|Alternatively, shared features may result from geographic proximity, contact, and borrowing.|2|language contact 51|Aikhenvald2007b|The extent of this varies, depending on a number of cultural and social factors, including the degree of speakers' awareness and sense of purism, and also on the structure of the languages in contact.|2|borrowing, language contact, borrowing degree 52|Aikhenvald2007b|No linguistic feature -- be it a form, or a pattern -- is entirely \`borrowing-proof'. [...] And yet some grammatical and other features are particulary open to -- and others are more resistant to -- diffusion.|2|borrowability 53|Aikhenvald2007b|Preferences at work in borrowing patterns and forms depend on the expression and function of a category, on its usage, and on the ways it correlates with cultural stereotypes. A plethora of social factors play a role -- these include language attitudes and receptivity to \`foreign' forms. Pre-existing structural similarities between languages in contact also facilitate contact-induced change. |2|borrowability, borrowing degree 54|Aikhenvald2007b|Linguistic diffusion is understood as the spread of a linguistic feature within a geographical area or as recurrent borrowing within a linguistic area. Diffusion within an area can be unilateral (when it proceeds from one source) or multilateral (when it involves several sources).|3|linguistic diffusion, lexical diffusion 55|Aikhenvald2007b|Contact-induced innovations are constantly being added to languages over the course [pb] of their development, as if piling tier upon tier of \`naturalized' foreign elements. The result is LAYERED languages: the inherited \`core' is discernible underneath the subsequent \`layers' of innovative influence from outside.|4f|stratification 56|Aikhenvald2007b|Figure 1, inspired by Owens (1996), reflects the scale of potential layering: which parts of the language are more likely to be shared with genetic relatives, and which are easily attributable to language contact and diffusion.|5|borrowability 57|Aikhenvald2007b|The idea of \`layering' is much more \`down to earth'; it reflects the procedure of easing apart subsequent \`layers' of discernible impact from neibhbouring language, on the way to identifying the \`genetic core'. \`Layering' has an additional flavour to it inasmuch as this term reflects chronologically organized stages of linguistic diffusion [...]. The idea of \`layering' is also linked to the notion of \`stratification' [...], whereby different speech registers reflect different contact patterns.|6|stratification 58|Aikhenvald2007b|Detecting \`layers' in languages is a heuristic procedure. And in all the instances quoted in this section the procedure has been successful: we know how to separate the layers of diffusion from the \`core' of genetic affiliation. But in quite a few other cases the picture is blurred.|7|stratification 59|Aikhenvald2007b|Teasing apart similarities due to genetic inheritace and those due to borrowing of varied kinds is one of the hardest problems in comparative linguistics. [...] Ideally, if two languages descend from the same ancestor, the forms and their meanings must be easily relatable, via the application of established rules for phonological change and semantic change. In actual fact, the distinction between inherited and diffused similarities may be difficult to discern, especially in the situation of prolonged and uninterrupted diffusion of cultural and linguistic traits across an area.|7|similarity, genetic similarity, contact-induced similarity 60|Aikhenvald2007b|A long-lasting diffusion area may result in layering of patterns and forms to such an extent that genetic relationships are undiscernible.|7|genetic classification, language contact 61|Aikhenvald2007b|If languages are genetically related, we expect them to develop similar structure, no matter whether they are in contact or not. And if genetically related languages are in contact, trying to prove that a shared feature is contact induced and not a \`chance' result of Sapir's drift may be next to impossible.|9|drift, contact-induced similarity, drift-induced similarity 62|Aikhenvald2007b|Languages known as \`MIXED' or \`INTERTWINED' arise as a result of a combination of special sociolinguistic sicrumstances with semi-conscious efforts to \`create a a language', in which different parts of grammar and lexicon come from different languages.|10|mixed languages, language mixture 63|Aikhenvald2007b|A linguistic area (or sprachbund) is generally taken to be a geographically delimited region including languages from at tleast tow language families, or different subgroups fo the same family, sharing traits, or combinations therof, most of which are not found in languages from these families or subgroups spoken outside the area.|11|language union, linguistic area 64|Aikhenvald2007b|How to locate the diagnostic traits, especially when at least some similarities between contiguous languages can be explained by accident, universals, and parallel development? As shown in the study of Mesoamerica as a linguistic area, by Campbell, Kaufman, and Smith-Stark (1986: 535-6) not all shared features have the same \`weight': \`highly \`\`marked", exotic or unique shared traits weigh more than does material that is more easily developed independently, or found widely in other languages.'|12|linguistic area, language union 65|Aikhenvald2007b|A typologically well-attested property cannot by itself be considered area defining. But the way properties cluster may be area specific.|12|linguistic area, language union 66|Aikhenvald2007b|The more areally defining features a language has, the more central it is to the area. The fewer features it has, the more \`peripheral' it is.|13|linguistic area, language union 67|Aikhenvald2007b|A necessary condition for a linguistic area is some degree of bi- and/or multilingualism.|14|linguistic area, language union 68|Aikhenvald2007b|Languages borrow forms and patterns. Borrowed forms may include a lexeme, a pronoun, an affix, a phoneme or intonation pattern, or a way of framing discourse (see Campbell 1997; Curnow 2001). Borrowing patterns does not presuppose borrowing forms.|15|borrowability 69|Aikhenvald2007b|In terms of the overall impact on the language, diffusion may involve contact-induced gain, or loss, of a form, or of a pattern. The original and the diffused form, or pattern, can coexist in the language, with -- or without -- some functional differentiation. Or a hybrid form may be created.|18|language contact 70|Bonfante1931|I concetti che ò seguito nella composizione di questo lavoro sono i seguenti. Prima di tutto ò cercato di fondarmi soltanto su etimologie assolutamente sicure o per lo meno estremamente verosimili: è facile fare lunghe liste di parole, la cui parentela è dubbia o discutibile, ma che valore ànno esse? Forniscono materiale infido agli studiosi, che non ànno tempo o possibilità di vagliare criticamente tutti gli étimi proposti, e finiscono poi per gettare il discrédito sulla nostra scienza.

[The ideas which I have followed in the compilation of this work are the following ones. First of all I tried to base myself on absolutely safe or at least extremely probable etymologies: it is easy to make long lists of words whose relationship is dubious or questionable, but what value do they have? They provide the scholars with treacherous material, so that they do not have the time or the possibility to sift through the proposed etymologies critically, and in the end discredit our scientific endeavours.]|71|etymology 71|Bonfante1931|Tale criterio doveva esser tenuto particolarmente presente in un lavoro come questo, che tratta dei dialetti ideur. Questi dialetti appartèngono tutti ad una medésima famiglia, come per esempio i dialetti italiani (escluso il sardo); ànno quindi più o meno tutti gli stessi casi, le stesse declinazioni, le stesse radici; le loro differenziazioni sono sottili, delicate, spesso fuggévoli; bisogna scendere a particolari minuti, che a qualcuno possono parere insignificanti, ma che sono invece i più significativi.

[Such a criterion had to be held particularly present in a work like this, which treats the Indo-European dialects. These dialects belong all to the same family, like for example the Italian dialects (Sardinian excluded); they have therefore more or less all the same cases, the same declinations, the same roots; their differentiations are thin, delicate, often fugitive; it must come down to tiny details that can seem insignificant to someone, but which are instead most meaningful.]|72|- 72|Bonfante1931|Ben di rado noi siamo certi delle isoglosse che costruiamo, e ciò per varie ragioni. Quando si tratta di fenòmeni fonétici, osserviamo fatti idèntici che si sviluppano spontaneamente e indipendentemente in territori anche lontanissimi e senza contatti fra loro (come la Lautverschiebung tedesca ed armena). Quando si tratta di fenòmeni fonétici, osserviamo fatti idèntici che si sviluppano spontaneamente e indipendentemente in territori anche lontanissimi e senza contatti fra loro (come la Lautverschiebung tedesca ed armena); ciò avviene perché gli organi fisiològici che prodúcono i suoni sono piùo meno gli stessi in tutte le razze, e quindi i suoni e le loro trasformazioni sono in número limitato e si riprodúcono in tempi e luoghi diversissimi.Quando si tratta di fenòmeni morfològici, può valere in parte quel che ò detto or ora; ma più di frequente osserviamo concomitanza di evolu[pb]zione; perché le leggi psicològiche si somigliano sotto ogni cielo, e sotto ogni cielo troviamo con nomi diversi le stesse leggende, gli stessi miti, le stesse divinità, le stesse fàvole, con monotonia quasi esasperante (1). Ma anche quando questi fattori siano esclusi, di fronte alla concordanza di due o più lingue indoeuropee contro le altre resta sempre la possibilità che le rimanenti lingue abbiano perduto il suono, o la forma, o la parola di cui si tratta; ed è in genere difficile dimostrare il contrario (2). Ciò avviene perche le leggi del mutamento linguistico ci sono sostanzialmente ancor oggi ignote, e potremo scoprirle solo se accumuleremo e ordineremo molto materiale linguístico, che ci permetta di trarre conclusioni dalla realtà delle cose, evitando di imporre alle cose le nostre astrazioni.

[Very rarely are we sure of isoglosses that we construct, and this for several reasons. When dealing with phonetical phenomena, we observe identical facts that developed spontaneously and independently in the territories very distant from each other and without contact between them (like the German and Armenian Lautverschiebung); this happens because the physiological organs that produce the sounds are more or less the same in all the races, and therefore the sounds and their transformations are limited regarding their number and are pronounced in very various times and places. When dealing with morphological phenomena, the same, which has been just said, holds. But in most cases we can observe common develop[pb]ments; because the psychological laws resemble each other under every sky, and under every sky we find with various names the same legends, the same myths, the same divinity, the same fables, with a nearly tedious monotony (1). But also when these factors are excluded, opposed to an agreement of two or more Indo-European languages with another one, there always remains the possibility that the remaining languages have lost the sound, the form, or the word one is dealing with; and it is generally difficult to demonstrate the contrary (2). This happens because the laws of linguistic change are still substantially unknown today, and we shall only be able to discover them, if we accumulate and order every linguistical material that allows us to draw conclusions from the truth of the things and avoid to impose our abstractions on them.]|75f|isoglosses, contact-induced similarity, genetic similarity 73|Bonfante1931|Generalmente si fa delle lingue indoeuropee il disegno di una catena, per cui ogni anello è collegato con altri due soli anelli, uno a destra e uno a sinistra. [...] secondo questo schema, potrebbe toccare solo l'ario a oriente, l'itàlico a occidente. Io mi figuro la cosa in modo un po' diverso. Prescindendo dai gruppi minori, attribuisco alle lingue indoeuropee nella loro fase preistòrica una posizione rispettiva di questo tipo: [graphic] Si vede da questo schizzo che una qualunque lingua indoeuropea (all'infuori del cèltico e dell'ario) può avere contatti diretti (ciò senza scavalcare un'altra lingua) con vari gruppi e non con 2 soli. Le isoglosse non vanno solo da oriente a occi[pb]dente e viceversa, come fin qui si è creduto, ma anche da nord a sud e da sud a nord. [...] Lo schema da me disegnato somiglia assai a quello di Hirt [...]; ma nel testo egli non applica la mia idea. :translation:`Generally, one depicts the Indo-European languages in the form of a chain, every ring of which is connected with only two other rings, one to the right and one to the left.[...] according to this schema, we could touch Arian only in the east and Italic in the West. I imagine the matter in a slightly different way. Apart from the minor groups, I attribute to the Indo-European languages in their prehistorical phase a position according to the following type: [graphic] From this sketch it can be seen that every Indo-European language (apart from Celtic and Arian) can have direct contacts (without crossing another language) with several and not with only two groups. The isoglosses do not only go from East to West and viceversa, as it has been assumed up to now, but also from North to South and South to North. [...] The sketch drawn by me bears some resemblences with that of Hirt [...]; but in the text he does not apply my idea.`|174f|- 74|Francois2008|Even though this may fail to represent faithfully the language-internal perception of an English native speaker, at least this serves efficiently the purpose of cross-linguistic comparison: it becomes then easy to state the facts by saying that these two senses are treated the same in English, and not in French. |169|vagueness 75|Francois2008|This empirical method of defining senses based on cross-linguistic comparison has the valuable advantage that it helps “sidestep the vexing problem of distinguishing between polysemy and vagueness” (Haspelmath 2003: 231). |169|vagueness 77|Francois2008| For example, Brown (2005a) suggests that the colexification of 〈hand〉 – 〈arm〉 may be influenced by the geographical situation of the community. According to him, the use of “tailored clothing covering the arm” in colder environments tends to make the contrast between the hand and the arm more salient, thus favoring the existence of two separate lexical items.|173|colexification 78|Francois2008|In sum, colexification may result historically from typological convergence, from genetic inheritance, or from contact-induced change ... just like any other structural feature of a language.|174|colexification 79|Francois2008|The method suggested by Haspelmath indeed resorts to observable data from actual languages. The basic idea is that senses should be arranged in space in such a way that each lexical unit in one language “occupies a contiguous area on the seman- tic map” (2003: 216). Furthermore, each specific connecting line should reflect the existence of at least one attested case of a direct lexical connection between these two senses, in any of the world’s languages.|180|networks, colexification, polysemy 80|Burlak2005|Регулярные фонетические соответствия могыт быть разделены на несколько типов. Первый тип -- взаимно-однозначные соответствия, как, например, рус. {\em г} -- укр. {\em г}. Второй тип соответствий можно назвать соответствиями-дроблениями: на приведенном романском примере можно видеть, что различные звуки языков-потомков развились из звуков, представлявших одну фонему праязыка и находившихся в дополнительном распределении (т.е. одна праязыковая фонема «раздробилась», или, как иногда говорят, «расщепилась», на несколько в языке-потомке). Чтобы показать, что имеет место соответствие-дробление, необходимо установить дополнительное распределение между несколькими звуками одного языка, соотвествующими одномы звуку другого. [\ldots] [pb] Третий тип регулярных фонетических соответствий -- соответствия-совпадения. [\ldots] Дла того чтобы решить, с каким типом соответствий мы имеем дело в конкретном случае, надо попытаться установить правило распределения. Если это удастся, значит, в праязыке на этом месте была одна фонема, если же не удастся -- это не значит ничего: вполне возможно, что через некоторое время (возможно, с накоплением новых данных, возможном с появлением другой гипотезы и т. д.) правило будет сформулировано.|46f|sound correspondences 81|Burlak2005|Специально отметим, что сходство звуков между собой при установлении регулярных фонетических соответствий никакой роли не играет. |47|synchronic similarity, sound correspondences 82|Burlak2005|Теоретическим основанием для применения такого метода служит предполобение [\ldots] о возможности выделить такие группы звуков, что изменения в пределах группы более вероятны, чем переходы из одной группы в другую.|?|sound classes 83|Brugmann1904|Wir können nicht wissen,w ie viel in jenen vorgeschichtlichen Zeit [sic!] ein Idiom dem andern nur entlehnt hat in der Art, wie z. B. lat. *poena* griechisches Lehnwort (ποινή) war; nur bis zu einem gewissen Grad haben wir hier, z.B. an den Lautverhältnissen, einen Anhalt zur Beurteilung. Auf Entlehnung beruhende Übereinstimmungen aber dürfen für die Bestimmung der Verwandtschaft, dieses wort in seiner üblichen Bedeutung genommen, nicht in Rechnung gestellt werden. Im ganzen ist also nur wenig, was aus den spezielleren Übereinstimmungen zwischen einzelnen von den acht Hauptgruppen für die Beziehungen der Völker zu einander in sogen. [pb] voreinzelsprachlicher Zeit mit grösserer Wahrscheinlichkeit entnommen werden kann. Und jedenfalls treten, so viel wir heute wissen, nirgends speziellere Gemeinsamkeiten, die als gemeinsame Neuerungen erscheinen, in *so grosser Anzahl* entegegen, dass man auf Grund derselben die betrefenden Sprachzweige in derselben Art zu Einheiten zusammenschliessen drürfte, wie man z.B. das Indische mit dem Iranischen, das Baltische mit dem Slavischen zu vereinigen pflegt. Dies gilt selbst für den Fall, dass man keine von diesen Übereinstimmungen als nur zufällig und keine als auf Entlehnung beruhend betrachten wollte. Man mag also immerhin konstatieren, welche und wie viele besondere Übereinstimmungen jedesmal zwischen zwei Nachbargebieten vorhanden sind -- die meisten und signifikantesten gibt es zwischen italisch und keltisch --, aber da uns eine nähere Einsicht in die Art und Weise, wie sie zustande gekommen sind, abgeht, so spreche man hier nicht von näherer oder fernerer \`Verwandtschaft', z.B. nicht von einer näheren Verwandtschaft des Italischen mit dem Keltischen als mit dem Griechischen, weil das wort Verwandtschaft hier allzu leicht ungerechtfertigte Vorstellungen erweckt. |21f|subgrouping, genetic classification 84|Brugmann1967|Das einzige nun, was auf das Verhältnis der einzelnen Sprachzweige zu einander, auf die Art des Hervorgangs der Einzelsprachen aus der idg. Ursprache Licht werfen kann, sind die besonderen Übereinstimmungen zwischen je zwei oder mehreren von ihnen, die Neuerungen, durch die jedesmal gewisse Sprachzweige gegenüber den andern in der Entwicklung vorangeschritten erscheinen. :translation:`The only thing that can shed light on the relation among the individual language branches, that is, on how the individual languages originated from the Indo-European proto-language, are the specific correspondences between two or more of them, the innovations, by which each time certain language branches have advanced in comparison with other branches in their development.`|24|subgrouping, genetic classification, shared innovation 85|Trask2000|**Cognates** 1. Narrowly, and most usually, one of two or more words or morphemes which are directly descended from a single ancestral form in the single common ancestor of the languages in which the words, or morphemes, are found, with no borrowing. [...] 2. Broadly, and less usually, one of two or more words which have a signle common origin but one or more of which have been borrowed. [...] \|63|cognates 86|Trask2000|The relationship which holds between cognates: descend from a common ancestor.|64|cognacy 87|Trask2000|**lookalike** Linguistic forms in different languages which are noticeabley similar in forma nd meaning but which are not known to be related and which cannot be fitted into systematic correspondences. The great majority of lookalikes are chance resemblances, and hence few linguists attach any importance to them in comparison.|202|lookalikes, similarity, 88|Trask2000|A striking similarity in form and meaning between words or other elements in two or more different leanguages which results from nothing more than a chance accident.|55|chance resemblance 89|Trask2000|**root cognates** Words or forms which are constructed upon cognate roots but at least some of which also contain additional morphological material which is not cognate.|290|root cognates, etymological relations 90|Trask2000|Narrowly, the transfer of a word from one language into a second language, as a result of some kind of contact [...] between speakers of the two.|44|borrowing, lexical borrowing 91|Trask2000|**etymon** An earlier linguistic form from which a later one is directly derived. [...] The opposite is **reflex**.|110|etymon, etymological relations 92|Trask2000|**reflex** A word or other linguistic form which is directly descended within a particular language form an ancestral form taken as a reference point. [...] The opposite is etymon.|278|reflex, residue 93|Trask2000|**loan word** A word which has been taken into one language form a second language in which it was already present.|201|loan word, borrowing 94|Trask2000|**oblique cognates** Two or more words in related languages which continue alternant forms of a single root in the ancestral language. |234f|oblique cognacy, etymological relations 95|Trask2000|**partial cognates** Linguistic forms which contain morphological material that is narrowly cognate but at least some of which contain additional material not present in the others.|248|partial cognacy, etymological relations 96|Trask2000|A word which is coined by adults for the purpose of addressing small children, often one which deliberately imitates the child’s first utterances. Nursery words [...] are exceedingly common in the world’s langauges in such senses as ‘mother’, ‘father’, ‘breast’; such words are useless as comparanda in comparative linguistics, because they are so often independently created.|234|nursery word, etymological relations 97|Fitch2000|Homology is the relationship of two characters that have descended, usually with divergence, from a common ancestral character. This is important because most of the terminological problems stem from different definitions of homology. Characters can be any genic, structural or behavioral feature of an organism. |227|homology, etymological relations 98|Fitch2000|The relationship of any two identical character states that must have arisen independently, given a specific phylogenetic tree. |229|homoplasy, etymological relations 99|Fitch2000|The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained. |229|orthology, etymological relations 100|Fitch2000|The relationship of any two homologous characters arising from a duplication of the gene for that character. |229|paralogy, etymological relations 101|Fitch2000|The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.|229|xenology, etymological relations 102|Allen1953|The systematization of similarities presupposes the identification of comparable items; and in traditional comparative linguistics the criteria for such identification are mainly of one type. The upper limit for the item is the word, more often the [pb] morpheme; and the comparable items in two or more languages are identified on a combined formal and semantic basis -- the \`\`linguistic meaning form" as one writer has called it (`cf. Dyen 1951 `_). for analytical purposes these two aspects of the identification may here be considered one at a time. The semantic identification is by translation into some common language -- the \`language of reference' we may term it; if the translations of the two or more items are identical, then the identification is accepted without question; if they are not, the identification depends largely upon the intuition or the preconceptions of the scholar concerned. Identifications of the latter type are not infrequently \`\`supported" by the citation of appropriate aphorism from the system of rhetoric entitled by its Greek authors Περὶ Τρόπων and by its modern exponents \`\`Semantic Changes" -- a mode of support which has been expressly condemned by Meillet (`Meillet: Méthode comparative `_). |57f|semantic similarity 103|Allen1953|[...] only those semantic equations are therefore considered for which it proves possible to state a corresponding formal equation: and the criterion for such an equation is that the correspondences between its terms should be congruent with those set up for other acceptable equations. It is precisely the insistence upon and possibility of stating these systematic correspondences that has given comparative linguistics its special character and puts the formal identifications on a more favourable footing than the semantic identifications. The formal correspondences are stated monosystematically in what are, to all intents and purposes, phonemic terms -- it is to be noted that we in fact owe the term \`\`phoneme" to Kruszewski's comparative-historical studies of morphemic alternation (`Kazan 1881: Über die Lautabwechslung `_, `Firth: Firth, J. R. (1934) The word “Phoneme”, Le Maître phonétique 46: 44–6 `_) the Roman and romanized orthographies of much of the comparative material [pb] were already all but phonemic, so that little work upon them was necessary in order to make a beginning, but only a study, as Hjelmslev has pointed out, of *literarum permutationes* [!??!]. By the formal criterion, then, Lat. *ovis* and Skt. *avi\d{h}}* \`\`sheep", for example, are accepted as comparable, in that they are identifiable phoneme by phoneme in accordance with the established system of correspondences, to which they themselves add support; by the same criterion an identification of Lat. *manus* and Skt. *hasta\d{h}* \`\`hand", is rejected. Thus only those examples are admitted to comparison for which a correspondence-system can be established -- in other words, from the point of view of such a system, only corroborative items are selected. From this it would appear that the regularity of the \`\`sound-laws" was not so much a blind as a logical necessity. |60f|formal similarities 104|Chambers2004|In so far as dialect variation is the result of waves of linguistic innovation spreading throughout a region, there is an intrinsic chronological dimension (in terms of ‘apparent’ time rather than real time, a distinction that is discussed in Chapter 10), a domain shared with comparative- historical linguistics.|21|dialectology, temporal dimension 105|Schleicher1848|Die bildung der Sprache, die aufsteigende geschichte ihrer Entwicklung fällt in die vorhistorische Periode der Völker; es giebt, wie schon gesagt, kein historisches Beispiel einer sich bildenden Sprache. In der historischen [pb] Periode ist die Sprachengeschichte die Geschichte des Verfalls der Sprachen als solcher in Folge ihrer Knechtung durch den Geist.|16f|- 106|Schleicher1848|Hieraus folgt, dass, wie die Geschichtsentwicklung eine gesetzmässige ist, so der Verfall der Sprache bestimmte Gesetze zeigen, einen regelmässigen Verlauf haben müssen und ferner dass, wie die Geschichte aller Völker wesentlich einen Gang geht -- wie auch die Entwicklung eines Individuums doch im Ganzen denselben Typus zeigt -- so auch die Sprachengeschichte überhaupt, die Geschichte aller Sprachen einen im Wesentlichen übereinstimmenden Verlauf zeigen müsse.|25|- 107|Schleicher1848|Der Verfall ist wirklich ein allmählicher wie die geschichtliche Entwicklung, er ist in Perioden theilbar, wie diese, je nach dem grösseren oder geringeren Grade der Entfernung vom Ursprünglichen und er verläuft bei allen Sprachen in analoger Weise, wie die Geschichte. Aus letzterem Satze folgt die Möglichkeit und der Vortheil einer vergleichenden Behandlung der Sprachengeschichte.|25|- 108|Schleicher1848|Der systematische Theil der Sprachforschung im Gegensatze zur historischen hat -- irre ich nicht, so sagt [pb] diess Bopp irgendwo -- eine unverkennbare Aehnlichkeit mit den Naturwissenschaften. Dies stellt sich namentlich bei der Eintheilung der Sprachen in Klassen heraus. Der ganze Habitus einer Sprachenfamilie lässt sich unter gewisse Gesichtspunkte bringen, wie der einer Pflanzen- oder Thierfamilie. Wie in der Botanik gewisse Merkmale -- Keimblätter, Beschaffenheit der Blüthe -- vor anderen sich als tauglich erweisen, eben weil diese Merkmale gewöhnlich mit anderen coincidiren, so scheinen in der Eintheilung der Sprachen innerhalb eines Sprachstammes, wie z.B. des Semitischen, Indogermanischen, die Lautgesetze diese Rolle zu übernehmen.|27f|sound law 109|Schleicher1848|Bei der systematischen Zusammenstellung der Sprachen darf man aber eine Erscheinung nicht übersehen, die falsch aufgefasst, zu grossen Fehlgriffen führen könnte, zumal, wenn der Lautlehre der ihr allerdings gebührende massgebende Einfluss auf die Eintheilung der Sprachen verstattet wird. Es ist nämlich unbestreitbar, dass örtlich neben einander liegende Sprachen, besonders wenn dieselben redenden Völker in lebhaftem Verkehre mit einander stehen, gewisse Färbungen von einander annehmen (selbst wenn sie ganz verschiedenen Familien angehören), die einen weniger genauen Beobachter leicht zur der Annahme einer näheren Verwandtschaft verleiten könnten.|29|language contact 110|Schleicher1848|Dennoch würde es eben so falsch sein das Lettische eine slawische Sprache zu nennen, als das Ossetische mit den Mingrelischen, Suarischen, u.s.w. zu einer Klasse zu rechnen. Dass dergleichen Färbungen von einer zur anderen Sprache sich verpflanzen können, scheint daher durch die Erfahrung gerechtfertigt und ist auf diese Erscheinung bei der Eintheilung der Sprachen gebührende Rücksicht zu nehmen.|30|genetic similarity, contact-induced similarity 111|Schleicher1848|Die historische Betrachtung kann nicht etwa die zahlreichen Sprachen, die nur in einer Phase ihrer Entwicklung, vielleicht nur in der allerneuesten vorliegen, unter dem Vorwande aus ihrem Kreise ausschliessen, dass ja in diesen nichts eigentlich Ge- schichtliches, keine Einsicht in ihr Werden, ihre Veränderungen im Laufe der Zeit gegeben sei; jede Sprache trägt mehr oder minder auch in späteren Formationen die früheren ans ich, repräsentirt überhaupt immer eine bestimmbare Stufe sprachlicher Entwicklung und fällt so durchaus in das Gebiet des Sprachhistorikers. |32f|- 112|Schleicher1848|[...] so mag es nicht überflüssig sein, hier, wo es sich darum handelt, das Ossetische dem Iranischen beizuzählen, auf Einiges Wenige hinzuweisen, das zufällig aus dem herausgegriffen ist, was das Ossetische speciell Iranisches hat; die ganze Masse dessen, was das Ossetische zwar auch mit dem übrigen iranischen, dieses aber mit dem Indogermanischen gemeinsam besitzt, möge vor der Hand mit Absicht ganz unberücksichtigt bleiben.|66|shared innovation, genetic classification 113|Schleicher1848|Das Wichtigste und Entscheidende aber ist, dass das Ossetische die das Iranische charakterisierenden, es von den andern indogermanischen Familien unterscheidenden Lautgesetze aufzuweisen hat.|67|sound law, genetic classification 114|Schleicher1861|Indogermanische sprachen nent man eine bestimte reihe von sprachen des asiatisch-europäischen erdteiles von so übereinstimmender und von allen andern sprachen verschidener beschaffenheit, daß sie sich deutlich als auß einer gemeinsamen ursprache entstanden erweist.|4|- 115|Schleicher1861|Nur von den Indern, die zu allerlezt den stamsitz verließen, wißen wir mit völliger sicherheit, daß sie auß iren späteren wonsitzen ein stamfremdes älteres volk verdrängten, auß dessen sprache manches in die irige über gieng. Von mereren der übrigen indogermanischen völker ist änliches teilweise in hohem grade warscheinlich.|6|language mixture 116|Schleicher1861|Die ältesten teilungen des indogermanischen bis zum entstehen der grundsprachen der den sprachstamm bildenden sprachfamilien laßen sich durch folgendes schema anschaulich machen. Die länge der linien deutet die zeitdauer an, die entfernung derselben von einander den verwantschaftsgrad.|6|family tree 117|Schleicher1869|Die Grundsprache *A* theilt sich in die Sprachen *a* und *b* in der beschriebenen Weise nämlich so, daß der Theil *b* des Sprachgebietes stärkeren Veränderungen unterliegt als der mit *a* bezeichnete. Bis zum Durchschnitt *xx* hat also *b* sich viel weiter von *A* entfernt als *a*, und dieß macht eben unser Schema dadurch anschaulich, daß es *bx* stärker von der geraden Richtung abweichen läßt als *ax*, das mehr als eine directe Fortsetzung von *A* erscheint (wir können usn unter *A* die litauische Grundsprache, unter *ax* die litauische und unter *bx* die lettische Sprache denken, oder in ähnlicher Weise sich verhaltende Sprachen oder Mudnarten).|59|language split, family tree 118|Schleicher1869|Gesetzt, die acht indogermanischen Grundsprachen wären in vollkommen gleicher Weise mit einander verwandt, jede stünde gleichweit von der andern ab, keine überragte an Ursprünglichekeit die andere, so müßten wir annehmen, daß sie alle acht gleich lange leben und daß sie alle auf gleichmäßige Art durch Theilung der gemeinsamen Ursprache in acht Sprachkörper gleichzeitig hervorgegangen seien. So verhält sich nun aber die Sache nicht.|80|genetic classification 119|Schleicher1869|Ferner erweisen sich Griechisch (Albanisch), Italisch und Celtisch deutlich als näher untereinander verwandt, als mit irgend einer der andern indogermanischen Sprachen.|80|- 120|Schleicher1869|Da die Sprache ein so wesentliches Moment der Nationalität bildet, daß weder zwei oder mehr Sprachen einem Volke, noch einer [pb] Sprache zwei Völker entsprechen können, sondern jede besondere Sprache nur auf dem Gebiete einer einzigen Nationalität wachsen kann, so können wir die Urgeschichte der indogermanischen Sprachsippe *\`mutato nomine'* zugleich als Urgeschichte der indogermanischen Völkersippe gelten lassen.|82f|language history 121|Schleicher1873|Von den sprachlichen Organismen gelten nämlich ähnliche Ansichten, wie sie Darwin von den lebenden Wesen überhaupt ausspricht, theils fast allgemein, theils habe ich zufällig im Jahre 1860, also in demselben Jahre, in welchem die deutsche Uebersetzung von Darwins Werk erschienen, über den \`Kampf ums Dasein', über das Erlöschen alter Formen, über die grosse Ausbreitung und Differenzierung einzelner Arten auf sprachlichem Gebiete mich in einer Weise ausgesprochen, welche, den Ausdruck abgerechnet, mit Darwins Ansichten in auffälliger Weise zusammen stimmt.|4|biological parallels 122|Schleicher1873|Ich wenigstens weiss sehr wohl, was ich dem Studium von Werken, wie Schleidens wissenschaftliche Botanik, Carl Vogts physiologischen Briefe u.s.f für die Erfassung des Wesens und des Leben s der Sprache zu danken habe. Habe ich doch aus diesen Büchern zuerst erfahren, was entwickelungsgeschichte ist. Bei den Naturforschern kann man einsehen lernen, dass für die Wissenschaft nur die durch sichere, streng objective Beobachtung festgestellte Thatsache und der auf diese gebaute richtige Schluss Geltung hat; eine Erkenntniss, die manchem meiner Collegen von Nutzen wäre.|6|- 123|Schleicher1873|Da die Beobachtung des allerdings nur sehr kurzen Zeitraumes des jüngsten Erdenlebens nur ein allmähliches Verändern ergibt, so haben wir durchaus kein Recht für die Verganenheit eine andere Art des Lebensverlaufes vorauszusetzen. Von derselben Ansicht ging ich von je her bei der Betrachtung des Lebens der Sprachen aus, welches ebenfalls nur in seinen für uns letzten und jüngsten, verhältnissmaässig sehr kurzen Perioden innerhalb der unmittelbaren Beobachtung fällt. Diese kurze Zeit von einigen Jahrtausenden lehrt uns mit unumstösslicher Gewissheit, dass das Leben der Sprachorganismen überhaupt nach bestimten Gesetzen in ganz allmählichen [pb] Veränderungen verlaufe und dass wir nicht im entferntesten ein Recht haben vorauszusetzen, dass diess jemals sich anders verhalten habe. |10f|uniformitarianism 124|Schleicher1873|Was Lyell für die Lebensgeschichte der Erde, das hat Darwin für die Lebensgeschichte der Bewohner der Erde ausgeführt. Darwins Lehre ist also keine zufällige Erscheinung, sie ist nicht die Ausgeburt eines absonderlichen Kopfes, sondern ein rechtes und echtes Kind unseres Jahrhunderts. Darwins Lehre ist eine Nothwendigkeit.|12|Charles Lyell 125|Schleicher1873|Von Sprachsippen, die uns genau bekannt sind, stellen wir eben so Stammbäume auf, wie diess Darwin (S. 121) für die Arten von Pflanzen und Thieren versucht hat.|14|family tree 126|Schleicher1873|Als Beispiel möge hier der Stammbaum der indogermanischen Sprachsippe Platz finden, wie er nach unserem Dafürhalten als Bild des allmählichen Entstehens derselben aufzustellen ist; man vergleiche ihn mit Darwins bildlicher Darstellung (S. 121), wobei man nicht ausser Acht lasse, dass Darwin ein ideales Schema aufstellt, wir aber das Bild der Entstehung einer gegebenen Sippe zeichnen.|14|family tree, Indo-European 127|Schleicher1873|In einer frühreren Lebensperiode des Menschengeschlechtes gab es eine Sprache, die wir aus den aus ihr hervorgegangenen indogermanisch genannten Sprachen ziemlich genau erschliessen können, die indogermanische Ursprache. Nachdem sie von einer Reihe von Generationen gesprochen ward, während dem wahrscheinlich das sie redende Volk sich mehrte und ausbreitete, nahm sie auf verschiedenen Theilen ihres Gebietes ganz allmählich einen verschiedenen Charakter an, so dass endlich zwei Sprachen aus ihr hervorgingen. Möglicher Weise können es auch mehrere Sprachen gewesen sein, von denen aber nur zwei am Leben blieben und sich weiter entwickelten; dasselbe gilt auch von allen späteren Theilungen. Jede dieser beiden Sprachen unterlag dem Differenzierungsprozesse noch zu wiederholten Malen. Der eine Zweig, den wir nach dem, was später aus ihm ward, den slawodeutschen nennen wollen, theilte sich abermals durch allmähliche Differenzierung (durch die fortgesetzte Neigung zur Divergenz des Charakters, wie es bei Darwin heisst), in deutsch und slawolettisch, von denen das erstere [pb] die Stammutter aller deutschen (germanischen) Sprachen und ihrer Mundarten, das letztere die der slawischen und litauischen (baltischen, lettischen) ward. Die andere sprache, die sich durch Differenzierung aus der indogermanischen Ursprache heraus entwickelt hatte, das ariograecoitalokeltischen -- man verzeihe diese langathmige Bezeichnung -- theilte sich später ebenfalls in zwei Sprachen, von denen die eine, die graecoitalokeltische, die Mutter der griechischen, albanesischen und der Sprache ward, aus welcher später keltisch und italisch hervorgiengen und die wir deshalb die italokeltische nennen, die andere aber, die arische Sprache, die nah verwandten Stammmütter der indischen und der eranischen (persischen) Sprachfamilie erzeugte. Weitere Übersetzung des Bildes in Worte ist wohl überflüssig.|15f|- 128|Schleicher1873|[...] je verschiedener von einander die Sprachen einer Sippe sind, desto früher setzen wir ihre Loslösung aus gemeinsamer Grundform an, indem wir die Verschiedenheit auf Rechnung einer längeren individuellen Entwickelung schreiben.|17|diversification, language split 129|Schleicher1873|Die Beobachtung ist in Beziehung auf Entstehung neuer Formen aus früheren auf sprachlichem Gebiete leichter und in grösserem Maassstabe anzustellen, als auf dem der pflanzlichen und thierischen Organismen. Ausnahmsweise sind wir Sprachforscher hier einmal im Vortheile gegen die übrigen Naturforscher. :translation:`In linguistics, observing how new forms developed from earlier ones can be accomplished more easily and on a larger scale than in the realm of plants and animals. This is one of the rare cases where we linguists have an advantage compared to the other scientists.`|18|linguistic reconstruction, reality 130|Schleicher1873|Wir kennen sowohl das Altlateinische, als auch die durch Differenzierung und durch fremden Einfluss -- Ihr würdet sagen durch Kreuzung -- nachweislich aus ihm hervorgegangenen romanischen Sprachen.|18|language mixture 131|Schleicher1863|Wir wissen also geradezu aus vorliegenden Beobachtungsreihen, dass die [pb] Sprachen sich verändern, so lange sie leben, und diese längeren Beobachtungsreihen verdanken wir der Schrift.|18f|language change, nice quote, August Schleicher 132|Schleicher1873|So aber haben wir mehr Beobachtungsmaterial als die übrigen Naturforscher und sind daher früher auf den Gedanken der Unursprünglichkeit der Arten gekommen.|19|biological parallels 133|Schleicher1873|Wenn herüber Darwin sagt (S. 57): 'Eine bestimmte Grenzlinie ist bis jetzt sicherlich nicht gezogen worden, weder zwischen Arten und Unterarten, d. i. solchen Formen, welche nach der Meinung einiger Naturforscher den Rang einer Species nahezu aber doch nicht gänzlich erreichen, noch zwischen Unterarten und ausgezeichneten Varietäten, noch endlich zwischen den geringeren Varietäten und individuellen Verschiedenheiten. Diese Verschiedenheiten greifen, in eine Reihe geordnet, unmerklich ineinander, und die Reihe weckt die Vorstellung von einem wirklichen Übergang', so brauchen wir nur die Benennungen Art, Unterart, Varietät mit den in der Sprachwissenschaft üblichen (Sprache, Dialekt, Mundart, Untermundart) zu vertauschen, und das von Darwin Gesagte gilt vollkommen für die sprachlichen Unterschiede innerhalb der Sippen, deren allmähliches Entstehen wir so eben an einem Beispiele vor Augen geführt haben. :translation:`If now Darwin says (p. 57): 'A certain border cannot be drawn until now, neither between species and subspecies [...] nor between subspecies and specific varieties, neither, finally, between the lowest varieties and the individual differences. These differences form an ordered chain, and the chain gives the impression of a real transition.', we only need to replace the terms species, subspecies, variety with the terms common in linguistics (language, dialect, speech variety [Mundart], sub-speech-variety [Untermundart]), and what Darwin says will completely hold for the linguistic differences inside the peoples, whose emergence we have exemplified in detail earlier.`|21|biological parallels 134|Schleicher1873|Eine so zu sagen materielle Abstammung aller Sprachen von einer einzigen Ursprache können wir also unmöglich voraussetzen.|22|monophyly 135|Schleicher1873|Für alle Sprachen nehmen wir also einen formell gleichen Ursprung an.|24|origin of language 136|Schleicher1873|Wir setzen deswegen eine unzählbare Menge von Ursprachen voraus, aber für alle statuieren wir eine und dieselbe Form.|24|origin of language 137|Schleicher1873|Das Reich der Sprachen ist von dem der Pflanzen und Thiere zu verschieden, als dass die Gesammtheit der Darwinschen Ausführungen mit ihren Einzelheiten für dasselbe Geltung haben könnte. :translation:`The realm of languages is too different from the one of plants and animals, as to allow that Darwin's postulations in all their detail would hold for it.`|31|biological parallels 139|Schuchardt1900|Jede ächte Klassifikation, sagt Darwin, ist eine genealogische. Wir besitzen nun eine Klassifikation des Romanischen in Gruppen, Sprachen, Mundarten, Untermundarten. ist dieselbe eine ächte? Ist sie nicht vielmehr bloss eine äusserliche? erlangen wir von denjenigen, die sie aufstellen, eine genaue Begründung auf [pb] gemeinschaftliche und nicht gemeinschaftliche Kennzeichen, so werden sie uns in ungemein vielen Punkten ihre Unsicherheit udn Ungewissheit eingestehen, oder sich untereinander in erklärtem Widerspruch befinden. Sie mögen dies als eine Folge sei es mangelhafter Erkenntnis sei es gewisser Erscheinungen von untergeordneter Bedeutung erklären; wir aber sehen darin die unvermeidliche Folge einer Thatsache von erster Bedeutung, mit der das System überhaupt nicht zu vereinigen ist. Ich meine die Thatsache der *geographischen Abänderung*, die Thatsache dass über das ganze romanische Gebiet hin die dialektischen Differenzen sich im Verhältnis ihrer räumlichen Vertheilung abstufen.|5f|genetic classification 140|Schuchardt1900|Was aber dann für die jüngste Generation, für die Wipfel des Stammbaums gilt, gilt jedenfalls auch für die früheren, da die gleichen Bedingungen immer vorhanden gewesen sind; und zwei Sprachvarietäten können nicht erst unabhängig sich entwickelt und, wenn sie fertig waren, einander beeinflust haben, sondern diese Wechselwirkung hat mit der Divergenz selbst ihren Anfang genommen. Wir verbinden die Äste und Zweige des Stammbaums durch zahllose horizontale Linien, und er hört auf ein Stammbaum zu sein.|11|family tree 141|Schuchardt1900|Denken Sie sich etwa, von [pb] einem Punkte gingen nach verschiedenen Richtungen zwei Kolonieen aus, zwischen denen jede Beziehung abgebrochen würde; von den Pflanzorten zweigten sich neue Niederlassungen und von diesen wieder andere und so fort ab, doch immer so dass jede ganz isolirt fortlebte. Dan würde ein Sprachstammbaum sich erheben an dem nicht das Geringste auszusetzen wäre. Ein solcher Wunderbaum, der doch weite Schatten werfen müsste, ist indessen so viel ich weiss noch nicht entdeckt.|11f|family tree, language split 142|Schuchardt1900|ich möchte Ihnen das Bild des Stammbaums, das ich zurückweise, durch ein anderes ersetzen. Es sei der ganze Länderkomplex romanischer Zunge mit einer und derselben Farbe, mit Weiss, bedeckt, welches die allgemeine Vulgärsprache repräsentire; dieses Weiss verdunkle sich, nehme verschiedene matte Töne an, welche stärker und immer stärker hervortreten, bis endlich die Farben des Regenbogens unmerklich ineinander überfliessend vor unsern Augen stehen. Dieses Bild ist, weil es verschiedene, nicht einen einzigen Moment der Anschauung erfordert, zwar ein weniger einfaches als jenes, kommt aber eben darum dem auch keineswegs einfachen Sachverhalt näher.|21f|family tree 143|Schuchardt1900|Wir können daher nicht sowohl das Gebiet eines einzelnen Dialektes als die Gebiete aller seiner einzelnen Lautbehandlungen beschreiben.|26|- 144|Schuchardt1900|Das steht im Allgemeinen fest dass eine Lauteigenthümlichkeit die ein gewisses grösseres Terrain beherrscht, nicht überall gleichzeitig, durch die gleichen oder ähnlichen natürlichen Bedingungen ins Leben gerufen wurde, sondern nur an einem Punkte und dass sie von da allmählich um sich griff. Sie wird sich nach allen Seiten ziemlich gleichmässig ausgebreitet haben, ihr Ursprung also gewöhnlich nicht an den Grenzen ihres Gebietes liegen. Entwerfen wir nun eine Karte, auf der wir die Umfassungslinien aller nur möglichen Laut- und Formerscheinungen vermittelst deren das Latein zum Romanischen sich abgeändert hat, angeben, so werden wir in diesem Durcheinander von Linien einige dichtere oder dunklere Stellen wo sich mehr kreuzen, wahrnehmen, d.h. wir werden Übergänge statuiren. Aber wir werden dadurch noch lange keine Klassifikation gewinnen, sondern im günstigsten Falle gewisse Hauptpunkte der Ausstrahlung erkennen, die aber doch wieder mit denen zweiter, dritter u.s.w. Grösse eine fortlaufende Reihe bilden. Ein rein mechanisches Abwägen aller verschiedenen merkmale ist im Einzelnen unausführbar, unstatthaft, überdies deswegen weil ja der Werth dieser Merkmale nicht gleich ist. Wollen wir je das wichtigste als Eintheilungsgrund herausgreifen, so ist die Frage nach der [pb] Wichtigkeit ausserordentlich schwer zu entscheiden, mit Sicherheit nur bei der Erkenntnis eines Lebenszweckes der Sprache. Nehmen wir aber mehrere wichtigste Charaktere an, so sehen wir dass in einem zwei Sprachen miteinander übereinstimmen, in dem andern voneinander abweichen.|29f|sound change, wave theory 145|Trubetzkoy1939a|Unter (direkt oder indirekt) phonologischer Opposition verstehen wir also jeden Schallgegensatz, der in der gegebenen Sprache eine intellektuelle Bedeutung differenzieren kann. Jedes Glied einer solchen Opposition nennen wir **phonologische** (bzw. **distinktive**) Einheit. Aus diesre Definition ergibt sich, daß die phonologischen Einheiten recht verschiedenen Umfans sein können.|296|phoneme 146|Trubetzkoy1939a|Phonologische Einheiten, die sich vom Standpunkt der betreffenden Sprache in nicht noch kürzere aufeinanderfolgende phonologische Einheiten zerlegen lassen, nennen wir **Phoneme**. Somit ist das Phonem die kleineste phonologische Einheit der gegebenen Sprache. Die bezeichnende Seite jedes Wortes im Sprachgebilde läßt sich in Phoneme zerlegen, als eine bestimmte Reihe von Phonemen darstellen.|297|phoneme 147|Trubetzkoy1939a|Natürlich darf man die Sache nicht zu sehr vereinfachen. Man darf sich die Phoneme nicht etwa als Bausteine vorstellen, aus denen die einzelnen Wörter zusammengesetzt werden. Vielmehr ist jedes Wort eine lautliche Ganzheit, eine **Gestalt**, und wird auch von den Hörern als eine Gestalt erkannt, ebenso wie man etwa einen bekannten Menschen auf der Straße an seiner ganzen Gestalt erkennt. Das Erkennen der Gestalten setzt aber ihre Auseinanderhaltung voraus, und diese ist nur dann möglich, wenn die einzelnen Gestalten sich voneinander durch gewisse Merkmale unterscheiden. Die Phoneme sind eben die **Unterscheidungsmerkmale** der Wortgestalten. |297|- 148|Trubetzkoy1939a|Jedes Wort muß so viele Phoneme und in einer solchen Reihenfolge enthalten, daß es sich von jedem anderen Worte unterscheided. Die ganze Phonemreihe ist nur dem einzelnen Worte eigen, jedes einzelne Glied dieser Reihe kommt aber auch in anderen Wörtern als Unterscheidungsmal vor. Denn die Zahl der als Unterscheidungsmale verwendeten Phoneme ist in jeder Sprache viel kleiner als die Zahl der Wörter, so daß die einzelnen Wörter immer nur eine bestimmte Kombination der auch in anderen Wörtern bestehenden Phoneme darstellen. |298|phonetic sequence, phoneme sequence 149|Weinreich1974|The ways in which one vocabulary can interfere with another are various. Given two languages, *A* and *B*, morphemes may be transferred from *A* into *B*, or *B*-morphemes may be used in new designative functions on the model of *A*-morphemes with whose content they are identified; finally, in the case of compound lexical elements, both processes may be combined.|47|language contact 150|Weinreich1974|In the case of simple (non-compound)lexical elements, the most common type of interference is the outright transfer of the phonemic sequence from one language to another. |47|direct transfer 151|Weinreich1974|The other major type of interference involves the extension of the use of an indigenous word of the influenced language in conformity with a foreign model.|48|loan transfer 152|Weinreich1974|As a theoretical point, it may be noted that an adjustment in the content of signs with a considerable degree of homophony is a borderline case between the alternatives of (1) word transfer and (2) semantic extension due to contact. [...] If there is a \`\`leap" in meaning, a HOMONYM is established in the recipient language.|49|direct transfer, loan transfer 153|Weinreich1974|Finally, a mild type of lexical interference occurs when the expression of a sign is changed on the model of a cognate in a language in contact, without effect on the content.|50|- 154|Weinreich1974|Transfer of analysed compounds occurs when the elements of a compound or phrase are adapted to word-formative or syntactic patterns of the recipient language (if the elements are transferred unanalyzed, the word is to be considered simple).|50|loan transfer 155|Weinreich1974|Reproduction in terms of equivalent native words can be carried out with compounds, phrases, and even larger units such as proverbs. Thus, English *skyscraper* has served as a more or less exact model for German *Wolkenkratzer*, French *gratte-ciel*, spanish *rasacielos*, Russian *nebosrkjób* [...]|50|loan translation 156|Weinreich1974|LOAN TRANSLATIONS proper, in which the model is reproduced exactly, element by element.|51|loan translation 157|Weinreich1974|LOAN RENDITIONS ({\em Lehnübertragungen}, in which the model compund only furnishes a general hint for the reproduction, e.g. German {\em Vater-Land} after Latin {\em patr-ia} [...].|51|loan translation 158|Weinreich1974|LOAN CREATIONS ({\em Lehnschöpfungen}, a term applied to new coinages which are stimulated not by cultural innovations, but by the need to match designations available in a language in contact. |51|loan creation 159|Weinreich1974|Among loan translations, one can also distinguish those in which the components appear with their familiar semantemes (only the particular combination of them being due to another language) from those where one or more of the components is involved in a semantic extension.|51|- 160|Weinreich1974|The third type of interference in compund lexical units involves the transfer of some elements and the reproduction of others.|51|hybrid transfer 161|Weinreich1974|Among the hybrid compounds one may also distinguish those in which the stem is transferred and a derivative affix reproduced, [...] and those in which the stem is indegeneous and an affix transferred [...]. This last type might be called interlingual portmanteaus.|52|- 162|Weinreich1974|Only the most concrete loanwords, such as designations for newly invented or imported objects, can be thought of as mere additions to the vocabulary. |53|loanword integration 163|Weinreich1974|Except for loanwords with entirely new content, the transfer or reproduction of foreign words must affect the existing vocabulary in one of three ways: (1) confusion between the content of the new and old word; (2) disappearance of the old words; (3) survival of both the new and old word, with a specialisation in content.|54|loanword integration 164|Weinreich1974|CONFUSION IN USAGE, or full identity of content, between the old and the new word is probably restricted to the early stages of language contact.|54|loanword integration 165|Weinreich1974|Old words may be DISCARDED as their content becomes fully covered by [pb] the loanword.|54f|lexical replacement, loanword integration 166|Weinreich1974|Finally, the content of the clashing old and borrowed words may become SPECIALICED. Strictly speaking, the specialization in content usually affects both the old word and the loanword if both survive.|55|loanword integration, specialization, lexical change 167|Weinreich1974|Stylistic specialization|56|- 168|Weinreich1974|The need to designate new things, persons, places, and concepts is, obviously a unversal cause of lexical innovation.|56|lexical borrowing 169|Weinreich1974|One such internal factor is the LOW FREQUENCY of words. It has been shown that, other things being equal, the frequent words come easily to mind and are therefore more stable; relatively infrequent words of the vocabulary, are, accordingly, less stable, more subject to oblivion and replacement.|57|lexical borrowing 170|Weinreich1974|Another internal factor cunducive to lexical innovation is perniciou7s HOMONYMY. [pb] [...] the explanation by homonymy must nevertheless be applied to the behavior of bilinguals only with great care, for in many cases the existence of homonyms has definitely not prevented borrowing.|57f|homonymy 171|Weinreich1974|A third reason for lexical innovation is related to the well-known tendency of affective words to lose their expressive force. [... In such semantic fields as `talking', `beating', `sleeping', `tallness', or `ugliness', there is in may languages a constant NEED FOR SYNONYMS, on onomastic low-pressure areas, as it were.|58|lexical borrowing 172|Weinreich1974|Wheras the unilingual depends in replenishing his vocabulary, on indigenous lexical material and whatever loanwords may happen to be transmitted to him, the BILINGUAL has the other language as a constantly available source of lexical innovations.|59|bilingualism, lexical borrowing 173|Weinreich1974|Three additional factors may prompt lexical borrowing on the part of bilinguals. First, a comparison with the other language to which he is exposed may lead him to feel that some of his semantic fields are INSUFFICIENTLY DIFFERENTIATED.|59|bilingualism, lexical borrowing 174|Weinreich1974|A second consideration affecting bilinguals in particular is the symbolic association of the source language in a contact situation with SOCIAL VALUES, either positive or negative.|59|lexical borrowing 175|Weinreich1974|Finally, a bilingual's speech may suffer from the interference of another vocabulary through mere OVERSIGHT; that is, the limitations on the distribution of certain words to utterances belonging to one language are violated.|60|lexical borrowing 176|Weinreich1974|Yet, remarkably, some words seem never to be transferred. [...] This resistance to transfer on the part of some words has so far not received any explanation. It is one of the unsolved problems of language contact.|61|borrowing constraints 177|Weinreich1974|In a specific instance, an existing word in the recipient language might be supposed to repel a homophonous transfer; to avoid homonymy, loan translation, (or semantic extension) might be preferred.|61|borrowing constraints, homonymy 178|Weinreich1974|Theoretically, too, a language with many restrictions on the form of words may be proportionately more resistant to outright transfer and favor semantic extensions and loan translation instead. Such resistance would, of course, be a function not of the structure of the recipient language, but of the difference in the structures of the recipient and source languages. If Tibetan resisted transfers from Sanskrit and restricted its borrowing to loan translations, it was only because the strucutre of Sanskrit words was so different from its own structure; the resistance to transfers from a language with more congenial word structure, like Chinese, has not been so great.|61|borrowing constraints 179|Taub1993|For example, in Zur vergleichende Sprachengeschichte (1848) he suggested that languages are natural organisms because, like plants and animals, they can be grouped into families; such a system of classification can be used to describe the relationships between languages.|178|biological parallels, August Schleicher, organism 180|Taub1993|In this book, Schleicher explained that he had firest read Darwin's ideas in H. G. Bronn's 1860 translation of the \emph{Origin}, at the urging of his friend and colleague at the University of Jena, Ernst Haeckel (1834-1919).|179|August Schleicher, Darwin, Haeckel 181|Taub1993|Like Lyell, Schleicher thought that a linguist would have an easier time demonstrating the principles of evolution than would a naturalist.|181|biological parallels, August Schleicher 182|Taub1993|Although some scholars have claimed that Charles Darwin exercised an important influence on Schleicher's thought, most of Schleicher's work on language was published before Darwin's \emph{Origin of Species}. Whatever evolutionary ideas Schleicher held were present in his mind long before he had read the \emph{Origin}; from a chronological point of view, Darwin could not have had much influence on Schleicher's ideas about language.|184|biological parallels, August Schleicher, Darwin 183|Taub1993|He had studied at Bonn with Friedrich Ritschl (1806-76), an important figure in the field of textual criticism. While Ritsch had not invented the method of recovering a lost manuscript source or archetype, he emphasized the procedure and rules by which manuscripts, both extant and reconstructed versions, are placed in a pedigree or stemma. One of the most important doctrines of this method is that of the shared error, which served as a criterion in setting-up subfamilies in a manuscript tradition. The established way to represent relationships between various manuscripts was the genealogical tree, or stemma; the branches, symbolizing events in time, represented the [pb] origin, by copying, of manuscript from manuscript source (either extant or inferred). The \`method' as applied and refined by a great many scholars, remained an important aspect of textual criticism. Though one might be tempted to suggest that Schleicher got the idea for hist *Stammbaum* from natural history and taxonomy, it seems just as likely that the impetus came from Ritschl's method, a method which bears important similarities to Schleicher's own method of linguistic reconstruction.|185f|stemmatics, August Schleicher 184|Taub1993|Schleicher suggested that he was himself a pracitioner of the uniformitarian doctrine of Lyell, explaining that he had always believed that languages change very gradually, according to definite laws which limit variation. [...] In any case, it is evident that it was primarily Lyell's method rather than his geological theories, which attracted Schleicher.|189|Charles Lyell, uniformitarianism, August Schleicher 185|Taub1993|Affirming his own commitment to \`Baconian' principles, Schleicher was very confident of the ability of his method to describe the real development of language, and made strong claims regarding the ability of philologists to study linguistic change, arguing that philologists \`know positively from the observation of collected facts that languages change as long as they live'. Schleicher recognized, but did not question, the possibility that facts might not have a perfect fit with theory; he included a *Stammbaum* in his book as an illustration of the genealogical relationships bewteen languages, warning that \`in comparing this with Darwin's diagram, one should not forget that the author of the \`\`Origin of Species" had to draw up an ideal scheme, wheras we have represented the actual process of development of a given family'.|189|family tree, biological parallels, August Schleicher 186|Taub1993|In fact, to a large extent *Die Darwinsche Theorie* can be regarded as an argument for Schleicher's own method, using Darwin's \`success', with what he regards as a similar method, as a justification for his devotion to genealogical trees.|190|biological parallels, August Schleicher, family tree 187|Trubetzkoy1939|:comment:`Note on shared things between Indo-European languages` * Es besteht keinerlei Vokalharmonie. [...] * Der Konsonantismus des Anlauts ist nicht ärmer als der des Inlauts und des Auslauts.[...] * Das Wort muss nicht unbedingt mit der Wurzel beginnen. [...] * Die Formbildung geschieht nicht nur durch Affixe, sondern auch durch vokalische Alternationen innerhalb der Stammorpheme. [...] * Ausser den vokalischen spielen auch freie konsonantische Alternationen eine morphologische Rolle. [...] * Das Subjekt eines transitiven Verbums erfährt dieselbe Behandlung wie das Subjekt eines intransitiven Verbums. \ |84f|- 188|Trubetzkoy1939|Jedes von diesen Strukturmerkmalen kommt auch in nichtindogermanischen Sprachen vor, alle sechs zusammen aber nur in indogermanischen Sprachen. Eine Sprache, die nicht alle genannten Strukturmerkmale besitzt, darf nicht als indogermanisch gelten, selbst wenn sie in ihrem Wortschatze viele Übereinstimmungen mit indogermanischen Sprachen aufweist. Und umgekehrt ist eine Sprache, die den grössten Teil ihres Wortschatzes und ihrer formativen elemente aus nicth indogermanischen Sprachen entlehnt hat, dennoch indogermanisch, wenn sie die genannten 6 spezifischen Strukturmerkmale besitzt, und sei es eine nur ganz kleine Anzahl lexikalischer und morphologischer Übereinstimmungen mit anderen indogermanischen Sprachen, die sie aufweist.|85|- 189|Trubetzkoy1939|Somit kann eine Sprache aufhören, indogermanisch zu sein, und umgekehrt, kann eine Sprache indogermanisch werden. Der Zeitpunkt, wo alle obenerwähnten sechs spezifischen Strukturmerkmale sich zum ersten Male in einer Sprache zusammenfanden, deren Wort- und Formschatz eine Anzahl regelmässiger Übereinstimmungen mit den später überlieferten indogermanischen Sprachen aufwies -- dieser Zeitpunkt war die Geburtsstunde des \`\`Indogermanischen". Es ist nicht ausgeschlossen, dass ungefähr um dieselbe Zeit mehrere Sprachen in diesem Sinne indogermanisch geworden sind. Retrospektiv können wir sie heute nur als Dialekte der indogermanischen Ursprache betrachten, es ist aber logisch nicht notwendig, sie alle auf eine gemeinsame Quelle zurückzuführen. Nur ein geographischer Kontakt zwischen diesen ältesten indogermanischen Dialekten darf mit hohem Grad der Wahrscheinlichkeit angenommen werden. |85|genetic relationship 190|Alvarez-Ponce2013|Problematically, phylogenetic tree reconstruction is particularly challenging in the presence of highly divergent sequences (26, 34). First, it relies on gene families delimited using clustering methods such as the Markov cluster algorithm (35), which detects communities of closely related sequences from BLAST results. This approach may exclude the most divergent homologs in a family, which might be the most informative for an event as ancient as the origin of Eukaryotes. Second, multiple sequence alignments cannot be accurately constructed in the presence of a high number of substitutions. Third, long divergence times may have eroded at least part of the phylogenetic signal, or even deleted any detectable similarity between homologous sequences (34). Finally, generating phylogenetic hypotheses using very divergent sequences is very dependent on the model of sequence evolution used (e.g., ref. 36), and in practice highly divergent sequences almost always produce highly questionable placements in phylogenetic trees. Therefore, it is desirable to explore whether new data types exist that might provide new insight into deep evolutionary events.|E1|sequence comparison, phylogenetic tree 191|Alvarez-Ponce2013|As an alternative to phylogenetic trees, the relationships among genes can be represented more generally in the form of gene similarity networks, in which nodes and edges represent genes and similarity statements (e.g., BLAST hits), respectively (37–43).|E2|gene similarity network 192|Alvarez-Ponce2013|By considering indirect relationships, gene similarity networks have the potential to explore deeper relationships than phylogenetic trees, thus being particularly appropriate for exploring deep evolutionary events such as the origin of Eukaryotes.|E3|- 193|Bapteste2012|Indeed, many genetic similarities between biological objects are not caused by vertical descent, where the genetic material of a particular entity is propagated by replication inside its own lineage.|E1|- 194|Bapteste2012|For instance, adaptive lateral genetic transfer between genomes of entities from different lineages that share the same environment or lifestyle (29, 32, 46) indicates additional (non-vertical) mechanisms for the integration of genetic material into one host. Hence, another type of descent is fundamental to the reconstruction of an accurate evolutionary picture of the evolutionary units.|E2|- 195|Bapteste2012|The typical biological outcomes of these interlineage and interlevel assortments, namely the mosaic objects, and the multilineage coalitions of genetic partners involved in these processes can be stabilized and selected, becoming important evolutionary players in their own right (46). Therefore, introgressive descent generates non-genealogical bonds between biological objects, producing a reticulate evolutionary framework.|E3|mosaic history 196|Bapteste2012|This range has two extremes. First, there are mergers. Mergers arise when two or more components, not hitherto coexisting within the same unit, are brought together, and these components are subsequently replicated or propagated within or by a new single corporate body (9). [...] Fused genes conferring drug resistance (35), new viral genomes (49), lineages treated from symbioses (39, 56), and Russian dolls of mobile genetic elements 52, 53) are among the best known examples of mergers. The offspring of sexual eproduction are also obligate mergers, because their parts come from distinct — although closely related—sources (two parents instead of one last common ancestor).|E4|introgressive descent 197|Bapteste2012|Second, there are multilineage clubs. Members of these clubs form coalitions of entities that replicate in separate events and exploit some common genetic material that does not trace back to a single locus in a single last common ancestor of all of the members (26, 29, 31, 32, 57, 58).|E5|- 198|Bapteste2012|For instance, gene networks represent sequences by nodes, and these nodes are connected by edges when they manifest significant similarity (63). Genome networks represent genomes as nodes, and these nodes are connected by edges when they share common features (e.g., the same sequence or the same gene family) (47–49).|E6|genome networks, gene networks 199|Bapteste2012|In genome networks, monophyletic groups will generally produce cliques (Figs. 1 and 2A and Table 1) (i.e., subgraphs in which all nodes are directly connected to one another), because all entities under study share some coalescent orthologs. However, when the similarity of characters decreases under a given threshold through evolution, a different pattern is produced: some edges disappear, and cliques are replaced by intransitive chains, with adjacent objects of the chain presenting similarity up to a certain threshold (Fig. 2B). In agreement with the terminology of graph theory, we call such a subgraph of three nodes (A, B, and C) a P3 (64), where A is linked to B, B is linked to C, and A is not linked to C. This concept can be easily extended to the case where A, B, and C are not nodes but instead, cliques; in graph theory, B is called a minimal clique separator (65).|E7|clique, network, genome networks 200|Bapteste2012|By contrast, we call mosaic-P3 (M-P3) a P3, in which two entities, A and C, are indirectly connected through a third entity, B, by one or more characters that are not coalescent orthologs (Fig. 2C). Such an M-P3 unites at least two distantly related and/or unrelated lineages through a third entity acting as an intermediate binder. By definition, this structure is beyond the reach of a single-tree analysis; A and C cannot be compared directly, because they lack homology for the traits under study.|E8|- 201|Bapteste2012a|One strategy to handle massive amounts of unknowns and an overwhelming wealth of data is called exploratory studies. Such studies go from (microbiome) data to hypotheses and rely heavily on the experimental design of most inclusive methods of genetic diversity, which seek for patterns in huge data sets with the fewest assumptions possible to ease the discovery of unrecognized regularities, phenomena and interactions. The assumed goal of exploratory studies is to foster the discovery of many unrecognized patterns and to actively generate novel hypotheses, in our case about genetic diversity [2].|E1|exploratory data analysis 202|Bapteste2012a|As such, exploratory approaches differ from standard (or targeted) approaches that go from hypotheses to (microbiome) data and either support or reject pre-existing hypotheses.|E2|- 203|Bapteste2012a|The tree hypothesis a priori constrains the patterns and the processes to be identified and the discoveries to be made in a data set (e.g. genealogical relationships between taxa and genes in the microbiome).|E3|- 204|Bapteste2012a|The tree hypothesis a priori constrains the patterns and the processes to be identified and the discoveries to be made in a data set (e.g. genealogical relationships between taxa and genes in the microbiome).|E3|phylogenetic tree 205|Bapteste2012a|For instance, in genome networks, two nodes (genomes) are connected when they share at least a gene family (i.e. two genomes of Escherichia coli, each with a copy of a glucose dehydrogenase, will be connected). In gene networks [5], two nodes (individual sequences) are connected when they display more than a certain threshold of similarity (i.e. two glucose dehydrogenases will be connected when their reciprocal best BLAST score is :math:`< 1e-5` and/or when they display :math:`> 70%` sequence identity).|E4|gene networks, genome networks 206|Bapteste2013|Evolutionary networks today are most often used for population genetics, investigating hybridization in plants, or the lateral transmission of genes, especially in viruses and prokaryotes. However, the more we learn about genomes the less tree-like we find their evolutionary history to be, both in terms of the genetic components of species and occasionally of the species themselves.|E1|phylogenetic network 207|Bapteste2013|A wide variety of evolutionary processes lead to mosaic patterns of relationships among taxa: sex in eukaryotes, recombination in its variety of forms, gene conversion between paralogs, intron retrohoming, allopolyploidization, partial non-orthologous replacement, the selection of new genetic assemblages leading to modular entities as in operon formation, the emergence of new families of transposons, independent lineage-sorting among alleles, and unequal rates of character loss between lineages, among others (Table 1).|-|mosaic history, phylogenetic network 208|Bapteste2013|Reticulate patterns can also emerge from improper data processing and analysis, such as model misspecification, data management error, and poor alignment of sequences.|E3|- 209|Keidan2013|Indeed, a common substrate (or typologically similar substrates) can lead to similar innovations in a group of genetically unrelated languages. |E1|substrate, shared innovation 210|Keidan2013|In our understanding, the drifts are processes that have a common genetic background but keep developing in a parallel way in two (or more) languages also after their genetic separation. |E2|drift, shared innovation 211|Keidan2013|If two languages can be proven to develop a common feature not originating from any kind of possible common genetic background in some preceding stages, then it must be considered a true contact‐based branch‐crossing innovation. Otherwise, if diachronically we observe a divergent development of the shared feature, the genetic origin is the default explanation (see Nichols 1992). |E3|drift-induced similarity, contact-induced similarity, genetic similarity 212|Beauregard-Racine2011|While networks are very useful and fast tools to unravel some patterns and processes of genetic diversity, they are incomparably more powerful when coupled with analyses of phylogenetic forests.|E1|network, forest 213|Branner2000b|My ability to confirm the soundness of another linguist's classification depends on my having access to enough data to be able to consider alternative possibilities.|22|- 214|Branner2000b|A language, however mixed, belongs to the same language class as another, when the most essential, most concrete, most indispensable [and] very first words, the foundations of language, are common to them both. In contrast, nothing can be inferred about original kinship of languages from technical terms, words of courtesy and commerce, i.e. that part of language which association with others, social intercourse, culture, and scholarly activity have rendered it necessary later to add onto the oldest vocabulary; for it depends on many circumstances, which can be known only from history, whether a people has simply borrowed these words from the tongues of others or developed them out of their own. :comment:`Original quote from Rasmus Rask 1818`|25|basic vocabulary, genetic classification, Rasmus Rask 215|Branner2000b|Sad to say, in Chinese linguistics, it is still rare to find these “most essential, most concrete, most indispensable and very first words, the foundations of language” in most published reports.|25|basic vocabulary, Chinese 216|Branner2000b|Chinese linguistic journals and books burgeon with essays that give only the briefest and most tantalizing glimpse of the phonology of some interesting dialect or another, using only the phonological categories of medieval formalism and without presenting enough whole data to answer a single question about affiliation. Even the long lexicons and full-length dictionaries and surveys that are finally beginning to be published in China often lack certain basic and very common words, so that if one takes the time to assemble correspondence sets from these poorly-indexed works, there are holes everywhere. Comparative linguistics cannot be done without data of sufficient detail for each sample used. |25f|Chinese dialectology, Middle Chinese 217|Branner2000b|The point of all this is that if one wants to do etymology rigorously, one needs to use comparative principles; to use comparative principles, one has to have a reasonably full record of the speech of the dialect, or at least a good selection of the really important, basic words. |28|Chinese dialectology, genetic classification, basic vocabulary 218|Branner2000b|A difference of opinion has emerged among students of Chinese dialect classification as to whether it is better to use as many features as possible or to narrow oneself down to as few features as possible. Conventional dialect classification in Chinese used to focus on a single exemplary dialect, always the speech of a large city, and define the whole group by describing the various features of the examplar. This is obviously an extreme case of the first approach, the use of many features.|29f|dialect classification, genetic classification, Chinese dialectology 219|Branner2000b|As data from more and stranger dialects has become available, that approach has been found untenable, and students have tried to embrace as many dialects as possible in classifications that use only a few highly significant features|30|feature, genetic classification, Chinese dialectology 220|Branner2000b|Among Chinese students of dialectology, it is unusual not to include etymological Chinese characters, the so called beentzyh 本字, 'original characters', in dialect reports.|35|Běnzì, cognates, Chinese 221|Branner2000b|Among Chinese students of dialectology, it is unusual not to include etymological Chinese characters, the so called beentzyh 本字, 'original charactesr', in dialect reports.|35|Běnzì, cognates, Chinese 222|Branner2000b|Among Chinese students of dialectology, it is unusual not to include etymological Chinese characters, the so called beentzyh 本字, 'original charactesr', in dialect reports.|35|Běnzì, cognates, Chinese 223|Branner2000b|Among Chinese students of dialectology, it is unusual not to include etymological Chinese characters, the so called beentzyh 本字, 'original charactesr', in dialect reports.|35|Běnzì, cognates, Chinese 224|Branner2000b|Among Chinese students of dialectology, it is unusual not to include etymological Chinese characters, the so called beentzyh 本字, 'original charactesr', in dialect reports.|35|Benzi, cognates, Chinese 225|Branner2000b|The use of beentzyh leads people to see the characters as absolute symbols of the Common Chinese morphemes underlying all dialect forms. This is one of the implications of teh Chinese writing system that has both helpful and misleading results.|35|Benzi, cognates 226|Wang1996|The result of these decases of empirical research is that we now have a great abundance of data, gathered in the dialect surveys, monographs, and journals.

Some of these data have been entered into the computer so that the new technology on data base management can be applied. Paradoxically, very little work has been done on what to do with this abundance of data, how to interpret them and distill from them the most important implications. In this dimension of Chinese linguistics, there is a severe imbalance of being data-rich and theory-poor. |244f|- 227|Wang1996|A third approach I would like to discuss briefly has to do with the internal temporal relations among the linguistics changes. The importance of studying these relations has been captured in the remarks of Ting Panghsin 丁邦新, who proposes to base the larger groupings of dialects on earlier changes, and smaller groupings on later changes. |255|Chinese dialectology, genetic classification 228|Wang1997|A major justification for calling all these seven varieties dialects is the great amount of socio-cultural continuity and homogeneity that is reflected in them for many centuries.|57|Chinese dialectology 229|Wang1997|This continuity and homogeneity is probably more accelerated today than ever before, since virtually everyone in China speaks Putonghua to some extent . This situation serves as a powerful centripetal force that is constantly feeding lexical items and grammatical features from Putonghua into all varieties of Chinese. |57|centripetal forces, Chinese dialectology, language history 230|Wang1997|Furthermore, the traditionalist would point out, that the shared written language is another strong argument. The design of the writing system is such that a group of characters which share a phonetic component have identical or similar pronunciations to each other . But the value of the phonetic component itself has no fixed phonetic value and varies according to space and time. This is the feature that made the characters serviceable to such a large population over these many centuries.|57f|centripetal forces, Chinese dialectology, language history 231|Wang1997|To balance out the picture, however, we should note that the characters which come from the political center are often not completely adequate to meet regional needs. Speaking from a Min viewpoint, R. Cheng put the point across well when he wrote an article entitled "Taiwanese morphemes in search of Chinese characters." In it, he gave a succinct description of the methods that have been used to deal with this problem. One of these methods is to create new characters to meet regional needs.|58|dialect writing 232|Wang1997|Going back to Pulleyblank's statement, note that the shared basic vocabulary between English and German is only 58%, as compared with 74% for Beijing and Guangzhou . Using basic vocabulary as a yardstick, it would be more accurate to say that Cantonese and Mandarin are as different from each other as French and Spanish . More generally, we can also say that the Chinese dialects are about as different from each other as the common Romance languages in Table 2, and certainly more different from each other than Norwegian and Swedish . Since the latter are considered separate languages, it is certainly justifiable to refer to the units in Table 1 |60|linguistic diversity, Chinese 233|Wang1997|On the other hand, the socio-cultural situations which ensued are fundamentally different in the two situations. France, Italy, and Spain evolved along largely separate and independent paths, in their societies as well as in their languages . In sharp contrast to this, for most of the time over these two millennia, there has been a continuous and unified heritage in China, with a prestige form of the language from the capital successively layering itself over all the dialects, influencing every aspect of linguistic development, especially the vocabulary .|60|centrifugal forces, Chinese, language history 234|Wang1997|Since all Chinese dialects have a comparable syllable structure, a written word can be borrowed from dialect to dialect, and be easily pronounced within the phonological system of any dialect.|61|borrowability, Chinese, lexical borrowing 235|Jachiet2013|Sequence similarity networks were first proposed in a study conducted by Tatusov et al. (1997) and used for larger scale studies in the study conducted by Enright et al. (2002).|E1|gene similarity network 236|Leigh2011|The notion of phylogenetic incongruence predates molecular phylogeny, though the many biological sources of incongruence in molecular data (e.g., hybridization [McBreen and Lockhart 2006; Koblmu ̈ller et al. 2007], incomplete lineage sorting [Hudson 1983; Hobolth et al. 2011], lateral gene transfer [Bapteste et al. 2009]) have certainly raised awareness of the importance of incongruence among evolutionary biologists in recent years.|E1|phylogenetic incongruence 237|Leigh2011|The topology-based methods use as null hypothesis complete lack of correlation between trees and directly compare topologies (Lapointe and Rissler 2005; de Vienne et al. 2007; Nye 2008; Puigbo` et al. 2009). [...] Other methods, such as the Congruence Among Distance Matrices test (Campbell et al. 2009, 2011), compare distance matrices rather that trees to assess the null hypothesis of incongruence.Topology-based methods have been very useful in fields such as phylogeography and studies of coevolution, where any correlation in different trees is of interest.|E2|phylogenetic incongruence 238|Leigh2011|In this case, a null hypothesis in which genes share the same topology is used and rejected in order to identify incongruence. Normally, these methods are classified as character-based because topologies are not compared directly; rather, the methods evaluate the fit of different topologies to different markers.|E3|phylogenetic incongruence 239|Bapteste2009|Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes.|E1|family tree, prokaryotic evolution 240|Bapteste2009|Hence, there is a high risk that the traditional approach produces circular phylogenetic analyses, in which assumptions of a common tree are supported by assumptions about how the data should be represented.|E2|family tree, circularity 241|Bapteste2009|First, in purely statistical terms, this failure to reject does not mean that they support the consensus tree, and that they have evolved according to this very topology [51]. Second, individual genes with a weak phylogenetic signal will always fail to reject the consensus tree.|E3|phylogenetic incongruence 242|Bapteste2009|Methods that search for a single universal tree often involve steps of data exclusion in which lateral gene transfer is conceived as noise. The use of such eliminative crite- ria allows these phylogeneticists to ignore LGT, but also leaves them without any trustworthy genes with which to study prokaryote evolution.|E4|conflicting data, phylogenetic incongruence 243|Bapteste2009|For those who take a monistic approach, sidelining or deprioritizing data that conflicts with the model of a single tree may appear to be a less extreme alternative than large-scale data exclusion.|E6|- 244|Bapteste2009|Several observations question the validity of equating the consensus or average phylogenetic pattern with a bifurcating evolutionary organismal history, or with the tree-like evolutionary history of the species [58-61]. At least some of the consensus signal found in core genomes [60] might reflect not a shared history but instead, artefactual phylogenetic reconstruction.|E7|family tree 245|Bapteste2009|The analyst lumps together a great deal of data that did not evolve by a common tree-like process, analyzes it with methods that deliver only trees as their result (as opposed to more-general models such as networks), obtains a tree, and then asserts that this exercise provides evidence in favour of the existence of a tree.|E8|family tree 246|Bapteste2009|Today however, if embracing a monist perspective to describe microbial evolution, the question is not to ask whether the tree model still represents the best framework to infer and depict evolutionary relationships, but rather to ask which of the competing approaches already available is best suited to produce the most satisfactory tree.|E9|family tree 247|Sankoff1978|This rejection is usually phrased in terms of the conviction that randomness and indeterminacy play no significant role in the structure of or inter language, particularly with respect to the linguistic competence, the individual speaker-hearer. The genesis of this attitude is threefold, and we will discuss the three aspects in order of increasing importance.|-|- 248|Yan2006|With its long history of civilization, many waves of migration, and various types of typography, China has evolved into a multitude of subdialects of Chinese, especially in the southern part of the country where the geographical conditions have contributed to the isolation of numerous settlements and, hence, the independent development of dialects.|3|wave theory, family tree, Chinese 249|Yan2006|From the 1930s, some linguists have begun to use linguistic characteristics, phonological features in particular, as criteria for dialect classification. The criteria, on which the dialect groupings and dialect identifications were based, always involve the reconstructed sounds or tonal categories of Middle Chinese as referent point.|9|genetic classification, Chinese dialectology 250|Yan2006|In 1937, Fang-kuei Li (李方桂) proposed some phonological features such as the criteria to classify Chinese dialects into eight major groups. His dialect classification has been widely accepted or modeled upon for future dialect studies. Later, scholars such as R. A. D. Forrest (1948), Tung T'ung-huo 蕫同龢 (1953), Yuán Jiāhuá 袁家驊 (1960), Zhān Bóhuǐ 詹伯慧 (1981, 1991), Ting Pang-hsin (Dīng Bāngxīn) 丁邦新 (1982), Huáng Jǐnghú 黄景湖 (1987), S. Robert Ramsey (1987), Jerry Norman (1988), and Lau Chufat (Liú Zhènfā) 劉鎮发 (1999), have also proposed different features for dialect classification or dialect identification, and differing numbers of groupings. Most of these scholars have used only phonological features as criteria for the classification.|14|genetic classification, Chinese 251|Yan2006|Norman (1988: 182) is the first linguist who has proposed ten criteria (or diagnostic features) of phonological, grammatical and lexical features for the dialect classification. |16|genetic classification, Chinese 252|Yan2006|In 1988, with a joint effort of the scholars of China and Australia, a set of Language Atlas of China 中国语言地图集 (Volume I & II), chief edited by S. A. Wurm et al. Were published. Chinese dialect texts were editet by Lǐ Róng (李荣), Xióng Zhènghuī (熊正挥), Zhāng Zhènxìng (张振兴), and Hè Wěi (贺巍), et al. In the Atlas, the Chinese dialects have been mainly divided into ten major fāngyánqún 方言群 (dialect supergroups), [...]|16f|genetic classification, Chinese 253|Yan2006|In Zìhuì, the data included 17 localities and readings of 2,722 character entries of seven major dialects. For each character entry, information of the Middle Chinese phonological categories (initial, final and tone) and dialect pronunciation were provided.|26|Chinese, dialect data 254|Yan2006|In Cíhuì, the data included 18 localities and readings of 905 lexical (word) entries of seven major dialects. The readings of the entries were basically in colloquial form, thus no literary readings were registered.|26|Chinese, dialect data 255|Yan2006|Both Zìhuì and Cíhuì have become two important resources for the dialectology outside of Chinese since 1960's.|26|Chinese, dialect data 256|Yan2006|One of the most important activities of Chinese dialectology from the mid-1980s to 1990s was a survey project of individual dialecst, preparte for the comilation of individual dialect dictionaries. [...] For the sake of future dialect comparison the Biānjí [...] had designed a Hànyǔ Fāngyán Cíhuì Diaòchábiǎo [...] for the fieldworker to use to collect at least the lexical items included in the list which itself included about 2000 entries. The entries were divided into twenty-nine categories [...].|33|Chinese, dialect data 257|Yan2006|The contents of the dictionaries were presented in a uniform format, including these major features: (1) Introduction [...], (2) main text [...], (3) indexes; (4) references. |36|Chinese, dialect data 258|Yan2006|During the 1990s, besides the publication of the above mentioned dialect dictionaries and Yángqǔ Fāngyánzhì, one new endeavor of the Chinese dialectology has been the compilation of a Xiàndài Hànyǔ Fāngyán Yīnkù 现代汉语方言音库 (Sound Database of Modern Chinese Dialects), under the direction of and chief-editor Hóu Jīngyī (侯精一) of Zhōngguó Shèhuì Kēxuéyuàn Yǔyán Yánjiūsuǒ 中国社会科学院语言研究所 (Linguistic Institue of the Chinese Academy of Social Sciences). So far, forty sound files have been completed ans some have been published beginning 1998.|38|Chinese, dialect data 259|Yan2006|As early as the late 1960s, Wang and his research associate, Cheng Chin-chuan (鄭錦全), felt that modern linguistic theories had been developed basically based on European languages (non-tonal languages). [...] Thus in 1966, with the collaboration of Cheng Chin-chuan [...] and others, Wang directed and lauched a computer dialect database project called “DOC” (Dictionary on Computer), which computerized the sounds of over 2,700 words in 17 dialects in the Hànyǔ Fāngyán Zìhuì 汉语方言字汇 (Chinese Dialect Character Pronunciation List, Beijing University 1962. Later they added the sounds of Shanghai, the Zhōngyuán Yīnyùn (中原音韵) of the 14th century, Kan-on and Go-on Sino-Japanese, and Sino-Korean to the data pool.|45|Chinese, dialect data 260|Yan2006|With the convenient reference tool “DOC”, they and other of Wang's students, Matthew Y. Chen [...] and Hsieh Hsin (谢信一), etc. Have been able to employ the Chinese dialect data in DOC to do the analyses and to test different linguistic change theories|45|Chinese, dialect data 261|Yan2006|His lexical diffusion theory – the gradual spread of phonological change from morpheme to morpheme, has a great impact on the fields of Chinese dialectology and historical phonology.|45|lexical diffusion, linguistic diffusion, Chinese 262|Norman1989|In searching for classificatory criteria, it is highly desirable to find features that are both necessary and sufficient.|323|feature, genetic classification, Chinese, dialect classification 263|Norman1989|Ideally, one would like to find a single criterion of this type, but this may not be possible in vevery case. An attempt should be made, nonetheless, to find the smalles number of criteria on which to establish the major dialect groups.|323|Chinese, dialect classification, genetic classification, feature 264|Norman1989|Another useful distinction to keep in mind is that between diachronic and synchronic criteria. [...] Clearly a genetic classification (one that purports to be based on historical descent) must be based on diachronic criteria.|323|classification criteria, dialect classification, Chinese, genetic classification 265|Norman1989|I take it as evident that all Chinese dialect classifications, at least implicitly, are intended to be genetic; that is, the groups arrived at in each classification are in some sense to be considered as groups of dialects sharing a common origin.|323|Chinese, dialect classification, genetic classification 266|Norman1989|Languages and dialects, to be sure, are not the same thing as living organisms, but genetic descent is undeniably one of the most important aspects of linguistic development.|324|genetic relationship, genetic descent, Chinese, language history 267|Norman1989|It is important that Chinese dialect classification be based on the actual spoken forms of dialects and not on lists of character readings. [...] The danger of what one might call the "character aproachs" is that one frequently misses the most critical forms for establishing the proper classification of the dialect being studied.|324|Chinese, genetic classification, dialect classification, basic vocabulary 268|Norman1989|It is better to view the uniform and unconditional shift of older voiced stops and affricates to voiceless aspirates in various regions of China as cases of parallel development lacking any strong diagnostic value in the classification of Chiense dialects. I would even go so far as to say that the development of the old voiced stops in Chinese diaelects is in general a weak classificatory criterion and that its main value is a secondayr corroborative trait which can be used, in the case of some dialect groups but not others.|327|classification criteria, parallel development, homoplasy, Chinese, dialect classification, genetic classification 269|Norman1989|The Kèjiā lexicon, hitherto little studiesd, provides a very rich area for comparative work. Perhaps the most interesting thing about the lexicon of these dialects is that there are very few lexical items that can be considered purely Kèjiā. This is in part to be explained by the fact that Kèjiā is surrounded on all sides by dialects of other groups and that it has lived in a symbiotic relationship with these dialects for many centuries. |335|Hakka, Chinese, lexical borrowing 270|Norman1989|One of these represents an achraic stratum that shows significant links to forms found in the Mǐn dialects. This body of words is not large; the larger part of the Kèjiā lexicon belongs to a later and more general variety exhibiting strongs bonds to the popular lexicon of the other non-Mǐn dialects of the Southeast.|336|Hakka, Chinese, lexical strata 271|Norman1989|Even from this rather cursory examination it can be seen that the Kèjiā lexicon falls into two distinct types, one a more archaic type showing links to the Mǐn dialects and another later type, having links to Gàn, Wú and Yuè.|337|Hakka, Chinese, lexical strata 272|Norman1989|As I have proposed elsewhere (Norman 1983: 209-210), the older stratum A goes back to the Chín-Hàn era and represents an ancient popular form of Chinese spread by the first imperial conquests. Stratum B, I believe, owes its origin to a later wave of Chinese, historically connected with the massive immigration of northerners into the region south of the Yangtze in the Eastern Jìn period. |338|Hakka, Chinese, lexical change 273|Norman1989|The core vocabulary of teh Mǐn dialects derives for the most part from this earlier stratum. I believe that Kèjiā originally also belonged to this early dialectal milieu. |339|Hakka, Mǐn, Chinese, genetic relationship 274|Baxter2000c|Intuitively, the least upper bound of a set S of phonologies is a minimal possible ancestor for the phonologies in S: it includes just enough distinctions to derive the distinctions found in any member of S, but no more.|104|- 275|Baxter2006a|This paper investigates the phylogenetic relationships (the relationships of shared ancestry) of ten Chinese dialects: eight generally classified as Mandarin, and two others (Chángshà 长沙 and Hángzhōu 杭州). Using six of the Mandarin dialects, a 'Proto-Macro-Mandarin' (PMM) phonological system is reconstructed, from which the other four dialects also appear to be derivable. Taking 29 phonological innovations as a basis, the phylogeny of the ten dialects is estimated using a technique borrowed from biology (Camin-Sokal parsimony). The results suggest that a classification of Chinese dialects based on phylogeny could be rather different from the generally accepted classification.|071|- 276|Baxter2006a|When two dialects share a certain feature, this could be for any of three reasons: (1) coincidence, (2) borrowing, or (3) inheritance. If the two dialects developed the feature independently of each other, then this is coincidence. If one of the dialects acquired the feature from the other, this is borrowing. If each dialect has a certain feature because it was already present in a common ancestor, this is inheritance.|72|feature, etymological relations, Chinese, inheritance, coincidence, borrowing 277|Baxter2006a|The distinction between borrowing and inheritance is crucial to analyzing linguistic history.|73|borrowing, inheritance 278|Baxter2006a|The diagram in Figure 1 is not intended as a representation of the historical relationships among English, German, and French. Rather, it represents a single aspect of their history, namely the relationships of common ancestry. A full account of their history would have to include much more, of course, including many kinds of influences. Following Schmidt (1872), textbooks of historical linguistics often present trees and waves as competing theories for representing linguistic history, as if linguists must choose between them; but this is a false dichotomy. Any adequate history of a language family must deal with both inheritance and borrowing.|74|language history, family tree, wave theory, genetic relationship 279|Schmidt1872|Fallen also die in neuerer zeit construierten grundsprachen die europäische, nordeuropäische, slawodeutsche, südeuropäische, graecoitalische oder italokeltische dem reiche des mythus anheim, so schwindet auch die mathematische sicherheit, welche man für die reconstruction der indogermanischen ursprache schon gewonnen zu haben glaubte.Zwar gibt es eine ganze reihe von worten und grammatischen formen, deren vorhisto-ische grundformen wir zuverlässig erschliessen können, selbst wenn sie in keiner einzigen sprache unverändert erhalten sind [...]. In anderen |28|- 280|Baxter2006a|But this argument presupposes a positivist approach to linguistic reconstruction, which assumes that if one follows the proper scientific method, one will never be led into error.|74 [footnote]|linguistic reconstruction 281|Baxter2006a|To turn now to Chinese: most modern attempts to classify Chinese dialects (such as Professor `Li Fang-kuei's \`Languages and dialects of China', 1937, 1973 `_) have been based on a few easily identifiable characteristics, such as initial consonant types and patterns of tonal development. Such a procedure allows dialects to be classified quickly and easily.|75|Chinese, dialect classification, classification criteria 282|Baxter2006a|But a classification based on a small number of features, while convenient, is essentially arbitrary: a different choice of features would produce a different classification, and there is no guarantee that either classification will reflect the phylogeny of the dialects. Of course, there can be different classification schemes for different purposes. But in this study we are concerned with relationships of common ancestry, and it is not clear that conventional dialect classifications reflect these relationships accurately.|75|dialect classification, Chinese, classification criteria 283|Baxter2006a|The main advantage of using phonological mergers is that one can usually say with confidence that the unmerged state is the ancestral one, and the merged state is the innovation;|75|merger, linguistic reconstruction, shared innovation, ancestral states 284|Baxter2006a|The reason for including so many Jiāng-Huái dialects is that the phylogeny of this group is especially problematic, because of the unusual features of the Tàizhôu-Rugâo subgroup on the one hand, and the striking similarity of Bàoshân (assigned to Southwest Mandarin) to Nanjing on the other.|79|Chinese, Mandarin 285|Baxter2006a|But the dialects of Chángshà and Hángzhōu, neither of which is generally considered Mandarin, also appear to be derivable from Proto-Macro-Mandarin; I have been unable to find in them any systematic phonological distinction not reflected in the PMM reconstruction.|88|Chinese, Mandarin, Xiāng, Wú 286|Baxter2006a|These observations suggest that current dialect classifications, such as that in Wurm et al. (1988) and Lï Róng (1989), may not be valid from a phylogenetic point of view. If Mandarin is defined broadly enough to include Rúgāo, then it is probably broad enough (from a phonological point of view, at least) to include Chángshà and Hángzhōu. If we use a narrower definition of Mandarin which would exclude Changshà and Hangzhôu, it would probably exclude Rúgāo as well. If Chángshà and Hángzhōu really are descendants of Proto-Macro-Mandarin, then this raises the question whether the other dialects now assigned to the Xiāng and Wú groups are also derivable from it; if not, then Xiāng and Wú as currently defined may not be valid phylogenetic units.|89|Xiāng, Wú, Chinese, Mandarin, dialect classification 287|Baxter2006a|Dialects diverge from a common ancestor as a result of innovations which affect some dialects but not others; the resulting phylogenetic relationships can be investigated by studying the pattern of shared innovations among the dialects. If a dialect classification is based on a small number of features, the job of identifying shared innovations is not especially complex and can be done by hand. But given the fact that the same change may happen repeatedly by chance, or may spread from one dialect to another, it is questionable whether any classification based on only a few features will be phylogenetically meaningful.|90|homoplasy, dialect classification, inheritance, shared innovation, borrowing 288|Baxter2006a|I also rejected a number of mergers which I judged likely to have occurred independently or to have been adopted from prestige dialects. For example, the palatalization of velars (e.g. the change of [k] etc. to [tç] etc.) before high front vowels is a natural assimilation frequently found in languages all over the world, and the subsequent merger of [tç] with [ts] in the same environment is also a very natural [pb] change; moreover, since this merger is a feature of standard Mandarin (*pútōnghuà* 普通话) it has spread widely for sociolinguistic reasons.|90|Chinese, shared innovation, sound change, naturalness, Chinese 289|Baxter2006a|Biologists face similar problems when trying to infer the phylogenies of living organisms, and they have developed a wide range of algorithms which can be used for this task. The algorithm used here comes from Joseph Felsenstein's PHYLIP package (Phylogeny inference Package), which is available on the internet free of charge.22 PHYLIP includes a great many different programs for phylogenetic analysis, not just maximum parsimony programs, but the maximum parsimony program PENNY is simple to use, and it is especially powerful when one is using features whose ancestral states are known.|92|dialect classification, maximum parsimony 290|Baxter2006a|The tree suggests that this north-south division is the oldest and deepest split among the dialects in the sample. Interestingly, this division is consistent with the division proposed by Zav'jalova (1983) on the basis of the more traditional method of observing bundles of phonological isoglosses.|95|Chinese, Mandarin, genetic classification, dialect classification 291|Baxter2006a|One final question: Is it appropriate to use phonology as a basis for phylogeny? I chose to use phonological mergers in this study largely because we can usually say with confidence that the unmerged state is the ancestral one, and the merged state is the innovative one. But as noted above, some phonological innovations are natural and may occur many times independently; and some innovations (not necessarily the same ones) are easily imitated, and likely to spread across dialect boundaries, if the sociolinguistic conditions are right. In either case, the innovations are unlikely to be useful in inferring phylogeny.|100|Chinese, dialect classification, genetic classification, classification criteria 292|Baxter2006a|On the other hand, phonological innovations which are subtle and unusual are less likely to recur independently or to be successfully imitated by speakers of neighboring dialects. In the Chinese context, a dialect can easily lose the distinction between retroflex and dental sibilants ([t§] etc. and [ts] etc.) across the board; but dialects which have the distinction show several subtly different patterns (see the discussion in Baxter (1999)), which would probably be difficult for outsiders to imitate successfully. The innovations which produced these patterns are therefore potentially useful for inferring phylogeny. In any case, a classification based on a large number of phonological innovations is more likely to reflect phylogeny than traditional classifications based on a few arbitrarily chosen features, some of which are innovations and some of which are not|100|dialect classification, Chinese, genetic classification, classification criteria, subgrouping 293|Cheng1973|In summary, a computer file fo the tones of 737 dialect locations has allowed us to arrive at some majority and minority trends of general Chinese tonal patterns. For the first time, we have been able to statistically validate the claim that, with the exception of 4A and 4B, B tones are lower than A tones. The data has shown, among other things, that northern dialects have fewer tones and higher pitch values than southern dialects, that high tones predominate in most dialects, and that the falling tone contour occurs more frequently than any other tonal contour. These are only a few of the possible conclusions that can be drawn from the data made readily available by the computer file.|105|tone, Chinese, Chinese dialects 294|Hsieh1973|Traditional methods of subgrouping such as the tree method, the wave method, and the lexico-statistic method sometimes fail to apply, particularly when the material involves incomplete sound changes -- changes that are still in progress of have terminated prematurely. The author proposes the diffusion overlapping method as a phono-lexico-statistic procedure of subgrouping to replace traditional methods where they do not apply or to supplement them where they do apply but the results need further confirmation.
This method is based on a collorary of the concept of lexical gradualness in sound change: that the more phonological forms or more dialects share with respect to a commonly initiated but independently executed sound change, the longer they must have developed together, and hence the more closely related they are.|064|- 295|Hsieh1973|Unfortunately, there does not seem to be any effective tool for distinguishing the former cause from the latter.|64|contact-induced similarity, genetic similarity, drift-induced similarity, chance resemblance 296|Hsieh1973|However, since the separation of dialects is not always along the geograhic dimension, linguistic areas will correspond to genetic subgroups only when the split of dialects proceeds strictly along the geographic line.|65|wave theory, dialect classification, genetic classification, geography, linguistic area 297|Hsieh1973|The tree and the wave theorists are incapable of treating this problem not only because they tend to ignore incomplete changes but because they are merely interested in the qualitative differences among dialects. In using these two methods, one only asks the quetsion whether two dialects share or do not share a particular change. One does not ask to what extent they share it.|65|family tree, wave theory, lexical diffusion 298|Hsieh1973|Although the lexico-statistic method does provide us with a quantitative way of measuring genetic affinity, it usually relies on different degrees of similarity in the vocabulary for clues. yet, dialects do not always differ enough in the vocabulary to allow meanignful measurement.|65|lexicostatistics, genetic classification, dialect classification 299|Hsieh1973|DIFFUSION OVERLAPPING. Since language split may happen in the mid-course of a sound change, the change originating in the previously uniform speech community will develop independently in each dialect. |70|lexical diffusion, language history, genetic classification 300|Hsieh1973|Since sound change is lexically graudal, a change may continue independently in two or more dialects after they split. The more phonological forms two or more dialects share with resprect to a commonly initiated but independently executed sound change, the longer they must have developed together, and hence the more closely related they are. |88|- 301|Li1973|The linguistic situation in China is a very complicated one. Aside from the Chinese language with its numerous dialects, there are many other languages, our knowledge of which is incomplete. Some of them have not been adequately studied, some of them are scarcely known to us, and many of them have not been sufficiently recorded. The material of these languages is therefore scanty, their history unknown, and their relation with other groups very vaguely understood.|1|Chinese, Chinese dialectology 302|Li1973|With the relations among the different languages thus formulated, it is apparent that these similarities cannot be due to chance or mere borrowing, but are due to the fact that these languages are the descendants of a common parent speech. |1|genetic classification 303|Li1973|The Northern Mandarin group occupies a large area in North China [...]. It is characterized by the unvoicing of the ancient voiced stops, affricates, and fricatives, and by the disappearance of the 'entering tone' 入声. There are as a rule only four tones: 'ying-ping' 阴平, 'yang-ping' 阳平, 'shang' 上, 'qu' 去. Further division into subgroups is possible.|3|subgrouping, Chinese, Mandarin 304|Li1973|The Eastern Mandarin group is spoken along the lower Yangtze in the provinces of Anhui 安徽 and Jiangsu 江苏. It is differentiated from the Northern group by the existence of the 'entering tone' as a short tone, but the original final consonants *-p, -t,* and *-k*, which accompanied the \`entering tone', are substituted by the glottal stop. It has, therefore, five tones.|3|subgrouping, Chinese 305|Li1973|The Southwestern Mandarin group is a farily uniform type of speech spoken in Sichuan 四川, Yunnan 云南, Guizhou 贵州, and parts of Hubei 湖北. It has as a rule no 'entering tone', but in the central part of Sichuan 四川 along the Yangtze, the \`entering [pb] tone' is preserved as a special tone, but the final consonants have completely disappeared. |3f|Chinese, subgrouping, Mandarin 306|Li1973|The Wu 吴语 group of dialects is spoken south of the Yangtze in Jiangsu 江苏, Zhejiang 浙江, and in a few districts in the eastern part of Jiangxi 江西. It is characterized by the preservation of the ancient voiced stops, etc., as aspirated voiced consonants and by the preservation of the \`entering tone' as a short tone with the loss, however, of the final *-p, -t, -k* (or rather, substituted by the glottal stop). It often has six or seven tones.|4|Wú, subgrouping, Chinese 307|Li1973|The Gan-Hakka group 赣客家 is spoken principally in the provinces of Jiangxi 江西 and Guangdong 广东. It is characterized by original tone classes (aspirated in ping-sheng only in the three Mandarin groups.) The 'entering tone' is preserved and the final *-p, -t, -k* are more or less preserved according to different dialects and there are often six or seven tones. The Northern Gan 赣 group, particularly around Boyang Lake 鄱阳湖, has the tendency to voice all aspirated surds in connected speech. The Hakka 客家 group preserved the final consonants such as *-m, -p, -t, -k* much better. |4|Gàn, Hakka, subgrouping, Chinese 308|Li1973|The Min 闽语 can be further divided into two subgroups. The Northern group is spoken in the norhtern part of Fujian 福建 and teh Southern group is spoken in the southern part of Fujian, in the eastern part of Guangdong, in Hainan Island 海南岛 and in parts of the Leizhou Peninsula 雷州半岛.|4|Mǐn, Chinese, subgrouping 309|Li1973|The Cantonese or the Yue group 粤语 is spoken in the provinces of Guangdong and Guangxi. It is characterized by the preservation of the final consonants *-m, -n, -n, -p, -t, -k*. It has a system of eight, nine, or more tones. This distinction of long and short vowels, as in Cantonese, is also a special feature.|4|Cantonese, Yuè, Chinese, subgrouping 310|Li1973|The Xiang group 湘语 is spoken principally in Hunan 湖南. The Ancient voiced stops, etc. are as arule kept as truly voiced consonants (except the Changsha 长沙 dialect). The final *-p, -t, -k* are usually lost, but the \`entering tone' is preserved as distinct tone classes.|4f|Xiāng, Chinese, subgrouping 311|Li1973|Aside from the phonological features specific to the groups mentioned above, there are also lexical items more or less peculiar to each of these groups, but these will not be discussed here. Among the various groups some are mutually intelligible.|5|- 312|Cheng1988|905 个词条共有6,454个变体。 :translation:`The 905 glosses correspond to a total of 6,454 states.` |90|cognates 313|Cheng1988|:comment:`The coding in Cheng's analysis is strictly numeric: no ipa-forms of the cíhuì are encoded in any form, so it cannot be used for further studies (apart from the fact the data is not available at all).`|90|data format, cognate judgments 314|Cheng1988|每一对方的关系度 可以涌*phi*[???]算出来。 :translation:`We use the phi-statistics to calculate the correlation between all dialect pairs.`|90|quantitative analysis, Chinese, genetic classification 315|Cheng1988|In this article, Cheng describes different methods to carry out quantitative assessments for the genetic affinity of the Chinese dialects. The paper discusses various aspects, such as * lexical applications based on the Cíhuì, using a simple phi-correlation of shared cognates to obtain affinity scores [90f] * phonetic applications by comparing reflexes of traditional Middle Chinese phonological categories (mainly initials) [91f] Basically, the clustering approach of Cheng makes use of simple linkage clustering, using the BMDP cluster package. In other papers, Cheng mentions the use of SPSS. Nevertheless, it is not entirely clear, how the trees are directly computed.|00|quantitative analysis 316|Keidan2013|The present paper is devoted to the description of a new research program that a group of scholars of Historical linguistics, from the universities of Rome, Ghent, Tübingen and Leiden, are about to embark. The results of our research are, for now, only hypothetical. However, we wish to share with the audience of this conference our essential ideas and methodological criteria. Our desire is to elicit the interest of other scholars and to invite them to collaborate with the project. The possible contribution to the project range from the help with data analysis to the participation to a forthcoming conference (Rome, May 2014), where some theoretical questions related to the project and, hopefully, some early results will be presented to the scientific community and discussed. |000|- 317|Byun2004|This thesis introduces the isolat Mandarin dialect of Gànzhōu which is spoken in the province of Jiangxi and preserved many archaic Mandarin features. The dissertation also contains a Swadesh-200 wordlist in IPA of the lexicon of the Gànzhōu variety.|000|Chinese, Mandarin, subgrouping 318|Iwata1995|Project on Han Dialects" was designed to investigate the history of Chinese dialects using a geographical approach. This paper presents an outline of the project and some of its results. Several types of distribution patterns are discerned from the four maps on the word 'knee'. These four maps and four other maps will give a rough sketch on the diachronic changes of words and dialect distributions. It is assumed that the Jianghuai area played an important role as a core area in the diffusion of lexical items.|000|- 319|Iwata1995|On one hand, it is evident that dialects may change gradually along a continuum from one locality to another, therefore showing no boundaries. People speak of "Guanhua", "Wu" and so on, but there is no such thing, what really exists is the dialect of each speech community, and ultimately an idiolect. Or we can safely say that the only existence is the linguistic features which are shared by the particular dialects. On the other hand, it is equally evident that we witness bundles of isoglosses dividing particular regions and we can say these are boundaries. This has been exemplified by such works as Zavjalova (1983) and numerous works devoted to the atlas, "LANGUAGE ATLAS OF CHINA" (LAC).|195|isoglosses, dialect classification, Chinese dialectology 320|Iwata1995|Iwata describes a highly ambitious research project on Chinese dialects in which lexical and other data was collected for many different locations in order to find isoglosses for the Chinese dialects. Data was collected computationally, but the format for lexical identification of cognates was rather inconsistent and arbitrary.|001|- 321|Iwata1995|A problem in classifying the word forms is that the exact etymology of the words is often unknown and the morphemes are written in various homophonous characters. In such cases it makes no sense to juxtapose the forms without any synchronie or diachronic consideration. A speculation on 'knee' reveals that most of the forms, in Shangdong, Henan, Hebei and north Jiangsu, are composed of three constituents and can be classified into two major types, p-l-k and k-l-p, with respect to the initial consonants of the syllables (Iwata 1986).|198|cognates, cognate detection 322|Iwata1995|Some of the results are introduced in this chapter by presenting four maps on one lexical item, 'knee1. Several types of distribution patterns can be discerned from the maps: (I) northern

vs.southern, (П) distribution specific to a particular dialect group, (III) eastern vs western (coastal vs. inland), (IV) "ABA" or concentric (central vs. peripheral) distribution and (V) the characteristic distribution along the Changjiang River (长江).|200f|isoglosses, Chinese 323|Iwata1995|It has been a generally accepted idea that the form in the center is relatively newer in chronology than those in the sur-

rounding area³, even though counter-evidence is not scarce.|208-210|isoglosses 324|Iwata1995|Wu-Min specific and Gan-Xiang-Yue specific. Ting (1988) posited a hypothesis on the Min strata in the Wu dialects, Zhang G.-Y. (1993) argued for affinity between Min and Wu, and Ballard

(1989, 1992) suggested that northern Min and southern Wu may have some special historical relationship.|215f|Mǐn, Wú, Chinese, linguistic area 325|Iwata1995|Among the eastern group the Min dialect has been more conservative than Wu, resisting influence from the north and west, as every map presented above demonstrates and hundreds of scholars have argued. The Wu area, on the other hand, has been exposed to a considerable degree of influence from the north.|218|Wú, Mǐn, Chinese, linguistic area 326|Iwata1995| Single word examples are presented on these pages, especially dialect words for "father" etc.|218f|Chinese, Chinese dialects 327|Iwata1995|People speak of the northern influence on the south, but the maps presented here allow us to posit a hypothesis that northern influence, if it did exist, came through the Jianghuai area, not directly from the Central Plain in the north. Lexical diffusion from the Jianghuai area is assumed to have proceeded in two directions: one was to the west along the Changjiang and the other to the south, i.e. to the Wu area.|220|- 328|Iwata1995|t seems to me that past historical studies of Chinese dialects have concentrated too much on establishing the authenticity of the dialect words to see the reality of the changes. A typical instance is Benzikao 本字考, in which an etymology of a word (morpheme) is found by looking for phonetic and lexical correspondences (similarities) between the literary words and dialect words. Researchers should be aware that such studies are founded on a fragile presupposition: the continuity of evolution has been maintained throughout the whole history of the particular word.|222|Benzi, cognates, Chinese 329|Iwata1995|t seems to me that scholars in Chinese historical dialectology have overestimated a factor, *migration*, which they assume would influence or even replace aboriginal dialects. However, it is true that a more normal and frequent manner of dialect diffusion is a geographically continuous spreading of an influential form, a viewpoint which has been underestimated or even ignored in the research history.|222|- 330|Iwata1995|Even in cases where legends are true, we need to know the exact locality from where people came and whether the people migrated in a mass or only individually, since fragmentary migration from various areas could hardly influence the aborignal dialect, as immigrants would soon conform to it.|223|- 331|Iwata1995|In Hashimoto (1992), he demonstrated that particular linguistic features in Hakka show resemblances with those found in other dialects, particularly with those in peripheral zones, claiming that those phenomena should be interpreted in terms of the wave theory.|223|wave theory, Chinese, Hakka 332|Wu2005|This is the first book in Chinese linguistics which discusses the grammar of a dialect group, in this case the Xiang dialect spoken in Hunan, from both a synchronic and diachronic prespective. The author uses new data and new frameworks to present her analysis. The synchronic part covers contemporary grammar across localities within the Xiang-speaking area by using the methods and theories of comparative and typological linguistics. The diachronic analysis reconstructs earlier grammatical systems based mainly on modern data but also on historical written records, and analyses the development of the syntactic systems of the Xiang dialects, adopting the methods and theories of historical linguistics and grammaticalization. The discussions in this book raise new issues on dialect research which have not yet been fully acknowledged by Chinese dialectologists. The author shows, for example, how the earlier layers of grammar may be reconstructed on the basis of modern data, and how the path of grammaticalization of functional words may be traced. The discussions reveal that the Xiang dialect group forms a transitional zone between northern and southern dialects. The syntactic constructions in these two areas often co-exist or are mingled in Xiang. Thus, the grammatical constructions in different localities of the Xiang dialect group often provide a bridge connecting the constructions of northern and southern Chinese, or Modern Chinese and Chinese of earlier periods. This book is of interest to scholars and students who are working on grammar, dialectology, historical linguistics, comparative linguistics, typological linguistics, and grammaticalization, as well as those researchers focusing on language policy, language acquisition, and education.|000|- 333|Branner2000b|This book discusses the methodology of systematic Chinese Dialect classification, with particular attention to the conservative Miin and Hakka groups spoken in southern China. The primary linguistic methodology employed is the historical-comparative method, and the dialects chosen as examples of classification are those spoken in and around the township of Wann'an in western Fukien's Longyan country. The book features extensive comparative tables of dialect forms, and a two-hundred page appendix outlining the diasystem of the four principal Wann'an dialects.|000|Mǐn, Hakka, Chinese 334|Kurpaska2010|The major aim of the book is to trace the current structuring of the Chinese language(s) on the ground of Chinese linguistics. The research presented is based on the newest and most renowned sources, namely The Great Dictionary of Modern Chinese Dialects, and the Language Atlas of China. The author discusses the role The Great Dictionary plays in analyzing the spectrum of linguistic differentiation in China and gives a detailed account of the kind of information the dictionary provides. As background, she sketches the development and current state of Chinese dialectology and dialect research. One of the author's aims is to show respect for the grand achievement the Dictionary undoubtedly is, but also to emphasize a critical distance to some of the views presented in it. Apart from being an analysis of this particular Dictionary, the book presents data about the state of modern Chinese dialectology. It provides information about different classifications of the dialects and explains on what basis the classifications are made. Looking at Chinese dialectology from a Western point of view, the author aims to understand and present the Chinese perspective. The book fills an important gap in the field of Western sinology. So far, despite lively discussions concerning the status of the varieties of Chinese and their taxonomy, full-scale studies on Chinese dialects have been almost non-existent in the Western World.|000|Chinese, Chinese dialects, dialect classification, genetic classification 335|Alvarez-Ponce2013|* gene similarity networks are the basic of the study, not networks of taxonomic similarities * gene similarity networks can be used to study deeper evolutionary relations in biology * similarity of hits is modeled by * comparing general similarity (e.g. 30 %) * and length (e.g. 70%) * connected component analysis as the basis of initial analyses * general topology analysis of the network * select specific connected components which consist of a specific constellation of gene families * analyze the specific connected components more closely This is a classical example on how BL usually work. Mentioning our own study `(Lopez et al. 2013) `_ as an initial analysis may be useful to point to first experiments here. We should, however, also mention that differences between biological and linguistic sequences need to be accounted for in the future. Nevertheless, similarity networks could start from connected component analyses, as this one. In order to improve them and make them applicable to linguistic studies, we only need to refine our similarity scores in linguistic analyses.|000|- 336|Alvarez-Ponce2013|The complexity and depth of the relationships between the three domains of life challenge the reliability of phylogenetic methods, encouraging the use of alternative analytical tools. We recon- structed a gene similarity network comprising the proteomes of 14 eukaryotes, 104 prokaryotes, 2,389 viruses and 1,044 plasmids. This network contains multiple signatures of the chimerical origin of Eukaryotes as a fusion of an archaebacterium and a eubacterium that could not have been observed using phylogenetic trees. A number of connected components (gene sets with stronger similar- ities than expected by chance) contain pairs of eukaryotic sequen- ces exhibiting no direct detectable similarity. Instead, many eukaryotic sequences were indirectly connected through a “eukary- ote–archaebacterium–eubacterium–eukaryote” similarity path. Furthermore, eukaryotic genes highly connected to prokaryotic genes from one domain tend not to be connected to genes from the other prokaryotic domain. Genes of archaebacterial and eubac- terial ancestry tend to perform different functions and to act at different subcellular compartments, but in such an intertwined way that suggests an early rather than late integration of both gene repertoires. The archaebacterial repertoire has a similar size in all eukaryotic genomes whereas the number of eubacterium- derived genes is much more variable, suggesting a higher plasticity of this gene repertoire. Consequently, highly reduced eukaryotic genomes contain more genes of archaebacterial than eubacterial af- finity. Connected components with prokaryotic and eukaryotic genes tend to include viral and plasmid genes, compatible with a role of gene mobility in the origin of Eukaryotes. Our analyses highlight the power of network approaches to study deep evolutionary events.|000|- 337|Bapteste2012|* vertical descent is simple replication * introgressive descent is propagation of evolutionary units into different host structures (?) * introgressive descent leads to a mosaic history of certain evolutionary units * #mosaic_history is a history that, simple spoken, cannot be told in a simple story, but needs to be told in many different stories * paper describes a method on how to find non-genealogical bonds in evolutionary networks gene vs. genome networks In this study, they show how genome networks can be analyzed and searched for specific patterns. The idea of irregularly shared homologs, or connections between taxonomic units that are not transitive or not clique-like are an interesting starting point for similar network analyses in historical linguistics.|000|- 338|Bapteste2012|All evolutionary biologists are familiar with evolutionary units that evolve by vertical descent in a tree-like fashion in single lineages. However, many other kinds of processes contribute to evolutionary diversity. In vertical descent, the genetic material of a particular evolutionary unit is propagated by replication inside its own lineage. In what we call introgressive descent, the genetic material of a particular evolutionary unit propagates into different host structures and is replicated within these host structures. Thus, introgressive descent generates a variety of evolutionary units and leaves recognizable patterns in resemblance networks. We characterize six kinds of evolutionary units, of which five involve mosaic lineages generated by introgressive descent. To facilitate detection of these units in resemblance networks, we introduce terminology based on two notions, P3s (subgraphs of three nodes: A, B, and C) and mosaic P3s, and suggest an apparatus for systematic detection of introgressive descent. Mosaic P3s correspond to a distinct type of evolutionary bond that is orthogonal to the bonds of kinship and genealogy usually examined by evolutionary biologists. We argue that recognition of these evolutionary bonds stimulates radical rethinking of key questions in evolutionary biology (e.g., the relations among evolutionary players in very early phases of evolutionary history, the origin and emergence of novelties, and the production of new lineages). This line of research will expand the study of biological complexity beyond the usual genealogical bonds, revealing additional sources of biodiversity. It provides an important step to a more realistic pluralist treatment of evolutionary complexity.|001|- 339|Bapteste2012a| * main point of paper is the study of the human gut genome, which has many, many unknowns * exploratory studies are a first step to approach cases where little is known * exploratory studies are the opposite of basic research, since they go from data to hypotheses, while "traditional" research goes from hypotheses (tree, divergence time, etc.) to data investigation * short note on difference between genome and gene networks * analyze genetic diversity in the gut by using gene networks and connected component analysis Exploratory data analyses are introduced and described as a specific technique which proceeds the other way round compared to traditional quantitative analyses. In traditional quantitative analyses, a hypothesis is selected and applied to a dataset. In exploratory data analysis, no hypothesis drives the analysis. EDA is thus unbiased regarding a certain bias resulting from underlying assumptions and expectations. |000|- 340|Bapteste2012a|In order to study complex microbial communities and their associated mobile genetic elements, such as the human gut microbiome, evolutionists could explore their genetic diversity with shared sequence networks. In particular, the detection of remarkable structures in gene networks of the gut microbiome could serve to identify important functions within the community, and would ease comparison of data sets from microbiomes of various sources (human, ape, mouse etc.) in a single analysis.|000|genetic diversity, exploratory data analysis 341|Keidan2013|* article proposes a new way of studying language history, by rigorously collecting all shared * similarities ("isoglosses") between notably unrelated Indo-European languages (unrelated regarding their most recent history) * the idea is to use the data to obtain closer insights into the nature of change sources of common isoglosses * natural tendencies * substrate influence * common drift (compare Sapir 1921) :comment:`This is an important paper in so far as it deals explicitly with problems that should have been addressed in historical linguistics some long time ago. Basically, we need to (a) get clear about the possible causes of similarities between languages, and (b) provide methods to distinguish between the causes. The problem here is that we currently don't have the means to distinguish in all cases between inherited and transferred, and coincidentally emerged similar traits. This needs to be specifically addressed in future research on networks: Using, for example, geographic distance as some factor, or other similar things, we should try to develop methods that help us explicitly distinguishing which traits we are dealing with.` `|000|isoglosses, wave theory, family tree, genetic similarity, contact-induced similarity, drift-induced similarity 342|Beauregard-Racine2011|# * use forest-based and network-based mehtods to explore microbial evolution * use gene networks to find genes with atypical evolution models * use genome netowrks to characterize the evolution between e. Coli and mobile genetic elements Another interesting paper. In contrast to previous ones, this paper introduces the notion of "forests" and the corresponding "clanistic forest analysis". This could be taken as yet another specific analysis to be mentioned in the proposal. This means, we have then: 1. character mapping, 2. language similarity networks, 3. word similarity networks, and 4. forest analysis # |000|- 343|Branner2000|this article * discusses many basic questions of Chinese dialect classification * says explicitly that words should serve as basic of classification, not some abstract syllables * says, that data should be generally published, and that this has not been often done so far [18-22 * speaks extensively about the important role that the lexicon plays for language classification [25-28] * turns to features to be used for dialect classification, mentioning controversy between multi- and few-feature propagagors [27-32] |000|dialect classification, Chinese dialects 344|Wang1996|
  • article gives general introduction to language diversity in China
  • article shows computations of different datasets (Xu 1991 vs. Wang & Shen 1992), using different methods, such as Neighbor-Joining or Parsimony
  • article makes a nice comparison between whole lexicon of about 1000 words and part of the lexicon (basic words), this approach makes it possible to compare different stages, given that the premise of basic vocabulary being more stable, still holds
  • article illustrates how parsimony can be used for phonetic character states on a very small sample of 7 Chinese dialects and only four characters
Nice article on some basic issues in language data and dialectology in China. |000|linguistic diversity, Chinese, Chinese dialects 345|Wang1998|In this paper the author describes different methods to infer past events. Using Chinese dialect data from Xu (1991), the author illustrates the use of quantitative approaches to Chinese dialect classification. This paper is a further example on quantitative applications in Chinese linguistics.|000|quantitative analysis, Chinese, Chinese dialects 346|Wang1997|This paper can be nicely quoted for cases in which the specific sociolinguistic situation of China is mentioned. The author points explicitly to differences between the Chinese dialects, comparing their diversity to the diversity of the Romance languages. Chinese languages are therefore surely more diverse than Scandinavian languages. The author also points to the centripetal forces induced by Pǔtōnghuà in recent development of the Chinese dialects. |000|Chinese, Chinese dialectology, sociolinguistic situation 347|Wang1994|Quote this paper in order to refer to early applications of quantitative methods in historical linguistics and Chinese historical linguistics. |000|quantitative analysis, Chinese, Chinese dialects 348|Jachiet2013|Quote this article in order to underline that network methods can be used to find cases of partial cognacy or fused genes. * algorithms and methods for the identification of fused genes are presented * algorithms are based on gene similarity networks * gene fusion is thought to be not very frequent, yet it is wide-spread, occurring in most domains of life |000|gene fusion, partial cognacy, biological parallels 349|Leigh2011|The paper describes methods for congruence analysis. Congruence analysis basically tests whether the characters which are bening analyzed point to identical phylogenetic histories. Quote this paper in order to point to congruence analysis in phylogenetics (alogn with Leigh et al. 2008) and mention the importance of this kind of analysis to find evolutionary histories and analyze datasets.|000|phylogenetic incongruence, congruence analysis 350|Bapteste2009|This article emphasizes that trees do only account for a very small amount of the evolutionary processes which characterize prokaryotic evolution. The authors name methodological and epistemological issues resulting from tree analysis. Quote this text in order to emphasize that biologists now turn to network analysis in order to describe prokaryotic evolution.|000|- 351|Sankoff1978|The author mentions that probability in linguistics is treated with suspicion due to (a) the Chomsky paradigm, (b) introspective methodology of grammatical investigation, and (c) the difficulties of applying probabilisty models to the prevailing non-probabilistic mathematical models of language in linguistics. Quote this paper as an example on why and that linguistis are suspicious about probabilisty approaches.|000|probability, historical linguistics 352|Chen1973a|The author examines the gradual attrition of the syllable final consonants (*p, t, k* and *m, n, ŋ*) which is manifested in its various phases in a large number of contemporary Chinese dialects. The results of this case study provide a starting point for discussion on issues of general interest to a phonological theory: (1) the reality of universal metarules which define the general pattern underlying similar processes in individual languages and the margin within which these processes may vary from language to language; (2) the phonetic motivation of recurring phonological processes; (3) the concept of coherent 'rule systems' reflecting the functional unity of formally quite diverse rules; (4) the hypothesis of 'latitudinal' reconstruction based exclusively on coexisting reflexes in related languages; and (5) a standard for measuring the distance between two given phonological systems.|000|Chinese. Chinese dialects, linguistic reconstruction, phonetic distance 353|Chen1973a|The author presents some of the DOC data and analyses it by comparing mergers in the character readings, especially of initial and final consonants in the Chinese dialects. The resulting patterns can, according to the author, be interpreted in several ways and imply new ideas for historical linguistics. Among these are universal rules of sound-change patterns ("metarules", 48f), new hints for phonetic explanation (50), temporal interpretation of synchronic sound correspondence patterns ("latitudinal reconstruction", 52-55), and a new approach for the measuring of phonological distances by means of a comparison of "rule sharing" between phonological systems (58-61). This article is not directly important for quantitative approaches to Chinese dialect classification, but one might consider quoting it as an example for the believe in cross-linguistic sound-change patterns.|000|patterns of sound change 354|Iwata2010|Although the international trend in geolinguistics has been to utilize dialect resources via computer-based GIS, research in the Chinese field is still grounded in exploring the history of words using the classical method of linguistic geography. After describing the historical background of how we Japanese researchers carry out projects on Chinese dialects, this paper will demonstrate some of our findings: 1) The distribution of modern dialects is well accounted for in terms of “Northernization” and “Southern kernel area”; 2) Linguistic geography can make it possible to reconstruct the history of words unbiased by historical documents; 3) The development of stress accent in Northern dialects has caused some word groups to acquire grammatical elements in their forms, due to the function of analogical attraction; 4) Any word can be in collision with others due to internal and external factors. Chinese cases are explained in terms of homonymic and synonymic collisions.|000|dialect geography, Chinese, Chinese dialects 355|Iwata2010|In China, on the other hand, linguistic geography ceased to exist after 1948. Meanwhile, mainstream linguistic study was directed toward two purposes: one, to reconstruct ancient Chinese phonology, and the other, to classify its dialects and demarcate their respective areas of distribution. While the current international trend in geolinguistics is to utilize dialect resources via computer-based GIS, we researchers in the Chinese field still have enough reasons for emphasizing the necessity of applying this classical method, linguistic geography, to the study of Chinese languages (Grootaers 1943, 1945).|98|linguistic area, Chinese 356|Iwata2010|A contribution of Bernhard Karlgren (1889-1978) was to treat “Modern Sounds” (his “Ancient Chinese”) as a reference point for studying the whole history of Chinese; namely studying “Old Sounds” (his “Archaic Chinese”) in terms of the projection from Ancient Chinese. This approach was much like that of the Qing Philologists, while unlike the Chinese precursors, this explained the phonetic forms of modern dialects as reflexes of Ancient Chinese. In his masterpiece, Études sur la phonologie chinoise (1915-1926), he reconstructed the sound system of Ancient Chinese, based on his own survey of twenty-four dialects, thus establishing his comparative method. For this dialect survey, however, a severe criticism was offered by Grootaers (1943).|99|- 357|Iwata2010|The most noteworthy event at the time of writing is the publication of the volume, *Linguistic Atlas of Chinese Dialects*, edited by Zhiyun Cao, Beijing Language and Culture University (Cao ed. 2008). Unlike LAC, this atlas is item based, comprising [pb] 205 phonetic maps, 203 lexical maps and 102 morphological and syntactic maps, which were compiled using a GIS based computer system. For this atlas, Cao and his colleagues surveyed 930 localities all over the Han Chinese speaking area. Most of the localities surveyed were local villages or towns, instead of big cities or county seats. This is in accordance with the policy proposed by Grootaers (1957), and the speakers selected were mostly males born during the years from 1931 to 1945. This atlas is comparable in scale to such distinguished atlases as ALF and the Wenker Atlas, and it is remarkable that the authors completed all necessary processes, including the dialect survey, data editing and cartography, within seven years.|101f|- 358|Iwata2010|Northernization refers to a long time process in which Northern features incessantly moved southwards, causing a varying degree of deformation to occur to the Southern dialects on an individual basis.|102|- 359|Iwata2010|Even with the strong influence of the Tang koiné on Southern dialects, we can still confidently say that this is not the whole story. The relationship between the Tang koine and modern dialects may be comparable to that between classical Latin and the modern Latin languages, as most modern Chinese dialects are not the direct descendants of the Tang koine.|103|Táng koine 360|Iwata2010|Northernization of the Southern dialects could not have been accomplished simply by the southern movement of population. There are two views we consider common sense but which are not shared by most Chinese dialectologists. One regards the effect of migration on the subsequent development of the host dialects. Based on historical documents, scholars in the Chinese field have interpreted dialect distribution in terms of migration. However, evidence shows that the language of immigrants has a tencency to assimilate to the host dialect and will basically fade away after three generations, if the immigration is to the village of Han Chinese inhabitants (Iwata 2007a: 125).|104|migration, language history 361|Iwata2010|The second piece of common sense for us is the fact that *words could travel by walking*, instead of flying from one place to another by migration. This is to say, the most prevalent medium for dialect diffusion or transmission should have been daily communication of farmers living in one village with those in another, therefore it should have taken a long time for one word to move from one place to another. In Chinese dialectology, however, this view has been least recognized by researchers due to their overestimation of the factor of migration (Iwata 1995: 222-223).|104|linguistic diffusion, lexical diffusion, Chinese 362|Iwata2010|The *Southern kernel area* specifically refers to the Jianghuai area, the area situated between the Huai River and the lower reaches of the Yangtze. It had two kernel cities around its southern border: Nanjing and Yangzhou. Nanjing was established as the Southern capital during the era of the Six Dynasty (222-589 AD.), and through to the 20 th Century it performed the role of political center for the whole of the Southern area, namely the region south of the Huai River. The latter city, Yangzhou, flourished as a large economic and commercial center starting from the Tang era.|105|- 363|Iwata2010|One internal factor that may have had a serious effect on lexical changes was the development of word accent, namely stress, which was brought about by the increase of polysyllabic words, in effect a compensation effect caused by the simplification of phonological structure. Word stress thus produced the following patterns, and is currently observed in the majority of Northern dialects, typically the dialect of Beijing. * Bi-syllabic structure: Strong-weak (trochee) * Tri-syllabic structure: Medium-weak-strong|112|- 364|Iwata2010|Turning back to the topic of time words, we find that the tri-syllabic form thus produced by the function of analogical attraction, namely jin ri ge, has completely disappeared, with its phonetic variant jin er ge being retained in a small number of localities in North China. This is because the accent rule for tri-syllabic structure, i.e., medium-weak-strong, has been applied to the time words, giving birth to a successive change described as follows: * jin ri ge > jin er ge > jinr ge > jin ge or ji ge This is considered a weakening process of the head, ri, which ultimately changed to the non-syllabic retroflex ending by fusing with the preceding syllable, namely jinr ge, the form still existing in Beijing. In Jianghuai, the change has proceeded one step further, leaving no trace of the original head in the form jin ge.|114|ordered character states 365|Iwata2010|Synonymic collision is defined as the conflict between different forms for a single referent. It is mostly triggered by external factors.|117|- 366|Iwata2010|Homonymic collision is, so to speak, a conflict between different referents for a single form. It is mostly triggered by internal factors, and some sorts of remedies are usually adopted for rescuing the abandoned words.|116|- 367|Nunn2011|As discussed in the previous chapter, parsimony refers to procedures for minimizing the number of changes on the tree; it is an optimality approach that prefers evolutionary scenarios that minimize the amount of change. Rather than minimizing the number of changes in a whole matrix of data to construct a tree, however, here the goal is to map a single trait onto a given phylogeny in a way that minimizes evolutionary change in that trait.|59|parsimony, character mapping 368|Nunn2011|Parsimony assumes that branches are equal in length. In other words, it assumes that evolutionary transitions are equally probable on the longest branch and the shortest branch on a phylogeny. It is also important to realize that parsimony cannot always reconstruct a character state unambiguously. In such cases, multiple equally parsimonious reconstructions can result, and equivocal values are indicated with a node having multiple states.|59|character mapping, parsimony 369|Nunn2011|For example, traits can be ordered or unordered, and this affects how trait changes on the tree are converted into different numbers of evolutionary steps. In an ordered series of more than two character states, it is assumed that traits pass through intervening states to go from one end of the trait spectrum to the other end.|60|ordered character states, parsimony 370|Nunn2011|For example, we might assume that traits are irreversible: once they are gained, they cannot be lost. Or we could assume that it is easier to lose a trait than to gain it, in what is called Dollo parsimony. In this case, only one gain is allowed, but multiple losses can occur. Another option is to base the transformation on the ages of strata, or, stratigraphic parsimony, which allows paleontologists to assess how well a phylogeny corresponds to stratigraphic age.|61|evolutionary model, parsimony, character mapping 371|Nunn2011|Th us, the person seeking to use parsimony has at hand a wide array of op- tions. Th is can be both exciting and overwhelming.|63|character mapping, parsimony 372|Nunn2011|Maximum likelihood methods use an ex- plicit evolutionary model to reconstruct ancestral states in a way that makes the observed character states in extant species most likely [...].|63|maximum likelihood, character mapping 373|Nunn2011|Maximum likelihood methods off er many advantages over reconstructions based on parsimony. For example, maximum likelihood methods use information on branch lengths; this is important, because a longer branch offers more opportunities for change to occur than a shorter branch. Another important advantage of maximum likelihood is that it provides a way to assess statistical support for one reconstructed ancestral state over one or more other reconstructed states.|64|maximum likelihood, character mapping 374|Nunn2011|In this application of Bayesian methods, the user estimates the probability that an ancestral node has a particular character state based on data involving the character states in extant species and how they are related (a dated phylogeny), and also takes into account mapping uncertainty by quantifying uncertainty in the transition rate parameters that are used to reconstruct trait evolution (Ronquist 2004).|71|Bayesian approaches, character mapping 375|Nunn2011|Thus, rather than estimating the reconstruction based only on optimization of the transition rates, they examined the distribution of reconstructed states given potential error in estimating these rates. Importantly, one can also sample different phylogenies using MCMC, and this effectively deals with the problem that different trees can produce different reconstructions of evolutionary history (Huelsenbeck and Bollback 2001; Lutzoni et al. 2001; Pagel et al. 2004).|72|- 376|Nunn2011|The other gives and interesting introduction into the specifics of evolutionary analysis. The book contains an important overview chapter on "Reconstructing Ancestral States for Discrete Traits" (Chapter 3, 52-78). Here, the basic approaches to character mapping are discussed by the author (character mapping being understood as approaches in which one seeks to reconstruct ancestral states relying on a reference tree).|000|- 377|Norman1974|Article contains a short description of the Shaowu dialect, a Chinese dialect spoken in Fújiàn 福建. Norman argues that the Shaowu dialect, originally classified as a Hakka dialect, is probably a Mǐn dialect. He also shows that Mǐn features are found in terms of Shaowu vocabulary.|000|- 378|Norman1974|Traditional treatments of Chinese phonology take the syllable as the basic unit of analysis. The syllable is further analyzed into an initial, a final and a tone. The initial is the consonantal onset tot he syllable; however, following traditional practice, any researches also include

the lack of any consonantal onset as a'zero' or empty member of the set of initials. The final consists of the remainder of the syllable: this may be a vowel, a diphthong or a triphthong, or some vocalic combiination plus a final consonant. The number of final consonants in all Chinese dialects is very limited.|329f|syllable structure, Chinese 379|Zavjalova1996|This is an introductory study on Chinese dialectology and Chinese dialects. It is especially interesting, since the author was involved in dialect geography in earlier studies and here gives some account on the research in this area.|000|- 380|Zavjalova1996|Один ис важных критериев, который наряду с фонетическими признаками всегда учитывался при делении китайского языка на диалектные группы -- это возможность взаимопонимания: к одной группе относили только те диалекты, представители которых способны понимать друг друга без предварительной специальной подготовки. :translation:`One of the most important criteria which along with the phonetic characteristics always was considered for the subgrouping of Chinese dialects is the possibility of mutual intelligibility: only those dialects were grouped together whose speakers could understand each other freely, without the need of additional preparation.`|18|mutual intelligibility, classification criteria, Chinese dialects 381|Zavjalova1996|Важнейшим классификационным признаком в китайской диалектологии остается судба исторических звонких инициалей (начальнослогивых согласных). :translation:`The most important criterion for Chinese dialectology is the fate of the voiced initials (syllable-initial consonants).`|23|classification criteria, subgrouping, Chinese dialects 382|Zavjalova1996|Хотя лексика по-прежнему плохо изучена в китайской диалектологии, уже сейчас стало ясно, что наиболее важным с классификационной точкой зрения являются так называемые "пустые" слова. :translation:`Although the lexicon is still not sufficiently understood in Chinese dialectology, even now it has become clear that the so-called "emtpy" words may play an important role from the viewpoint of classification.`|36|xūcí, classification criteria, Chinese dialects 383|Zavjalova1996|[...] лексико-статистический анализ диалектов позволил установить, что внутри китайского языка в лексическом плане прежде всего следует противопоставить всем прочим диалектам наиболее древние миньские. :translation:`Lexicostatistical analyses of the dialects made it possible to ascertain that among the dialects of Chinese -- as far as the lexicon is concerned -- especially the Mǐn dialects are most archaic.` |38|Mǐn, lexicostatistics, classification criteria, Chinese dialects 384|Zavjalova1996|The latest classification of the Chinese dialects [...], like all the previous ones, is based on phonological criteria. Being really very important, these criteria are still very often analysed separately -- each map is usually devoted to only one phonological feature. As a result, in many cases phonology turns out to be insufficient even for the separation of those dialect varieties, which are intuitively felt as rather distant by the speakers themselves. When taken in combination, however, phonological features prove to form some previously unknown, but rather important linguistic boundaries.|198|Chinese, dialect classification, Chinese dialects, classification criteria 385|Zavjalova1996|The Jin ialects cover most of the Shanxi province, the northern elevated part of Shenxi and those regions of Hebei, Henan and Inner Mongolia, that border on Shanxi. In the new Chinese classification these diaelcts are treated as a separate group outside guanhua. Phonologically, however, there are no reasons for such a separation. On the one hand, some phonological features of teh Jin dialects bring them together with the Northern guanhua area as a whole [...]. On the other, the Jin dialects have some comparatively archaic features [pb] on a post-Middle Chinese lefel, similar to those of the Jiang-Huai dialects of the Yangtze basin.|201f|Jǐn, Jiānghuái, Mandarin, Chinese, subgrouping, dialect classification 386|Zavjalova1996|In Shanghai and in other systems of the Wu group [...] the tone of the first syllable automatically predicts the characteristics of the whole phonetic word, spreading over all its syllables (usually two or three) and thus reminding of the tone expansion in some of the [pb] African languages.|204f|- 387|Zavjalova1996|For Chinese linguistics the XX-th century is really an epoch of dialectology. In her study Dr Olga Zavyalova who is a Senior Researcher at the Institue of Far Eastern studies of the Russian Academy of Sciences dwells upon the most important aspects of chines dialectology. The first chapter contains a discussion on the classification of the Chinese dialects. Two further chapters deal with the problems of segmental and suprasegmental morphophonemics taking into account the vast data having been published on these subjects over the last decades. Chapter IV is devoted to the linguistic geography of the *guanhua* and Jin dialects covering most of the country. And finally, an analyses of teh post-Middle Chinese dialect sources is given in Chapter V. |000|Jǐn, Jiānghuái, Chinese dialects, Mandarin, subgrouping 388|Zavjalova1982|The author discusses the subgrouping of the Mandarin dialects. Based on the distribution of phonetic features, she proposes to divide the dialects into a Northern and a Southern zone, the Northern zone including Jǐn and the rest of the Northern dialects, and the Southern zone including South-Western dialects of the Jiānghuái region, and the Chǔ dialects.|000|Jiānghuái, Mandarin, dialect classification, Chinese, subgrouping, Jǐn 389|Yuan1983|This book represents the very influential Běidà-standard on Chinese dialect classification. The author repeats the traditional classification first presented by Li (1937[1973]) and divides Chinese dialects into the well-known seven groups of Mandarin, Wú, Xiāng, Gàn, Hakka, Cantonese (Yuè), and Mǐn (22f). All groups are introduced and discussed separately in the work. Basic criteria for classification are -- as allways -- phonological categories from Middle Chinese. |000|- 390|You1992|The book gives an introduction to current issues in Chinese dialectology. It introduces topics such as * basic aspects of investigation, data-storage, and description of Chinese dialects (17-41), * geographical aspects of Chinese dialects (42-84), * Chinese dialect history (85-114), * Chinese dialect change (115-133), * Chinese dialect contact (134-148), * Chinese dialect comparison (149-176), * Chinese dialect writing (177-188), and * history of Chinese dialectology Furthermore, the book gives a glossarium in which Chinese terminology is contrasted with English translations (very useful for students of Chinese linguistics), and a list of bible translations (old and new testament (?)) in the Chinese dialects before 1945. This book also contains an interesting summary on speech islands (方言岛, language isolates), and a list of known dialect islands is given on pages 67-69. A very important chapter (good for quotation) on diglossia (bilingualism) in Chinese dialects is also given on pages 146-148.|000|Chinese, dialect classification, Chinese dialectology 391|You1992|特征判断法[...].方言地图上的同言线(isogloss, [...])的两边,方言特征不同,也可医用同言线在方言地图上圈定一个地域,圈内的方言特征相同,例如方言岛上的方言。好几条同言线密集或重合在一起就成为“同言线束“(bundle of isoglosses). |45|- 392|You1992|古今比较判断法。这个方法是试图从历史来源的角度来区分方言。我国至今还没有编制过大型的汉语方言地图集,目前将汉语大致分为七大方言区,实际上用的就是这个方法。 :translation:`Genetic classification. This classification method seeks to classify dialects according to their origin. Until today, large dialect atlases have not been compiled in China. The current broad classification of Chinese dialects into seven major dialect groups is based on this kind of classification.`|47|genetic classification, Chinese, Chinese dialectology, subgrouping 393|You1992|综合判断法。这个方法放弃以同言线作为方言分区的基础,它设想首先列出成系统的语言,语法,词汇等方面的项目,然后就这些项目比较特点间的异同,再根据异同项目的多寡及其出频率的高下,来划分方言区。 |48f|- 394|You1992|可蕫度(intelligibility)测定法。 |49f|classification criteria, mutual intelligibility, Chinese 395|You1992|本书是在下述意义上使用“方言岛“ (speech island)这个术语的。|55|dialect isolates, Chinese 397|You1992|上述南方七大方言,从方言发沈学的角度来看,吴,湘,粤,平,赣是从北方汉语直接分化出来的,可以说是原生的,闽语,客家则是次生的(secondly developed),即是由某一种南方方言派生出来的。 |98|- 398|You1992|各大方言互相关系及其与原始汉语的关系可以用一张树形图示意 [...]. .. image:: static/img/chinese-dialects.png :name: chinese_dialects :width: 500px [Family tree of the Chinese dialects]|104|- 399|Schleicher1861|Die ältesten teilungen des indogermanischen bis zum entstehen der grundsprachen der den sprachstamm bildenden sprachfamilien laßen sich durch folgendes schema anschaulich machen. Die länge der linien deutet die zeitdauer an, die entfernung derselben von einander den verwantschaftsgrad. .. image:: static/img/schleicher-1861.jpg :name: schleicher-1861 :width: 500px [Schleicher's famous family tree]|6f|family tree, August Schleicher, figure 403|Dessimoz2008|LGT events between real biological genomes can be simulated by introducing a gene from one species into another, either as substitute for its ortholog (“orthol- ogous replacement”) or as additional sequence. Such artificially introduced LGT event allows the testing of the algorithm on real biological data while having a positive control. However, only the specific case of very recently introduced genes can be simulated. Furthermore, real occurences of LGTs may already be present in the dataset and their signals may conflict with the artificially introduced ones.|323|lateral gene transfer, evaluation, LGT detection 404|Dessimoz2008|This paper presents an algorithm to detect lateral gene transfer (LGT) on the basis of pairwise evolutionary distances. The prediction is made from a likelihood ratio derived from hypotheses of LGT versus no LGT, using multivariate normal theory. In contrast to approaches based on explicit phylogenetic LGT detection, it avoids the high computational cost and pitfalls associated with gene tree inference, while maintaining the high level of characterization obtainable from such methods (species involved in LGT, direction, distance to the LGT event in the past). We validate the algorithm empirically using both simulation and real data, and compare its predictions with standard methods and other studies. |000|LGT detection, lateral gene transfer 405|Sagart2011|Talk held on the Séminaire Sino-Tibétein du CRLAO. Proposes an innovation-based classification of the Chinese dialects. This classification is a family tree of the Chinese dialects in the true sense. Every split is established with help of specific innovations for the dialect groups identified by Sagart. Note, what is also important, that most innovations are lexical in their nature. Thus LS explicitly takes lexical innovations as the basis for dialect classification.|000|- 406|Sagart2011|Li's theory was formed at a time when very few Chinese dialects had been analyzed for corespondences with Middle Chinese. Those that had been were mostly the dialects of large cities.|1|- 407|Sagart2011|Li did not give a tree. He seems to have assumed that first the different dialect groups that compose Sinitic individualized, and that devoicing then happened separately in each dialect group, before each group had time to differentiate into subdialects. So one feature (devoicing of voiced stops) was enough in his view to classify Chinese dialects.|2|family tree, Chinese, Li Fang-Kuei, dialect classification, genetic classification, subgrouping, classification criteria 408|Sagart2011|There are two kinds of problems with Li's analysis. First, there are many more dialects analyzed now than there were in 1938, and there are many, many exceptions to Li's generalizations. [...] The second problem relates to Wu. To say that the Wu dialects are a branch of Sinitic supposes that they have a common ancestor which is more recent thant the common ancestor of all Chinese dialects. To make that claim one needs to find a feature that they share exclusively; a feature that is not found in other Chinese dialects and that the ancestor of all Chinese dialects did not share either. We could then say that the change that brought this feature about took place in the ancestor of Wu, and that is why all Wu dialects have it, and all non-wu dialects do not have it.|2f|Wú, genetic classification, subgrouping, shared innovation, Li Fang-Kuei 409|Sagart2011|The principle we should remember is that one should classify languages on shared innovations (=changes that occurred after the breakup of the most recent ancestral language of the group to be classified). You cannot use shared retentions (=features that were already present in the ancestor). You can use the change of b, d, g into p, t, k as a classification feature, but you cannot use the absence of a change. This principle was first formulated by August Leskien in 1876. Evolutionary biologists also use it. Classifying on shared retentions is a common fallacy.|3|shared innovation, genetic classification 410|Sagart2011|Norman defined three large groups of dialects: a northern group, corresponding to Mandarin, defined by lexical innovations like having 他 for 3SG pronoun; 的 for attributive particle, 不 for general negation, 站 'to stand', 走 'to walk', 儿子 'son', 房子 'house'. This is sound and the move towards using lexical features is welcome. However his southern group (Min + Hakka + Cantonese) is defined by the fact of NOT having these innovations. That is the same fallacy as in Li's classification (`Norman `_ also has a central group: Wu+Gan+Xiang, which is transitional between north and south).|4|genetic classification, Chinese, Southern Chinese hypothesis 411|Sagart2011|This is because sound changes spread easily (for instance palatalization of velars has spread across Chinese dialects from Manchuria to south Jiangxi and SW Fujian in 300 years), while basic vocabulary and morphology spread less. Sound changes will be used only as supporting material.|4|genetic classification, shared innovation, subgrouping 412|Sagart2011|Because of language contact, not all genuine innovations will stay in the language where they first occurred. When that happens an innovation will give a false testimony. Not all the innovations one has assembled will be mutually compatible, in other words, compatible with the same genetic tree. The approach followed here is to look for the largest possible set of innovations (in the basic vocabulary/morphology) that is compatible with the same tree, and regard that tree as an approximation of the true phylogeny. This is called a compatibility method (Meacham and Estabrook 1985).|4f|conflicting data, shared innovation, genetic classification, subgrouping 413|Sagart2011|The first language to branch off from the mainstream was likely located in the south or southwest on Chinese. It has two modern descendants: the Wǎxiāng 瓦鄉 dialect of northwest Hunan and the Càijiā 蔡家 language of western Guizhou (1000 speakers, all old!). This last language has only been minimally described (`Bo 2004 `_) and it is not even clear that it is a Chinese dialect, but at the very least it includes a Chinese layer in it and that layer is clearly connected to the Waxiang dialect. Both CJ and WX are very archaic, lexically and phonologically: they preserve the old words for \`dog' 犬, 3sg 伊, \`face' 面, \`love' 字, \`walk' 行. No other Chinese dialect preserves \`love'.|6|Chinese, subgrouping, Càijiā 414|Bo2004|Article gives a brief sketch of the Càijiā language. In the remainder of the paper, the author also gives a list of the basic words of Càijiā. The language has not more than 1000 speakers left. It is very archaic, and `Sagart (2011) `_ assumes that it was among the first languages to split off the group of Chinese dialects. |000|Càijiā, Chinese, subgrouping 415|Bai2011|This paper lists the connotations of some local words such as “西”[ 揶i 55 ]( look , look carefully ), “息”[ 揶i 35 ] ( grandchildren ’ s children , general reference for future generations ),“耶 ”[ 捺e 35 ]( father ),爹 [ tia 55 ]( dad ),“间 ”[ kan55 ]( unit of house measure ),“龚 ”[ 諬i蘅耷55 ],“老 革 革 ”[ lau 51 ke 214 ke 214 ],牙 桠 [ 捺a 35 捺a 44 ],“达 搭 ”[ ta 35 ta 55 ]( elder brother ),颟 颟 [ ma耷 55 ma耷 55 ] ( eat ), “满 忙 ” [ man 51 man 35 ] ( father ’ s younger brother , uncle ),“喷香”[ p蘅耷 55 cian 55 ],“攀臭”[ p'a耷 55 鬗'藜u 214 ],“倒”[ tau 51 ],“起”[ 諬'i 51 ], etc. Some of the connotations are put forward for the first time by us , while some are supplementary to the old expressions of our ancestors. These local dialects preserve some old accents. The preservation of “ Waxiang Dialect ” in the Youshui River basin of Southeast Chongqing is especially precious for its stronger regional color. The causes of these phenomena are that Southeast Chonqing is bounded on the northwest of Hunan , northeast of Guizhou and southwest of Hubei with mountains and rivers connected , water systems complicated but interlinked , and people and culture often interacting. The Chinese and ethnic dialects in these areas are similar or almost the same. These dialects have preserved many ancient accents and typical oral expressions which are similar to or the same as those of Chongqing proper and its surroundings , Chengdu proper and its surroundings and Dafang County of Guizhou. It needs further investigation and research.|000|Wǎxiāng, Chinese, Chinese dialects 416|Wang1982|This article first introduces basic aspects (mainly phonological) of the Wǎxiāng dialect spoken in Húnán. This dialect shows many strange, archaic features and does not neatly fit into the 7 or 10 traditional dialect groups. `Sagart (2011) `_ groups this dialect as one of the first who split off from the main stock of Chinese dialects during Chinese dialect history.|000|Wǎxiāng, Chinese dialects, Chinese dialects 417|Nunn2011|While many phylogenetic lineages are denoted as branching dichoto- mously, that is, into two lineages, phylogenies can also contain a polytomy, where more than two lineages emanate from a single node.|22|polytomy, multifurcation, family tree 418|Nunn2011|More often, however, polytomies reflect uncertainty about the true branching order of a group of organisms. It could be that speciation events happened so closely in time that the available morphological or molecular evidence can- not discern the evolutionary relationships, or diff erent genes may have conflicting evolutionary histories (e.g., Maddison 1997), resulting in ambiguity at a particular node. When a polytomy reflects uncertainty due to conflict in the data or general lack of knowledge, it is referred to as a *soft polytomy*, with the polytomy thus reflecting ignorance in our knowledge of the true evolutionary history for a group of organisms.|22|polytomy, family tree 419|Nunn2011|Many trees represent only the branching pattern, that is, a tree topology, rather than the exact age of the splits on the tree. Such a topology is often called a *cladogram*. Without dates, we can make inferences about the relative relatedness of nested taxa on the tree, but not about the timing of the splits.|22-24|cladogram, family tree 420|Nunn2011|Other trees give details on branch lengths in what is called a *dated phylogeny*, or a *chronogram*.|24|chronogram, family tree 421|Nunn2011|A *phylogram* is similar to a dated phylogeny but shows amounts of character changes along branches rather than amounts of time; in such cases, the tips of the tree usually do not line up to give the same age for all living species in the clade (i.e., the tree is not *ultrametric*).|24|phylogram, family tree 422|Nunn2011|Distance methods calculate a measure of distance between all pairs of species and then generate a tree that is most consistent with this matrix.|29|distance-based methods, phylogenetic reconstruction 423|Nunn2011|Parsimony attempts to minimize the number of changes over evo- lutionary time and thus selects the tree that can account for the data with the fewest evolutionary changes (i.e., most “parsimoniously”).|30|parsimony, phylogenetic reconstruction 424|Nunn2011|The *consistency index* (CI) and *retention index* (RI) are two statistics that are commonly used in parsimony analyses to assess the degree of homoplasy (see Maddison and Maddison 2000). These metrics have become important in evolutionary anthropology, because they are used to study aspects of cultural evolution (Collard et al. 2006a; see also chapter 10). The CI provides a measure of the amount of homoplasy in a tree and can be calculated for an entire set of characters (the ensemble CI) or for a single character (Maddison and Maddison 2000). For a single character, it is calculated as *CI = m / s*, where *m* is the minimum number of possible evolutionary steps on a tree and *s* is the actual number of reconstructed steps. A higher amount of homoplasy results in higher *s* and thus a lower CI, while a tree without homoplasy has a CI of 1. Th e RI measures the degree to which potential synapomorphy is exhibited on the tree. For a single character, it is calculated as *RI = (h − s) / (h − m)*, where *h* is the maximum number of steps possible for a character and the other variables correspond to those just given for the CI. An RI of 1 indicates that the character is completely consistent with phylogeny (i.e., it shows no homoplasy), while an RI of 0 indicates the maximum amount of homoplasy that is possible.The CI increases as the number species in the analysis increases and thus requires a correction term when comparing values among data sets with different numbers of species (Sanderson and Donoghue 1989; Hauser and Boyajian 1997). *Ensemble indices* are calculated in a similar way, but for the whole set of characters in a data set rather than single characters (Maddison and Maddison 2000).|31|parsimoy, consistency index, retention index, phylogenetic reconstruction, homoplasy 425|Nunn2011|Maximum likelihood methods make use of the observed trait data, a probability model of character evolution, and a hypothesis of phylogeny with branch lengths. From these, one can obtain the likelihood of the data given the proposed tree and evolutionary model. After conducting such calculations on many tree and parameter value combinations, the tree and model parameters offering the highest likelihood of the data are selected.|33|maximum likelihood, phylogenetic reconstruction 426|Nunn2011|Bayesian methods are the “new kids on the block” in terms of phylogenetic methodology, and they are revolutionizing phylogenetics by providing an efficient approach to searching tree space that is not based on the strict optimization procedures of maximum likelihood or parsimony approaches (Huelsenbeck et al. 2001, 2002; Holder and Lewis 2003).|35|Bayesian approaches, phylogenetic reconstruction 427|List2014a|Since Latin never ceased to serve as a cultural adstrate language (a language that co-exists in some form in parallel with another language with which it is in contact), with a particularly great impact on written vernaculars, only 33% of all 1,000 words were completely lost, and about 50% survive as borrowings from the ancestor language in the daughter languages [37].|143|- 428|Qian2007|由于北方话与吴语差异较大,上海话到现在在语音音系上没有接受任何一个普通话的音位,但是在词汇上,接受书面语和普通话的影响方面在近年来有长足的发展。:translation:`Due to the great phonetic differences between Northern dialects and Wú dialects, Shanghainese has -- until now -- not borrowed any phonemes from the standard language. However, the influence of the literary language and *pǔtōnghuà* on the Shanghainese lexicon has steadily increased during the last years.`|25|Shanghainese, pǔtōnghuà, lexical borrowing, Chinese 429|Branner1999|Here I shall only repeat my philosophy in a few words. There are many ways of considering the “classification” of a dialect sample. But in order for this work to be truly systematic, it is necessary for a significant portion of the sample to be shown to correspond to a significant portion of other dialect samples. The one intellectual process that, above all others, is fundamental to classification is comparison, which leads to the establishment of rigorously attested correspondence sets. In ordinary circumstances correspondence sets should be made up of words that are in some sense basic to the language. This work can be done with a minimal set of highly diagnostic forms, but far better results come with large corpora and extensive correspondence sets. A correspondence set is never a finished thing; each new dialect that is fitted into it alters its character somewhat.|37|Chinese dialects, dialect classification, basic vocabulary, sound correspondences 430|Qian2007| Known borrowings from pǔtōnghuà into Shanghainese: * 公鸡 -> 雄鸡 (rooster) * 母鸡 -> 雌鸡 (chicken) * 太阳 -> 日头 (sun) * 卫生间 -> 马桶间 (toilet) * 火烧 -> 火着 (burn) * 感冒 -> 伤风 (catch a cold) * 老婆 -> 家主婆 (wife) * 胖 -> 壮 **in source written as 奘** *zàng/zhuǎng* (fat) * 保姆 -> 娘姨 (nurse) * 厨师 -> 烧饭司务 **in source written as 饭师条** (cook) Cf. pages 24-27|25-27|Chinese, Shanghainese, lexical borrowing, pǔtōnghuà 431|Hamed2005|As with species studied by evolutionary biologists, languages are evolving entities. They can evolve in tree-like patterns, possibly blurred by borrowing, but they can also develop in non-tree-like schemes. For instance, diglossia, as in the case of Chinese, can counterbalance the hierarchical pattern expected from differentiation by internal change associated with isolation by distance of speech communities. Using two lexical datasets, either the basic lexicon supposedly more immune to borrowing or a representative sample of the whole lexicon, we investigate the development pattern of Chinese dialects using a neighbour-net approach, which is an unprejudiced technique for representing object relationships. The resulting graphs are consistent with a dialect continuum shaped by counterbalanced effects of homogenizing diglossia and borrowing versus differentiating spread of speech communities. Historical events and linguistic claims can be mapped on these graphs.|000|bilingualism, Chinese dialects, subgrouping, Neighbor-Net 432|Hamed2005|However, diglossia, combined with massive and stratified borrowing, is expected to counterbalance the tree-like differentiation pattern by bringing genetically distant dialects into greater similarity. Given these two factors of deviation from the pure tree model, exploring the relationships between Chinese dialects in a Stammbaum framework is, *a priori*, inadequate.|1016|family tree, Chinese, Chinese dialects 433|You1992|:comment:`Wave model of Chinese dialects` .. image:: static/img/fangyan_wang.jpg :name: fangyan-wang :width: 800px :comment:`Wave model of Chinese dialects` |134|family tree, network, Chinese dialects 434|Norman2003|Before proceeding to a discussion of the various groups of dialects, it should be made clear that Chinese is not entirely amenable to a Stammbaum formulation. The fact that these dialects have been spoken in close proximity to one another for two millennia and the pervasive influence of various quasi-standards and koinés on all Chinese dialects over a very long period easily obscures underlying relationships. Nonetheless, to go the other extreme and claim that Chinese dialects cannot be classified in any meaningful way would be equally [pb] erroneous. Below I will discuss each of the major groups in turn, paying especial attention to how each group is delimited from the others.|76f|family tree, Chinese dialects 435|Aikhenvald2007b|[borrowability scale] .. image:: static/img/bscale.png :name: bscale :width: 500px [scale of borrowability]|7|borrowability, borrowing scale 436|Lopez2013|Network-based studies of genetic diversity typically foster the discovery of many unrecognized patterns, and thus contribute to actively generate novel hypotheses about the evolution of genetic diversity.|182|exploratory data analysis 437|Norman1988|This general introduction to the study of Chinese traces the language's history from its beginnings in the second millennium B.C. to the present, and provides a clear picture of the contemporary language and its sociolinguistic status. Chinese, in its numerous dialects, has more speakers than any other language in the modern world, and this vast extension in time and space brings to its study an exceptional complexity. Nevertheless, Norman's crisp organization and lucid elegance make this extraordinary range of material easily accessible even to those with an elementary understanding of linguistics. Chinese includes information on the genetic and typological connections of the language, the writing system, the classical and early vernacular tongues, the modern language and non-standard dialects, and the history of linguistic reform in China.|000|- 438|Norman1988|We have already indicated in Chapter 1 that Chinese is monosyllabic from a morphemic point of view: almost every syllable corresponds to a morpheme. There is a sense in which Chinese is also phonologically monosyllabic. In almost all descriptions of Chinese, the syllable is taken as a kind of self-contained entity which forms the basis of phonological description. It may be that this approach is to some extent influenced by the writing system in which each unit represents a single syllable, but other factors enter in as well. It would seem, for example, in historical comparison that the syllable is the largest relevant unit; another important feature of Chinese dialects, (and perhaps of other monosyllabic languages as well) is that any one dialect contains a fixed number of possible syllables. Even when new terms are borrowed from foreign languages, they are interpreted in terms of the existing set of syllables [...]. A further consideration is that most phonological processes affect the syllable without reference to its lower level constituents.|138|Chinese, syllable structure, monosyllabicity 439|Fox1995|A further factor in establishing these correspondences is that of POSITION in the string of phonemes [...]. This case demonstrates the importance of aligning the positions within the morpheme appropriately. Our concept of «correspon[pb]dence» must therefore be refined to imply «ocurring in comparable positions in genetically related forms».|67f|phonetic alignment, sound correspondences 440|Fox1995|We may note finally the question of the number of forms that are required to constitute a reliable correspondence set, and to allow us to say that the correspondence is a regular one. @Meillet<1937> (1937: 340) proposes that, for the purposes of lexical comparison at least, a minimum of three «witnesses» is required (the «three witness» requirement). Elsewhere he writes: «An agreement of two languages, if it is not complete, risks being fortuitous. But if the agreement is extended to three, four or five very different languages, chance becomes less likely» (@Meillet1925: 38, quoted from the English edition of 1967) @Hoenigswald<1950> (1950) comments, however, that: .. pull-quote:: to guard agains the effects of secondary developments in daughter languages, we may refer to Meillet's rule that in reconstructing the vocabulary of a proto-language we need the testimony of three, rather than two, independent witnesses. For many other purposes, however, reconstruction from more than two witnesses may well be viewed as a mere extension of the fundamental operation involving only two. In practice, therefore, the reliability of the reconstruction may increase with the number of witnesses, but it is not really possible to stipulate how many witnesses are actually required to ensure that a correspondence is regular; the principles of reconstruction are the same, regardless of the number of languages compared.|68|comparative method, sample size, sound correspondences 441|Snel2002|In the course of evolution, genomes are shaped by processes like gene loss, gene duplication, horizontal gene transfer, and gene genesis (the de novo origin of genes). Here we reconstruct the gene content of ancestral Archaea and Proteobacteria and quantify the processes connecting them to their present day representatives based on the distribution of genes in completely sequenced genomes. We estimate that the ancestor of the Proteobacteria contained around 2500 genes, and the ancestor of the Archaea around 2050 genes. Although it is necessary to invoke horizontal gene transfer to explain the content of present day genomes, gene loss, gene genesis, and simple vertical inheritance are quantitatively the most dominant processes in shaping the genome. Together they result in a turnover of gene content such that even the lineage leading from the ancestor of the Proteobacteria to the relatively large genome of Escherichia coli has lost at least 950 genes. Gene loss, unlike the other processes, correlates fairly well with time. This clock-like behavior suggests that gene loss is under negative selection, while the processes that add genes are under positive selection.|000|- 442|Snel2002|This article mentions in the end a technique to handle fused genes. This may be important and interesting, since fused genes are similar to cases of partial cognacy in historical linguistics. The technique for inference is based on some kind of parsimony, but it is not entirely clear, which principles it is generally based upon: inclusion of LGT is modelled as separate events on the reference tree, which is standard. However, the penalties for LGT events are assumed to be always more than those for loss events. This seems to have some obvious reason in biology which I'm not aware of. |000|character mapping, lateral gene transfer 443|Katicic1966|Hier soll die Bestimmung genügen, daß zwei Mengen, deren Elemente innerhalb jeder Menge in gewissen Beziehungen zueinander stehen, *isomorph* sind, wenn jedem Element der einen Menge eindeutig je ein Element der anderen zugeordnet werden kann *und umgekehrt*, so daß das zugeordnete Element zu den übrigen Elementen seiner Menge im gleichen Verhältnis steht, wie das Element, dem es zugeordnet wird, zu denen der seinen.|50|isomorphism 444|Katicic1966|Wenn von zwei solchen Mengen jedes Element der einen eindeutig einem Element der anderen zugeordnet werden kann, *aber nicht notwendigerweise umgekehrt*, so daß die Elemente der zweiten Menge im gleichen Verhältnis zueinander stehen wie die jedem von ihnen zugeordneten Untermengen der ersten, dann ist die zweite Menge der ersten *homomorph*.|51|homomorphism 445|Katicic1966|Als *Modell* einer Menge, deren Elemente in gewissen Verhältnissen zueinander stehen, werden wir eine ihr homomorphe Menge betrachten, die wenn alle für unsere jeweilige Untersuchung unwesentlichen Unterschiede in der ersten Menge außer acht gelassen werden, ihr isomorph ist.|54|model, 446|Katicic1966|Die Vielfalt der Einheiten und Verhältnisse in der Organisation der sprachlichen Texte, also eine natürliche Sprache, kann beim heutigen Stand unserer Bescreibungstechnik und Unterscheidungskapazität durch ein isomorphes Gebilde nicht dargestellt werden. Selbst wenn es gelingen sollte, das Modell zu konstruieren, wre es für uns ebenso unübersichtlich und in seiner Vielfalt unfaßbar, wie die Organisation der Texte selber. In der beschreibenden Sprachwissenschaft muß man also mit homomorphen Darstellungen arbeiten. Nur so kann die nötige Vereinfachung erzielt und ein bestimmter Teil der Einheiten und ihrer Verhältnisse je nach dem Gesichtspunkt des Forschers als Gegenstand der Forschung ausgesondert werden.|55|language, language model 447|Katicic1966|Die Gesamtheit der Entsprechungen und ihrer gegenseitigen Verhältnisse kann wieder nur durch ein Modell dargestellt werden. |56f|sound correspondences, language model 448|Katicic1966| Unter vergleichender Sprachforschung verstehen wir aber nicht jede mögliche Vergleichung von beschreibenden Sprachmodellen, sondern nur solche, durch welche die genetische Verwandtschaft verschiedener Sprachen festgestellt und ihre Vorgeschichte erforscht werden kann. |57|comparative linguistics, genetic relationship 449|Katicic1966|Die Elemente der sprachlichen Vielfalt existieren nur in einer bestimmten Zeit, an einem bestimmten Ort und in einer bestimmten gesellschaftlichen Einheit.|57|- 450|Katicic1966|Zwei Sprachen sind genetisch verwandt, wenn eine von ihnen die ältere Entwickelungsstufe der anderen ist, oder wenn sie eine gemeinsame ältere Entwickelungsstufe haben. |62|genetic relationship 451|Katicic1966|Es ist dem Wesen dieser Erscheinung durchaus angemessen und ergibt sich, wie gezeigt wurde, von selbst, wenn man versucht, die genetische Sprachverwandtschaft einigermaßen exakt darzustellen. Es ist also gar kein Wunder, daß dieses Modell in den Vorstellungen der vergleichenden Sprachwissenschaftler trotz der heftigen an ihm geübten Kritik immer eine so wichtige Rolle gespielt hat. Diese zwiespältige Lage, in der man die Stammbaumtheorie verwarf, gleichzeitig sich aber ihrer geflisstentlich bediente, war die Folge eines theoretischen Mißverständnisses. Es wurde nicht erkannt, dß der Stammbaum ein *Modell* der genetischen Verwandtschaftsbeziehungen und keine SChilderung des Ablaufs geschichtlicher Ergeignisse ist.|63|family tree, genetic classification 452|Katicic1966|Außerdem wurden nur zu oft die Stammbäume nach willkürlich ausgewählten Isoglossen und nicht nach den gegenseitigen Verhältnissen ganzer Sprachen aufgestellt, was dann auch mit Recht heftig kritisiert wurde, weil Stammbäume von vereinzelten Isoglossen nicht für Stammbäume von Sprachen ausgegeben werden dürfen.|63|classification criteria, family tree, genetic classification, isoglosses 453|Katicic1966|Wenn die Wortformen zweier Sprachen durch eindeutige Verwandlung der Phoneme einer dritten Sprache abgeleitet werden können, werden ihre Phoneme charakteristische Entsprechungen nach ihrem Platz in den Wortformen aufweisen. Diese Entsprechungen stehen zum phonologischen System der älteren Sprache in einem bestimmten Verhältnis. Dieses Verhältnis ist notwendigerweise ein Homomorphismus, weil die Verwandlungregeln an *phonologischen Einheiten* angewandt werden, und so müssen auch die Entsprechungen zu den phonologischen Einheiten der älteren Entwickelungsstufe in einem eindeutigen Verhältnis stehen, wenn auch nicht notwenid in einem ein-eindeutigen. Somit können die Entsprechungen mit den Verhältnisse, in denen sie zueinander stehen, als ein *Modell der gemeinsamen älteren Stufe* der verglichenen Sprachen betrachtet werden. Falls dieses Modell mit einer der verglichenen Sprachen isomorph ist, dann ist eben diese Sprache die gemeinsame ältere Entwickelungsstufe. |65|sound correspondences, comparative method 454|Katicic1966|Es sind nicht die Grundsprachen, die erschlossen werden, sondern ihre mehr oder weniger vollständigen, mehr oder weniger getreuen Modelle. [...] Ähnlich wie bei Schleichers Stammbaumtheorie handelt es sich nicht um eine geschichtliche Tatsache, sondern um ein Modell, das recht verschiedenen geschichtlichen Tatsachen homomorph sein kann.|65|linguistic reconstruction, language model, proto-language 455|Katicic1966|Über das Zustandekommen einer solchen Menge ist in dem Begriff der Grundsprache nichts präjudiziert. Sie kann in einigen Fällen durch divergente Entwickelung einer wirklich einheitlichen Sprache netstehen, aber auch durch Entlehnung und tiefgehende gegenseitige Beeinflussung. Daraus ergibt sich aber die überaus wichtige und folgenschwere Einsicht, daß die genetische Verwandtschaft von Sprachen nicht in jedem Falle das Ergebnis eines gleichen geschichtlichen Vorganges sein muß.|65|genetic relationship, language history 456|Katicic1966|Die Grundsprache als Modell einer gemeinsamen älteren Entwickelungsstufe ist ein Begriff, der auf Grund ausschließlich sprachlicher Entsprechungsverhältnisse bestimmt ist, und sagt somit über geschichtliche Begebenheiten nur wenig aus. Natürlich müssen Sprachen, in deren Entsprechungen das Modell einer gemeinsamen älteren Entwicklungsstufe kodiert ist, früher einmal in engen Beziehungen zueinander gestanden sein. Doch diese recht allgemeine Feststellung ist auch alles, was auf Grund der genetischen Verwandtschaft über geschichtliche Vorgänge geschlossen werden kann.|66|language history, genetic relationship, history, 457|Ross1950|Among other problems, Ross discusses a method for measuring the genetic closeness between languages based on the count of shared roots retained from the ancestor language. In concrete, his calculations are based on the etymological dictionary of Indo-European of Walde and Pokorny. His statistics are based on a hypergeometric distribution. Part of the data of Walde-Pokorny are given in numerical form.|000|- 458|Ross1950|It seems that nearly all the evidence for assessing the closeness of the relationship between two of the branches of Indo-European is comprised in a table made up as follows: Allot a column to each branch and a row to each attested Indo-european root; if a root is present in a branch put a cross in the appropriate cell of the table. |26f|presence-absence pattern, phyletic pattern 459|Ross1950|The question "Is Italic closely related to Greek?" would seem to be equivalent to the question, "Given the number of crosses in the two relevant columns, what is the probability of obtaining the given number of cases of a row with a cross in each of the two columns (or a greater number) if the crosses were placed in the two columns at random?"|27|genetic relationship, genetic closeness, 460|Ross1950|**Comment by Kendall on Ross' article:** The unknown parameters :math:`N_1_+_2, α_1, α_2`, are also three in number; of these the first is the number of roots present in the common parent at the moment of separation, while :math:`α_j (j = 1,2)` denotes the chance of survival for an individual root along the line of descent to the language labelled *j*. I do not assume the *α*'s to be equal, for the two lines of descent may represent different periods of historical time, and in any case represent different sets of historical circumstances. One must, however, assume that along a given segment of a given line of descent the chance of survival is the same for every root exposed to risk, and one must also assume that the several roots are exposed to risk independently. |41|- 461|Ross1950|**comment by Kendall:** Thus we shall call a pair of languages closely related when the estimate :math:`N_1_+N_2` turns out to be much lower than that obtained for most of the other parirs of languages in the family.|42|- 462|Shapiro2007|**Grammatical function as a problem of translation in Chinese** As one can see, it is a question of grammatical function rather than frequency or style.|397|translation, Swadesh list 463|Turchin2010|The idea that the Turkic, Mongolian, Tungusic, Korean, and Japanese languages are genetically related (the “Altaic hypothesis”) remains controversial within the linguistic community. In an effort to resolve such controversies, we propose a simple approach to analyzing genetic connections between languages. The Consonant Class Matching (CCM) method uses strict phonological identification and permits no changes in meanings. This allows us to estimate the probability that the observed similarities between a pair (or more) of languages occurred by chance alone. The CCM procedure yields reliable statistical inferences about historical connections between languages: it classifies languages correctly for well- known families (Indo-European and Semitic) and does not appear to yield false positives. The quantitative patterns of similarity that we document for languages within the Altaic family are similar to those in the non- controversial Indo-European family. Thus, if the Indo-European family is accepted as real, the same conclusion should also apply to the Altaic family.|000|sound classes, consonant classes, cognate detection 464|Turchin2010|Present a new method of calculating the probability of language relationship based on Swadesh-lists and Dolgopolsky Consonant Classes (Dolgopolsky 1964`_, `Swadesh 1952 `_). |000|lexicostatistics 467|Lees1953|If (1) the morpheme inventory of a language, or a definable portion of it, is observed over a span of time, and if (2) the individual members of the inventory at a given time are identified as cognates of members at some previous time, and if (3) some statable regularity can be found in the time-rate at which members disappear from the inventory to be replaced by new items, then the number of items in a certain subset which are present at any one time can be used as a measure of time elapsed since some previous time-point for which a similar count is available.|113|- 468|Lees1953|If the morphemes correlated with a certain subset of cultural universals in some language at a given time is compared with the corresponding morphemes correlated with the same meanings in the derivative cognate language at some later time, many corresponding morphemes will be found to be cognate; but a certain number may not be cognate.¹ In the latter case, certain morphemes of the original set have disappeared and have been replaced by new, non-cognate morphemes. This temporal decrease in the size of the original subset is called MORPHEME DECAY. ¹By cognate we mean, of course, derivable one from the other by the use of a systematic set of phoneme correspondences, furnished by the traditional comparative method as applied to the language family in question.|114|- 469|Lees1953|The reasons for morpheme decay, i.e. for change in vocabulary, have been classified by many authors; they include such processes as word tabu, phonemic confusion of etymologically distinct items close in meaning, change in material culture with loss of obsoloete terms, rise of witty terms or slang, adoption of prestige forms from a superstratum language, and various gradual semantic shifts such as specialization, generalization, and pejoration.|114|lexical change 470|Lees1953|the basic=root-morpheme inventory is sampled in the following way: A small set of basic morphemes (say 200) is selected from the inventory of some control language (say English), and each item in it is translated into the common colloquial expression of the test languages. These translation will then comprise, for the most part, root-morphemes which can be compared by the usual etymological techniques. Corresponding terms in towo test languages will be either cognate or non-cognate, the latter label including terms borrowed by one language from the other. it is assumed that all the various causes of morpheme decay add up in both languages to some total amount of change which is dependent only upon the length of time during which these causes have been active.|115|lexicostatistics, wordlist, lexical change 471|Starostin2013a|С нашей точки зрения, «сетевое» моделирование межъязыковых отношений по определению носит смешанный «генетическо-ареальный» характер и, следовательно, никоим образом не может замещать «древесную» схему, претендующую на отражение исключительно генетических связей; таким образом, по- строение исторически осмысленных «деревьев» и «сетей» — не взаимо- исключающие, а скорее взаимодополняющие задачи. При этом ни для одной «сети» по определению невозможно предложить оптимальную историческую интерпретацию, если она не будет сопровождаться «дере- вом», обратное же неверно.|24|family tree, wave theory, network 472|Lopez2013|Linguists, as well as biologists, study historical objects that form lineages, undergoing transformations over time. Biologists, as well as linguists, therefore, are very dependent on comparative analyses to structure and analyze their data. Thus, it seems intuitive that conceptual and methodological researches in both fields could inform each other, and benefit to both fields. In particular, the comparative approaches elaborated in biology are experiencing massive developments that could be explored in linguistic studies.|000|- 473|Schleicher1853|Diese Annahmen, logisch folgend aus den Ergebnissen der bisherigen Forschung, lassen sich am besten unter dem Bilde eines sich verästelnden Baumes anschaulich machen.|787|family tree 474|Schleicher1853|.. image:: static/img/schleicherstree.png :name: schleicher-tree :width: 500px :comment:`Schleicher's early tree.` |787|family tree 475|McMahon1994|Convergence takes place within a convergence area, linguistic area, or Sprachbund \`which includes languages belonging to more than one family but showing traits in common which are found not to belong to the other members of (at least) one of the families' (Emeneaur 1956: 16). IN other words, in a convergence area, \`genetic heterogeneity is gradually replaced by typological homogeneity' (Lehiste 1988: 59). [...] Convergence typically occurs in situations where communication between linguistic groups is essential, and all, or the majority of speakers must learn and use two (or more) grammars, each with its own lexicon and set of rules. It will clearly be easier for an individual to learn the grammars, and therefore master the languages, if the grammars are similar. What seems to happen in extreme cases of convergence is a gradual approximation of the rules that generate the two languages over time, so that the strucutres generated correspondingly become more and more similar.|213|contact-induced similarity, convergence, language union, language mixture, language contact 476|McMahon1994|However, there is usually little effect on the lexical material; the languages [pb] retain their own words and morphemes, but become markedly similar in structure, producing ultimate intertranslatability with effectively a single set of syntactic rules and two sets of lexical items. |213f|convergence, language contact 477|Trask2000|**genetic relationship** (also **genetic affiliation**) the relationship which holds between two or more languages which share a single common ancestor – that is, they all started off at some time in the past as no more than regional varieties of that ancestral language, but each has undergone so many changes not affecting the others that they have diverged into distinct languages. All the languages sharing such a common ancestor constitute a single language family, ald all those languages which share a single common ancestor at some intermediate time constitute a single branch of that family. The identification of genetic relationships is the principal business of comparative linguistics.|133|- 478|Jones1798|The *Sanscrit* language, whatever be its antiquity, is of a wonderful structure; more perfect than the *Greek*, more copious than the *Latin*, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, [pb] both in the roots of verbs and the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists; there is a similar reason, though not quite so forcible, for supposing that both the *Gothic* and the *Celtic*, though blended with a very different idiom, had the same origin with the Sanscrit; and the old *Persian* might be added to the same family, if this were the place for discussing any question concerning the antiquities of *Persia*.|422f|Sir William Jones, genetic relationship 479|Szemerenyi1970|Das bedeutet, daß bei Vergleichen der Form unbedingt der Vorzug gegeben werden muß. Wenn zwei Formen sich genau – oder den Regeln nach entsprechen – wiegt das auch gewisse Abweichungen in der Bedeutung auf.|15f|semantic similarity, formal similarities 480|Huang2002|[pǔtōnghuà, definition of phonological perspective] *yǐ Běijīng yǔyīn wéi biāozhǔn* 以北京语音为标准 :translation:`phonology of Běijīng as standard` [pǔtōnghuà, definition from grammatical perspective] *yǐ diǎnfàn de báihuàwén zhùzuò wéi yǔfǎ guīfàn* 以典范的白话文著作为语法规范 :translation:`classical báihuà-grammar as grammatical reference` |4|pǔtōnghuà, Chinese 481|Gabelentz1881|Denn der Deduction ist nicht Alles erreichbar, weil in der Sprache wohl Alles gesetzlich, aber nicht Alles nothwendig ist. Auch die Freiheit, die **Geschichte** hat ihr Antheil an ihr. Wo jene Grundgesetze nur ein Bereich von Möglichkeiten umgränzen konnten, da hatte die Geschichte zu entscheiden, welche dieser Möglichkeiten zu Thatsachen werden sollten.|16|- 482|Patthy2003|Modular assembly of novel genes from existing genes has long been thought to be an important source of evolutionary novelty. Thanks to major advances in genomic studies it has now become clear that this mechanism contributed significantly to the evolution of novel biological functions in different evolutionary lineages. Analyses of completely sequenced bacterial, archaeal and eukaryotic genomes has revealed that modular assembly of novel constituents of various eukaryotic intracellular signalling pathways played a major role in the evolution of eukaryotes. Comparison of the genomes of single-celled eukaryotes, multicellular plants and animals has also shown that the evolution of multicellularity was accompanied by the assembly of numerous novel extracellular matrix proteins and extracellular signalling proteins that are absolutely essential for multicellularity. There is now strong evidence that exon-shuffling played a general role in the assembly of the modular proteins involved in extracellular communications of metazoa. Although some of these proteins seem to be shared by all major groups of metazoa, others are restricted to certain evolutionary lineages. The genomic features of the chordates appear to have favoured intronic recombination as evidenced by the fact that exon-shuffling continued to be a major source of evolutionary novelty during vertebrate evolution.|000|gene fusion 483|Morrison2011|Exploratory data analysis (EDA), or descriptive data analysis, involves evaluating the characteristics of the data *before* proceeding to the definitive analysis in relation to the scientific question at hand. In this regard, a data-display network is designed to display any character (or tree) conflict in a dataset, without prior assumptions about the causes of those conflicts. Compatible phylogenetic data patterns are displayed as a tree, while incompat- ibilities in the data are displayed as reticulations in the tree. The data may be either raw character data (e.g. sequences, AFLP, microsatellites, SNPs) or they may be characters summarized as a set of trees (e.g. gene trees).|51|exploratory data analysis, EDA 484|Morrison2011|Standard piece of literature on network applications in evolutionary biology. Important parts include the following: * Exploratory data analysis [51-55] * Causes of Reticulation [44-46] * Evolutionary networks (in contrast to data-display networks) [105-162]|000|exploratory data analysis, phylogenetic network 486|Hamed2006|.. image:: static/img/hamed-wang-ctree.png :name: hamed-wang :width: 800px :translation:`Majority rule consensus tree.`|46|- 487|Hamed2006|.. image:: static/img/hamed-wang-ctree2.png :name: hamed-wang :width: 800px :translation:`Summary of the majority rule consensus.`|47|- 488|Chen2005|Languages of ethnic minorities affect Chinese in two ways. First , a new Chinese dialect emerges through mother tongue interruption , when a minority people speak Chinese. Second , a new Chinese dialect emerges through mother tongue transition , when a minority people substitute Chinese for their own mother tongue. Study of conversa 2 tional situations is crucial for the understanding of the two models. The early conversational situation of many so 2 called creole languages can be judged by its rank distribution of correlative morphemes. These languages can be treated as form 2 transited Chinese dialects. The mother tongue interruption under an island condition is crucial for the emergence of form 2 transited Chinese dialects.|000|Chinese, Chinese dialects, family tree 489|Chen2005|.. image:: static/img/chen-2005-famtree.png :name: chinese_dialects :width: 700px [Family tree of the Chinese dialects]|43|family tree, Chinese dialects 490|Norman1988|:translation:`The "famous" Southern Chinese hypothesis is discussed on these pages.`|210-214|Southern Chinese hypothesis 491|Edgar2004a|Distance matrices are clustered using UPGMA (11), which we find to give slightly improved results over neighbor-joining (12), despite the expectation that neighbor-joining will give a more reliable estimate of the evolutionary tree. This can be explained by assuming that in progressive alignment, the best accuracy is obtained at each node by aligning the two profiles that have fewest differences, even if they are not evolutionary neighbors.|1792|Neighbor-Joining, UPGMA, guide tree, multiple sequence alignment 492|Sechehaye1908|Il constitue lui aussi une «forme» dans le sens où nous avons entendu ce terme, car on peut concevoir le système phonologique sous son aspect algébrique et remplacer les trente, cinquante ou cent éléments qui le composent dans une langue donée, par autant de symboles généraux qui fixent leur individualité, mais non pas leur caractère matériel. |151|linguistic sign 493|Jakobson1978|For Saussure, changes in sounds are blind and fortuitous, and \`alien to the system of the language'. But observation has shown on the contrary that changes cannot be understood except in relation to the phonological system which undergoes them. Consequently the system of sounds considered as linguistic values can be studied in its evolution just as well as in its given state, and phonology includes the historical study of phonemes.|49|linguistic sign 494|Jakobson1978|The ontological problem of what form of reality is concealed behind the idea of the phoneme is in fact not at all specific to the idea of the phoneme. |52|linguistic sign 495|Jakobson1978|The linguistic value [...] of any phoneme in any language whatever, *is only its power to distinguish the word containing this phoneme from any words which, similar in all other respects, contain some other phoneme.*|61f|linguistic sign, phoneme, phoneme sequence 496|Jakobson1978|What corresponds to the difference between two phonemes is solely the *fact* of a difference in meaning, whereas the *content* of these different meanings varies from one word to another. |62f|linguistic sign, phoneme 497|Haldeman1857|:translation:`Interesting example of chance resemblances: English gnaw is compared to Chinese 咬, with the wrong cognate assumption being actually triggered by the fact that the Chinese word is not written properly!`|201|chance resemblance 498|Jakobson1978|From a strictly articulatory point of view there is no {\em succession} of sounds. Instead of following one another the sounds overlap; a sound which is acoustically perceived as coming after another one can be articulated simultaneously with the latter or even in part before it.|11|phoneme sequence, articulation, speech signal 499|IPA1999|Phonetic analysis is based on the crucial premise that it is possible to describe speech in terms of a sequence of segments, and on the further crucial assumption that each segment can be characterized by an articulatory target.|6|segmentation, phoneme, phoneme sequence, speech signal 500|Durbin2002|Comparative analysis is a painstaking art. Inferring thee correct structure by comparative analysis requires knowing a structurally correct multiple alignment, but inferring a structurally correct multiple alignment requires knowing the correct structure. A structure is \`solved' by an iterative refinement process of guessing the structure based on the current best guess of the multiple alignment, then realigning based on the new guess at the structure. The sequences to be compared must be sufficiently similar that they can be initially aligned by primary sequence identity alone to start hte process but they must be sufficiently dissimilar that a number of covarying substitutions can be detected. |266|- 501|PeirceCP|Accepting the conclusion that an explanation is needed when facts contrary to what we should expect emerge, it follows that the explanation must be such a proposition as would lead to the prediction of the observed facts, either as necessary consequences or at least as very probable under the circumstances. A hypothesis then, has to be adopted, which is likely in itself, and renders the facts likely. This step of adopting a hypothesis as being suggested by the facts, is what I call *abduction*. I reckon it as a form of inference, however problematical the hypothesis may be held. |7.202|abduction 502|Starostin2013a|Из новейших разработок такого рода следует особо выделить статью `[Steiner et al. 2011] `_, посвященную подробному описанию фор- мального алгоритма, цель которого (едва ли не впервые в западной кван- титативной исторической лингвистике) определена не просто как обна- ружение неслучайных совпадений звучания в фиксированных списках, а как частичная симуляция работы компаративиста. Такое развитие необ- ходимо всячески приветствовать, несмотря на то, что конкретные резуль- таты тестирования, описанные в статье, пока что не дают возможности понять, будет ли соответствующий алгоритм в состоянии успешно справляться с «нетривиальными» ситуациями (дальнее языковое родство, умение различать между генетически общими когнатами и следами ареальных когнатов и т. п.). В частности, из двух протестированных семей алгоритм выдает позитивные результаты для цезских языков (близкород- ственных в рамках одной ветви нахско-дагестанской семьи), но неопределенные для языков матако-гуайкуру (гипотетическая семья в Южной Америке, относительно генетического единства которой не существует единой точки зрения), т. е., по сути, застревает на тех же сложностях, на которых дает сбои ручная процедура, осуществляемая компаративистом на «субъективных» основаниях.|30|cognate detection 503|Gates2012|Corresponding lexical items that are bimorphemic must have both morphemes cognate in order to be considered as ‘lexically similar’. For example ‘roof’ inP-BJ, pronounced cəmˈkʉʔ, and D-JCD, pronounced spə³ku¹, are considered ‘lexicallydissimilar’ because, although the second syllable is cognate, the first syllable is not.|51|partial cognacy, Sino-Tibetan 504|Wettig2013|This paper introduces several models for investigating and evaluating ety- mological data. Our main point of departure is alignment of etymological data. Specifically, given a raw set of etymological data, we first aim to find the “best” alignment at the sound or symbol level. The reason for this focus is given in section 1.1. Sets of etymological data are found in digital ety- mological databases, such as ones we use for the Uralic language family. A database is typically organized into cognate sets; all elements within a cog- nate set are posited (by the database creators) to be derived from a common origin. The origin is some word-form in the ancestral language.|000|linguistic reconstruction, phonetic alignment 505|Shneiderman1991|The traditional approach to representing tree structures is as a rooted, directed graph with the root node at the top of the page and children nodes below the parent node with lines connecting them. Knuth (1968, p. 305-313) has a long discussion about this standard representation, especially why the root is at the top and he offers several alternatives including brief mention of a space-filling approach. However, the remainder of his presentation and most other discussions of trees focus on various node and edge representations. By contrast, this paper deals with a two-dimensional (2-d) space-filling approach in which each node is a rectangle whose area is proportional to some attribute such as node size.|000|family tree, genetic classification, visualization, treemaps 506|Arvelakis2005|Over recent years the field of phylogenetics has witnessed significant algorithmic and technical progress. A new class of efficient phylogeny programs allows for computation of large evolutionary trees comprising 500–1.000 organisms within a couple of hours on a single CPU under elaborate optimization criteria. However, it is difficult to extract the valuable information contained in those large trees without appropri- ate visualization tools. As potential solution we propose the application of treemaps to visualize large phylogenies (evolutionary trees) and im- prove knowledge-retrieval. In addition, we propose a hybrid tree/treemap representation which provides a detailed view of subtrees via treemaps while maintaining a contextual view of the entire topology at the same time. Moreover, we demonstrate how it can be deployed to visualize an evolutionary tree comprising 2.415 mammals. The respective software package is available on-line at http://www.ics.forth.gr/~stamatak. |000|family tree, genetic classification, visualization, treemaps 507|Hall2002|The Parasitic Hypothesis, formulated to account for early stages of vocabu- lary development in second language learners, claims that on initial exposure to a word, learners automatically exploit existing lexical material in the L1 or L2 in order to establish an initial memory representation. At the level of phonological and orthographic form, it is claimed that significant overlaps with existing forms, i.e. cognates, are automatically detected and new forms are subordinately connected to them in the mental lexicon. In the study re- ported here, English nonwords overlapping with real words in Spanish (pseu- docognates), together with noncognate nonwords, were presented to Spanish- speaking learners of English in a word familiarity task. Participants reported significantly higher levels of familiarity with the pseudocognates and showed greater consistency in providing translations for them. These results, together with measures of the degree of overlap between nonword stimuli and transla- tions, were interpreted as evidence for the automatic use of cognates in early word learning.|000|second language learning, mental lexicon 508|Beckwith2002|This is a chapter on words from the Pyu language, a Tibeto-Burman language which is completely unknown so far. More information available on: * http://glottolog.org/resource/reference/id/95905 * http://glottolog.org/resource/languoid/id/burm1262 |000|Pyu language, Tibeto-Burman, Sino-Tibetan 509|Palla2007|The rich set of interactions between individuals in society 1–7 results in complex community structure, capturing highly con- nected circles of friends, families or professional cliques in a social network 3,7–10 . Thanks to frequent changes in the activity and com- munication patterns of individuals, the associated social and com- munication network is subject to constant evolution 7,11–16 . Our knowledge of the mechanisms governing the underlying commun- ity dynamics is limited, but is essential for a deeper understanding of the development and self-optimization of society as a whole 17–22 . We have developed an algorithm based on clique percolation 23,24 that allows us to investigate the time dependence of overlapping communities on a large scale, and thus uncover basic relationships characterizing community evolution. Our focus is on networks capturing the collaboration between scientists and the calls be- tween mobile phone users. We find that large groups persist for longer if they are capable of dynamically altering their member- ship, suggesting that an ability to change the group composition results in better adaptability. The behaviour of small groups dis- plays the opposite tendency—the condition for stability is that their composition remains unchanged. We also show that know- ledge of the time commitment of members to a given community can be used for estimating the community’s lifetime. These find- ings offer insight into the fundamental differences between the dynamics of small groups and large institutions.|000|- 510|Allwood2003|The purpose of this paper is to suggest a view of word meaning on the type level based on “meaning potentials” rather than on reified type meanings founded on either of the two traditional approaches of abstract generalization (Gesamtbedeu- tung) and typical or basic meaning (Grundbedeutung). It is suggested that actual meaning on the occurrence level is produced by context sensitive operations of meaning activation and meaning determination which combine meaning potentials with each other and with contextually given information rather than by some sim- ple compositionality operations yielding phrase and sentence meaning from simple type meanings of one of the two traditional kinds. To establish this goal, I first present the traditional notions and discuss some problems which arise when trying to handle variation in meaning. I then specifically discuss the relation of homonymy and polysemy to the traditional notions. In section 3, I introduce the notion of “meaning potentials” as an alternative to the traditional notions and then discuss in section 4 how this notion might be used to handle problems of meaning variation, focusing especially on homonymy and polysemy.|000|reference potential, meaning potential, meaning, reference 511|Prokic2013|In this paper we combine the geographic variation of closely related language variants (‘dialects’) with the distribution of sound correspondences through the lexicon. One of the central problems with sound correspondences at the dialect level is that they are not very regular, especially when they are investigated in sufficient detail. Sound changes spread both through a language (e.g., from one word to another) and through the population of speakers (in our case through a population of villages with different dialects). Both processes happen at the same time, and the challenge is to reconstruct what has happened from a snapshot of synchronic data. The method described in this paper allows us to track the geographic spread of sound changes and the underlying patterns of linguistic diversity simultaneously. By combining the two, it is possible to detect areas of intensive linguistic contact and gain better insight into the mechanisms of language change.|000|networks, sound correspondences, linguistic area 512|Goodfellow2014|Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localiza- tion, segmentation, and recognition steps. In this paper we propose a unified ap- proach that integrates these three steps via the use of a deep convolutional neu- ral network that operates directly on the image pixels. We employ the DistBe- lief (Dean et al., 2012) implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the per- formance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-the- art, achieving 97.84% accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over 90% accuracy. To further explore the applicability of the proposed system to broader text recognition tasks, we apply it to transcribing synthetic distorted text from a popular CAPTCHA ser- vice, reCAPTCHA. reCAPTCHA is one of the most secure reverse turing tests that uses distorted text as one of the cues to distinguish humans from bots. With the proposed approach we report a 99.8% accuracy on transcribing the hardest cat- egory of reCAPTCHA puzzles. Our evaluations on both tasks, the street number recognition as well as reCAPTCHA puzzle transcription, indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators.|000|character recognition, ocr, machine learning, neural network 513|Chen1606|故士人篇章,必有音節,田野俚典,亦名諧 聲,豈以古人之詩而獨無韻乎?蓋時有古今, 地有南北,字有更革,音有轉 移,亦埶所必至。故以今之音讀古之作,不免乖剌而不入。 :translation:`The writings of scholars must be made of adequate sounds. Even in the rural areas everybody orders the sounds harmonically. Can it be that the ancients solely did not have rhymes? One can say that in the same way in which ancient times differ from modern times, and places in the North differ from places in the South, characters change and sounds shift. This is a natural tendency. Therefore, it is inevitable that reading the ancient writings with modern pronunciation will sound improper and wrong.`|原序|shījīng, language change, sound change, Chinese 514|Chen1606|古今一音,古今一聲,以吾之意而逆古人之意,其理不遠也,以吾之聲而調古 人之聲,其韻不遠也。:translation:`Zwischen Altertum und Gegenwart sind die Laute identisch, so wie zwischen Altertum und die Gegenwart auch die Stimmen identisch sind. Wenn ich mit meinen eigenen Konzepten die Konzepte der Menschen des Altertums rekonstruiere, werden deren Regelmässigkeiten nicht allzu entrückt sein; wenn ich mit meiner eigenen Stimme die Stimmen der Menschen des Altertums abgleiche, werden ihre Reime nicht allzu entrückt sein.` [Translation taken from @Behr2014]|原序|language change, sound change, shījīng, Chinese 515|Duan1750|時有古今,地有南北,音不能無流變。音既變矣,文人學士聘才任意,又從而 汩之,古音於是亦淆訛,如分絲之不可理。《三百篇》者,古音之叢,亦百世 用韻之準。稽其入韻之字,凡千九百有奇,同今音者十七,異今音者十三。試 用治絲之法,分析其緒,比合其類,綜以部居,緯以今韻,古音犁然。起間不 無方語差池,臨文假借,案之部分,間有出入之篇章,然亦可指數矣。 :translation:`Da es bei der Zeit Altertum und Gegenwart gibt, bei den Regionen Süd und Nord, sind die Laute unweigerlich dem Wandel unterworfen. Sobald ein Laut sich gewandelt hat, werden die Literaten und Gelehrten sich aufs Geratewohl verhalten. Wenn ihre Praxis zahlreiche Nachahmer findet, werden auch die altertümlichen Aussprachen sodann korrumpiert, wie ein Seidenfaden, den man nicht mehr entwirren kann. Die Dreihundert Kapitel (des Buches der Lieder) sind Repositorien der altertümlichen Lautungen und zugleich die Norm für den Reimgebrauch durch die Generationen hinweg. Untersucht man die Schriftzeichen, die sich in einer Reimposition befinden, so stellt man fest, dass bei den rund 1.900 auftretenden Fällen das Verhältnis zwischen denjenigen, die mit der heutigen Aussprache übereinstimmen zu denen, die es nicht tun, etwa 70:30 ist. Versucht man es gleichsam mit der Methode des Entwirrens von Seide, d.h. indem man ihre einzelnen Fäden separiert, sie zu gleichen Typen bündelt und jeweils zu Reimklassen zusammenfaßt und sie mit den heutigen Reimen verwebt, dann werden die altertümlichen Reimklassen wie Ackerfurchen von selbst deutliche Gestalt annehmen. Zunächst wird es aufgrund von Dialektausdrücken und phonetischen ad-hoc-Entlehnungen immer Unregelmäßigkeiten geben, Abschnitte, in denen Spezialfälle vermerkt werden müssen, oder gelegentlich auch Verse, die aus dem Schema ausbrechen; aber solcherlei Fälle kann man wohl an einer Hand abzählen.` [Translation by @Behr2014]|?|sound change, language change, Duàn Yùcái, Chinese 516|Behr2014|Text provides a summary on different sinologists who were teaching at the university Frankfurt. The text gives an outline on some etymologies of Old Chinese, including the one for 首 and 頭, along with interesting translation of early passages on language change as reflected in the Chinese literature.|000|Chinese, language change, sound change, Old Chinese 517|Labov2007|The transmission of linguistic change within a speech community is characterized by incremen- tation within a faithfully reproduced pattern characteristic of the family tree model, while diffusion across communities shows weakening of the original pattern and a loss of structural features. It is proposed that this is the result of the difference between the learning abilities of children and adults. Evidence is drawn from two studies of geographic diffusion. (i) Structural constraints are lost in the diffusion of the New York City pattern of tensing short-a to four other communities: northern New Jersey, Albany, Cincinnati, and New Orleans. (ii) The spread of the Northern Cities Shift from Chicago to St. Louis is found to represent the borrowing of individual sound changes, rather than the diffusion of the structural pattern as a whole.|000|language change, family tree, wave theory 518|Prokic2013|.. math:: \text{Poisson Association} = \text{sign}(O-E) * (O*\text{log}(O/E)-(O-E)) :comment:`[O is "observed" and E is "expected" frequency]`|51|phonetic alignment, log-odds ratio, Poisson distribution 519|Apeltsin2011|**Motivation:** Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms. **Results:** We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. **Availability:** Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip.|000|threshold, homolog detection, automatic threshold selection 520|Frank2010|This paper examines how historical cognitive linguistics can benefit methodologically through the application of the notion of language as a complex adaptive system. The idea that languages are complex adaptive systems (CAS) was introduced initially in computa- tional evolutionary linguistics, a discipline that was and remains inspired by biological, systems theoretical approaches to the evolution of life. Here the way that the CAS ap- proach serves to replace older historical linguistic notions of languages as organisms and languages as species is explained as well as how the CAS approach can be generalized to encompass linguistic domains. Specifically, an overview of the CAS approach and its implementation in linguistics is provided with an emphasis on stigmergic, embodied, usage-based and socio-culturally situated language studies in particular.|000|language model, complex adaptive system 521|Frank2010|Paper gives an interesting overview over the idea to model language as a *complex adaptive system*. By contrasting the new idea to the older theory of language as organism, or language as a simpler evolutionary system (as in @Croft2000), the new model is introduced and discussed. Of greater interest is the fact that *language as a complex adaptive system* implies * that agent-based models and simulations are especially apt to study language evolution, and * that language cannot be described in isolation of the speakers (as implied by UG approaches) Since the CAS approach to language modeling works with a direct dynamic component, it might be the model of choice which one should consider when trying to model languae history or simulations of language history. Under #artificial_agents, we collect issues regarding this thread.|000|language model, complex adaptive system, organism, language evolution, language history 522|Bradley2007|When it comes to dealing with indels, molecular evolution lags heuristic bioinformatics by decades. Sophisticated alignment algorithms have been widely known since the 1960s (and in bioinformatics since 1970), but we are still struggling to understand the corresponding phylogenetic models. Big ideas drive change: as we dream of reconstructing ancestral geno- types, it is ever clearer that indels cannot be ignored. We need to develop a robust understanding of probabilistic indel analysis and its relationship to alignment.|3258|sequence alignment, sequence comparison, transducer, insertion, deletion, indel 523|Bradley2007|We believe that a suitable foundation for such analysis already exists, where evolutionary models meet automata theory: the framework of finite-state transducers. This framework links Hidden Markov Models (Brown et al., 1993; Churchill, 1992), sequence alignment algorithms (Gotoh, 1982; Miller and Myers, 1988; Needleman and Wunsch, 1970; Smith and Waterman, 1981), finite-state machines and Chomsky grammars (Durbin et al., 1998) and molecular phylogenetics (Miklo ́s et al., 2004; Thorne et al., 1991). In this letter we outline this framework, also describing a preliminary analysis of one recent algorithm— Indelign—for reconstructing ancestral indel histories (Kim and Sinha, 2007).|3258|hidden markov models, transducer, sequence alignment 524|Holmes2001|Motivation: We review proposed syntheses of probabilis- tic sequence alignment, profiling and phylogeny. We de- velop a multiple alignment algorithm for Bayesian infer- ence in the links model proposed by Thorne et al. (1991, J. Mol. Evol., 33, 114–124). The algorithm, described in de- tail in Section 3, samples from and/or maximizes the pos- terior distribution over multiple alignments for any number of DNA or protein sequences, conditioned on a phyloge- netic tree. The individual sampling and maximization steps of the algorithm require no more computational resources than pairwise alignment. Methods: We present a software implementation ( Handel ) of our algorithm and report test results on (i) simulated data sets and (ii) the structurally informed protein align- ments of BAliBASE (Thompson et al., 1999, Nucleic Acids Res., 27, 2682–2690). Results: We find that the mean sum-of-pairs score (a measure of residue-pair correspondence) for the BAliBASE alignments is only 13% lower for Handel than for CLUSTALW (Thompson et al., 1994, Nucleic Acids Res., 22, 4673–4680), despite the relative simplicity of the links model ( CLUSTALW uses affine gap scores and increased penalties for indels in hydrophobic regions). With reference to these benchmarks, we discuss potential improvements to the links model and implications for Bayesian multiple alignment and phylogenetic profiling. Availability: The source code to Handel is freely dis- tributed on the Internet at http://www.biowiki.org/Handel under the terms of the GNU Public License (GPL, 2000, http://www.fsf.org./copyleft/gpl.html). |000|transducer, hidden markov models, multiple sequence alignment, sequence alignment 525|Wiener1987|Linguistic and organic systems are both characterized by descent with modification through time. There is an unbroken stream of ancestor-descendant relationships with the great majority of characters possessed by one generation passed on unchanged to the next. Lines of descent may split, resulting in different streams of descent for the various parental characters. The task of a systematist or historical linguist is to identify, delineate, and classify taxa in such a way taht the subgroupings explain the observed character distributions and yield an understanding of the historical development of the group in question.|217|biological parallels 526|Wiener1987|Languages do not adapt in the way in which organisms can be said to adapt. The success of a language is not dependent on the characters of the language itself, but on the social status of the people who speak it. Thus, new characters spread through a language if they are perceived as being associated with a prestige group. Also, the langauge characters of teh parents do not usually determine the first langauge of their children. The peer group of a child is largely responsible for determining the characters of a child's speech. Thus, if parents are native speakers of German and move to America their children will be native speakers of English.|219|language history, biological parallels 527|Kilbury2011|This is a draft on the use of finitie state transducers in historical linguistics. The paper is basically interesting in so far as it shows the relevenat literature and the ideas behind it. Furthermore, the paper also shows that the actual implementations are very tedious to maintain in their current form, and not very powerful regarding the problem of realization, since many basic problems, such as tokenization of input strings, or a full handling of IPA, are *not* supported. The tool may thus be more powerful in theory, yet, as mentioned in the discussion about #transducers, it is far less powerful in practice.|000|transducer, finite state transducer, historical linguistics, sound change 528|BonTempo2004|This thesis gives an overview on an open source program which can be used to model language change with help of artificial agents. |000|artificial agents, language change, language model, evolutionary model 529|Abramson2004|Perhaps linguists who speak tone languages natively wonder why so many languages are without tones! [...] Thus, one might ask whether the first languages were not all tonal. |17|tone, tonogenesis, sound change 530|Abramson2004|One way or another, it seems to be the dominant view in the literature that tones arose from a toneless state. It is this outlook that led to the widely used term *tonogenesis*, which was apparently coined by James Matisoff (1970, 1973). The concept of tonogenesis has become a broad rubric to cover not only the rise of tones int he first place but also the subsequent splitting of tones into more tones. |17|tone, tonogenesis, sound change 531|Abramson2004|The general physiological and acoustical conditions leading to tonogenesis ought to be the same wherever tonal systems have arisen, even if from area to area and language to language there are differences in detail arising, perhaps, from both internal and external linguistic factors.|17|tone, tonogenesis, sound change 532|Abramson2004|I accept the normal convention among linguists working with East Asian langauges that the phonology of a tone langauge includes a number of pitch levels or contours that, along with segmental phonemes, serve to differentiate morphemes. Each syllable is potentially a tone-bearer, even though the lexicon may contain words with certain toneless syllables. Thus, the child in such a speech community must learn for each new word not only a string of supraglottal gestures with various excitation sources (voicing, turbulence, *etc.*) but also distinctive states and movements of pitch. |17f|tone, tone language 533|Abramson2004|It is truly impressive that using the techniques of comparative linguistics, scholars began linking the rise of tones in Asian language families with certain presumed phonetic properties of initial and final consonants in the ancestral languages. Henri Maspero (1911, 1912) did so at a time when experimental phonetic data bearing on the topic were not yet available. In spite of André-G. Haudricouts's scientific background in botany, it apparently did not occur to him to look into the existing literature on the production and perception of speech. At the time of his paper (@1954) on the rise of tones in Vietnamese some research providing acoustic underpinnings to his arguments had been done (*e.g.*, House & Fairbanks, 1953), but, as far as I know, he never showed awareness of such research in his later writings.|18|tonogenesis, sound change 534|Haudricourt1954|This article describes the origin of tones in Vietnames and was one of the first articles that was published on #tonogenesis in South East Asian languages.|000|sound change, Vietnamese, tonogenesis 535|Abramson2004|The gist of his reasoning [i.e. the arguments of @Haudricourt1954, JML] is as follows. At the beginning of the Common Era Vietnamese was toneless. As show in Table 1, there were three syllable types that ultimately gave rise to tones;l all three had distinctive voicing in initial stop consonants: (1) open syllables, *i.e.*, those ending in a vowel or nasal, (2) syllables formerly checked with a final voiceless spirant that had become /h/, and (3) syllables formerly checked with some kind of stop, symbolized by /X/, that had become glottal stop. .. image:: static/img/abramson.png :name: abramson :width: 500px [...] The emergence of three tones in Vietnames by the end of the sixth century is apparent to Haudricourt. As shown in Table 2, characteristics of the syllable endings given in Table 1 created two contour tones as deviations from an uninflected level, the mid tone. Before disappearing, the final aspiration of the second column of Table 1 caused relaxation of the vocal folds ("le relâchement des cored vocales") and a concomitant lowering of the pitch of the preceding voiced portion of the vowel. On the other hand, before dropping out of the system, the final glottal closure [pb] of the third column of Table 1 led to an increase in tension of the vocal folds ("l'augmentation de tension des cordes vocales") with a resulting rising pitch. With the loss of the old syllable-endings the remaining pitch features took over the role of phonological distinctiveness. For this stage of the language the voicing distinction in initial position is still in place. .. image:: static/img/abramson2.png :name: abramson2 :width: 300px [...]|18f|tonogenesis, Vietnamese 536|Abramson2004|:comment:`Report on findings of speech acoustic reports by House & Fairbanks 1953` They had young adults, native speakers of American English at the University of Illinois, record many utterances of six English vowels with identical initial and final consonants, English /ptkfsbdgvzmn/. Thus the voicing, oral-nasal, stop-fricative, and palce-of-articulation contrasts were represented, with each syllable beginning and endign with the same consonant. The vocalic properties measured were duration, fundamental frequency (*F₀*), and relative power. Of concern to us here is their finding that vowels in a voiceless environment hat a higher *F₀* (109, Figure 2). [...] They point out (106) that a search of the available literature failed to show any effect of syllable-finals on *F₀*; however, in a recent study of Italian (Esposito, 2002) there was no evidence of an effect of syllable-final consonantal voicing on preceding vowels. Also in another paper (Lea, 1973), acoustic evidence of no effect on *F₀* of final segments is presented. :comment:`mentions more and more explicit studies in which no effect of final consonants on vowel pitch was revealed`|19|tonogenesis, speech acoustics, experimental phonetics 537|Abramson2004|The foregoing acoustic findings demonstrated a clear correlation between voicing states of initial consonants and the *F₀* heights of acompanying vowels. At least one piece of research lends credence to the posited effect of syllable-final elements. Fairly early in this program of research into tonogenesis, then, there seemed to be a substantive base for the logic of the historical linguists.|20|tonogenesis, speech acoustics 538|Matisoff1973|If the laryngeal mechanisms we have been considering are really universal, why haven't all human languages been tonal at some point in their history, like chinese, Burmese, or Jinghpaw? Some language families seem more hospitable to the development of tones than others, and the same goes for geographic areas of the world. It is as if the seeds of tone potential required a particularly fertile soil of a certain structural type in order to take root and flourish. In particular, it appears that to become truly tonal a langauge must have a basically **monosyllabic** structure (i.e. the morphemes must be only one syllable long). Polysyllabic languages like Japanese, Swedish, or Serbo-Croatian may develop "pitch-accent" systems, but these differ from true tone-systems in many important respects.|77|monosyllabicity, tonogenesis, sound change, language universals 540|Abramson2004|:comment:`Quotes work by Timothy Light (1978: 119) on tonogenesis, who replies to` @Matisoff1973 :comment:`:` "In this [pb] revised version of tonogenesis, then, tones come about because of a cluster of losses combined with constraints that will not permit these losses to be compensated for on the segmental level. metaphorically, the langauge has 'nowhere to go' but to the suprasegmental level if it is to maintain the order of magnitude of distinctions it has had." |20f|tonogenesis, monosyllabicity 541|Thurgood2002|Our most widely-used model of tonogenesis is Haudricourt’s @1954 classic analysis of Vietnamese tonogenesis. This paper examines Vietnamese evidence and this dominant model of tonogenesis, arguing that the Haudricourt analysis should be updated, replacing its segmentally-driven model by a laryngeally-based model, incorporating the effects of voice-quality distinctions. This proposed model provides phonetically-plausible paths of change, not just for Vietnamese, but also for the widely- attested correlations between initial voicing and pitch height and between voice quality and vowel quality. At the same time, these same laryngeal considerations provide a phonetic motivation for the preference for the development of breathy voice from voiced stop onsets over sonorants and fricatives. Of equal importance, the model appears to provide significant insights into tonogenesis in Southeast Asia, East Asia, South Asia, Africa, Europe, and the Americas, that is, the applicability model is not restricted to any particular geographical area. |000|tonogenesis, sound change 542|Abramson2004|@Thurgood<2002> develops a model based on properties of laryngeal control. One such crucial property is phonation type with voice quality as its principal auditory correlate. Two departures from model (clear or normal) voice of concern here are creaky voice and breathy voice. in some languages these phonation types are grouped with other phonetic properties to form phonologically distinctive register complexes. These complexes may be distinguished by differences in such features as voice quality, pitch, and vowel quality. In his broad overview of both the historical linguistic and the instrumental phonetic literature Thurgood concludes that not voiced sonorants or fricatives normally but voiced obstruents, since only they are likely to impart breath voice to vollowing vowels, bring about lowering of pitch. he beleives that his model is better able to handle tonogenesis not only in Southeast Asia but also elsewhere in the world.|21|tonogenesis, sound change, phonation type 543|Abramson2004|:comment:`Author elaborates on the question of whether syllable-final position finds acoustic support for tonogenesis. According to` @Haudricourt1954, final /h/ lowered the *F₀* of the preceding vowel :comment:`(in Vietnamese)` and final /?/ raised it. Supposedly this happened in other families too. Only one source that I can find (Homber, 1978, 92-95), a substantial one at that, lends direct support to the acoustic and perceptual plausibility of this reconstruction. |22|tonogenesis, sound change, final consonants 544|Abramson2004|The phonetic evidence reviewed here lends credence to the broad lines of the historical linguistics arguments for tonogenesis. The seeds of change are small perturbations of fundamental frequency that are not under the control of the speaker. That is, in times of relative stability these minor pitch shifts are presumably not noticed as such. They are automatic int he context of certain syllable-final or syllable-initial consonantal properties. For example, they are neither produced nor heard as part of the stress pattern of a word or as part of an intonation pattern. [...] Thus, in the "tone-prone" langauge families of concern to us here, the weakening of the consonantal voicing contrast, probably with an intermediate stage of a phonation-type contrast, could well have facilitated the enhancement of previously redundant pitch to tonal status.|26f|tonogenesis, experimental phonetics 545|Abramson2004|A way of understanding this process is through the concept of parsing (Fowler & Brown, 1997). [...] the listener parses out the *F₀* perturbations from the intonational, accentual, or tonal contour of the linguistic expression and uses them only for aid in identifying the voicing state of the consonant. Such a perturbation is ignored, it would seem, in responding to the overall pitch contour of the utterance. It, however, is ripe for advancement to a new phonological function if the language, for one reason or another, begins to lose the consonantal distinction with which the perturbation has been linked and the language is threatened with burdensome homophony.|27|tonogenesis, homonymy, homophony, monosyllabicity 546|Schipke2011|Furthermore, the proportion of innovations in children’s production amounted to a small percentage: 94.80% of all tokens and 89.60% of all types of word formations were con- ventional German lexemes. Innovations already appeared at the first recording and were found throughout the period of observation. Approximately half of the innovations (46.67%) were based on nouns, for example compounds consisting of two nouns as in Automensch ‘carman’ or Murmeltreppe ‘marblestair.’ The other half of the innovations (53.34%) were based on verbs, for example conversions and/or compounds consisting of verbs and adverbs or prepositions as in einklopfen ‘to knock sth. in sth.’ or anleitern ‘to ladder sth. at sth.’|75|word formation, language acquisition 547|Schipke2011|This study investigates the development of German word formation as an important step in mastering complex lexical items for the language learning child. Thirty mother–child dyads participated. Means of word formation and resulting word categories were analyzed in children’s spontaneous speech at ages 1;9, 1;11, 2;6, and 3;0. In contrast to the acquisition of English, the results show simultaneous development of compounds and derivations. German toddlers produce more verbal than nominal derivations and more compounds based on verbs than on nouns. The findings suggest that (1) there are cross-linguistic differences in the development of word formation devices, and (2) children rely heavily on verbs in word formation|000|word formation, language acquisition 548|Mielke2008|This book represents my attempt to Wnd out where distinctive features come from. I should say that it started with a general uneasiness about the innateness of features. Uneasiness by itself is not helpful, so my goal was to take the issues of innateness and universality seriously, and to assemble data that would help to answer some open questions and evaluate some assumptions. It has never been my intention to discredit or disparage the work that has been done in the framework of innate distinctive features, and most of the questions I have intended to address could not even be formulated without it. Innate feature theory has provided a way to talk about interesting generalizations about sound patterns, and I believe emergent features mean reconsidering the speciWc mechanisms behind some of these generalizations (history or physiology vs. Universal Grammar), but do not undermine the generalizations themselves. In most cases, evaluating innate feature proposals is not a matter of right vs. wrong but a matter of literal vs. metaphorical, and Iunderstand that for a lot of researchers, it was metaphorical all along. I believe that reinterpreting features as emergent allows feature theory to be better equipped to deal with language, and in this book I have made an eVort to illustrate how the insights of innate feature theory can be retained and how new insights are made possible when features are treated as emergent.|000|distinctive features, sound classes 549|Kemler2003|Two studies investigated whether four-year-old children (12 in Exper- iment 1 with a mean age of 4; 8 and 36 in Experiment 2 with a mean age of 4; 7) invent names for new artifacts based on the objects’ functions as opposed to their perceptual properties. Children informed about the intended functions of novel objects provided more name innovations that were clearly function-based than perception-based. This tendency was observed when children were shown the objects’ functions, even if they were also given verbal descriptions of the objects’ perceptual properties and parts. Only when ignorant of the objects’ intended functions did children tend to use perceptual features to create substan- tial numbers of names. Accordingly, results from this name-innovation methodology converge with findings from some recent studies of lexical categorization suggesting that functional information is critical to how preschoolers extend artifact names. Children appear to appreciate an intimate relation between the functions of artifacts and how they are named.|000|name inventions, denotation 550|Sloman1996|Distinctions have been proposed between systems of reasoning for centuries. This article distills properties shared by many of these distinctions and characterizes the resulting systems in light of recent findings and theoretical developments. One system is associative because its computations reflect similartiy structure and relations of temporal contiguity. The other is "rule based" because it operates on symbolic structures that have logical content and variables and because its computations have the properties that are normally assigned to rules. The systems serve complementary functions and can simultaneously generate different solutions to a reasoning problem. The rule-based system can suppress the associative system but not completely inhibit it. The article reviews evidence in favor of the distinction and its characterization.|000|modes of reasoning, categorization, cognition 551|Tverksy1984|Concepts may be organized into taxonomies varying in inclusiveness or abstraction, such as furniture, table, card table or animal, bird, robin. For taxonomies of common objects and organisms, the basic level, the level of table and bird, has been determined to be most informative (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). Psychology, linguistics, and anthropology have produced a variety of measures of perception, behavior, and communication that converge on the basic level. Here, we present data showing that the basic level differs qualitatively from other levels in taxonomies of objects and of living things and present an explanation for why so many measures converge at that level. We have found that part terms proliferate in subjects' listings of attributes characterizing category members at the basic level, but are rarely listed at a general level. At a more specific level, fewer parts are listed, though more are judged to be true. Basic level objects are distinguished from one another by parts, but members of subordinate categories share parts and differ from one another on other attributes. Informants agree on the parts of objects, and also on relative "goodness" of the various parts. Perceptual salience and functional significance both appear to contribute to perceived part goodness. Names of parts frequently enjoy a duality not evident in names of other attributes; they refer at once to a particular appearance and to a particular function. We propose that part configuration underlies the various empirical operations of perception, behavior, and communication that converge at the basic level. Part configuration underlies the perceptual measures because it determines the shapes of objects to a large degree. Parts underlie the behavioral tasks because most of our behaviors is indirect toward parts of objects. Labeling appears to follow the natural breaks of perception and behavior; consequently, part configuration also underlies communication measures. Because elements of more abstract taxonomies, such as scenes and events, can also be decomposed into parts, this analysis provides a bridge to organization in other domains of knowledge. Knowledge organization by parts (partonomy) is contrasted to organization by kinds (taxonomy). Taxonomies serve to organize numerous classes of entities and to allow inference from larger sets to sets included in them. Partonomies serve to separate entities into their structural components and to organize knowledge of function by components of structure. The informativeness of the basic level may originate from the availability of inference from structure to function at that level.|000|taxonomy, partonomy, knowledge organization, cognition 552|Tversky1986|Geometric models impose an upper bound on the number of points that can share the same nearest neighbor. A much more restrictive bound is implied by the assumption that the data points represent a sample from some continuous distribution in a multidimensional Euclidean space. Analysis of 100 data sets showed that most perceptual data satisfy the geometric-statistical bound, whereas many conceptual data sets exceed it. The most striking discrepancies between the data and their multidimensional representations arise in semantic fields when the stimulus set includes a focal element (e.g., a superordinate category) that is the nearest neighbor of many of its instances. Theoretical and methodological implications of nearest neighbor analysis are discussed.|000|conceptualization, cognition, categorization 553|Ungerer1994|If we try to explain word meanings in terms of underlying mental concepts, we find that concepts like 'car', 'table' and 'dog' have a crucial function in identifying and classifying the objects and organismns of the world around us, and may therefore be said to represent ab asic level of categorization. Starting from this premise, the paper postulates that basic level concepts can also be assumed for actions, states, physical properties and locations, a view supported by previous research on colours, cshapes and prepositional concepts. Compared with 'car', 'table' and 'dog', superordinate concepts like 'vehicle', 'furniture' and 'animal' have a different structure, which highlights one or several salient attributes ('moving people around' int he case of 'vehicle'). Additional atttributes [sic!] are drawn from related basic level concepts ('car', 'bus', 'bicycle') if necessary, which measn that they are available via 'parasitic' categorization, and this has important implications for our understanding of lexical hierarchies. Emition concepts and concepts of other mental phenomena have yet another structure in which metonymies and mataphors lay an important part. It is claimed that these relationships can also be understood in terms of parasitic categorization, and that lexical concepts in general can be integrated into a centric or centripetal view of conceptual taxonomies.|000|cognition, conceptualization, categorization, semantic hierarchies 554|Ungerer1994|:comment:`Authors summarize the notion of "basic level concepts":` - basc level concepts are commonly expressed by words which have a simple morphological form, which first come to mind and which are first learned by children (such as *dog*); - basic level concepts have a large number of attributes which are shared by category members (e.g. by various types of dog) and are at the same time distinct from attributes of other basic level concepts (e.g. 'elephant' and 'bird'); - basic level concept srefer to objects and organisms which stimulate typical actions or motor movements (charis are connected with the action of sitting down, etc.); - members of basic level concepts (e.g. all kinds of dogs) have a characteristic overall shape that is readily identified as such; this seems to favour holistic (or gestalt) perception; - last but not least: basic level concepts are the prime examples of the prototype-cum-periphery structure, i.e. they comprise good and bad examples, with the prototype functioning as model for the categorization of the otehr category members. |149|basic level concepts, basic vocabulary 555|Ungerer1994|:comment:`discussion of superordinate concepts:` First and foremost, superordinate concepts to not have a prototype which may serve as a model for lesser category members. Even if in Rosch's experiments (Rosch/Mervis 1975L 579), informants rated peas as best examples of vegetables and cars as best examples of vehicles, the concept 'pea' could not serve as a model for all other kinds of vegetables, nor could the concept 'car' be used as a model for a bicycle. [...] In sum, in order to categorize the concept of vegetable or vehicle, we find all sorts of ways of reverting to basic level concepts, and this is what I propose to call 'parasitic categorization'. |150|superordinate concepts, conceptualization 556|Ungerer1994|To summarize the discussion so far, it seems to have suggested two things: first, conceptual structure may indeed vary and deviate from the cognitive standard model of a rich category structure; second, since basic level concepts hold a central position in the categorization of the world, other concepts, whether logically superordinate or subordinate, depend on them for their conceptual structure; this is what I have called parasitic categorization.|150|categorization, basic level concepts, parasitic categorization 557|Ungerer1994|On an applied level, verbal meanings can be divided up into dynamic and stative meanings. If one first looks for dynamic verbs which may meet our criteria for basic level concepts (prototype & periphery, numerous specific attributes, possibility of gestalt perception), one will intuitively name veryday vers such as *eat*, *drink*, *walk*, *work*, and even the linguistically notorious verb *kill*.|151|basic vocabulary, basic level concepts, verb 558|Ungerer1994|The text introduces some elaboration of the idea of "basic level concepts". What is interesting in this context is that these basic level concepts seem (on the first sight) to come quite close to basic vocabulary concepts. So when working in the context of basic vocabulary, it might be worthwile to read this text again and carry out some closer comparison of the actual concepts named by Ungerer.|000|basic level concepts, basic vocabulary, conceptualization, cognition, categorization 559|Ungerer1994|1. The centrality of basic level concepts applies not only to nominal, but to non-nominal concepts such as actions, physical properties, locations and evidential states as well. It is supported by the notion of parasitic categorization, by which various other types of concepts are related to basic level concepts. The result is a centric or even centripetal conceptual system, which can be regarded as an alternative to 'logical' taxonomies based on class inclusion. 2. Apart from the cognitive 'standard model' of category structure (prototype and periphery, a large number of attributes, gestalt perception), which applies to concrete basic level concepts, there seems to exist a number of other category structures, of which sali[pb]ient attribute concepts (the superordinates), 'undifferentieated gestalt' concepts (the subordinates), metaphorical concepts and abstract concepts have been provisionally identified. 3. Basic level concepts seem to be linked to major word classes, yet these distinctions are overridden on the other levels, especially the metaphorical and abstract levels. One of the reasons is that parasitic categorization cuts across word class distinctions which apply on the basic level.|159f|basic level concepts, parasitic categorization 560|Rooij2006|Most work in ‘evolutionary linguistics’ seeks to motivate the emergence of linguistic universals. Although the search for universals never played a major role in semantics, a number of such universals have been pro- posed concerning connectives, property and preposition denoting ex- pressions, and quantifiers. In this paper we suggest some evolutionary motivations for these proposed universals using game theory.|000|semantic universals, conceptualization, cognition 561|King1967|The idea that functional load offers a tool of potentially great explanatory power in diachronic linguistics is shared by a number of contemporary linguists, particularly those influenced at first or second hand by Prague. It is the purpose of the present paper to investigate the hypothesis that functional load plays a significant role in sound change. I will attempt to demonstrate that functional load, if it is a factor in sound change at all, is one of the least important of those we know anything about, and that it is best disregarded in discussions centering on the cause and direction of phonological change.|000|sound change, functional load, explanation of sound change 562|King1967|Author gives a summary on different notions on functional load in the literature and tests their suitability to account for actual cases of sound change by computing functional load from empirical data, using a specific formula for the quantification of #functional_load and comparing the actual functional load of merged elements in Germanic languages.|000|functional load, sound change, explanation of sound change 563|King1967|The term FUNCTIONAL LOAD is customarily used in linguistics to describe the extent and degree of contrast between linguistic units, usually phonemes. In its simplest expression, functional load is a measure of the number of minimal pairs which can be found for a given opposition. More generally, in phonology, it is a measure of the work which two phonemes (or a distinctive feature) do in keeping utterances apart-in other words, a gauge of the frequency with which two phonemes contrast in all possible environments.|831|functional load, sound change 564|King1967|One of the most important of these devices was the avoidance of homonymy (Gillieron 1918:14). His concept was not, of course, the same thing as functional load, but it was close. The crucial point is that sound change in Gilli6ron's view was restrained (though to an unstated and indefinite degree) by the communicative function of language.|832|homonymy, sound change, functional load 565|King1967|Credit is usually given to the founder of the Prague Circle, Vil6m Mathesius, for being the first to publish on functional load (Vachek 1966:65). Actually, Mathesius did not use the term in either of his two early articles concerned with the degree of utilization of phonemic oppositions (1929, 1931), using instead circumlocutions such as 'valeur fonctionnelle', 'l'utilisation fonctionnelle', and 'Grad der Ausniitzung von phonologischen Einheiten'. It is, however, clear that he was dealing with the concept of functional load, even if he did not use the specific French or German term. Aside from the minor question of what terms Mathesius did or did not use, it is of more importance in the present paper to observe that he regarded functional load as a purely descriptive device-as one part of a complete phonological description of a language along with the roster of phonemes, phonemic variants, distinctive features, and the rest (Mathesius 1931:148).|832|functional load, sound change 566|King1967|Thus, although functional load is felt, by most linguists acquainted with the term, to be characteristic of the Prague Circle approach to phonology, the truth of the matter is that, in the task of shaping functional load into a viable tool of either synchronic or diachronic phonology, Prague did not get much beyond Gillieron's concept of the clash of homonyms. And, after all, one did not have to be a member of the Prague Circle to suggest that Gillieron's ideas might be fruitfully incorporated into historical discussions based on the emerging concept of the phoneme.|833|functional load, sound change 567|King1967|Given Martinet's initial hypothesis that sound changes do not operate independently of communicative needs, it is but a short step to his conjecture about functional load: '... toutes choses égales d'ailleurs, une opposition phonologique qui sert A maintenir distincts des centaines de mots parmi les plus frequents et les plus utiles n'opposera-t-elle pas une résistance plus efficace a l'élimination que celle qui ne rend de service que dans un tres petit nombre de cas?' (1955:54). Thus, the requirement that communicative needs must be satisfied should help prevent the merger of two phonemes whose opposition bears a high functional load, while such a therapeutic factor need not assert itself to prevent the eradica- tion of an opposition whose functional load is very small.|834|sound change, explanation of sound change, functional load 568|King1967|THE WEAK POINT HYPOTHESIS states that, if all else is equal, sound change is more likely to start within oppositions bearing low functional loads than within [pb] oppositions bearing high functional loads; or, in the case of a single phoneme, a phoneme of low frequency of occurrence is more likely to be affected by sound change than is a high-frequency phoneme. THE LEAST RESISTANCE HYPOTHESIS states that, if all else is equal, and if (for whatever reason) there is a tendency for a phoneme x to merge with either of the two phonemes y or z, then that merger will occur for which the functional load of the merged opposition is smaller THE FREQUENCY HYPOTHESIS states that, if an opposition x :math:`\neq` y is destroyed by merger, then that phoneme will disappear in the merger for which the relative frequency of occurrence is smaller: i.e. x > y if the relative frequency of x is smaller than that of y, and y > x if the relative frequency of y is smaller than that of x.|834f|sound change, functional load 569|King1967|It should be clear from the discussion which accompanied the presentation of data that none of the hypotheses stated in §1.2 stands up very well in confrontation with empirical findings from the Ger- manic languages studied here.|847|functional load, sound change 570|Surendran2006|Paper gives an overview on different attempts in the literature to quantify the concept of #functional_load. They show how it can be computed, and which difficulties computation has to face.|000|functional load, sound change 571|Surendran2006|Perhaps the most common definition of FL(x,y) — the functional load of the x–y opposition — is the number of minimal word pairs that are distin- guished solely by the opposition. The major flaw with this definition is that it ignores word frequency. Besides, it is not generalizable to a form that takes into account syllable and word structure, or suprasegmentals. We shall say no more about it.|46|functional load 572|Surendran2006|The definition was based on the information theoretic methods intro- duced by Shannon (1951), and assumes that language is a sequence of pho- nemes whose entropy can be computed. This sequence is infinite, representing all possible utterances in the language. We can associate with a language/se- quence L a positive real number H(L) representing how much information L transmits. Suppose x and y are phonemes in L. If they cannot be distinguished, then each occurrence of x or y in L can be replaced by the occurrence of a new (archi-)phoneme to get a new language L xy . Then the functional load of the x–y opposition is as in (1): :math:`FL(x,y) = \frac{H(L)-H(L_{xy})}{H(L)}` This can be interpreted as the fraction of information lost by L when the x–y opposition is lost. :comment:`Note that the authors further show that this formula is not feasible for computation!`|46|functional load 573|Keller1990|Sprachwandel ist bis auf wenige Ausnahmen ein unbeabsichtigter, unreflektierter "Nebeneffekt" kommunikativen Handelns und ein Sonderfall soziokultureller Evolution. Eine zentrale These des Buches lauted daher: Sogenannte natürliche Sprachen sind weder Naturphänomene noch Artefakte. Als \`\`Phänomene der dritten Art" stellen sie spontane Ordnungen dar. Der adäquate Erklärungsmodus führt über die Frage: Wieso erzeugen wir durch unser Kommunizieren Sprachwandel, und welches sind seine Mechanismen? Die hier vorgestellte evolutionäre These des Sprachwandels wird wissenschaftshistorisch und systematisch entwickelt, an Beispielen erläutert und mit konkurrierenden Konzepten verglichen. |000|language change, explanation of language change, invisible hand 574|Keller1990|:comment:`Zusammenfassung zu Poppers Welt III im Vergleich zur invisible hand Auffassung vertreten in dem Buch Kellers. Grundthese ist, dass Poppers Welt III und Sprache als Phänomen dritter Art sich nicht direkt decken. Zur Illustration gibt Keller eine Tabelle auf Seite 172, wo dies erläuternd zusammengefasst wird.` .. raw:: html
Welt 1Welt 2 Welt 3
Naturphänomene + +-
Artefakte+++
Phänomene der 3. Art +-+
gg|164-174|Popper's World III, language change, language model, invisible hand 575|BermudezOtero2007|One of the basic challenges for diachronic phonology is the problem of innovation: how does a phonological variant that has never existed previously in a speech community first come into being? Here, it is commonly agreed that the potential for innovation leading to sound change arises whenever speaker and listener fail to solve the coordination problem 2 posed by speech: the speaker must produce a phonetic stimulus that enables the listener to recover the intended phonological representation; the listener must decide which properties of the incoming stimulus are intended by the speaker as signal, and which properties are accidental noise; neither participant can read the other’s mind. The innovation mechanisms proposed by @Ohala<1989>, *hypocorrection* and *hypercorrection*, both involve failures of coordination: the listener does not parse the stimulus in the way that the speaker intended (see e.g. Ohala 1989, Alderete & Frisch 16.3, Kingston 17.3.3).|497|sound change 576|BermudezOtero2007|In a pretheoretical sense, all phonological change is gradual: developments such as the raising of :sampa:`/A:/` to :sampa:`/O:/` in southern dialects of Middle English – and, a fortiori, large-scale upheavals like the Great Vowel Shift – do not take place overnight. However, this obvious fact does not imply that phonological change advances gradually in all dimensions. One must first distinguish between graduality in implementation and graduality in propagation.|498|sound change, sound change mechanisms 577|BermudezOtero2007|A change is said to be phonetically gradual – or gradient – if it involves a continuous shift along one or more dimensions in phonetic space, such as the frequency of the first formant of a vowel as measured in hertz. In contrast, a change is phonetically abrupt – or categorical – if it involves the substitution of one discrete phonological category for another: e.g. replacing the feature [-high] with [+high] (see Harris 6.2.1). Deciding whether the pattern created by a change is gradient or categorical often requires careful instrumental analysis, as well as a global understanding of the phonology–phonetics interface in the language in question (Myers 2000).|499|sound change, sound change mechanisms 578|BermudezOtero2007|Indeed, laboratory research has in recent times redressed the balance between gradient and categorical rules in phonology (21.3.1). Languages have been shown to vary with respect to the phonetic realization of phonological categories down to the finest detail: for example, contrary to the assumptions of SPE (Chomsky & Halle 1968: 295), patterns of coarticulation are not mechanical and universal, but cognitive and acquired (Keating 1988c:287–288; Pierrehumbert et al. 2000: 285–286). In addition, many phenomena previously thought to be categorical have proved to be gradient (Myers 2000: 257).|499|sound change, sound change mechanisms 579|BermudezOtero2007|As a first approximation, lexically abrupt – or *regular* – implementation can be defined as follows: a change is regular if it applies at the same time to all words that are identical with respect to the relevant phonological, morphological, and syntactic conditions. In contrast, a change is lexically gradual – or *diffusing* – if it affects certain words earlier than others with an equivalent phonological and morphosyntactic makeup, i.e. if lexical identity plays an irreducible rôle in controlling the advance of the change. When applying this definition in practice, one must take account of sociolinguistic variation: lexical diffusion can manifest itself through a difference in the relative frequency with which two words display the innovative variant, as long as this difference is not determined by phonological, morphological, or syntactic conditions, or by sociolinguistic factors (e.g. sex, age, social status, style, register, etc.). Accordingly, establishing whether a particular change is regular or diffusing often requires large data-sets and powerful statistical methods (e.g. Labov 1994:Ch.16).|500|sound change, lexical diffusion, regular sound change, 580|BermudezOtero2007|When two initially homophonous words cease to be phonologically identical by undergoing different processes of change, we have strong evidence for lexical diffusion (@Chen<1972> 1972: Sec. 6).|499|sound change, lexical diffusion 581|Chen1972|Students of historical linguistics have generally underestimated the diag nostic significance of residual forms of sound changes which are, by and large, regular. Linguists of the Neogrammarian persuasion understandably tend to see only regular correspondences and overlook irregularities. Some linguists show remarkable tolerance level to exceptions to sound laws either because they cherish the glib optimism that somehow some day a rule will be discovered to account for these exceptions, or because they can always relegate these irregularities to marginal developments such as borrowing and analogical leveling. On the other hand, for linguists who do not share the Neogrammarian view, residues of regular sound changes simply do not pose a problem. As a result, while irregularities are occasionally felt as annoying necessary evil, they have never been given more than passing attention. Without subscribing to the theoretical and empirical arguments underlying the Neogrammarian doctrine, we can adopt the regularity hypothesis as a working assumption. On this premise, a high tolerance level to irregularities is incompatible with the accountability principle which has characterized the methodology and style of modern linguistics. We want to seek an explanation for irregular changes not only because exceptions are embarrassingly nu merous and because the Neogrammarian optimism is unfounded but, more importantly, because the inability to account for residual forms is diag nostic of a fundamental inadequacy in a model of historical phonology in which the time dimension of sound change is totally absent.|457|sound change, irregularity of sound change, regular sound change, lexical diffusion 582|Chen1972|When a phonological innovation enters a language it begins as a minor rule, affecting a small number of words [...]. As the phonological innovation gradually spreads across the lexicon, however, there comes a point when the minor rule gathers momentum and begins to serve as a basis for extrapolation. At this critical cross-over point, the minor rule becomes a major rule, and we would expect diffusion to be much more rapid. The change may, however, reach a second point of inflection and eventually taper off before it completes its course, leaving behind a handful of words unaltered.|474|sound change, lexical diffusion, sound change mechanisms 583|BermudezOtero2007|In the 1980s and 1990s, the work of William Labov and Paul Kiparsky brought about a convergence of empirical results and theoretical perspectives on the implementation problem. Labov’s (@1981, @1994) empirical findings confirmed the existence of two long-recognized mechanisms of phonological change: *Neogrammarian change* (Osthoff & Brugmann @1878) and *classical lexical diffusion* (@Wang<1969> 1969). .. raw:: html
Dimensions
*Phonetic**Lexical*
Modes*Neogrammarian sound change*gradualabrupt
*Classical lexical diffusion*abruptgradual
Kiparsky (@1988, 1995) then showed how the existence of these two modes of implementation follows from the architecture of grammar in generative theory, particularly in Lexical Phonology. Lately, however, the received view has come under challenge, as the claim that all phonological change is both lexically and phonetically gradual (@2000, 2001) gains increasing currency. The ensuing debate bears directly on a central issue in phono- logical theory: whether or not lexical representations contain gradient phonetic detail.|501|sound change, lexical diffusion, regular sound change, Neogrammarian sound change 584|Wang1969|Phonological change may be implemented in a manner that is phonetically abrupt but lexically gradual. As the change diffuses across the lexicon, it may not reach all the morphemes to which it is applicable. If there is another change competing for part of the lexicon, residue may result. Several fundamental issues in the theory of phonological change are raised and discussed.|000|sound change, lexical diffusion, Neogrammarian sound change 585|Labov1981|Recent investigations of the history of Chinese have given new support to the view that sound change diffuses gradually across the lexicon. Yet instrumental studies of sound change in progress support the Neogrammarian position that change affects all words that include the sound according to their phonetic environment. The paradox can be resolved by distinguishing abstract phonological change from change in low-level output rules. Both types of rules can be observed in recent studies of sound change in progress in Philadelphia: the lexical split of short a shows lexical diffusion in progress, while raising, lowering, fronting, and backing rules show Neogrammarian regularity. A review of the literature on completed changes and other changes in progress tends to support the relevance of a hierarchy of abstractness in determining the nature of the transition from one stage to the other|000|sound change, Neogrammarian sound change, lexical diffusion 586|Osthoff1878|Aller lautwandel, soweit er mechanisch vor sich geht, vollzieht sich nach ausnahmslosen gesetzen, d.h. die richtung der lautbewegung ist bei allen angehörigen einer sprachgenossenschaft, ausser dem Fall, dass dialektspaltung eintritt, stets dieselbe, und alle wörter, in denen der der lautbewegung unterworfene laut unter gleichen verhältnissen erscheint, werden ohne ausnahme von der änderung ergriffen. :translation:`All sound change, as long as it proceeds mechanically, follows exceptionless laws, i.e., the direction of the sound shift is the same with all members of a language community except from those cases in which the dialect split occurs, and all words in which the sound occurs in the same context are trnasformed without exception.`|XIII|sound change, regular sound change, Neogrammarian sound change, regularity hypothesis 587|Bybee2002|The literature on frequency effects in lexical diffusion shows that even phonetically gradual changes that in some cases are destined to be lexically regular show lexical diffusion while they are in progress. Change that is both phonetically and lexically gradual presents a serious challenge to theories with phonemic underlying forms. An alternate exemplar model that can account for lexical variation in phonetic detail is outlined here. This model predicts that the frequency with which words are used in the contexts for change will affect how readily the word undergoes a change in progress. This prediction is tested on data from 0t, d0 deletion in American English. Finally, the effect of bound morphemes on the diffusion of a sound change is examined. The data suggest that instances of a bound morpheme can affect the rate of change for that morpheme overall.|000|lexical diffusion, sound change, regular sound change, Neogrammarian sound change 588|Bybee2002|The hypothesis that sound change is lexically regular seems well supported by the facts of change. When we observe that two languages or dialects exhibit a phonological difference, it is very likely that this difference is regular across all the words that have the appropriate phonetic environment. This observation is fundamental to the comparative method; the establishment of genetic relations and the reconstruction of protolanguages are based on the premise that sound change affects all words equally. Schuchardt (1885) was one of the detractors from this position. When he observed sound change in progress, he noted that all words did not change at the same rate, and that the differences were not due to “dialect mixture,” as was often claimed by the Neogrammarians, who supported the regularity position.|262|sound change, linguistic reconstruction, Neogrammarian sound change, regular sound change, genetic classification 589|Bybee2002|A major challenge to the regularity position in the 20th century is expressed in the work of Wang and his colleagues (Wang, 1969, 1977; Wang & Cheng, 1977), which documented changes that seem to occur word by word over a long period of time. While some of these changes result in lexical regularity, Wang and his colleagues also identified changes that appear to be arrested after affecting only part of the lexicon.|262|sound change, regular sound change, lexical diffusion 590|Bybee2002|Labov (@1981, @1994) also dealt with the issue, availing himself of the data from his numerous studies of sound change in progress. He proposed two types of sound change: “regular sound change” is gradual, phonetically motivated, without lexical or grammatical conditioning, and not influenced by social awareness, whereas “lexical diffusion” change, such as the phenomena studied by Wang, is “the result of the abrupt substitution of one phoneme for another in words that contain that phoneme” (@Labov<1994>, 1994:542).|262|sound change, Neogrammarian sound change, lexical diffusion, sound change mechanisms 591|Bybee2002|Hooper (1976) identified a lexical diffusion paradox. Reductive sound change tends to affect high frequency words before low-frequency words, but analogical leveling or regularization tends to affect low-frequency words before high- frequency words.|263|lexical diffusion 592|Bybee2002|Sound changes that are complete can be identified as regular or not, depending upon whether they affected all lexical items existing at the time of the change. Ongoing changes cannot be designated as regular or not, since they are not complete. However, one can reference the typical characteristics of a change to project whether it will be regular or not. That is, a phonetically gradual change with clear phonetic conditioning falls into Labov’s first type, and thus we can project its regularity. Here I document lexical diffusion from high-frequency to low-frequency words in ongoing changes that can be expected to be regular, as well as in certain reductive changes that may never be complete because of the nature of the lexicon.|263|lexical diffusion, Neogrammarian sound change, regular sound change 593|Bybee2002|:comment:`Author elaborates on t/d deletion in American English and quotes own literature and literature of others which apparently confirms that t/d-deletion occurs much more frequently in high-frequency words.`|263-265|sound change, lexical diffusion 594|Bybee2002|:comment:`The data mentioned before along with some hints on laboratory-experiments suggests that t-d-deletion occurs phonetically gradually, with varying duration of t and d throughout the words. Based on this observation, the author concludes:` The data on this obstruent deletion process, then, suggests both lexical and phonetic gradualness. It thus cannot be said that obstruent deletion is the abrupt deletion of a phoneme. In fact, these data are problematic for any version of phonemic theory. A model that can accommodate these data is presented later in the article.|265|sound change, regular sound change, lexical diffusion 595|Bybee2002|One might therefore predict that, in general, reductive changes tend to occur earlier and to a greater extent in words and phrases of high frequency.|268|sound change, reduction, deletion 596|Mowrey1995|Although directionality in internal phonetic evolution has been noted by many historical linguistics, current models of change have been unsuccessful in incorporating this property directly into their theoretical structures. This paper explores some of the issues standing in the way of such an enterprise, including the definition of which kinds of chagnes must be accounted for and the selection of descriptive primitives. A general theory of unidirectional reductive articulatory evolution is advanced which unites the shallow diachronic processes found in casual speech with longer-term processes common in the historical record.|000|deletion, reduction, sound change, articulation 597|Mowrey1995|Is sound change a phenomenon *sui generis*, distinct from such things as borrowing and analogy, and thus deserving of separate theoretical and analytical treatment? Is it essentially a physical (phonetic) or psychological (phonological) phenomenon? If phonetic in nature, is its source to be sought in production or perception? Are sound changes phonetically abrupt or gradual? Are they lexically abrupt or gradual? What is the relation between synchronic variation and change?|37|sound change 598|Hockett1965|:comment:`After quoting Sir William Jones and the like, naming this "genetic hypothesis" the first important hypothesis of historical linguistics, the author goes on to naming the second important hypothesis, the "regularity hypothesis":` **The regularity hypothesis.** The second breakthrough was achieved in the 1870's, in the emergence of what I shall call the REGULARITY hypothesis.|186|regularity hypothesis, sound change 599|Hockett1965|:comment:`Summary on Brugmann and Osthoff and their Neogrammarian Manifesto:` Brugmann and Osthoff founded a journal, Morphologische Untersuchungen, as an outlet for their own papers; the first issue appeared in 1878 and included an introduction signed by both but written by Brugmatm. In this, he accepted the originally humorous epithet Junggrammatiker and used it as a rallying cry 'summoning students of compara- tive linguistics "forth from the hypothesis-laden atmosphere where Indo-Euro- pean prototypes are fabricated ... into the clear air of tangible actuality of the present" ', and odiously naming himself and a few of his closest colleagues as the sole possessors of The Truth.|187|Neogrammarian Manifesto 600|Hockett1965|I regard the regularity hypothesis as the second great breakthrough of our science, and I think of it as an achievement largely of the 1870's; but I do not identify it with any single neogrammarian attempt to formulate it. Perhaps Verner's almost plaintive remark already quoted, 'there must be a rule for ir- regularity; the problem is to find it', comes closest-if we may be allowed to in- terpret as plaintive an expression that Verner may not have intended so. More important is the sequence of remarkable discoveries in Indo-European, made during the decade or so in question in part by the Junggrammatiker and in part by their opponents; the working assumptions ACTUALLY USED as these discoveries were made, rather than any contemporary expression of them, are the true break- through.|188|regularity hypothesis, sound change 601|Hockett1965|Let me observe, not completely as an aside, that genetic relatedness is reflexive, symmetric, and transitive. We take the reflexivity so much for granted that you may have been surprised when I spoke, a moment ago, of comparing a language with itself.|189|transitivity, reflexivity, symmetricity, genetic relationship 602|Hockett1965|Nor was there any great fuss about a basic triad, the familiar BORROWING, ANALOGY, and SOUND CHANGE. Whether each of these was to be interpreted as a KIND of change, a CAUSE of change, or a MECHANISM of change is obscure; ap- parently the scholars of that time had not the habit of making distinctions of this sort. Even so, this classification afforded some answer to another question left open by the genetic hypothesis, which, you will recall, acknowledged the fact of linguistic change but had nothing much to say about either how or why language changes. The threefold classification was to some extent an answer to the how. Everyone agreed that forms may pass from one language or dialect to another. Everyone acknowledged instances of analogic reshaping. And everyone felt it fitting and proper, in setting forth the development of a daughter language or later stage from a parent language or earlier stage,|190|mechanism of language change, language change, analogy, borrowing, sound change 603|Hockett1965|There is a mechanism of linguistic change, hereafter to be called SOUND CHANGE, not to be confused with and not reducible to analogy or borrowing; its effects are not necessarily reflected in every sound shift that has been reliably discovered and reported, but without it there would be no discernible underlying regularity in linguistic change and the com- parative method would yield no results.|191|regularity hypothesis, sound change, sound change mechanisms 604|Jespersen1922|**Gradual Shiftings** We shall do well to put aside such artificial theories and look soberly at the facts. When some sounds in one century go one way, and in another, another, while at times they remain long unchanged, it all rests on this, that for human habits of this sort there is no standard measure. Set a man to saw a hundred logs, measuring No. 2 by No. 1, No. 3 by No. 2, and so on, and you will see considerable deviations from the original measure perhaps all going in the same direction, so that No. 100 is very much longer than No. 1 as the result of the sum of a great many small deviations perhaps all going in the opposite direction but it is also possible that in a certain series he was inclined te make the logs too long, and in the next series too short, the two sets of deviations about balancing one another. It is much the same with the formation of speech sounds at one moment, for some reason or other, in a particular mood, in order to lend authority or distinction to our words, we may happen to lower the jaw a little more, or to thrust the tongue a little more forward than usual, or inversely, under the influence of fatigue or laziness, or to sneer at someone else, or because we have a cigar or potato in our mouth, the movements of the jaw or of the tongue may fall short of what they usually are. We have all the while a sort of conception of an average pronunciation, of a normal degree of opening or of protrusion, which we aim at, but it is nothing very fixed, and the only measure at our disposal is that we are or are not understood. What is understood what does not meet this requirement must be repeated with greater correctness as an answer to 'I beg your pardon?' Everyone thinks that he talks to-day just as he did yesterday, and, of course, he does so in nearly every point. But no one knows if he pronounces his mother-tongue in every respect in the same manner as he did twenty years ago. May we not suppose that what happens with faces happens here also? One lives with a friend day in and day out, and he appears to be just what he was years ago, but someone who returns home after a long absence is at once struck by the changes which have gradually accumulated in the interval. Changes in the sounds of a language are not, indeed, so rapid as those in the appearance of an individual, for the simple reason that it is not enough for one man to alter his pronunciation, [pb] many must co-operate: the social nature and social aim of language has the natural consequence that all must combine in the same movement, or else one neutralizes the changes introduced each individual also is continually under the influence of his fellows, and involuntarily fashions his pronunciation according to the impression he is constantly receiving of other people's sounds. But as regards those little gradual shiftings of sounds which take place in spite of all this control and its con- servative influence, changes in which the sound and the articulation alter simultaneously, I cannot see that the transmission of the language to a new generation need exert any essential influence: we may imagine them being brought about equally well in a society which for hundreds of years consisted of the same adults who never died and had no issue. |166f|sound change 605|Hockett1965|The quintessence of the quantization hypothesis can be stated as follows. In any speech community, only certain DIFFERENCES of speech sound are functional. This breaks the continuous multidimensional space of all possible speech sound into a finite number of regions; in at least some environments it matters, to hearer and thus to speaker, whether a given sound falls into one region or the next, but does not matter where it falls within a single region.21 Successive articulations aimed into the same region show a scatter, clustered around a LOCAL FREQUENCY MAXIMUM. The local frequency maxima are, or create, the 'sort of conception of an average pronunciation' of which Jespersen spoke. The frequency maxima, [pb] then, are the neat discrete functioning units of the phonological system of the language.|194|phonological principle 606|Hockett1965|:comment:`There is a very boring passage before, but based on the idea that a phoneme is just the sum of its allophonic pronunciations in a given speech variety, Hocket concludes in the end that sound change just is possible because of this synchronic variation (more or less):` Sound change is not reducible to analogy: the latter leads only to the replacement of one array of distinctive features by another as the 'realization' of some lexical or grammatical form. Sound change is affected by certain kinds of borrowing, in that so-called 'fashions of pronunciation', as in imitation of a prestigious model, can alter the density distribution; but it is not REDUCIBLE to borrowing because the density distribution is largely altered by innumerable tiny imprecisions of pronunciation and by constant channel noise-the kind of thing Jespersen was talking about, but mostly on a much finer-grained scale-that take place totally out of awareness.|202|sound change, explanation of sound change 607|Hockett1965|There remains to ask WHY sound change should go on. We must be careful that our answer is not more complicated and obscure than the question. I shall argue in general biological and physical terms, not from human psychology; indeed, what I am about to say has really already been said in the preceding discussion. I shall argue, first, that there is nothing in the design or use of language to PREVENT sound change or, within wide limits, its structural consequences, and second, that there is a factor in the universe to cause it. Sound change CAN go on because language is redundant: most of the time, a hearer need register only a suitably scattered small fraction of the stigmata of the speech signal in order to know what has been said. Language, in turn, is redundant because otherwise [pb] it would have had no survival value for our species. Sound change DOES go on because of NOISE. Only in a permanently noiseless universe would sound change cease; but I think I could show on elementary physical principles that in a permanently noiseless universe there would be no events, hence no information, hence nothing to talk about and no language to talk with.|203f|reasons for sound change, sound change 608|Bybee2000|In many areas of linguistics, examining the nature of diachronic changes leads to a more accurate modeling of synchronic systems. Lexical diffusion - the way a sound change spreads through the lexicon - has, as yet, not been exploited as a potential source of evidence about the phonological shape of lexical representations. The present study contains evidence that sound change is both phonetically gradual and lexically gradual, and that the rate at which words undergo sound change is positively correlated with their text frequency. This correlation is found in monomorphemic words, regu- larly inflected words and irregularly inflected words. I argue that frequency effects in sound change may be explained by assuming that cognitive repre- sentations are impacted by every token of use.|65|sound change, explanation of sound change, lexical diffusion, regular sound change, sound change mechanisms 609|Bybee2002|:comment:`Based on aforementioned stuff, proposes a new model for language which accounts for the new findings. This approach is based on` !exemplar_theory :comment:`and apparently assumes that humans store some exemplars of pronunciations of words in their memory.` The model that can handle the facts just described must allow for gradual changes in the phonetic representations of words, and it must also allow for the expression of certain relations within and among exemplar clusters. As already demonstrated, reductive change occurs in production as already automated sequences of linguistic elements are further reduced. These online reductions feed back into memory representations, since the language user’s accumulated experience is [pb] represented in memory. Words are represented as clusters of exemplars, and the relative weight of exemplars with different patterns may change over time as reduction proceeds. If the distribution of words in actual discourse contexts differs, the rate at which their exemplar clusters change, and thus the rate at which they undergo a change, may differ.|281f|sound change, exemplar theory 610|Bybee2002|Effects of frequency and context demonstrate that language use has an effect on mental representations. In this view, representations and the grammatical structure that emerges from them are based on experience with language. New linguistic experiences are categorized in terms of already stored representations, adding to the exemplar clusters already present and, at times, changing them gradually. Various levels of abstraction emerge as exemplars are categorized by phonological and semantic similarity—morphemes, words, phrases, and constructions can all be seen as the result of the categorization of linguistic experiences.|288|sound change, exemplar theory 611|Martinet1984|The difference between language in general and individual languages should always be kept in mind. The Saussurean dichotomy of langue vs. parole should yield before linguistically relevant vs. linguistically irrelevant. Languages should not be identified with codes. Proceeding inductively when trying to define the object of our science has proved impracticable. We should rather stipulate what we want to call a language. From a language we expect that it be actually used for communicating experience by means of a succession of vocal products analyzable into segments equated with some features of the total experience; each of those segments being analyzable into a succession of well-defined vocal units. Speaking here of dual patterning would obscure the fundamental hierarchy of the double articulation of language. The economical nature of double articulation is obvious: the vocal auditory nature of language, determining the linearity of speech, will automatically lead to it. But there is more to it than sheer economy: the analysis of experience into features corresponding to the significant units of a language makes it possible to communicate new experience by means of unexpected combinations of these units. The second articulation into phonemes is instrumental in stabilizing the perceptible forms by making them independant of the correspponding meanings.|000|double articulation, language model 612|BermudezOtero2007|:comment:`speaks of traditional models in generative linguistics, which start from lexical representation, go then to phonological rules, and then to phonological representation, and then to phonetic rules to realization. From these assumptions, the following two assumptions can be derived (or they underlie them):` * *Lexical and phonological discreteness* - In lexical and phonological representations, attributes have discrete values. * *Modularity* - Phonetic rules cannot refer directly to lexical representations. |502|language model, phonological principle 613|BermudezOtero2007|* By (3a), phonetically gradual change can take place only through the alteration of the phonetic rules that assign realizations to phonological categories. But, by (3b), any such alteration must be free of lexical conditioning. This is the key insight behind Bloomfield’s (@1933:351) slogan ‘Phonemes change’. * Diffusing change involves the alteration of the lexical representations where lexical information is stored. By (3a), however, such alterations must be categorical. In fact, given all the logically possible interactions between the phonetic and lexical dimensions, the architecture in (2) predicts the existence of not two but three modes of implementation for phonological change: .. raw:: html
Mode of implementation Possible? Innovation in what component of grammar?
*phonetic dimension* *lexical dimension*
abrupt gradual Yes lexical representations
abrupt abrupt Yes phonological rules
gradual abrupt Yes phonetic rules
gradual gradual No
To my knowledge, the prediction that regular categorical change exists has rarely, if ever, been explicitly discussed in the literature. As we shall see in Section 21.3.2, however, there are powerful arguments in its favour.|503|sound change, sound change mechanisms 614|BermudezOtero2007|* Phase I - The life cycle starts with phonologization (Hyman 1976), which occurs when some physical or physiological phenomenon gives rise to a new cognitively controlled pattern of phonetic implementation through a coordination failure. * Phase II - Subsequently, the new gradient sound pattern may become categorical. In the modular feedforward architecture in (2), such a change would involve the restructuring of the phonological representations that provide the input to phonetic implementation, with the concomitant development of a new phonological counterpart for the original phonetic rule. * Phase III - Reanalysis can also cause categorical patterns to change. Over time, phonological rules typically become sensitive to morphosyntactic structure, often with a reduction in their domain of application * Phase IV - At the end of their life cycle, sound patterns may cease to be phonologically controlled. Thus, a phonological rule may be replaced by a morphological operation (morphologization), or may disappear altogether, leaving an idiosyncratic residue in lexical representations (lexicalization).|504|sound change, sound change patterns 615|BermudezOtero2007|:comment:`Author gives arguments against non-modular appraoches, based on specific, apparently attested patterns.` In nonmodular theories of the phonology–phonetics interface, in contrast, a predictable allophone can remain after the loss of its conditioning environment only if it is already present in lexical representation (@Bybee<2001> 2001: Sec.3.6). However, this approach misses the crucial rôle of opacity in triggering lexical restructuring: see (7). Modular feedforward models capture the right sequence of cause and effect: in the Sanskrit case [:comment:`palatalization of /k/ to /c/ in specific environments`], the stabilization of palatalization enabled it to interact opaquely with lowering; this opacity, in turn, prompted lexical restructuring, with the attendant phonemic split. |508|exemplar theory, modularity, language model 616|BermudezOtero2007|Essential to current functionalist thinking is the idea that the lexicon consists of a vast repository of highly-detailed memory traces of phonetic episodes experienced by the speaker: so-called ‘exemplars’ (Johnson 1997). Exemplars are linked to one another by a network of connections based on similarity in a high-dimensional phonetic space. Crucially, phonological categories do not exist independently of the exemplars. This idea comes in several versions.|512|exemplar theory 617|BermudezOtero2007|Moreover, gradient diffusion predicts that, over time, the lexicon will preserve remnants of old phonemes and exceptions to new allophonic patterns, all left behind by arrested changes. Indeed, if holistic phonetic targets were kept in long-term memory in quite the same way as lexicalized [pb] morphological constructs, then phonetic relics should be as unremarkable as stored morphological irregularities like children, oxen, feet, or wolves.|513f|exemplar theory, irregularity 618|Bloomfield1933|The interpretation the, of the phonetic correspondences that appear in our resemblant forms, assumes that *the phonemes of a language are subject ot historical change*. This change may be limitate to certain phonetic conditions [...]. This type of linguistic change is known as *phonetic change* (or *sound change*). In modern terminology, the assumption of sound-change can be stated in the sentence: *Phonemes change*.|351|sound change 619|Ohala1989|[...] we can identify some of the causes of sound change or at least locate the domain in which they lie. The ultimate proof of this is being able to duplicate sound change in the laboratory.|173|sound change 620|Ohala1989|A further qualification is that we can identify only some of the causes of sound change and, perhaps, make imporved speculation on some of the others. [:comment:`Follows a talk on problem of causation in general, comparing it with diseases, where the ultimate cause is often not that easy to determine. Basic example is that of a heart attack.`] I propose to treat sound change the way public health workers treat heart attacks. By this I do not mean that I intend to tyr to prevent sound change -- indeed, I do not think it is harmful at all and, in any case, there is little we could do to control it -- but rather that I will try to identify those causes which are the preconditions for sound change. As for the immediate triggers of sound change in a particular language at a particular time, I will have little to say about them except to suggest that these things are bound to happen and that it is not so interesting to try to identify them. |174|sound change, 621|Ohala1989|There exists in any speech community at any point in time a great deal of hidden variation in the pronunciation of words. [:comment:`rather lengthy part on what is not meant by "hidden"`] By "hidden" I mean rather that speakers exhibit variations in their pronunciation which they and listeners usually do not recognize as variation. When pronunciation is transmitted, however, the existence of this variation can create ambiguity and lead to the listener's misapprehension of the intended pronunciation norm. A misapprehended pro[pb]nunciation is a changed pronunciation, i.e., sound change. Analogues of the mechanism I have in mind may be found in scribal errors made by mediaval manuscript copyists, in transcription errors between DNA and RNA, in the transmission errors of signals over telephone lines. In all of these cases as well as in normal spech transmission there is sufficient redundancy in the message to allow most such errors to be corrected, but the error correction is not perfect and so occasionally the signal is changed between the source and the destination.|175f|sound change, synchronic variation, production 622|Ohala1989|One of the most important discoveries of modern instrumental phonetics is the incredible amount of variation that exists in pronunciation, not only between speakers but also in the speech of a single speaker. [...] In fact, if synchronic variation is the stuff out of which sound change emerges, as I claim, the suprising thing is that sounds do nto change more often. |176|sound change, robustness, biological parallels 623|Kitano2007|Robustness is one of the fundamental characteristics of biological systems. Numerous reports have been published on how robustness is involved in various biological processes and on mechanisms that give rise to robustness in living systems (Savageau, 1985a, 1985b, 1998; Barkai and Leibler, 1997; Alon et al, 1999; von Dassow et al, 2000; Bhalla and Iyengar, 2001; Csete and Doyle, 2002, 2004; Kitano et al, 2004, 2004a, 2004b; Stelling et al, 2004; Kitano and Oda, 2006; Kitano, 2007a). With increasing interest in systems biology, properties at the system level such as robustness have attracted serious scientific research. Nevertheless, a mathematical foundation that provides a unified perspective on robustness is yet to be established. For systems biology to mature into a solid scientific discipline, there must be a solid theoretical and methodological foundation. Often, systems biology is equated with computer simulation of cells and organs. Although computer simulation is a powerful technique for clarifying the complex dynamics of biological systems, it is also a useful tool for exploring the foundation of biological systems. While investigation on the dynamic properties of specific aspects of organisms is scientifically significant and can be widely applied, it is a study on specific instances of design within a design space that is shaped by fundamental principles, structural, environmental, and evolutionary constraints. The scientific goal of systems biology is not merely to create precision models of cells and organs, but also to discover fundamental and structural principles behind biological systems that define the possible design space of life (Figure 1). The value of understanding fundamental and structural theories is that they provide deeper insights into the governing principles that complex evolvable systems including biological systems follow. Building a solid theoretical foundation of biological robustness, and in particular defining a mathematical formulation of robustness, represents a key challenge in Systems Biology. Such a framework would be enormously useful, as it would provide general constraints on possible architectural features of living organisms.|000|robustness, biology, biological parallels 624|Ohala1989|The mapping between vocal tract shape and the output sound is a many-to-one mapping, i.e. the same or similar sound may result from tow or more different vocal-tract configurations. When listeners repeat what they have heard they may use an articulation different from the original. Henry Sweet (1874: 15-16, 1900: 21-22) was among those who recognized thsi mechanism for sound change, exemplifying it with the variant forms of English 'through' as :sampa:`[TRu]` and :sampa:`[fRu]`. Modern acoustic analysis has revealed a great many other acoustic similarities between distinct articulations. :comment:`It follows a very nice explanation of labiovelar change in Indo-European as being the result of mis-perception, rather than mis-production.`|182|synchronic variation, sound change, perception 625|Ohala1989|:comment:`Gives examples for hypo-correction as a trigger of sound change. Here, the idea is not generally clear, but Ohala seems to think of a speaker that mis-interpretes a given pattern in some way. Basic example for this is the nasalization, which goes via successive stages (nasal vowel + nasal) => (nasal vowel alone).`|184-188|hypo-correction, sound change, 626|Ohala1989|The failure to implement the corrective rules was labelled 'hypo-correction'; implementing the rules when they are not called for, then, is 'hyper-correction'.|188|sound change, hyper-correction 627|Ohala1989|:comment:`Last passage is devoted to laboratory experiments for the investigation of sound change.` What is important about "laboratory evaluation" is the possibility of conducting controlled observations on sound change in order to carefully evaluate competing hypotheses concerning its mechanisms. In this sense "laboratory| should not be interpreted literally as "a space equipped with an elaborate array of instruments" but rather as a set of techniques or an arsenal of methods which can be used anywhere. Having recourse to such methods is the only way the study of sound change is going to progress beyond the Kiplingesque *Just-So* stories offered in the past.|194|sound change 628|Kiparsky1988|Text deals with the mechanisms of sound change and tries to explain the mechanisms within the framework of phonological theory. Kiparsky also gives an overview on major sound change types which apparently serves as the basic for the Wikipedia articles on the matter. The term "sound change mechanism" is also used by Kiparsky, although less explicitly then in my dissertation.|000|sound change, lexical diffusion, phonological theory, exceptionlessness hypothesis 629|Kiparsky1988|:comment:`mentions uniformitarianism as the basis for the exceptionlessness principle of the Neogrammarians.` The uniformitarian principle had two important consequences: first, that properties of langauge change can and should be investigated not on the basis of hypothetical reconstructions but on the basis of known languages and known historical changes; second, that reconstructed proto-languages are constrained by the same principles as are valid for actual languages. :comment:`The observation that languages and dialects would show regular phonological systems was taken as an indicator for the exceptionlessness hypothesis.`|364|exceptionlessness hypothesis, uniformitarianism 630|Kiparsky1988|A corollary of this new realism was that sound changes are language-specific historical processes situated in space and time, rather than simply general tendencies of 'phonetic erosion,' as they had previously been thought to be. This not only raised new questions about their phonetic interpretation, but also provided a basis for investigating their relative chronology.|364|catastrophism, exceptionlessness hypothesis, uniformitarianism 631|Kiparsky1988|The EH was in essence a radicalization of the insight that sound changes are rule-governed historical processes: not only are sound changes rule-governed, they are governed by rules of a very special and restricted sort. Apparent exceptions or nonphonetic conditions, it was claimed, are *always* due either to analogy, or to borrowing, or to other interacting sound changes. A neogrammarian historical phonology accordingly consists of a list of sound changes together with an account of the apparent exceptions to each by means of these three factors. The considerable success with with the history of a wide range of languages could be covered in this constrained framework was an impressive argument for the correctness of the EH.|365|sound change, analogy, borrowing, exceptionlessness hypothesis 632|Kiparsky1988|But this methodological argument is obviously worthless, since it depends on the false identification of exceptionlessness and regularity. Historical linguistics is viable because there are regularities, and does not depend at all on exceptionlessness or on the absence of nonphonetic conditioning.|365|working hypothesis, exceptionlessness hypothesis, sound change 633|Kiparsky1988|First, @Bloomfield<1933> was able to add a simple and powerful argument in support of the EH: if sound changes were lexically idiosyncratic, then langauges would have assorted \`stray sounds' left over from the previous stages of their history. [:comment:`points to specific residues such as exceptions in morphology, here and gives examples.`] In reality, of course, langauges always have concise phonemic inventories, and where marginal segments exist they are normally the result of borrowing or of expressive processes, not residiues left behind by supposedly sporadic sound change.|366|sound change, exceptionlessness hypothesis, irregularity of sound change 634|Bloomfield1933|Theoretically, we can understand the regular change of phonemes, if we suppose that langauge consists of two layers of habit. One layer is phonemic: the speakers have certain habits of voic[pb]ing, tongue movement, and so on. These habits make up the phonetic system of teh language. The other layer consists of formal-semantic habits: the speakers habitually utter certain combinations of phonemes in response to certain types of stimuli, and respond appropriately when they hear these same combinations. These habits make up the grammar and lexicon of the language.|364f|sound change, language model, regularity hypothesis, regular sound change 635|Kiparsky1988|:comment:`Mentions quote by` @Bloomfield<1933> (364f) :comment:`in which two representations of language are postulated due to the regularity of sound change:` not only will tthere have to be two 'layers of habit' (levels of representation), but they must be mutually independent. Only if the 'phonetic system' is independent of the grammar and lexicon will it follow that the phonological structure of uttereances cannot be determined by their grammatical-lexical structure.`|366|sound change, phonological theory, levels of representation 636|Kiparsky1988|However, the prevalence of langauge contact and diversity in no way disconfirms the EH, specifically not the *causal* claim which lies at the heart of it, namely that exceptions do not develop internally to a system but only through the interference between systems. On the contrary, it makes that hypothesis more devensible by justifying the very assumptions of heterogeneity which must be invoked to explain away contrary evidence. But by the same token it substantially insulates the EH from the actual data of change, and so makes it harder to put an empirical test.|369|exceptionlessness hypothesis, irregularity of sound change 637|Hoenigswald1978|:comment:`Referes to "Leskien's regularity principle" and the reactions by scholars which defend it on "methodological grounds":` There was a similarly awkward counter-plea to the effect that we cannot afford to allow exceptions because that would be the end of our efforts and put us, as it were, out of business -- as though a false principle was better than none at all. Schuchardt called this an attempt at scholarship by intimidation. |24|regular sound change, regularity hypothesis, working hypothesis 638|Hoenigswald1978|First, there are those who speak of sporadic sound-change [pb] -- presumably the marginal or not so marginal realization of the possibility left open after the first, merely classificatory 'step' implied by the customary formulation of the regularity principle. Advocates of sporadic sound-change have occasionally asserted taht the question of sporadicity arises typically where particular phonetic properties and processes are involved; dissimilartions and haplologies are among the most familiar examples. [:comment:`lenghty example on Greek word`] It wouild have been best to agree, without for a moment deviating from the working rules which Leskien and his colleagues consistenlty observed, not that all sound changes are discovered to be regular, but rather that regular changes are also called sound-changes, regularity being essentially a term for phonological statability. |25f|sporadic sound change, irregularity of sound change, sound change, regularity hypothesis 639|Hoenigswald1978|It is a customary piece of orthodoxy to reject the notion of langauge 'mixture' on the ground of some kind of principle and to assert, for instance, that English may be identified in some safe, immediate, and *a priori* fashion as a Germanic langauge with French borrowings, rather than as a Romance langauge with Germanic borrowings. Perhaps this can be done, but let us ntoe that teh criteria are not altogether general ones. Crude numbers are no use since, in an even case, we reserve the freedom to decide that 'a langauge', even though swamped by loandwords, ahs remained its own self. Grammatical traits, so highly prized ever since Ludolf and Gyarmathi (if not before), can be borrowed quite massively, as students of calques and of typological isoglosses have come to know. Perhapts it is possible to fall back on the concept of the base vocabulary as a guide to the true strain; but there are difficulties here as well, especially when the differences between the forms of speech are only on the dialectal scale. All depends, as is wisely said, on the circumstances; and it is the general historian, and nto the linguist in his Young Grammarian armour, who will deal constructively and reasonably with the historical setting.|27|basic vocabulary, lexical borrowing, inheritance, mixed languages 640|Kiparsky1988|:comment:`Kiparsky quotes` @Hoenigswald1978 :comment:`but this is not true, since the passages cannot be found in neither of the two articles written by Hoenigswald in that year, so it is more likely that Kiparsky refers to` @Hoenigswald1964 :comment:`where the question of sporadicity of sound change is much more explicitly expressed.`|370f|sound change, regular sound change 641|Kiparsky1988|The first problem for the EH is that the phonological system of a language places constraints on the sound changes that language can undergo. [:comment:`Quotes` @Jakobson1929 :comment:`on this matter, who showed that certain changes are following general structural principles of a given language.] These principles include *implicational universals*, that is, universals which state that the presence (or absence) of a property A in a language requires the presence (or absence) of property B. [...] This is crucial to the sound change issue in that the very existence of nontrivial implicational universals [...] immediately refutes the EH: if sound changes were purely *phonetically* conditioned they should be blind to what was *distinctive* in teh language, and should be capable of applying in such a way as to produce phonological systems that vialate implicational universals framed in terms of those notions.|372|regularity hypothesis, implicational universals 642|Kiparsky1988|Sound changes [...] can be initiated in or restricted to certain morphological categories [...] or blocked by morphological boundaries [...]. In some cases thsi could be attributed to analogical restoration in mid-change, on teh basis of categories where the ending was regularly retained (@Anttila<1972> 1972). But [...] there are cases where this interpretation is impossible. |373|regularity hypothesis, sound change 643|Kiparsky1988|Sound changes sometimes apply just to frequent words of the langauge [...]. Frequency itself could hardly be the conditioning factor, or the 'cause' of the change. More plausibly, the causal link between frequency and change is redundancy: frequent items are more easily guessed by the hearer, so the speaker can afford more reduced pronunciations of them, which then may be lexicalized [...].|373|regularity hypothesis, sound change 644|Kiparsky1988|The word-by-word spread of sound change through the lexicon within a dialect, its *lexical diffusion*, is another kind of phenomenon which is incompatible with the EH.|373|regularity hypothesis, sound change, lexical diffusion 645|Kiparsky1988|:comment:`Kiparsky embarks on causes of sound change, mentioning that soudn change often originates in middle class societies. More literature and detailes studies are mentioned in the passage.`|374-376|sound change, explanation of sound change, reasons for sound change 646|Kiparsky1988|:comment:`Kiparsky mentiones three aspects from which sound change should and could be studied.` * *Typology*. A typological perspective is traditional in structural phonology [...]. Typological studies, then, are simply part of the normal activity of linguistics. [...] * *Phonetics*. The program of explaining phonological processes by the physical constraints of the vocal tract goes back to the beginnings of phonetics [pb] and flourished especially around the turn of the century [...]. Ohala and his associates [...] have recently revived a more sophisticated form of this line of research which brings in the entire speech mechanism, including perception. * *Phonology*. Most ongoing work in phonological theory represents some variety of generative phonology and has as its goal the development of models of (1) phonological representations, (2) phonological rules, (3) the organization of the phonology and its relation to other modules of grammar. |376f|sound change, typology 647|Kiparsky1988|*Tonogenesis*. The laryngeal articulation of consonants can affect neighboring vowels in a way which is then phonologized as tone. This process can give rise to tonal systems, and increase the number of tones in languages which are already tonal. The best-documented type, found throughout Southeast ASia and scattered elsewhere, is the development of low tone after voiced obstruents. |382|tonogenesis 648|Kiparsky1988|[...] the discontinuous transmission of language is a factor in sound change, and that this mechanism is specifically responsible for certain so-called 'minor sound changes.' These changes, notably dissimilation [...] and metathesis [...], differ from ordinary sound change in being (1) abrupt rather than gradual, (2) often sporadic, [pb] and (3) structure-preserving, in the sense that they do not result in new segment types but simply redistribute existing ones [...]. |389f|language acquisition, sound change, sporadic sound change 649|Kiparsky1988|[...] sound changes are determined by overriding structural principles, universal as well as language-specific. If change is structure-dependent then we had better knwo what the structure is inorder to understand the change.|391|sound change, language structure 650|Kiparsky1988|[...] the discontinuous transmission of language is a factor in sound change, and that this mechanism is specifically responsible for certain so-called 'minor sound changes.' These changes, notably dissimilation [...] and metathesis [...], differ from ordinary sound change in being (1) abrupt rather than gradual, (2) often sporadic, [pb] and (3) structure-preserving, in the sense that they do not result in new segment types but simply redistribute existing ones [...]. |389f|language acquisition, sound change, sporadic sound change 651|Kiparsky1988|I draw the conclusion that the existence of two types of sound change,lexical diffusion and 'neogrammarian' sound change, is a consequence of the existence of two types of phonological rules, lexical rules and postlexical rules.|404|lexical diffusion, regular sound change, Neogrammarian sound change 652|Kiparsky1988|Progress in our understanding of sound change is likely to come from an integration of theories of phonology, phonetics, acquisition, and language processing (perception, production, variation). It is this interplay of mutually constraining factors which gives historical linguistics its focal role in the study of languages.|404|sound change 653|Trask2000|Any syntagmatic change in which some segment becomes more similar in nature to antother segment int he same sequence, usually within a single phonological word or phrase.|30|assimilation, sound change types 654|Trask2000|Any **syntagmatic change** in which one segment changes so as to become less similar to another segment in the same form|95|dissimilation, sound change types 655|Trask2000|Any **syntagmatic change** in which the order of segments (or simetimes of other phonological elements) in a word is altered. |211|metathesis, sound change types 656|Trask2000|Any process which leads to the introduction of tones into a language which formerly lacked them. |346|tonogenesis, sound change types 657|Trask2000|Any of various phonological processes applying to sequences of segments either across morpheme boundaries (*internal sandhi*) or across word boundaries (*external sandhi*). |296|sound change types, sandhi 658|Trask2000|A type of phonological change (of or phonological constraint) in which one of two adjacent syllables of identical or similar form is lost (or fails to appear in the first place). |146|haplology, sound change types 659|Trask2000|Any phonological change which inserts a segment into a word or form in a position in which no segment was formerly present.|107|sound change types, insertion, epenthesis 660|Trask2000|Any of various processes in which phonological segments are lost from a word or a phrase. Specific varieties of elision are often given special names like **aphaeresis**, **syncope**, **apocope**, **synaeresis**, **synizesis**, **synaloepha**. Not infrequently this name is given to specific processes in particular languages.|102|elision, sound change types 661|Trask2000|Any phonological process in which a segment acquires a nasal character which it formerly lacked.|224|nasalization, sound change types 662|Trask2000|(also **prosthesis**) The addition of a segment to the beginning of a word. [...] The opposite is **aphaeresis**.|266|prothesis, sound change types 663|Mowrey1995|:comment:`Reply to` @Wang1969 :comment:`who claims that many changes are not gradual.` Here Wang seems to overstate the case for phonetic abruptness. As we will see later on, for some changes of place of articulation for which no physiological continuum is apparent, articulatory pathways in fact exist. Others are known to be the result of language contact, and as such cannot be construed as evidence for abruptness in internal change. [...] In regular metathesis, it is not unusual to find that one of the segments involved in the apparent transposition is articulatorily minimal [...]. What appears to be an interchange of two entire segments may then be the result of a change in the timing of a very small number of muscular events -- similar to the phenomena which, we argue below, underlie assimilatory processes. |42|sound change, regularity hypothesis, metathesis 664|Mowrey1995|[...] evidence for gradience in segment addition or deletion phenomena is also available. Fourakis & Port (1986) show that the transitional (inserted) stops arising from [ns] and [ls] sequences in American English, as in *dense* and *false*, are significantly shorter than the underlying stops in *dents* and *faults*. :comment:`Authors give more examples for gradience in change patterns which are not apperently gradient.`|41f|gradient sound change, epenthesis 665|Mowrey1995|The general difficulty here is that the case for abruptness cannot be judged on the witness of orthography and broad transcription, since alphabetic units by their nature wil always testify that change is abrupt. For relevant testimony we must turn to phonetic accounts, whether based on the informal obervations of a Boas or Menéndez Pidal or on instrumental data. Although we expect abrupt substitution of one sound for another in cases of language contact, such finer-grained ata as we possess point to the phonetic gradualness of internal changes.|42|sound change, gradual sound change, contact-induced sound change 666|Mowrey1995|Labov and his collaborators have shon that ongoing changes are typically characterized by variation; variation is not 'free' but systematic, and variatns typically have social value. |42|synchronic variation, sound change, gradual sound change 667|Mowrey1995|:comment:`short summary on what was discussed so far` First, the aetiological differences between sound change proper and other phenomena effecting the change of form over time are so obvious that the de facto abandonment of the distinction by generativists may be viewed as an aberration. [...] Secondly, that sound chagnes proceed in a lexically gradual rather than abrupt fashion has been more than amply demonstrated by Wang and others; the role of relative frequency in diffusion also appears to have been adequately documented. Third, evidence for the phonetic gradualness of at leas some processes assumed to be necessarily abrupt, though for obvious reasons more difficult to obtain and therefore less abundanant, si sufficient to suggest that the case for it has much more merit than Wang assumed. Fourth [...] variation is not only systematic but often reflects the action of changes in progress, and is [pb] thus necessarily within the domain of phenomena which a theory of sound change must address.|45f|sound change mechanisms, phonetic gradualness 668|Mowrey1995|The existence of dialectal detail, and its successful continuance in generational transmission, suggests not only that speakers routinely command levels of phonetic detail much finer than that which our descriptive tools can accomodate, but also taht audition and articulation must be very closely linked. |47|perception, production, sound change 669|Mowrey1995|We suggest that though auditory perception is implicated in changes arising in language contact, in which the differently-tuned speech perception systems of the speakers of an adopting language come into play, misperception does not provide a plausible explanatory basis for internal change, since the analyses depend on attributing to the speech perception system a tendency toward confusion and approximation which it does not appear to exhibit. Articulatory processes by which production patterns gradually change over time [...] thus appear to provide a better account of internal change. The nature of variation along the formal-to-casual speech continuum corroborates this conclusion. Not only does the relation between more formal and more casual speech modes seem to be most naturally characterizable as the product of unidirectionally reductive changes, but speakers effortlessly control the entire range of production along the continuum. This suggests that the reductive changes arise from on-line articulatory phenomena whose effects are incremental and continuous rather than from a series of events with a basis in misperception or reassignment.|48|perception, production, sound change, explanation of sound change 670|Mowrey1995|:comment:`Mention theories on phonology which argue that phonology is important to guarantee that the memory load for humans learning languages is as low as possible, and argue against this view.` We believe that it does no, and that the level of phonetic detail in the core lexical material of any given dialect in itself suggests taht such assumptions cannot be correct. In fact, given that speakers must acquire and store detail anyway, the linking of phonetic and semantic form -- the pairing of sound and meaning that linguists regard as their responsibility to explicate -- might be effected in a smoother way in a holistic model. [...] We have argued that it is articulation directly -- rather than articulation mediated by perception, or psychological units of the nature of phonemes -- that changes in internal sound change, and that the locus of change is specific meaningful expressions, and not, or at least not in any primary sense, sounds or sound patterns independent of expressions. |53|sound change, explanation of sound change, holistic model 671|Mowrey1995|Authors make an interesting distinction between external and internal change, which may also occur already in @Labov1994 or earlier, by treating internal change as the change which is not driven by any other elements than the language itself and would also occur if the languages was spoken in isolation. External change is then the change which is triggered by language contact. This may be problematic, since dialect continua as part of our linguistic reality, will make it difficult to really distinguish between external and internal change.|000|internal and external sound change, lexical diffusion, sound change mechanisms, production, perception 672|Mowrey1995|The term *internal* will roughly correspond to those changes which are thought to be phonetically motivated within a dialect. Excluded are changes which are obviously induced by contact between distinct dialects or languages, as well as hypercorrection and abrupt changes to archaic or infrequent forms [...]. These latter all behave in much the same way as lexical material borrowed from a different language, and we group them together under the terms *external* or *non-evolutionary* change. In doing so, we follow in the tradition of theories of change whcih are careful to separate analogical changes and borrowings from processes which do not seem to require external motivation. |68|internal and external sound change 673|Mowrey1995|.. image:: static/img/mowrey-table.png :width: 600px :name: table :comment:`Table illustrating internal and external sound change.`|70|internal and external sound change 674|Mowrey1995|The characterization of assimilation as temporal reduction must, then, be fundamentally incomplete. Assimilation mus involve or be accompanied by some kind of substantive reduction.|72|assimilation, sound change, reduction 675|Mowrey1995|.. image:: static/img/mowrey-sonority-hierarchy.png :name: sonority :width: 500px :comment:`Sonority hierarchy by the authors.`|80|sonority hierarchy, sound change 676|Mowrey1995|Fig. 3 shows typical decay pathways of voiceless aspirated stops expressed as segmental mutations. Alternativ edecay pathways are represented as branches from a higher node on the vertical axis. The parallel between consonantal decay pathways and the strength/sonority hierarchy is immediately obvious: in general terms, liquids, glides, and 'secondary' vowel or consonant qualitites (reperesented in Fig. 3 as Q) descend from weakly-stopped or fricative configurations, which in turn descend from more fully stopped configurations. [...] [pb] .. image:: static/img/mowrey-sound-change-paths.png :name: sound change :width: 500px :comment:`Image illustrating different sound change paths.`|81f|sonority hierarchy, sound change patterns 677|Mowrey1995|:comment:`After an interesting treatment of epenthesis and other phenomena, which are quite nicely argued to be similar to the general "reduction"-pathway of change, the authors conclude:` Internal or evolutionary change (sound change) consists of just two types of changes in muscular activity patterns: temporal reduction, by which bursts of muscular activity become temporally more contiguous, and substantive reduction, by which burts of muscular activity diminish in amplitude. These two event types appear to provide straightforward accounts of most of the major types of phonological processes. [...] Evaluating the case for evolutionary epenthesis, we have concluded that it is weak. [:comment:`it follows a lengthy passage on epenthesis and other stuff in general.`] [pb] We do not expect that all cases of internal epenthesis will be explicable, nor do we intend to explain them all away. But even if every case were valid, epenthesis would still be statistically insignificant relative to reduction, and does not seem, to be observed in contemporary internal changes-in-progress. [...] The reduction of redundancy in signals or messages, especially those repeated often and therefore in part predictable, is automatic and advantageous. [...] It is an error to coneive of this as deriving from ease of articulation or laziness; such parochializations have detained us too long, preventing us from considering that use-induced reduction over time is characteristic of all symbolic motor behavior.|108f|reduction, internal and external sound change 678|Hoenigswald1964|There are *unconditional* sound changes and *conditioned* sound changes. Unconditional sound changes are typically *regular* and *gradual*. Most conidtioned sound changes are also regular and gradual. In addition conditioned changes are very largely *assimilartory* in nature. The assimilation in question is an assimilation to a nearest segment of sound -- it is *contact* assimilation; and what is more, it is , typically, an assimilation to the nearest following segment: it is *regressive* (anticipatory). |205|conditioned sound change, regular sound change, gradual sound change 679|Hoenigswald1964|But then there are some conditioned changes which are different in all or some of the following ways: 1 they are *sporadic* rather than regular, 2 they are *sudden* rather than graudal, 3 they are sometimes *dissimilarotry*, [...] 4 these sporadic, sudden dissimilations or assimilations are *distantly* conditioned rather than conditioned in contact, 5 they are not infrequently progressive, or else they are *reciprocal* [...], 6 certain particular sounds or sound configurations are said to be more subject to those processes than others: [pb] liquids and naals for example; geminates are presumably prone to dissimilatory simplification; syllables are lost haplologically, etc. |205f|sporadic sound change 680|Hoenigswald1964|Author writes on the question of sporadicity vs. graduality of sound change. He argues that sporadic sound change (as a sound change mechanism) cannot be tight to any specific sound change types, but rather express a continuum in which regular sound change is intersected by alternative processes such as analogy.|000|sound change types, sporadic sound change, gradual sound change, sound change mechanisms 681|Hoenigswald1964|The grounds on which the so-called minor sound change processes have been contrasted with ordinary, "regular" sound change are very weak. In particular, the sporadicity of souch sound changes as distant dissimilartions and assimilations may be directly connected with the curcumstance taht distant conditioning allows more socope for morpheme boundaries to intervene and thus for "analogic" change to occur. This will be the case fwhere the prevailing typology operates against certain (discontinuous) phoneme combinations and where inflectional or derivational morpheme boundaries are likely to fall in between.|214|sporadic sound change, analogy, gradual sound change 682|Schouwstra2014|Where do the different sentence orders in the languages of the world come from? Recently, it has been suggested that there is a basic sentence order, SOV (Subject–Object–Verb), which was the starting point for other sentence orders. Backup for this claim was found in newly emerging languages, as well as in experiments where people are asked to convey simple meanings in improvised gesture production. In both cases, researchers found that the predominant word order is SOV. Recent literature has shown that the pragmatic rule ‘Agent first’ drives the preference for S initial word order, but this rule does not decide between SOV and SVO. This paper presents experimental evidence for grounding the word order that emerges in gesture production in semantic properties of the message to be conveyed. We focus on the role of the verb, and argue that the preference for SOV word order reported in earlier experiments is due to the use of extensional verbs (e.g. throw). With intensional verbs like think, the object is dependent on the agent’s thought, and our experiment confirms that such verbs lead to a preference for SVO instead. We conclude that the meaning of the verb plays a crucial role in the sequencing of utterances in emerging language systems. This finding is relevant for the debate on language evolution, because it suggests that semantics underlies the early formation of syntactic rules.|000|word order, syntax, language evolution 683|Li2013|Identification of communities in complex networks is an important topic and issue in many fields such as sociology, biology, and computer science. Communities are often defined as groups of related nodes or links that correspond to functional subunits in the corresponding complex systems. While most conventional approaches have focused on discovering communities of nodes, some recent studies start partitioning links to find overlapping communities straightforwardly. In this paper, we propose a new quantity function for link community identification in complex networks. Based on this quantity function we formulate the link community partition problem into an integer programming model which allows us to partition a complex network into overlapping communities. We further propose a genetic algorithm for link community detection which can partition a network into overlapping communities without knowing the number of communities. We test our model and algorithm on both artificial networks and real-world networks. The results demonstrate that the model and algorithm are efficient in detecting overlapping community structure in complex networks.|000|link communities, community detection, community structure 684|He2014|Discovery of communities in complex networks is a fundamental data analysis problem with applications in various domains. While most of the existing approaches have focused on discovering communities of nodes, recent studies have shown the advantages and uses of link community discovery in networks. Generative models provide a promising class of techniques for the identification of modular structures in networks, but most generative models mainly focus on the detection of node communities rather than link communities. In this work, we propose a generative model, which is based on the importance of each node when forming links in each community, to describe the structure of link communities. We proceed to fit the model parameters by taking it as an optimization problem, and solve it using nonnegative matrix factorization. Thereafter, in order to automatically determine the number of communities, we extend the above method by introducing a strategy of iterative bipartition. This extended method not only finds the number of communities all by itself, but also obtains high efficiency, and thus it is more suitable to deal with large and unexplored real networks. We test this approach on both synthetic benchmarks and real-world networks including an application on a large biological network, and compare it with two highly related methods. Results demonstrate the superior performance of our approach over competing methods for the detection of link communities.|000|link communities, community structure, community detection 685|Francis2014|Hybrid evolution, horizontal gene transfer (HGT) and recombination are processes where evolutionary relationships may more accurately be described by a reticulated network than by a tree. In such a network, there will often be several paths between any two extant species, reflecting the possible pathways that genetic material may have been passed down from a common ancestor to these species. These paths will typically have different lengths but an ‘average distance’ can still be calculated between any two taxa. In this note, we ask whether this average distance is able to distinguish reticulate evolution from pure tree-like evolution. We consider two types of reticulation networks: hybridization networks and HGT networks. For the former, we establish a general result which shows that average distances between extant taxa can appear tree-like, but only under a single hybridization event near the root; in all other cases, the two forms of evolution can be distinguished by average distances. For HGT networks, we demonstrate some analogous but more intricate results.|000|hybridization, lateral gene transfer, reticulate evolution, distance-based methods 686|Dehmer2011|Overall, to order the different parts we assumed an intuitive – problem-oriented – perspective moving from *Modeling, Simulation, and Meaning of Gene Networks* to *Inference of Gene Networks* and *Analysis of Gene Networks*.|XVIII|modeling, inference, analysis 687|Graur2000|Today we are witnessing what Jim Bull and Holly Wichmann called a "revolution in evolution." The revolution has been advanced in several fronts through "changes in technology, expansion of theory, and novel methodological approaches." Moreover, the field has now become socially, economically, and politically relevant.|3|molecular evolution, quantitative turn 688|Bull1998|Evolutionary biology has emerged from its 19th-century state; the image of naturalists collecting butterflies and museum curators dusting fossils has faded. Evolution is now widely perceived and appreciated as the organizing principle at all levels of life. This principle so pervades research that the evolutionary underpinning of many experimental approaches is unstated. For example, studies in the developmental genetics of fruit flies led to the discovery of genes that control development in all segmented organisms. Protein alignments are used to identify conserved amino acid residues that are potentially important to function. Even the hierarchical use of model organisms, from bacteria to humans, in biomedical testing and research is implicitly based on a recognition that the same building blocks underlie all life and that levels of increasing similarity are nested. [...] The revolution in evolutionary biology has been advanced on several fronts through changes in technology, expansion of theory, and novel methodological approaches. Technological advances in molecular genetics have provided insights into the deepest mechanistic secrets of evolution. Aided by advances in computer technology and phylogenetic theory, molecular genetics has also provided a universal tool for uncovering evolutionary histories. On another front, the classical Darwinian model of natural selection has given way to a more complex view; selection on the “selfish” gene affects organization at all levels, giving rise to parent-offspring, male-female, and intragenomic conflicts of interest. A new wave of advances is promised by experimental evolutionary biology, as theories are tested from direct observations of evolution in the laboratory, and results are assessed at the phenotypic, genetic, molecular, and structural levels. |1959|biology, quantitative turn, molecular evolution 689|Gevaudan2007|Very interesting work on lexical change. Invaluable for a more decent modeling of common problems in historical linguistics. * Chapter 1, Einführung, drei Dimensionen des lexikalischen Wandels, semantischer Wandel, morphologischer Wandel, Entlehnung, 11-29 * Chapter 2, * Chapter 5, "Entlehnung" (borrowing, *stratische Filiation*), 141-163 |000|lexical change 690|Wanger2013|Scribal evidence is of great value for the study of historical sociolinguistics as it is often the sole reliable source, if not the only source, of evidence for dia- chronic language change. Observing language change over time, we can see that scribes are both active agents of change but also have a more passive role when – perhaps unknowingly and unintentionally – documenting variation or develop- mental tendencies and patterns in language. Scribal evidence can also shed light on mechanisms of language change that would otherwise remain obscure, as for example when revealing the mutual influence of co-existing codes (see Berg 1 and Schiegg), or when showing how rapidly language change can take place (Dolberg). Thus, studying scribal evidence is an attempt at making visible the “invisible hand” (Keller 1994) of language change.|3|language change, spoken vernacular, written language, explanation of language change 691|Eigler2001|This is an official German translation of *Phaidon*, *Das Gastmahl*, and *Kratylos* by Platon.|000|Platon, language change, physei-thesei 692|Eigler2001|Ich denke nämlich, daß die Hellenen, zumal die in der Nähe der Barbaren wohnenden, gar viele Worte von den Barbaren angenommen haben.|409de|borrowing 693|Eigler2001|Aber weißt du denn nicht, du Schwieriger, daß die ursprünglichen Namen schon ganz zusammengeschmolzen worden sind von denen, welche sie prächtig machen wollten und nun Buchstaben darum hersetzten und andere herausnahmen des bloßen Wohlklangs wegen, so daß sie auf vielerlei Weise verdreht sind, teils der Verschönerung wegen, teils aus Schuld der Zeit.|414b|language change 694|Eigler2001|Zu sagen, wenn wir etwas nicht verstehen können, dies sei ein barbarisches und ausländisches Wort. Und vielleicht ist manches unter diesen in der Tat ein solches; es kann aber auch von ihrem Alter herrühren, daß die ersten Worte uns unerforschlich sind. Denn da die Worte so nach allen Seiten herumgedreht werden, wäre es wohl nicht zu verwundern, wenn sich die alte Sprache zu der jetzigen nicht anders verhielte als eine barbarische.|421c-d|language change 695|Paul1886|Sehen wir nun, wie sich bie dieser natur des objects die aufgabe des geschichtschreibers stellt. Der beschreibung von zuständen wird er nicht entraten können, da er es mit grossen complexen von gleichzeitig neben einander liegenden elementen zu tun hat. Soll aber diese beschreibung eine wirklich brauchbare unterlage für die historische betrachtung werden, so muss sie sich an die realen objecte halten, d. h. an die eben geschilderten psychischen organismen. [...] Um den zustand einer sprache vollkommen zu beschreiben, wäre es eigentlich erforderlich, an jedem einzelnen der sprachgenossenschaft angehörigen individuum das verhalten der auf die sprache bezüglichen vorstellungsmassen vollständig zu beobachten und die an den einzelnen gewonnenen resultate unter einander zu vergleichen. In wirklichkeit müssen wir uns mit etwas viel unvollkommenerem begnügen, was mehr oder weniger, immer aber sehr beträchtlich hinter dem ideal zurückbleibt. |26|language change 696|Paul1886|Suchen wir zunächst ganz im allgemeinen festzustellen: was ist die eigentliche ursache für die veränderungen des sprachusus? Veränderungen, welche durch die bewusste absicht einzelner individuen zu stande kommen sind nicht absolut ausgeschlossen. Grammatiker haben an der fixierung der schriftsprachen gearbeitet. Die terminologie der wissenschaften, künste und gewerbe ist durch lehrmeister, forscher und entdecker geregelt und bereichert. In einem despotischen reiche mag die laune des monarchen hie und da in einem punkte eingegriffen haben. Ueberwiegend aber hat es sich dabei nicht um die schöpfung von etwas ganz neuem gehandelt, sondern nur um die regelung eines punktes, in welchem der bebrauch noch schwankte, und die bedeutung dieser willkührlichen festsetzungen ist verschwindend gegenüber den langsamen, ungewollten und unbewussten veränderungen, denen der sprachusus fortwährend ausgesetzt ist. *Die eigentliche ursache für die veränderung des usus ist nichts anderes als die gewöhnliche sprechtätigkeit.* Bei dieser ist jede absichtliche einwirkunf auf den usus ausgeschlossen. Es wirkt dabei keine andere absicht als die auf das augenblickliche bedürfnis gerichtete, die ab[pb]sicht seine wünsche und gedanken anderen verständlich zu machen. Im übrigen spielt der zweck bei der entwicklung des sprachusus keine andere rolle als diejenige, welche ihm Darwin in der entwickelung der organischen natur angewiesen hat: die grössere oder geringere zweckmässigkeit der entstandenen gebilde ist bestimmend für erhaltung oder untergang derselben.|29f|language change 697|Paul1886|Wenn durch die sprechtätigkeit der usus verschoben wird, ohne dass dies von irgend jemand gewollt ist, so beruht das natürlich daraf, dass der usus die sprechtätigkeit nicht vollkommen beherrscht, sondern immer ein bestimmtes mass individueller freiheit übrig lässt. Die betätigung dieser individuellen freiheit wirkt zurück auf den psychischen organismus des sprechenden, wirkt aber zugleich auch auf den organismus der hörenden. Durch die summierung einer reihe solcher verschiebungen in den einzelnen organismen, wenn sie sich in der gleichen richtung bewegen, ergiebt sich dann als gesammtresultat eine verschiebung des usus heraus, der eventuell den alten verdrängt. Daneben gibt es eine menge gleichartiger verschiebungen in den einzelnen organismen, die, weil sie sich nicht gegenseitig stützen, keinen solchen durchschlagenden erfolg haben. Es ergibt sich demnach, dass sich die ganze principienlehre der sprachgeschichte um die frage concentriet: *wie verhält sich der sprachusus zur individuellen sprechtätigkeit?* wie wird diese durch jenen bestimmt udn wie wirkt sie umgekehrt auf ihn zurück?|30|language change, reasons for sound change 698|Campbell2007|A kind of analogical change in which speakers make an attempt to change a form from a less prestigious variety to make it conform with how it would be pronounced in a more prestigious variety but in the process overshoot the target so that the result is erroneous from the point of view of the prestige variety being mimicked. For example, *for you and I* (for Standard English *for you and me*) is a hypercorrection based on stigmatized use of me as subject pronoun in instances such as *Billy and me saw a rat* or *me and him chased the rat*.|79|hyper-correction, sound change 699|Ohala1989|:comment:`Hypocorrection is seen similar to phonologization, i.e. to a process where one fails to detect the appropriate rule and thus creates a new one.` From this we can conclude that when the development of distinctive vowel nasalization in French is described in the typical way as in (8) * (8) bon - - → bõ it is coded in a misleading way since it collapses two distinct processes. As given in (9) there is first a synchronic physical process of vowels being * (9) bon - - → bõn bon - - → bõ nasalized before a nasal consonant, and second, there was the failure to detect the nasal consonant which therefore resulted in its omission. |186|hypo-correction, sound change 700|Berzak2014|Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign lan- guage. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained di- rectly from the typological features of the native languages. We leverage this find- ing to recover native language typological similarity structure directly from ESL text, and perform prediction of typological fea- tures in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.|000|linguistic reconstruction, language typology, native language, foreign language 701|Jakielski1998|A consonant cluster is formed by sequential closant articulations. Clusters are one of the last phonetic constructs acquired by children, frequently misarticulated by individuals with disordered speech, most subject to impairment in acquired motor-based neurological disorders, and difficult for second language learners to acquire. No phonetic account of early cluster acquisition has been systematically explored. In the present study, longitudinal prelinguistic and linguistic data from five infants 7-36 months of age were phonetically transcribed. Data were analyzed to evaluate five production hypotheses. Results showed a significant group trend to form babbling and word clusters primarily from singletons in the phonetic inventory within the same utterance/word position. The majority of babbling clusters were produced in medial position, while the majority of word clusters were produced in initial position. A significant number of babbling clusters were homorganic; word clusters also were homorganic, although the trend did not reach statistical significance. In babbling, cluster-plus-vowel co-occurrence patterns showed significant associations. In words, consonant place of articulation was independent of vowel articulatory dimension in cluster-plus-vowel sequences. An index of cluster complexity did not reveal a change in cluster complexity from the beginning to the end of this study. Individual subjects did not show production patterns that differed considerably from the group trends. Overall, findings revealed that production constraints influence cluster acquisition in babbling, and that both production constraints and ambient language effects influence cluster acquisition in words. These findings provide preliminary evidence for a phonetic approach to understanding serial complexity in cluster acquisition.|000|phonetic complexity 702|Popov1996|A notion and a measure of linguistic complexity introduced earlier (Trifonov, 1990) were originally used for analysis of nucleotide sequences. This measure was shown to reflect multiplicity of codes (messages) of different natures superimposed in the sequences. Unlike human language texts, genetic texts are ‘read’ by cellular mechanisms in several different ways, each time using a different selection of the characters of the same text while skipping others (Trifonov, 1989). Human texts are read in one way only, sequentially and involving all characters (one code). The conceptual sig- nificance and essence of the idea on the multiplicity of overlapping codes in genetic sequences, as opposed to human languages, is discussed. The linguistic complexity technique allows a calculation to be made of the structural complexity of any linear sequence of characters irrespective of whether the text is cognized or presently undeciphered. The texts (sequences) are compared exclusively from the point of view of their structural complexity with no reference to the meaning of the texts which is beyond the scope of this article. Results of such a comparison of protein sequences with various texts, written in English, Italian and Welsh are presented. The human texts are found to be structurally simpler than genetic (protein) texts, reflecting, apparently, a difference in the reading modes: single code versus many codes.|000|linguistic complexity 703|Abramov2011|This article presents an approach to automatic language classification by means of linguistic networks. Networks of 11 languages were constructed from dependency treebanks, and the topology of these networks serves as input to the classification algorithm. The results match the genealogical similarities of these languages. In addition, we test two alternative approaches to automatic language classification – one based on n-grams and the other on quantitative typological indices. All three methods show good results in identifying genealogical groups. Beyond genetic similarities, network features (and feature combinations) offer a new source of typological information about languages. This information can contribute to a better understanding of the interplay of single linguistic phenomena observed in language.|000|genetic classification, dumb-ass approaches 704|Abramov2011|Article explores the use of treebanks for the task of language classification. Despite what the authors say, their approach does *not* recover common phylogenies of the languages, since they apply a lot of tuning in order to arrive at the correct phylogenies. So it is just another approach in which people try to reconstruct language phylogenies with features which have a low discriminative power.|000|genetic classification, dumb-ass approaches, discriminative power 705|Nowak2002|Language is our legacy. It is the main evolutionary contribution of humans, and perhaps the most interesting trait that has emerged in the past 500 million years. Understanding how darwinian evolution gives rise to human language requires the integration of formal language theory, learning theory and evolutionary dynamics. Formal language theory provides a mathematical description of language and grammar. Learning theory formalizes the task of language acquisition—it can be shown that no procedure can learn an unrestricted set of languages. Universal grammar specifies the restricted set of languages learnable by the human brain. Evolutionary dynamics can be formulated to describe the cultural evolution of language and the biological evolution of universal grammar.|000|language change, syntax, formal language theory, linguistic complexity 706|Nowak2002|Interesting article that explains nicely the aspects of grammar and syntactic complexity, especially in the light of Chomsky's hierarchies, which are not often explained in such an interesting manner. |000|Chomsky hierarchy, syntax, language model 707|Patterson2013|**Background:** Models of ancestral gene order reconstruction have progressively integrated different evolutionary patterns and processes such as unequal gene content, gene duplications, and implicitly sequence evolution via reconciled gene trees. These models have so far ignored lateral gene transfer, even though in unicellular organisms it can have an important confounding effect, and can be a rich source of information on the function of genes through the detection of transfers of clusters of genes. **Result:** We report an algorithm together with its implementation, DeCoLT, that reconstructs ancestral genome organization based on reconciled gene trees which summarize information on sequence evolution, gene origination, duplication, loss, and lateral transfer. DeCoLT optimizes in polynomial time on the number of rearrangements, computed as the number of gains and breakages of adjacencies between pairs of genes. We apply DeCoLT to 1099 gene families from 36 cyanobacteria genomes. **Conclusion:** DeCoLT is able to reconstruct adjacencies in 35 ancestral bacterial genomes with a thousand gene families in a few hours, and detects clusters of co-transferred genes. DeCoLT may also be used with any relationship between genes instead of adjacencies, to reconstruct ancestral interactions, functions or complexes. **Availability:** http://pbil.univ-lyon1.fr/software/DeCoLT/|000|gene tree reconciliation, reconciliation, lateral gene transfer, LGT detection 708|Nakhleh2013|An intricate relation exists between gene trees and species phylogenies, due to evolutionary processes that act on the genes within and across the branches of the species phylogeny. From an analytical perspective, gene trees serve as character states for inferring accurate species phylogenies, and species phylogenies serve as a backdrop against which gene trees are contrasted for elucidating evolutionary processes and parameters. In a 1997 paper, Maddison discussed this relation, reviewed the signatures left by three major evolutionary process- es on the gene trees, and surveyed parsimony and likelihood criteria for utilizing these signatures to eluci- date computationally this relation. Here, I review prog- ress that has been made in developing computational methods for analyses under these two criteria, and survey remaining challenges.|000|lateral gene transfer, gene tree reconciliation, reconciliation, LGT detection 709|Smith2013|Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.|000|tree alignment graph, network, phylogenetic network, reconciliation, visualization 710|Smith2013|Interesting article, which was introduced in Morrison's blogpost: * _http://phylonetworks.blogspot.com/2014/08/tree-alignment-graphs-and-data-display.html The idea seems to be fairly simple, and it should be easy to implement it in LingPy. The interesting thing is that despite the simplicity of the idea, this seems to be a fresh and interesting approach to phylogenetic analyses.|000|family tree, reconciliation, network, phylogenetic network, visualization 711|Wiedenbeck2008|In this paper, we apply algorithms for defining regions from sets of points to the problem of drawing isoglosses, the bound- aries between dialect regions. We discuss the justifications for our method, and al- ternative models that could be constructed from this data. We evaluate the resultant model by comparison to the traditional method of drawing isoglosses, by hand.|000|isoglosses, isogloss, dialect classification, algorithms, visualization 712|Winter2014|As with biological systems, spoken languages are strikingly robust against perturbations. This paper shows that languages achieve robustness in a way that is highly similar to many biological systems. For example, speech sounds are encoded via multiple acoustically diverse, temporally distributed and functionally redundant cues, characteristics that bear similarities to what biologists call ‘‘degeneracy’’. Speech is furthermore adequately char- acterized by neutrality, with many different tongue con- figurations leading to similar acoustic outputs, and different acoustic variants understood as the same by recipients. This highlights the presence of a large neutral network of acoustic neighbors for every speech sound. Such neutrality ensures that a steady backdrop of variation can be maintained without impeding communication, assuring that there is ‘‘fodder’’ for subsequent evolution. Thus, studying linguistic robustness is not only important for understanding how linguistic systems maintain their functioning upon the background of noise, but also for understanding the preconditions for language evolution.|000|robustness, spoken vernacular, language change, degeneracy, complex adaptive system 713|Kay1964|The "comparative method" is one of the principal tools of historical linguistics, but it is not the well-defined technique that the name suggests. This paper presents a formalization in terms of elementary propositional logic of one of the most crucial steps in the comparative method, namely, that in which modern derivatives of prehistoric phonemes are recognized. The basic assumption on which the theory rests is that the words of a hypothetical prehistoric language should be constructed in such a way as to minimize the total number of phonemes in the language and of the statements that need to be made to account for the forms of the modern words. The theory is sufficiently specific to provide an algorithm for a computer program. However, the amount of computation rises sharply with any increase in the amount of data to be considered, and with ipresent techniques it is prohibitive even for trivially small data sets. Nevertheless, the theory provides a basis on which more efficient heuristic procedures might be built.|000|comparative method, cognate detection, cognates, algorithms 714|Kay1964|:comment:`Quoting work of` @Hoenigswald1960 :comment:`, who assumes that the step of cognate identification is already been done in his outline of the comparative method.` As we have pointed out, the problem of deciding what is iniherited is distinguishable from that of actually making reconstructions only in the special cases where results can be confirmed from historical records. To pronounce a set of forms related is precisely to provide plausible reconstructions which fall within the scope of generally applicable rules.|5|comparative method, cognates, cognate detection 715|Kay1964|We shall confine the following discussioni to cases where only two modern languages are to be considered. This eases the exposition and also the preliminary experiments in implementing the theory. However, little generality is lost, for to extend the method to any number of languages requires only trivial modifications to the definitions of correspondences and decomposition.|5|cognate detection, comparative methode shall 716|Kay1964|:comment:`Mentions decomposition as a way to identify sound correspondences in cognate words. Essentially, this is close to alignment analyses, although not mentioned by the author. In contrast to alignment analyses, decompositions allow for more freedom, since they are alignments allowing for compression and expansion.` Consider now the correspondence 'that/dass', consisting of a pair of vocabulary items from English and German. It can be decomposed in twenty different ways, but only one of these has a correspondence for each Indo-European phoneme, namely: * th/d a/a t/ss The main problem still outstanding is to discover a satisfactory criterion for distinguishing this from the nineteen other decompositions.|8|phonetic alignment, comparative method, cognate detection 717|Hoenigswald1960|Famous attempt to formalize the comparative method. Provides many interesting insights, especially into linguistic reconstruction. However, as others note, the question of #cognate_detection is largely ignored in the work, although it is the most essential one.|000|comparative method, formal linguistics, historical linguistics 718|Morrison2014a|The use of phylogenetic methods in anthropological fields such as archaeology, linguistics and stemmatology (involving what are often called ?culture data?) is based on an analogy between human cultural evolution and biological evolution. We need to understand this analogy thoroughly, including how well anthropology data fit the model of a phylogenetic tree, as used in biology. I provide a direct comparison of anthropology datasets with both phenotype and genotype datasets from biology. The anthropology datasets fit the tree model approximately as well as do the genotype data, which is detectably worse than the fit of the phenotype data. This is true for datasets with \<500 parsimony-informative characters, as well as for larger datasets. This implies that cross-cultural (horizontal) processes have been important in the evolution of cultural artifacts, as well as branching historical (vertical) processes, and thus a phylogenetic network will be a more appropriate model than a phylogenetic tree.|000|biological parallels, historical linguistics, phylogenetic reconstruction, phylogenetic network 719|Thiergart2014|Network modularity is a well-studied large-scale connectivity pattern in networks. The detection of modules in real networks constitutes a crucial step towards a description of the network building blocks and their evolutionary dynamics. The performance of modularity detection algorithms is commonly quantified using simulated networks data. However, a comparison of the modularity algorithms utility for real biological data is scarce. Here we investigate the utility of network modularity algorithms for the classification of ecological plant communities. Plant community classification by the traditional approaches requires prior knowledge about the characteristic and differential species, which are derived from a manual inspec- tion of vegetation tables. Using the raw species abundance data we constructed six different networks that vary in their edge definitions. Four network modularity algorithms were examined for their ability to detect the traditionally recognized plant communities. The use of more restrictive edge definitions significantly increased the accuracy of community detection, that is, the correspondence between network-based and traditional community classification. Random-walk based modularity methods yielded slightly better results than approaches based on the modularity function. For the whole network, the aver- age agreement between the manual classification and the network-based modules is 76% with varying congruence levels for different communities ranging between 11% and 100%. The network-based approach recovered the known ecological gradient from riverside – sand and gravel bank vegetation – to dryer habitats like semidry grassland on dykes. Our results show that networks modularity algorithms offer new avenues of pursuit for the computational analysis of species communities.|000|community detection, community structure, network modularity, network 720|Lupyan2010|**Background** Languages differ greatly both in their syntactic and morphological systems and in the social environments in which they exist. We challenge the view that language grammars are unrelated to social environments in which they are learned and used. **Methodology/Principal Findings** We conducted a statistical analysis of >2,000 languages using a combination of demographic sources and the World Atlas of Language Structures— a database of structural language properties. We found strong relationships between linguistic factors related to morphological complexity, and demographic/socio-historical factors such as the number of language users, geographic spread, and degree of language contact. The analyses suggest that languages spoken by large groups have simpler inflectional morphology than languages spoken by smaller groups as measured on a variety of factors such as case systems and complexity of conjugations. Additionally, languages spoken by large groups are much more likely to use lexical strategies in place of inflectional morphology to encode evidentiality, negation, aspect, and possession. Our findings indicate that just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. As adults learn a language, features that are difficult for them to acquire, are less likely to be passed on to subsequent learners. Languages used for communication in large groups that include adult learners appear to have been subjected to such selection. Conversely, the morphological complexity common to languages used in small groups increases redundancy which may facilitate language learning by infants. **Conclusions/Significance** We hypothesize that language structures are subjected to different evolutionary pressures in different social environments. Just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. The proposed Linguistic Niche Hypothesis has implications for answering the broad question of why languages differ in the way they do and makes empirical predictions regarding language acquisition capacities of children versus adults.|000|linguistic complexity, correlational studies, language change, language contact 721|Hay2007|This short report investigates the relationship between population size and phoneme inventory size, and finds a surprisingly robust correlation between the two. The more speakers a language has, the bigger its phoneme inventory is likely to be. We show that this holds for both vowel inventories and consonant inventories. It is not an artifact of language family. |000|linguistic complexity, phonetic complexity, correlational studies 722|Hay2007|Atkinson 2011 finds a significant positive correlation between population size and phoneme inventory size (confirming Hay & Bauer 2007) and explains it by migration: phoneme sizes are largest in Africa, and as societies spread out of Africa and around the world they went through population and cultural bottle- necks and underwent phonological simplification as a consequence. We believe the correlation is artefactual if it exists at all, and it probably does not exist.|000|phonetic complexity, correlational studies, linguistic complexity, phoneme inventory size 723|Nettle1995|Functional theories of language structure predict that as the number of contrastive segments in a language increases, the average length of a word will decrease. This relationship is found to hold for a sample of ten lan- guages, and to fit the synergetic model Y=aX b . The average length of a word is approximately 7±2 segments. This corresponds to the proposed capacity of working memory.|000|phoneme inventory size, population size, communicative efficiency, correlational studies 724|Eger2013|We provide simple generalizations of the classical Needleman–Wunsch algorithm for align- ing two sequences. First, we let both sequences be defined over arbitrary, potentially differ- ent alphabets. Secondly, we consider similarity functions between elements of both sequences with ranges in a semiring. Thirdly, instead of consider ing only ‘match’, ‘mismat ch’ and ‘skip’ operation s, we allow arbitrary non-negative alignment ‘steps’ S. Next, we present novel combinatorial formulas for the number of monotone alignments between two sequences for selected steps S. Finally, we illustrate sample applications in natural language processing that require larger steps than available in the original Needleman–Wunsch sequence alignment procedure such that our generalizations can be fruitfully adopted.|000|sequence alignment, phonetic alignment, many-to-many alignment 725|Eger2013|Interesting article proposes solutions (based on training, though), for the *different alphabets problem* in historical linguistics. Their method, however, is not directly applicable to phonetic alignment for the purpose of cognate detection, but better for cases of transliteration, and the like, where the mapping is known to be non-defective.|000|sequence alignment, phonetic alignment 726|SatterthweitePhillips2011|It has been argued that lexicostatistical methods cannot be fruitfully applied to the Tibeto-Burman (or the larger Sino-Tibetan) language family. This dissertation first develops a statistical method for determining linguistic cognates (or sound correspondences) between arbitrarily many languages. This method is then applied to a subset of the Sino-Tibetan languages to infer the correspondences and further develops a method for lexicostatistical analysis for subgrouping those languages. The output of these methods agrees very well with that determined by the traditional comparative method. Furthermore, this method yields both greater resolution of the subgroupings, and provides statistical inference of the confidence in the various groupings The primary objective of this dissertation is to infer the historical relationship—or phylogeny—of several languages belonging to the Tibeto-Burman language family. However, in the interest of developing methods with broader applications, the methods developed and used herein are made general enough to apply to number of languages with minimal modification necessary. The organization of this dissertation is as follows. Chapters two through four are introductory. The second chapter will briefly review existing methods for phylogenetic inference in linguistics, including the comparative method and glottochronology, paying particular attention to problems inherent to each. The third chapter will introduce the reader to field of Tibeto-Burman language studies, providing an overview of the languages, number of speakers, and geographic distribution. This chapter will then go on to describe linguistic features that are particular to Tibeto-Burman languages, and that have made the subgrouping of these languages especially difficult. In particular, it will address the difficulties in applying existing methods. The fourth chapter will be a brief review of phylogenetic methods, with particular attention to the underlying assumptions in each, as well as the strengths and weaknesses of each.|000|Sino-Tibetan, Tibeto-Burman, phylogenetic reconstruction, cognate detection, historical linguistics 727|SatterthweitePhillips2011|Interesting thesis, uses a simplified approach for cognate detection, based on syllable-initial consonants in selected morphemes only. Also employs #sound_classes in a certain way, since the author explains in Chapter 6 that specific aspects, such as aspiration or voicing, are ignored in the approach. Key data of the thesis are some 20 Sino-Tibetan languages and it seems that the underlying data was created using IPA transcription. |000|historical linguistics, Sino-Tibetan, Tibeto-Burman, cognate detection, phylogenetic reconstruction 728|SatterthweitePhillips2011|:comment:`Author specifically talks about the problem of` #partial_cognacy :comment:`: Later, author quotes an example for "head" by` @Matisoff2000 :comment:`where the problem seems to be further elaborated.` Sino-Tibetan languages however, present an additional problem as described briefly in Matisoff’s criticism of glottochronology above. I return to this now to explain how it may be dealt with. Since the time of the ancestral Proto-Sino-Tibetan, most of the ST languages have become increasingly polysyllabic, constructing many new words by joining two or more monosyllabic words. Mandarin Chinese offers many of the better-known examples. The word 身體 *shenti* ‘body’ is composed of *shen* and *ti*, either of which would have occurred alone in Classical Chinese. 42 Similarly the word for ‘ear’, 耳朵 *erduo*, was simple *er* in the Classical language, and [pb] *duo* meant flower. The problem though, is that different languages paired words differently in their histories. [...]|99f|compounding, partial cognacy, Sino-Tibetan, Tibeto-Burman 729|SatterthweitePhillips2011|As a brief aside, note that this phenomenon is much less common in IE, though analogous cases can be made if we consider just phonemes rather than whole morphemes. For example consider the following Italian : Spanish pairs: *stare : estar* ‘to be’, *studiare : estudiar* ‘to study’, *strella : estrella* ‘star’, *scalare : escalar* ‘to climb’. The method described above would fail to recognize the s+consonant pairings in each of these. Thus the revised method given below could also be applied to search for phoneme matches anywhere in a given word.|101|partial cognacy, Sino-Tibetan, Indo-European 730|SatterthweitePhillips2011|:comment:`Explains the method of cognate detection applied in the thesis. This is not very specific, unfortunately, since it only relies on comparing all syllable pairs, instead of infering any specific histories of compounding.` In short this means that actual cognate correspondences might occur in any given syllable, and the method must be updated to reflect this. [...] Because we have no a priori reason to indicate which positions the actual pairs are likely to occur in, all such pairings as shown here must be considered: [pb] [...] However, one more adjustment remains, because, as it stands, this method would give more weight to words with more syllables. [...] Since we do not want *head* to provide six times the weight of our hypothetical *pa : pe*, but want each word to contribute equal weight, all such entries must be adjusted so that the total row and column values sum to one. This is done by dividing the desired marginal value (1) [pb] into the number of cells in that row/column (= number of syllables in the longest word).|101-103|partial cognacy, sound correspondences, cognate detection, Sino-Tibetan, Tibeto-Burman 731|Matisoff2000|[...] Few linguists nowadays rely exclusively on the traditional sort of lexicostatistics -- i.e. the use of a 100- or 200-item list of 'core vocabulary' -- in order to establish a genetic relationship among languages not already recognized as related. The method has seemed somewhat more useful for the subgrouping of a well-established language family. It is my contention, however, that glottochronology is quite useless when applied to the subgrouping of Tibeto-Burman (TB), for many reasons. We will return to these TB-specific problems [...]|000|Tibeto-Burman, Sino-Tibetan, glottochronology, lexicostatistics, phylogenetic reconstruction 732|deGroot1956|Adrianus Dingeman de Groot (1914–2006) was one of the most influential Dutch psychologists. He became famous for his work “Thought and Choice in Chess”, but his main contribution was methodological — De Groot co- founded the Department of Psychological Methods at the University of Ams- terdam (together with R. F. van Naerssen), founded one of the leading testing and assessment companies (CITO), and wrote the monograph “Methodol- ogy” that centers on the empirical-scientific cycle: observation–induction– deduction–testing–evaluation. Here we translate one of De Groot’s early ar- ticles, published in 1956 in the Dutch journal Nederlands Tijdschrift voor de Psychologie en Haar Grensgebieden. This article is more topical now than it was almost 60 years ago. De Groot stresses the difference between ex- ploratory and confirmatory (“hypothesis testing”) research and argues that statistical inference is only sensible for the latter: “One ‘is allowed’ to ap- ply statistical tests in exploratory research, just as long as one realizes that they do not have evidential impact”. De Groot may have also been one of the first psychologists to argue explicitly for preregistration of experiments and the associated plan of statistical analysis. The appendix provides an- notations that connect De Groot’s arguments to the current-day debate on transparency and reproducibility in psychological science.|000|statistics, significance 733|Hull1988|In a paper published in 1961, Ehrlich (1961a) made waht he knew would be unpopular predictions for systematics in 1970: electronic data-processing equipment would be the systematist's most important tool, nomenclature would be deemphasized, and traditional taxonomic monographs would largely be replaced by computer prinouts of data matrices. At the St. Louis meeting, when one taxonomist asked indignantly, "You mean to tell me that taxonomists can be replaced by computers?" Ehrlich responded, "No, some of you can be replaced by an abacus." Thereafter, Ehrlich did not consider the give-and-take after a paper truly sucessful unless he brought at least one taxonomist to the point of tears. |121|systematics, biology, taxonomy, quantitative turn 734|Hull1988|* Applies evolutionary models to the cultural and conceptual change of intellectual communities. Essential reading for anyone interested in how ideas evolve, and how best to describe these processes rigorously. * "Legend is overdue for replacement, and an adequate replacement must attend to the process of science as carefully as Hull has done. I share his vision of a serious account of the social and intellectual dynamics of science that will avoid both the rosy blur of Legend and the facile charms of relativism. . . . Because of [Hull's] deep concern with the ways in which research is actually done, Science as a Process begins an important project in the study of science. It is one of a distinguished series of books, which Hull himself edits."—Philip Kitcher, Nature * "In Science as a Process, [David Hull] argues that the tension between cooperation and competition is exactly what makes science so successful. . . . Hull takes an unusual approach to his subject. He applies the rules of evolution in nature to the evolution of science, arguing that the same kinds of forces responsible for shaping the rise and demise of species also act on the development of scientific ideas."—Natalie Angier, New York Times Book Review * "By far the most professional and thorough case in favour of an evolutionary philosophy of science ever to have been made. It contains excellent short histories of evolutionary biology and of systematics (the science of classifying living things); an important and original account of modern systematic controversy; a counter-attack against the philosophical critics of evolutionary philosophy; social-psychological evidence, collected by Hull himself, to show that science does have the character demanded by his philosophy; and a philosophical analysis of evolution which is general enough to apply to both biological and historical change."—Mark Ridley, Times Literary Supplement |000|science, evolution, cultural evolution, biological evolution, quantitative turn 735|Maddison1997|Exploration of the relationship between gene trees and their containing species trees leads to consideration of how to reconstruct species trees from gene trees and of the concept of phylogeny as a cloud of gene histories. When gene copies are sampled from various species, the gene tree relating these copies might disagree with the species phylogeny. This discord can arise from horizontal transfer (including hybridization), lineage sorting, and gene duplication and extinction. Lineage sorting could also be called deep coalescence, the failure of ancestral copies to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events. These events depend on various factors; for instance, deep coalescence is more likely if the branches of the species tree are short (in generations) and wide (in population size). A similar dependence on process is found in historical biogeography and host-parasite relationships. Each of the processes of discord could yield a different parsimony criterion for reconstructing the species tree from a set of gene trees: with horizontal transfer, choose the species tree that minimizes the number of transfer events; with deep coalescence, choose the tree minimizing the number of extra gene lineages that had to coexist along species lineages; with gene duplication, choose the tree minimizing duplication and/or extinction events. Maximum likelihood methods for reconstructing the species tree are also possible because coalescence theory provides the probability that a particular gene tree would occur given a species tree (with branch lengths and widths specified). In considering these issues, one is provoked to reconsider precisely what is phylogeny. Perhaps it is misleading to view some gene trees as agreeing and other gene trees as disagreeing with the species tree; rather, all of the gene trees are part of the species tree, which can be visualized like a fuzzy statistical distribution, a cloud of gene histories. Alternatively, phylogeny might be (and has been) viewed not as a history of what happened, genetically, but as a history of what could have happened, i.e., a history of changes in the probabilities of inter-breeding. |000|reconciliation, gene tree reconciliation 736|Page2002|Ancient gene duplication events have left many traces in vertebrate genomes. Reconciled trees represent the differences between gene family trees and the species phylogeny those genes are sampled from, allowing us to both infer gene duplication events and estimate a species phylogeny from a sample of gene families. We show that analysis of 118 gene families yields a phylogeny of vertebrates largely in agreement with other data. We formulate the problem of locating episodes of gene duplication as a set cover problem: given a species tree in which each node has a set of gene duplications associated with it, the smallest set of species nodes whose union includes all gene duplications specifies the locations of gene duplication episodes. By generating a unique mapping from this cover set we can determine the minimal number of such episodes at each location. When applied to our data, this method reveals a complex history of gene duplications in vertebrate evolution that does not conform to the "2R" hypothesis.|000|gene tree reconciliation, reconciliation 737|Maurits2014|The ordering of subject, verb, and object is one of the fundamental components of the syntax of natural languages. The distribution of basic word orders across the world’s languages is highly non- uniform, with the majority of languages being either subject- object-verb (SOV) or subject-verb-object (SVO). Explaining this fact using psychological accounts of language acquisition or processing requires understanding how the present distribution has resulted from ancestral distributions and the rates of change between orders. We show that Bayesian phylogenetics can provide quantita- tive answers to three important questions: how word orders are likely to change over time, which word orders were dominant his- torically, and whether strong inferences about the origins of syntax can be drawn from modern languages. We find that SOV to SVO change is more common than the reverse and VSO to SVO change is more common than VSO to SOV, and that if the seven language families we consider share a common ancestor then that common ancestor likely had SOV word order, but also that there are limits on how confidently we can make inferences about ancestral word or- der based on modern-day observations. These results shed new light on old questions from historical linguistics and provide clear targets for psychological explanations of word-order distributions.|000|syntax, language evolution 738|Naranan2011|We have studied the rank frequency distribution (RFD) of letters of the alphabet in Tamil language texts. In a novel application of rank frequencies, we have defined a simple intuitive distance parameter between a pair of strings (text or DNA sequence of codons). This distance correlates well with age difference in historical linguistics and evolutionary genetics. Using a distance matrix of a set of strings, we derive evolutionary trees that are broadly in agreement with historical evidence. The method has potential for refinement and application in evolutionary studies to complement other approaches to evolution. The RFD in a single string conforms to a law called the CMPL (Cumulative Modified Power Law), which we had formulated and applied to RFD’s of diverse symbol sets.|000|rank frequencies, DNA sequence, alphabet, Tamil, phylogenetic reconstruction 739|Harrison2006|The aim of this paper is to enable those who have never reconstructed a phylogeny to do so from scratch. The paper does not attempt to be a comprehensive theoretical guide, but describes one rigorous way of obtaining phylogenetic trees. Those who follow the methods outlined should be able to understand the basic ideas behind the steps taken, the meaning of the phylogenetic trees obtained and the scope of questions that can be answered with phylogenetic methods. The protocols have been successfully tested by volunteers with no phylogenetic experience.|000|phylogenetic reconstruction, guide, 740|Collier2014|Phonology and syntax represent two layers of sound combination central to language’s expressive power. Comparative animal studies represent one approach to understand the origins of these combinatorial layers. Traditionally, phonology, where meaningless sounds form words, has been considered a sim- pler combination than syntax, and thus should be more common in animals. A linguistically informed review of animal call sequences demonstrates that phonology in animal vocal systems is rare, whereas syntax is more widespread. In the light of this and the absence of phonology in some languages, we hypothesize that syntax, present in all languages, evolved before phonology.|000|language evolution, origin of language 741|Redelings2005|We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm’s mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.|000|multiple sequence alignment, Bayesian approaches, phylogenetic reconstruction 742|Krebernik2007|.. image:: static/img/krebernik-2007.jpg :name: homophony :width: 700px [Krebernik, Zur Entwicklung des Sprachbewusstseins im Alten Orient in: Wilcke (Hrsg.), Das geistige Erfassen der Welt im Alten Orient (2007) S. 42, see also @Edzard2006 ]|42|homophony, genetic relationship 743|Edzard2006|.. image:: static/img/edzard-2006.jpg :name: homophony :width: 700px [Edzard, Das Ebla-akkadische als Teil des altakkadischen Dialektkontinuums in: Deutscher / Kouwenberg, The Akkadian Language in its Semitic Context (2006) S. 81f., see also @Krebernik2007 ]|81f|genetic relationship, homophony 744|Dellert2014|Recently, cross-linguistic polysemies have been discovered as a valuable resource for lexical semantics. Co-occurrences of gloss lexemes in multilingual wordlists can be summarized into polysemy networks, a new type of semantic resource. In this work, this relatively new paradigm is applied to an entire dictionary database, resulting in a polysemy network that spans more than 30,000 German lexemes. Within computational historical linguistics, polysemy networks have been discussed as a possible computational model of semantic change, where short paths in such networks are expected to express possible semantic shifts. To assess the validity of the polysemy network for this purpose, I apply it to the task of finding cognate candidates, i.e. words in related languages which might have developed from the same word in a common ancestor language. On a test set of cognates shared between Finnish and Hungarian, I investigate the number of true cognate pairs whose translations are connected in the polysemy network by shortest paths of different lengths. The results are very promising, providing evidence that cross-linguistic polysemies are indeed closely connected to plausible semantic shifts.|000|colexification, polysemy, network 745|Altschul1990|A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent matlmmatical results on the stochastic properties of MSP scores allow an analysis of the perfornmnee of tiffs method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene idc~ltification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitnde faster than existing sequence comparison tools of comparable sensitivity.|000|BLAST, local alignment, sequence alignment, database, database search 746|Altschul1997|The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments pro- duced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI- BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.|000|BLAST, local alignment, iterative procedures, sequence alignment, database search 747|Brinton1891|For that reason I have thought it worth while to bring together a short list of common words, and show their renderings in a number of American tongues.|333|basic vocabulary, concept list 748|Sankoff1969|In 1950 Swadesh proposed a method for measuring the rate of change of a language in terms of the frequency of replacement of wrods having a given meaning. This evolved into a theory of lexicostatistics which claimed to justify a technique for estimating the time of separation of two related languages and techniques for classifying languages according to their genetic relationships. This theory has come under sharp attack, due mainly to the simplicstic model involved and also to the degree of variability of the estimates produced. In this study we construct a more refined theory based on a mathematical linguistics model of word-meaning relationship. This model is not subject to the criticisms of the earlier model, but analytit and computer simulation studies show that the lexicostatistic properties hold with respect to this theory. To apply some of the refinements of thsi theory, an extensive data processing project is reported, involving the Indo-European languages. A number of related problems are discussed, with emphasis on the applicability of the stochastic processes approach.|000|language model, lexical change, lexicostatistics, modeling 749|Sankoff1969|Table of contents: 1. Lexicostatistics 2. Word-meaning systems and processes 3. Linguistic and semantic implications 4. The lexicostatistics of Indoeuropean 5. Lexicostatistics and the genetic classification of languages 6. Summary, conclusions, and suggestions for future work |000|lexicostatistics, language model, lexical change, modeling 750|Ellison2006|This paper presents a method for build- ing genetic language taxonomies based on a new approach to comparing lexi- cal forms. Instead of comparing forms cross-linguistically, a matrix of language- internal similarities between forms is cal- culated. These matrices are then com- pared to give distances between languages. We argue that this coheres better with current thinking in linguistics and psy- cholinguistics. An implementation of this approach, called PHILOLOGICON , is de- scribed, along with its application to Dyen et al.’s (1992) ninety-five wordlists from Indo-European languages.|000|language comparison, language model, internal reconstruction, morpheme detection 751|Ellison2006|One psychologically well-grounded way of describing the similarity of words is in terms of their *confusion probabilities*. Two words have high confusion probability if it is likely that one word could be produced or understood when the other was intended. This type of confusion can be measured experimentally by giving subjects words in noisy environments and measuring what they apprehend.|274|confusion probability, language-internal comparison 752|Ellison2006|For example, the *neighbourhood activation model* (NAM) (Luce et al., 1990; Luce and Pisoni, 1998) predicts confusion probabilities from the relative frequency of words in the neighbourhood of the target. Words are in the neighbourhood of the target if their Levenstein (1965) edit distance from the target is one. The more frequent the word is, the greater its likelihood of replacing the target.|275|neighbourhood activation model, Levenshtein distance, edit distance, confusion probability 753|Ellison2006|They use simple normalized edit distances (normalized by average word length) to derive an initial value for intra-language comparison. They integrate this value into a larger framework of confusion probabilities which are derived from this initial metric. The result is a language-specific similarity score between all words in a sample. The interesting aspect of this approach is that it yields one of the best Dyen trees I have ever seen so far. The authors also mention that it should be possible to use their system for cognate detection, which is something one should definitely look into at some point in the future. Unfortunately, the software is not available online, yet it may be possible to ask the authors whether it is possible to get it. |000|edit distance, confusion probability, language-internal comparison 754|Ellison2006|:comment:`The PHILOLOGICON algorithm:` [...] the PHILOLOGICON algorithm for measuring the divergence between two languages has the following steps: 1. determine their joint confusion probability matrices, *P* and *Q*, 2. substitute these into equation (7), equation (8) and equation (11) to calculate *k* (0), *k* (0.5), *k* (1), *k'* (0.5), and *k''* (0.5), 3. and put these into equation (1) and equation (14) to calculate the KL and RAo distances between the languages. :comment:`So in the end, it is all about distances here...`|277|edit distance, algorithms, language-internal comparison 755|Krishnamurti1983|If a sound change has lexically diffused without completing its course, one finds that among the lexical items qualified for the change, some have already changed (c), others have remained unchanged (u), and still others show variant forms (u/c). When such a change has affected a group of genetically related languages, the consequent comparative pattern u-ulc-c can be used to set up subrelations among languages. In this paper, we draw on data from six languages belonging to the South-Central subfamily of Dra- vidian, with reference to an atypical sound change called 'apical displacement'. There are 63 etymologies which qualify for the study. A total of 945 possible binary-labeled trees fall into six types for the six languages under study. In terms of our postulates, that tree is the best which scores the lowest m, i.e. the minimum number of independent instances of change needed to account for the u-c-o (o = no cognate) pattern of a given entry. Each of the 63 entries has been applied to the possible 945 trees, and the trees have been scored for the value m by computer. The one tree which scored the lowest (71 points) is identical with the traditionally established tree for these languages. This paper shows that: (a) one shared innovation is sufficient to give genetic subrelations among languages, within the framework of the theory of lexical diffusion; (b) unchanged cognates are as important as changed cognates in giving differential scores for possible trees; and (c) the notion of shared innovation can be further refined within the theory of lexical diffusion|000|subgrouping, phylogenetic reconstruction, cognates 756|GutuRomalo1959|In the numerous cases in which a word might be either verb or substantive, we have translated it, most of the time, by a verb.|58|lexicostatistics, glottochronology, Swadesh list 757|Wu1994a|We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-speci c and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.|000|tokenization, word segmentation, Chinese, annotation, evaluation, computer-aided approaches 758|Wu1994b|The first step in Chinese NLP is to tokenize or segment character sequences into words, since the text contains no word delimiters. Recent heavy activity in this area has shown the biggest stumbling block to be words that are absent from the lexicon, since successful tokenizers to date have been based on dictionary lookup (e.g., Chang &Chen 1993; Chiang et al. 1992; L i n e t al. 1993; Wu & Tseng 1993; Sproat et al. 1994). We present empirical evidence for four points concerning tokenization of Chinese text: (I) More rigorous "blind" evaluation methodology is needed to avoid inflated accuracy measurements; we introduce the nk-blind method. (2) The extent of the unknown-word problem is far more serious than generally thought, when tokenizing unrestricted texts in realistic domains. (3) Statistical lexical acquisition is a practical means to greatly improve tokenization accuracy with unknown words, reducing error rates as much as 32.0%. (4) When augmenting the lexicon, linguistic constraints can provide simple inexpensive filters yielding significantly better precision, reducing error rates as much as 49.4%.|000|tokenization, word segmentation, Chinese, computer-aided approaches, evaluation 759|DeMillo1979|It is argued that formal verifications of programs no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics. Furthermore, the absence of continuity, the inevitability of change, and the complexity of specification of significantly many real programs make the formal verification process difficult ot justify and manage. It is felt that ease of formal verification should not dominate program language design. |000|formal mathematics, program verification, evaluation, computer-aided approaches 760|Bochkarev2014|The frequency with which we use different words changes all the time, and every so often, a new lexical item is invented or another one ceases to be used. Beyond a small sample of lexical items whose properties are well studied, little is known about the dynamics of lexical evolution. How do the lexical inventories of languages, viewed as entire systems, evolve? Is the rate of evolution of the lexicon contingent upon historical factors or is it driven by regularities, perhaps to do with universals of cognition and social inter- action? We address these questions using the Google Books N-Gram Corpus as a source of data and relative entropy as a measure of changes in the fre- quency distributions of words. It turns out that there are both universals and historical contingencies at work. Across several languages, we observe similar rates of change, but only at timescales of at least around five decades. At shorter timescales, the rate of change is highly variable and differs between languages. Major societal transformations as well as catastrophic events such as wars lead to increased change in frequency distributions, whereas stability in society has a dampening effect on lexical evolution.|000|lexical change, stability, lexical replacement 761|Magee2014|The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation – extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Importantly, our survey spans recent policy initiatives and infrastructural changes; our analyses indicate that the positive impact of these community initiatives has been both dramatic and immediate. Although the results of our study indicate that the situation is dire, our findings also reveal tremendous recent progress in the sharing and preservation of phylogenetic data.|000|open access, data sharing, data exchange 762|Magee2014|The paper is thoroughly discussed in this blog post: * http://phylonetworks.blogspot.de/2014/11/archiving-of-phylogenetics-data.html Since it is related to similar problems in historical linguistics, it is probably worth keeping it in mind.|000|open access, data sharing, data exchange 763|Kuemmel2008|Ein weiteres wichtiges, wenn auch kaum vollständig erreichbares Ziel der Arbeit ist es auch, auf eine Vereinheitlichung der temrinologie und der Schreibweisen hinzuarbeiten. Allein die in diesen Punkten bestehenden beträchtlichen Differenzen zwischen der allgemeinen Phonetik und Phonologie, den betroffenen Philologien, der typologischen Sprachwissenschaft, der älternen und der neueren Indogermanistik sowie zwischen z. B. "amerikanischen" und "deutschen" Notationsweisen([...]) behindern das Verständnis besonders für Außenstehende bisweilen unnötig. Daher werden alle Schreibungen aller herangezogenen Quellen einheitlich in die einzig universelle Schreibweise des IPA-Alphabets überführt [...]. |15|IPA, phonetic transcription, phonological transcription, representation of sound sequences 764|Kuemmel2008|Mit der "Erfindung" bzw. Entdeckung der Phonologie bzw. Phonematik und deren Rezeption in der diachronen Sprachwissenschaft änderte sich dise: grundsätzlich hätte man sich nun entscheiden müssen, ob die "Laute" non *Phone* oder *Phoneme* sein sollten. Die traditionelle Indogermanistik setzte freilich weitgehend ihre (diachron orientierte) Schreibweise fort, d. h. es wurden durchaus auch Phone unterschieden, wenn [pb] der phonetische Unterschied diachron relevant war [...]. Andererseits verzichtete manaber nun nicht selten auf solche Unterscheidungen [...], und im Falle der neu entdeckten sogenannten "Laryngale" sogar meist auf jede genauere phonetische Bestimmung [...]. Dies ist auch heute noch die Praxis der meisten Autoren.|16f|phonology, phonetics, linguistic reconstruction 765|Kuemmel2008|Gerade dies aber stellt ein Problem dar: Die Rekonstruktion kann nämlich gar nicht auf rein phonologischer Basis stattfinden, da ja auch der Lautwandel Phonemgrenzen keineswegs beachten muss. Die Rekonstruktion ist lediglich eine Methode, um die Daten einer auf andere Weise nicht zugänglichen Sprache zu erfassen, sie führt damit wie die synchrone Sprachanalyse zunächst nur auf ein (molicherweise unvollständiges) Inventar von Phonen, das erst in einem zweien Schritt einer phonologischen Analyse unterzogen werden kann. Erest wenn man diese durchgeführt hat (wobei hier natürlich besondere Probleme zu beachten sind), kann man entscheiden, ob die Differenz zweieer untershciedlicher rekonstruierter Phone phonologisch oder nur allophonisch-phonetisch ist. |17|phonetics, phonology, linguistic reconstruction 766|Kuemmel2008|Lautgesetze haben *predictive power*. Allerdings bleibt diese dadurch eingeschränkt, dass man zwar ziemlich sicher sagen kann, wie entsprechende Wörter lauten müssen, wenn sie existieren, jedoch nicht, ob sie tatsächlich (noch) existieren. |22|predictive power, sound change, sound law 767|Kuemmel2008|Hier kann heute wohl die Grundannahme gelten, dass synchrone phonetische Variation die Grundlage aller weiteren Veränderungen ist und durch die Interaktion von Sprechern und Hórern phonologischen Status erlangen kann. [...] Aus dieser Grundannahme ergibt sich jedenfalls, dass diachroner Lautwandel synchron belegbare Prozesse widerspiegeln sollte und damit auch etwas von der synchronen Phonetik und Phonologie profitieren kann, wie umgekehrt diese von der Lautwandelforschung.|22|synchronic variation, sound change, phonetic theory, phonological theory, reasons for sound change 768|Kuemmel2008|:comment:`Note on articulatory sound change.` Bei artikulatorischem lautwandel sind nach verbreiterter Auffassung keine "Sprünge" möglich, sondern jeder Lautwandel kann direkt nur zu einem nach Artikulationsort ud/oder -art dem Ausganslang enachbarten Lauttyp führen. Dieses Postulat impliziert, dass man bei historisch belegten Prozessen anderer Art Zwischenstufen ansetzen muss (vgl. @Lass1984: 332-338).|24|sound change, articulation, Neogrammarian sound change 769|Lass1984|When we say that phonetic change is 'gradual' we don't -- or shouldn't -- necessarily mean that it's 'infititely' gradual, as some writers have supposed. Indeed, the classic argument against the notion of gradualness (misinformed, but influential) is that it implies 'infinitesimal' change, and allows no principled limits on the size of the intermediate steps between input and output (@King1969). This purports to make gradualness an incoherent concept, but fails to, as we'll see, since the problem is a false one. |332|gradual sound change, phonetic gradualness, sound change 770|Lass1984|:comment:`On the pages before, examples for changes involving lenition have been brought up.` Our principles have provided us with hitherto unsuspected **missing links** (as I like to call them): we have used synchronic observation to create history. And note that by using hte lenition hierarchy as a template against which to look at history, we can say that given correspondences [p]:[h]:ø, this represents a SEQUENCE of developmental types: even though of course there is no ancestor-descendant relation amon ghte three dialects in question (Germanic isn't 'older than' Celtic, etc.). [...] A correspondence [p]:Ø then implies that the Ø-dialect whent through the stages [f][, [h] if it started from [p]. And similar arguments can be constructed for all types of change, though some (like vowel change) are more difficult.|337|sound change, gradual sound change, 771|Lass1984|This is obviously one "standard" book on phonology in which on pages 315-338 an interesting introduction to phonological change is given. Furthermore, there is some interesting explicit treatment of "gradual sound change" and intermediate stages on pages 332-338).|000|sound change, phonology, phonological theory 772|Chomsky1968|The decision to regard speech sounds as feature complexes rather than as indivisible entities has been adopted explicitly or implicitly in almost all linguistic studies. Specifically, it is almost always taken for granted that phonological segments can be grouped into sets that differ as to their 'naturalness'. Thus, the sets comprising all vowels or all stops or all continuants are more natural than randomly chosens sets composed of the same number of segment types. No serious discussion fo the phonology of a language has ever been done without reference to classes such as vowels, stops, or voiceless continuatns. On the other hand, any linguist would react with justified skepticism to a grammar that made repeated reference to a class composed of just the foru segments [p r y a]. Thus judgments of 'natural' classes that are relevant empirically by the observation that it is the'natural' classes that are relevant to the formulation of phonological processes in the most varied languages, though there is no logical necessity for this to be the case.|335|naturalness, segmentation, segment inventories, phonological segments 773|Foley1977|Philosophically interesting, however, is Chomsky and Halle's notion that the 'naturalness' of a rule is defined by its 'expectedness', that is by its statistical frequency. This is a shaky basis for determining naturalness of a rule, and is contrary to the theoretical position that the naturalness of a rule is determined by its derivation from a higher order phonological rule in conformity with a phonological principle.|9|phonological theory, frequency, naturalness, phonological change 774|Foley1977|:comment:`Arguing against naturallness of phonemic inventories as layed out in` @Chomsky1968 However, I know of no philosophical reason why statistically frequent phonemic inventories should have fewer features associated with them than statistically less frequent phonemic inventories. |11|naturalness, phoneme inventory, distinctive features 775|Foley1977|:comment:`Before, the author discusses different ways to formulate phonological rules, and emphasizes the importance of not being led astray by short, abstract rules involving just a few features when trying to detect what naturally happens.` The observations made so far indicate that *g* spirantizes more readily than *d* or *b*, and *g* and *d* spirantize more readily than *b*. We therefore establish the following relation: .. code:: g d b → → → 1 2 3 which refers to the propensity to spirantization, with the weakest element being most inclined to spirantization. |28|sound change, phonological strength, phonetic strength, 776|Grammont1933|:comment:`Quoted after` @Foley1977 :comment:`on p. 53 (using a reprint from 1971 of the original work)` C'etait un dogme de la grammaire comparée que chaque langue avait sa phonétique propre et son évolution particulière. En 1895, dans son livre su r *La Dissimilation consonantique dans les langues indo-européennes et dans les langues romanes*, m. Grammont renverse ce dogme, en établissant la première li phonétique générale. Il montre que les lois phonétiques sont audessus des langues et les dominent, qu'elles sont humaines, c'est-à-dire communes à tout le langage humain. :translation:`It is a dogma in comparative grammar that every language has its own phonetic system and its own specific evolution.`|154|sound change, language universals 777|Foley1977|James Foley's book on foundatiosn of theoretical phonology is interesting in many respects: 1. It builds on heavily criticizing @Chomsky1968 2. It develops a theory of **phonological strength** and illustrates how to adapt it. 3. It tries to make all this in a formal way that is easy but seems exact enough, so that it could even be used in computational studies. |000|sound change, phonological theory, phonetic strength, phonological strength 778|Hock1991| * Pages 26ff are very useful for the notation of changes and generalizations (like the phonological stuff in @Hall1999).|000|sound change, historical linguistics, introduction 779|Kuemmel2008|Wirksam und fruchtbar wird die ARgumentation mit diachron-typologischen Daten daher nur im Zusammehang mit anderen Daten, die zur phonologischen Analyse beitragen: Distributionseigenschaften, Besonderheiten der Schreibung, Interferenzen im Sprachkontakt und auch synchron-typologische Argumente. Durch eine Gesamtbewertung kann man so die beim ejtzigen Forschungsstand wahrscheinlichste Deutung erreichen. Eine Verbesserung der Datenbasis durch Material as den vielen hier nicht erfassten Sprachfamilien der Welt dürfte die Lage verbessern, doch nicht die prinzipiellen Beschränkungen typlogischer Argumentation beseitigen: Zwingende Argumente können nur aus allgemeinen, theoretisch begründbaren Regeln übersprachliche Systeme abgeleitet werden, aso wenn man aus typischen Phänomenen Universalien gewinnen kann. Letztlich bleibt dieses Buch also nur Stückwerk, doch hoffentlich nützlich asl Steinbruch für weiterführende Forschungen.|344|linguistic reconstruction, constraints, typology, language typology, sound change, typology of sound change 780|Mace1994|Cross-cultural comparison is a common method of testing hypotheses regarding the co-evolution of elements of cultures or of the adaptiveness of a cultural practice to some aspect of the environment. It has long been recognized, however, that cultures are not independent but rather may share many cultural elements by virtue of common ancestry and proximity. Attempts to address this issue, known as Galton's problem, range from statistically removing confounding variables to using a standard sample of ''independent cultures.'' We show here that when testing any hypothesis of co-evolution one should not attempt to identify independent cultures or to create them statistically. Rather, cross-cultural comparative studies must be based upon the identification of independent events of cultural change. Once this principle is applied, it becomes apparent that it is in fact groups of closely related cultures that are potentially the most informative for testing cross-cultural hypotheses. Constructing phylogenies of cultures and placing upon them independent instances of cultural elements' arising or changing is an essential part of this task.|000|cultural evolution, common origin, statistical independency, correction, co-evolution 781|Szoellosi2015|This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.|000|gene tree reconciliation, lateral gene transfer, phylogenetic network 782|Kassian2010|The paper is a thematical follow-up to the refinements of the lexicostatistical method sug- gested in @Starostin2010 [Starostin G. 2010]. It discusses the issue of synonymity/polysemy, a well-known obstacle in the compilation of Swadesh wordlists for various languages, and presents a list of both syntactic/semantic contexts and explanatory notes that could help reduce the ambiguity issue in the creation and quantitative analysis of such wordlists. The notes and contexts are partially based on linguistic tradition and partially on theoretical and/or pragmatic consid- erations, some of which are stated explicitly.|000|basic vocabulary, basic level concepts, semantic similarity, semantic specification 783|Kassian2010|:comment:`Authors name basic principles for filling out the test lists.` :comment:`As to the style, choose the word belonging to the most basic style.` 1) In all cases, as is prescribed by the standard procedure, we advocate the choice of the most stylistically *basic*, *neutral*, and *unmarked* word, uncumbered by either pragmatic connotations or strongly emphasized additional, “extra” semantic features. [...] :comment:`Unbound use must be possible.` 2) The word also has to be encountered in unbound use, not constrained exclusively to spe- cific idiomatic constructions or compound forms. [...] :comment:`In cases of stem alternation, choose the most basic stem.` 3) A frequent source of synonymity and, therefore, confusion concerning the Swadesh wordlist is the phenomenon of suppletion within a paradigm. Because proper grammatical suppletion is sometimes hard to distinguish from lexical variation (e.g., the use of special col- lective forms of nouns; aspectual forms of verbs, etc.), it is strongly recommended to avoid in- cluding suppletive stems as synonyms altogether. One should attempt to determine the most unmarked, morphologically unencumbered stem, which typically corresponds to such mor- phological values as singular number (for nouns), positive degree of comparison (for adjec- tives), 3rd person / present tense / singular subject or object of action (for verbs); [...] :comment:`Make ambiguous items on the list anthropocentric.` 4) The list is generally anthropocentric, meaning that most of the anatomic terms and ac- tion verbs (such as ‘sit’, ‘lie’, ‘eat’, etc.) are to be taken as human body parts and actions; e.g., the wordlist for Modern Russian should contain *рот* ‘mouth (of a human)’, but not *пасть* ‘mouth (of an animal)’, and the wordlist for German should contain *essen* ‘to eat (of a human)’, but not *fressen* ‘to eat (of an animal)’. [...] :comment:`Search for syntactic contexts that contrast functional antonymous pairs in the list.` 5) Many of the items on the Swadesh list form functional antonymous pairs, e.g., ‘big : small’, ‘black : white’, ‘man : woman’, or groups of two or three elements with tight semantic connections, e.g., ‘sun : moon’, ‘eat : drink’, ‘sit : stand : lie’ etc. It is very useful to seek out syntactic contexts that contrast these elements, since otherwise we increase the risk of a non- basic term creeping in. [...] :comment:`Allow for synonyms in cases of transitional stages.` 6) One of the cases where synonymity is unavoidable is when we deal with a transitional stage in language history, during which an older word is gradually being “ushered out” by a more recent replacement [...]. [pb] :comment:`Treatment of compound forms (only relevant for cognate judgments, but doubling, as suggested by Kassian et al. should NOT be allowed but can better be modeled explicitly.` 7) Synonimity must also be considered (but can sometimes be avoided) in dealing with the phenomenon of *compound* forms found on the list, consisting of two or more root mor- phemes; this is especially typical of “monosyllabic” languages that frequently recur to com- pounding in order to avoid excessive homonymy (e.g., Sino-Tibetan, Kradai, Abkhaz-Adyghe), but in small doses can be found almost anywhere. :comment:`As point number 8 (but not displayed as such), frequency of use could be named.` Most of these criteria — not always, but more frequently than not — help to single out the most *statistically frequent* of the possible equivalents. The criterion of frequency of use cannot be construed as a strong demand, but it may be related to the idea of writing down the equivalent that “first springs to mind” when dealing with an informant, or the one that is listed before all others in a context-less dictionary (unless, of course, it lists all the synonyms in alphabetic order).|48f|basic vocabulary, semantic specification 784|Wang2004|:comment:`Tree calculation of 9 Bai varieties. This calculation is based on specific features which are more or less idiosyncratically selected and described on page 139f.`|140|family tree, Bai, Sino-Tibetan, Chinese dialects 785|Chessa2014|In this work we are interested in identifying clusters of \positional equivalent" actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. The main contribution of our work is twofold. First, we develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be de ned on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Second, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data.|000|clustering, complex network, copula function, positional analysis, weighted bipartite network 786|Saitou2013|Recombinations are known to disrupt bifurcating tree structure of gene genealogies. Although recently occurred recombinations are easily detectable by using conventional methods, recombinations may have occurred at any time. We devised a new method for detecting ancient recombinations through phylogenetic network analysis, and detected five ancient recombinations in gibbon ABO blood group genes [Kitano et al., 2009. Mol. Phylogenet. Evol., 51, 465–471]. We present applications of this method, now named as “PNarec”, to various virus sequences as well as HLA genes.|000|phylogenetic reconstruction, network, phylogenetic network, recombination 787|Saitou2013|The good old days of constructing phylogenetic trees from relatively short sequences are over. Reticulated or ‘‘non-tree’’ structures are omnipresent in genome sequences, and the construction of phylogenetic networks is now the default for describing these complex realities.|507|network, paradigm shift, phylogenetic network, family tree 788|Everett2015|We summarize a number of findings in laryngology demonstrat- ing that perturbations of phonation, including increased jitter and shimmer, are associated with desiccated ambient air. We predict that, given the relative imprecision of vocal fold vibration in desiccated versus humid contexts, arid and cold ecologies should be less amenable, when contrasted to warm and humid ecologies, to the development of languages with phonemic tone, especially complex tone. This prediction is supported by data from two large independently coded databases representing 3,700+ languages. Languages with complex tonality have generally not developed in very cold or otherwise desiccated climates, in accordance with the physiologically based predictions. The predicted global geo- graphic–linguistic association is shown to operate within conti- nents, within major language families, and across language isolates. Our results offer evidence that human sound systems are influenced by environmental factors|2015|tone language, language evolution, evolutionary pressure 789|Mazaudon1988|It is generally assumed that sound changes target classes of features, rather than phonemes. Here I shall argue that while this is true of segmental features, tone changes (with one exception) do not take place by feature. This leads us to claim that tones should not be defined in terms of features, but instead should be viewed as indivisible units. If the widely held view were correct that tones should be represented by matrices of features (and that contour tones consist of sequences of such matrices), we would expect historical change to affect tones AS SERIES , as it does in the case of consonants. The extensive literature and my own comparative field work on the evolution of tonal systems in Asian languages show that tones typically evolve independently of one another. The one exception occurs when the merger of two (rarely three) series of initial consonants leads to the phonologization of a pitch feature on a vowel. In this case, analyses such as Yip (1988) and Clements (1983), where one feature represents the proto-tonal opposition and another feature represents the feature contributed by the consonant, can be maintained. However, once the tone system is fully constituted, each tone follows its own path and this individual evolution constitutes a counter-example for a feature analysis of purely tonal systems.|000|tone, tone language, sound change, distinctive features 790|Mazaudon1991|La phonologie historique qui a été à l'origine du développement de la linguistique moderne n'a plus autand d'adeptes que par le passé. L'une des multiples raisons est paradoxalement la remise en question de son caractère scientifique. L'histoire évoque, à tort ou à raison, la collection de faits plus ou moins importants, ayant pour beaucoup un caractère arbitraire. Peut-être l'emploi du terme linguistique génétique, proposé par Malkiel, aurait-il eu le mérite de souligner la présence en linguistique historique, de plusieurs composants, chacun appelant des méthodes distinctes pours son étude, et dont l'un à tout le moins, relève d'une méthodologie qui s'apparente à celle de l'histoire naturelle plutôt que del'histoire événementielle. La prise en compte du caratère social de la langue et des multiples influences externes qui laissent des trace dans les structures qu'on peut y observer, corrobore plutôt qu'elle ne l'infirme la présence d'un composant régulier, « mécanique », dans les changements linguistiques « toutes choses égales d'ailleurs ». Le fait que les choses soient rarement égales d'ailleurs ne fait rien à l'affaire. Le recours à des modèles probabilistes pour décrire l'évolution historique des langues nous semble une attitude singulièrement défaitiste au vu des résultats remarquables atteints par la méthode comparative. Le logiciel de reconstruction automatique que nous présentons [FN] constitue une application rigoureuse et sans exception du principe [pb] néogrammairien. R. E., en anglais Reconstruction Engine, examine l'ensemble des lexiques d'un groupe de langues et trie le vocabulaire en deux groupes: les formes régulières dont les correspondances sont explicable par des « lois » phonétiques et structurales, et les formes irrègulères qui appellent d'autres explications.|000|Reconstruction Engine, automatic linguistic reconstruction, comparative method, linguistic reconstruction 791|Jacques2011|Aspirated fricatives are typologically uncommon sounds, only found in a handful of languages. This paper studies the diachronic pathways leading to the creation of aspirated fricatives. A review of the literature brings out seven such historical pathways. An eighth, heretofore unreported pattern of change is revealed by Shuiluo Pumi, a Sino-Tibetan language spoken in China. These diachronic data have non-trivial implications for phonological modelling as well as for the synchronic typology of sound patterns. First, they provide new evidence for the debate concerning the definition of the feature [+spread glottis]. Second, they explain some of the typological properties of aspirated fricatives, in particular the absence of aspirated fricatives in consonant clusters and the rarity of non- coronal aspirated fricatives.|000|sound change, sound change patterns, aspirated fricatives, fricatives 792|Ngai2015|This paper looks at the polysemous, multifunctional Shaowu verb [tie 53 ] which means ‘to get’ in a mono-transitive construction, and which is relexiied to mean ‘to give’ in a ditransitive construction through the process of semantically coerced syntactic change. he morpheme then grammaticalises along a bifurcated pathway to become possibility modal suix, verb complement marker, dative, benefactive, causative and passive markers, among other things. This poly-functionality may in part be due to language internal change, but may also be attributed to contact-induced grammaticalisation. Various historical documents are examined to follow the diachronic change, whereas languages from neighbouring dialect groups and language families are considered for the likelihood of areal diffusion of certain constructions and functions of the Shaowu GET/ GIVE verb of [tie 53 ].|000|Shaowu dialect, Chinese, Chinese dialects, semantic change 793|Yu2014|Hybridization plays an important role in the evolution of certain groups of organisms, adaptation to their environments, and diversification of their genomes. The evolutionary histories of such groups are reticulate, and methods for reconstructing them are still in their infancy and have limited applicability. We present a maximum likelihood method for inferring reticulate evolutionary histories while accounting simultaneously for incomplete lineage sorting. Additionally, we propose methods for assessing confi- dence in the amount of reticulation and the topology of the inferred evolutionary history. Our method obtains accurate esti- mates of reticulate evolutionary histories on simulated datasets. Furthermore, our method provides support for a hypothesis of a reticulate evolutionary history inferred from a set of house mouse (Mus musculus) genomes. As evidence of hybridization in eukaryotic groups accumulates, it is essential to have methods that infer reticulate evolutionary histories. The work we present here allows for such inference and provides a significant step to- ward putting phylogenetic networks on par with phylogenetic trees as a model of capturing evolutionary relationships.|000|maximum likelihood, evolutionary model, reticulate evolution, network, phylogenetic network 794|Norman1983|The chief task of Chinese historical dialectology is to explain how the present-day dialects were formed. This is not an easy task since we still have only a very incomplete record of modern words. Historical records of ancient dialects are unfortunately even more spares and while the recording of modern dialects is an on-going process, the amount of dialectal data known from written sources of eary centuries is unlikely to increase appreciably. Moreover, dialectal data from historical sources is of a fragmentary and unsystematic nature. But despite these shortcomings, in some cases these ancient recordings can and do shed light on problems in modern dialectology. In this paper I will examine a number of such ancient dialect words and attempt to interpret their significance.|000|Yáng Xióng's Fāngyán, Chinese, Chinese dialects, Chinese dialectology, Mǐn, ancient dialect words 795|Handel2010|The Min dialects are known to have split off of from mainstream Chinese before the Middle Chinese period. This paper explores the implications of this split in terms of the possible relationship between Min and Old Chinese. Through a comparison of Norman’s Proto-Min reconstruction with the most recent Old Chinese reconstruction system of Baxter and Sagart, including references to early Chinese borrowings in Tai and Hmong-Mien, it is argued that Norman’s “softened initials” *-p and *-b have their origin in Old Chinese iambic nasal pre-initials, which developed into prenasalized initials.|000|Mǐn, Chinese dialects, Chinese dialectology, Old Chinese, Middle Chinese, genetic classification 796|Norman1991|A very good overview over the Mǐn dialects, their geographic distribution, and their history. Divided into five major parts: 1. Geographical Factors 1. The Mǐn Dialects and Geography 2. The Mǐn Heartland 3. Mǐn Overseas 4. Dialect Islands 2. Historical Factors 1. The early history of Fújiàn 2. The Aboriginal Population 3. The Chinese Conquest of Fújiàn 4. Chinese Colonization of Fújiàn 3. The Problem of Stratification 1. The Components of Mǐn 2. The Yuè Substratum 3. The Hàn Foundation 4. The Mǐn Protolanguage 5. Mǐn and the *Qièyùn* System 4. Classification and Subgrouping 1. The Definition of a Mǐn Dialect 2. Subgrouping 5. Problems for Further Research 1. Meso-history of Micro-history 2. Dialectal Syntax |000|Mǐn, Chinese dialects, Chinese dialectology, Middle Chinese, Qièyùn, 797|Wheeler2014|Language origins and diversification are vital for mapping human history. Traditionally, the reconstruction of language trees has been based on cognate forms among related languages, with ancestral protolanguages inferred by individual investigators. Disagreement among competing authorities is typically extensive, without empirical grounds for resolving alternative hypotheses. Here, we apply analytical methods derived from DNA sequence optimization algorithms to Uto-Aztecan languages, treating words as sequences of sounds. Our analysis yields novel relationships and suggests a resolution to current conflicts about the Proto-Uto-Aztecan homeland. The techniques used for Uto-Aztecan are applicable to written and unwritten languages, and should enable more empirically robust hypotheses of language relationships, language histories, and linguistic evolution.|000|sequence alignment, phylogenetic reconstruction, POY, software, phonetic alignment 798|Creanza2015|Worldwide patterns of genetic variation are driven by human demographic history. Here, we test whether this demographic history has left similar signatures on phonemes—sound units that distinguish meaning between words in languages—to those it has left on genes. We analyze, jointly and in parallel, phoneme inven- tories from 2,082 worldwide languages and microsatellite polymor- phisms from 246 worldwide populations. On a global scale, both genetic distance and phonemic distance between populations are significantly correlated with geographic distance. Geographically close language pairs share significantly more phonemes than distant language pairs, whether or not the languages are closely related. The regional geographic axes of greatest phonemic differentiation corre- spond to axes of genetic differentiation, suggesting that there is a relationship between human dispersal and linguistic variation. However, the geographic distribution of phoneme inventory sizes does not follow the predictions of a serial founder effect during human expansion out of Africa. Furthermore, although geographi- cally isolated populations lose genetic diversity via genetic drift, phonemes are not subject to drift in the same way: within a given geographic radius, languages that are relatively isolated exhibit more variance in number of phonemes than languages with many neighbors. This finding suggests that relatively isolated languages are more susceptible to phonemic change than languages with many neighbors. Within a language family, phoneme evolution along genetic, geographic, or cognate-based linguistic trees predicts similar ancestral phoneme states to those predicted from ancient sources. More genetic sampling could further elucidate the relative roles of vertical and horizontal transmission in phoneme evolution.|000|phoneme, phonemic variation, phonetic variation, language variation, phoneme inventory, genetic diversity 799|Bromham2015|The effect of population size on patterns and rates of language evolution is controversial. Do languages with larger speaker popula- tions change faster due to a greater capacity for innovation, or do smaller populations change faster due to more efficient diffusion of innovations? Do smaller populations suffer greater loss of language elements through founder effects or drift, or do languages with more speakers lose features due to a process of simplification? Revealing the influence of population size on the tempo and mode of language evolution not only will clarify underlying mechanisms of language change but also has practical implications for the way that language data are used to reconstruct the history of human cultures. Here, we provide, to our knowledge, the first empirical, statistically robust test of the influence of population size on rates of language evolution, controlling for the evolutionary history of the populations and formally comparing the fit of different models of language evolution. We compare rates of gain and loss of cognate words for basic vocabulary in Polynesian languages, an ideal test case with a well-defined history. We demonstrate that larger populations have higher rates of gain of new words whereas smaller populations have higher rates of word loss. These results show that demographic factors can influence rates of language evolution and that rates of gain and loss are affected differently. These findings are strikingly consistent with general predictions of evolutionary models.|000|language history, language evolution, population size, rate of change, correlational studies 800|Behr2004|初別國不相往來之言也,今或同。而舊書雅記故俗語,不失其方,而後人不知,故為之作釋也。 :translation:`Words, which could initially not be communicated be communicated between different countries, are sometimes shared [by all speakers of Chinese] today, whereas words noted as vernacular in ancient writings and glossography do not fail to be widespread. However, later people will not know this and therefore we offer explanations for these [words].`|23|Yáng Xióng's Fāngyán, Chinese dialects, diachrony and synchrony 801|Norman1983|By far the richest and most useful of the ancient dialect collections for my present purpose are the glosses of the Jìn dynasty scholiast Guō pú 郭璞 who lived from 276 until 324. In his commentaries on the Fāngyán and the Ěryá 爾雅 a large number of dialectal notes occur among which there are more than 150 words from the Jiāngdōng 江東 dialect of his day. Serruys (1962) [@Serruys1962] remarks that Guō Pú's dialect material is much less complete than that of the Fāngyán and that it does not cover as many geographical areas. One explanation for this may be that in the period between the Fāngyán and Guō Pú's commentaries (almost three centuries) the Hàn standard language had spread to many regions and either strongly influenced or replaced the original local dialects leading to a much more homogeneous dialectal situation in China. |202|Hàn time, Chinese dialects, Yáng Xióng's Fāngyán 802|Liu1992|:comment:`Talks on place names in the Fāngyán and distinguishes: administrative areas, physical geography, ancient place names, place names from Hàn times, names for large areas, and names for regions.` [pb] :comment:`Authors suggest to distinguish names for administrative subdivisions (行政區劃), and geographical place names (自然地理地名.`|107f|Yáng Xióng's Fāngyán, place names 803|Morrison2015|Aligning multiple nucleotide sequences is a prerequisite for many if not most comparative sequence analyses in evolutionary biology. These alignments are often recognized as representing the homology relations of the aligned nucleotides, but this is a necessary requirement only for phylogenetic analyses. Unfortunately, existing computer programs for sequence alignment are not based explicitly on detecting the homology of nucleotides, and so there is a notable gap in the existing bioinformatics repertoire. If homology is the goal, then current alignment procedures may be more art than science. To resolve this issue, I present a simple conceptual scheme relating the tradi- tional criteria for homology to the features of nucleotide sequences. These relations can then be used as optimization criteria for nucleotide sequence alignments. I point out the way in which current computer programs for multiple sequence alignment relate to these criteria, noting that each of them usually implements only one criterion. This explains the apparent dissatisfaction with computerized sequence alignment in phylogenetics, as any program that truly tried to produce alignments based on homology would need to simultaneously optimize all of the criteria.|000|sequence alignment, multiple sequence alignment, homology, cognacy 804|Baxter2014|:comment:`Very interesting summary on Hàn-time dialects which shows a dialect split into dialects which resolve the voiceless clusters (n, m, r) to MC th or MC sy/trh respectively.`|112-116|Hàn time, Old Chinese, voiceless initial, Middle Chinese, dialect split 805|Zeige2015|‘Morphology’ in linguistics is the study of the structure and function of word forms. In this paper, Sections 1 and 2 will give an insight into the basic notions and subfields of linguistic morphology to illustrate the linguistic approach to structure and function. It will then proceed to identify the position of morphology within linguistics and the repeated conjunctions between biology and linguistics by glancing at the theoretical foundations (Section 3) and the history (Section 4) of morphology in linguistics as well as today's theoretical and methodological challenges (Section 5). The paper will conclude with some deliberations on the relevance of morphological studies as part of the academic canon.|000|morphology, morpheme, family tree, genetic classification, ordered character states 806|Zeige2015|.. image:: static/img/zeige-1.gif :name: chinese_dialects :width: 700px .. image:: static/img/zeige-2.jpg :name: chinese_dialects2 :width: 700px :comment:`Both images fromt he article, first illustrating an old language tree from 1600 (compare text), second illustrating the multiple dimensions of morphology and word forms.`|?|family tree, morphology, ordered character states 807|Gaillard2013|We demonstrate an online application to explore lexical networks. Tmuse displays a 3D interactive graph of similar words, whose layout is based on the proxemy be- tween vertices of synonymy and transla- tion networks. Semantic themes of words related to a query are outlined, and pro- jected across languages. The application is useful as, for example, a writing assis- tance. It is available, online, for Mandarin Chinese, English and French, as well as the corresponding language pairs, and can easily be fitted to new resources.|000|network, lexical network, semantic similarity, lexicology, dictionary 808|Kreiner2007|In a recent paper on the integration of psychology and psychometrics, Borsboom (2006) describes construct validity as a “black hole from which nothing can escape: Once a question gets labelled as a problem of construct validity, its difficulty is considered superhuman and its solution beyond a mortal’s ken”.|272|validity, construct, construct validity, test theory 809|Embretson2006|H. Blanton and J. Jaccard (2006) examined the arbitrariness of metrics in the context of 2 current issues: (a) the measurement of racial prejudice and (b) the establishment of clinically significant change. According to Blanton and Jaccard, although research findings are not undermined by arbitrary metrics, individual scores and score changes may not be meaningfully interpreted. The author believes that their points are mostly valid and that their examples were appropriate. However, Blanton and Jaccard’s article does not lead directly to solutions, nor did it adequately describe the scope of the metric problem. This article has 2 major goals. First, some prerequisites for nonarbitrary metrics are presented and related to Blanton and Jaccard’s issues. Second, the impact of arbitrary metrics on psychological research findings are described. In contrast to Blanton and Jaccard (2006), research findings suggest that metrics have direct impact on statistics for group comparisons and trend analysis.|000|arbitrary metrics, test theory, psychology, validity, construct validity 810|Dogan2013|Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.|000|partial cognacy, cognate detection, partial cognate detection, clique, network 811|Huang2007|The work presented here is situated in the broader project of creating of multilingual lexical resources with a focus on Asian languages. In the paper, we describe the design of the upper-level we are creating for our multi-lingual lexical resources. Among the current efforts devoted to this issue our work put the focus on (i) the language diversity aiming at massively multi-lingual resource, and (ii) the attention devoted to the ontological design of the upper level.|000|Swadesh list, basic level concepts, concept list, wordnet 812|Robbeets2004|The Altaic origin of Japanese is among the most disputed questions of language history. Given this lack of consensus, an argument that is often advanced is: “there is no basic vocabulary relating Japanese to Korean and Altaic”. The present paper investigates this postulation from a methodological and data-oriented perspective. First it seeks to advance a more precise methodology involving the concept of basic vocabulary. Next it evaluates the etymological proposals made so far, relating Japanese basic vocabulary items to Korean and Altaic. Starting from 92 basic vocabulary items out of Swadesh 100 list for which an etymology relating the Japanese entry to Korean and Altaic has been presented in the past, 41 etymologies stand the selection criteria. The paper concludes that the similarities we find between Japanese, Korean and Altaic are more likely to be the result of common genetic inheritance than of borrowing.|000|Altaic, Japanese, Korean, Swadesh list 813|Greenhill2011|The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from a large database of Austronesian languages. Comparing the classifica- tion proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of the time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages. This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks.|000|Levenshtein distance, edit distance, language comparison, sequence alignment, phylogenetic reconstruction 814|Boc2010|Discovering the origin of the Indo-European (IE) language family is one of the most intensively studied problems in historical linguistics. Gray and Atkinson [6] inferred a phylogenetic tree (i.e., additive tree or X-tree [2]) of the IE family, using bayesian inference and rate-smoothing algorithms, based on the 87 Indo-European language data set collected by Dyen et al. [5]. When conducting their classification study, Gray and Atkinson assumed that the evolution of languages was strictly divergent and the frequency of borrowing (i.e., horizontal transmission of individual words) was very low. As consequence, their results suggested a predominantly tree- like pattern of the IE language evolution. In our opinion, only a network model can adequately represent the evolution of the IE languages. We propose to apply a method of horizontal gene transfer (HGT) detection [8] to reconstruct phylogenetic network depicting the evolution of the IE language family.|000|Indo-European, phylogenetic reconstruction, network, phylogenetic network 815|Prevot2006|The long-term goal of the research described in this paper is to develop a multilingual language resource linked to the Princeton WordNet. The paper describes the experiments we are conducing for de- termining a basic vocabulary and for designing a language-independent core for the future resource. More precisely, in this paper we use the universality of the Swadesh list [15] for selecting it as a basic core vo- cabulary and we present several options for designing a minimal upper ontology underlying the list.|000|Swadesh list, basic vocabulary, ontology 816|Cardoso2015|Automated scoring systems which evaluate content require robust ways of dealing with form errors. The work presented in this paper is set in the context of scoring learners’ responses to listening comprehension items included in a placement test of German as a foreign language. Based on a corpus of over 3000 responses to 17 questions, by test takers of different language proficiencies, we perform a quan- titative analysis of the diversity in misspellings. We evaluate the performance of an off-the-shelf open source spell-checker on our data showing that around 45% of the reported non-word errors are not correctly accounted for, that is, they are either falsely identified as misspelt or the spellchecker is unable to identify the intended word. We propose to address misspellings in computer-based scoring of constructed response items by means of phonetic nor- malization. Learner responses transcribed into Soundex codes and into two encodings borrowed from historical linguistics (ASJP and Dolgopolsky’s sound classes) are compared to transcribed reference answers using string distance measures. We show that reliable correlation with teachers’ scores can be obtained, however, similarity thresholds are item-specific.|000|misspelling, sound classes, LingPy, Dolgopolsky, ASJP, sequence modeling 817|Cardoso2015|Here's a link to the full bibliography. What is nice about the paper is that they show that one can use Sound classes to enhance spelling error correction. Cool result for LingPy. * http://www.ep.liu.se/ecp_article/index.en.aspx?issue=114;article=002 |000|spelling correction, sound classes, sequence modeling 818|Su2009|Iconicity should be taken into account for the comparison of lexical similarity in sign languages, but it should be excluded for the study of their historical relatedness. Woodward (1978, 1991, 1993) modified Swadesh list by excluding body part signs and pronouns for historical comparison. In addition to body part signs and pronouns, signs with similar iconic motivation are also excluded in this study for historical comparison. The preliminary result shows that Taiwan Sign Language (TSL) and Japanese Sign Language (JSL) can be considered as languages of the same family, while TSL and Chinese Sign Language (CSL) can not. The similarity between TSL and CSL are due to language contact. TSL and American Sign Language (ASL) are least similar. Signs with iconic motivation are prevalent and universal in sign languages. Lexical comparison of sign languages can also be conducted with respect to various types of iconic devices even for historically unrelated languages such as TSL and ASL.|000|Swadesh list, sign language, Chinese, Japanese, Taiwanese 819|Tsvetkov2015|Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a “donor” language to a “recipient” lan- guage as a result of contacts between commu- nities speaking different languages. Borrowed words are found in all languages, and—in con- trast to cognate relationships—borrowing rela- tionships may exist across unrelated languages (for example, about 40% of Swahili’s vocabu- lary is borrowed from Arabic). In this paper, we develop a model of morpho-phonological transformations across languages with features based on universal constraints from Optimality Theory (OT). Compared to several standard— but linguistically naïve—baselines, our OT- inspired model obtains good performance with only a few dozen training examples, making this a cost-effective strategy for sharing lexical information across languages.|000|lexical borrowing, sequence modeling, 820|Urban2011|This article is a contribution to the long standing issue of identifying directionality in semantic change. Drawing on evidence from a sample of morphologically complex terms in basic vocabulary for 149 globally distributed languages, it is argued that cross-linguistically preferred synchronic relationships of word-formation provide clues to likely directions of diachronic semantic developments. The hypothesis is tested against diachronic data from Indo-Aryan languages, and, in spite of a number of counterexamples, a correlation is found. In addition, it is shown how these data can be applied to semantic reconstruction, and a scenario of semantic change which involves morphological complexity in an early stage of semantic development is sketched.|000|semantic change, polysemy, colexification, morphology, directionality of semantic change 821|Swadesh1959|The use of standard vocabulary lists on printed (or mimeographed) forms re- duces tremendously the work of the comparativist, since placing a set of lists side by side will reveal cognates at a glance which would take hours or days to locate in dictionaries or in ordinary field notebooks. However, since cognates often have different meanings, one needs in addition a guide to likely near synonyms and other related meanings based on knowledge of general semantics and the specific tenden- cies of particular groups of languages|34|Swadesh list, standardization, basic vocabulary, cognate detection 822|Swadesh1959|These notions have led to the following technique of locating cognates, which the author has seen used with great saving of time and increased thoroughness. Word-lists for a set of languages are placed side by side, and all apparent cognates of the same meaning are put on a slip of paper; if no cognates are observed, separate slips are made for each word in each language. A carbon copy of the slips is retained in the original numerical order as a finder list; since the word lists are in topical order an alphabetical index by meanings is needed for locating the slips when needed. The originals are regrouped to bring together all those of related meaning, and cognates thus found are joined [pb] by stapling the slips together. Sometimes different semantic groupings are tried until all the possibilities have been examined. Tentative reconstructions of phonetic form are written in for each set of cognates. The slips are refiled according to the phonetic form of the reconstructions, allowing the comparativist to study the correctness of the phonologic equations and also sometimes revealing additional cognates with unexpected meanings. Corrections in cognate identification and reconstruction are made whenever the accumulating information requires it. The file prepared for one linguistic group may be collated with another one to build up broader and broader comparisons. The technique we have described could be easily adapted to use with the simpler types of filing machinery, with the com- parativist intervening at the critical points.|34f|Swadesh list, word list, basic vocabulary, cognate detection, comparative method 823|Lyell1830|The geologist who yields implicit assent to the truth of these principles, will deem it incumbent on him to examine with minute attention all the changes now in progress on the earth, and will regard every fact collected respecting the causes in diurnal acction, as affording him a key to the interpretation of some mystery in the archives of remote ages. Our estimate, indeed, of the value of all geological evidence, and the interest derived from the investigation of the earth's histoyr, must depend entirely on the degree of confidence which we feel in regard to the permanency of the laws of nature. Their immutable constancy alone can enalbe us to reason from analogy, by the strict rules of induction, respecting the events of former ages, or, by a comparison of the state of things at two distinct geological epochs, to arrive at the knowledge of general principles in the economy of our terrestrial system.|165|Charles Lyell, key to the past, inference, induction, abduction 824|Lyell1863|The supposed existence, at a remote and unknown period, of a language conventionally called the Aryan, has of late yeasr been a favourite subject of speculation among German philologists, and Professor Max Müller has given us lately the most improved version of this theory, and has set forth the various facts and arguments by which it may be defended, with his usual perspicuity and eloquence. He observes that if we new nothing of the existence of Latin, - if all historical documents previous to the finteenth century had been lost, - if tradition even was silent as to the former existance of a Roman empire, a me- re comparison of the Italian, Spanish, Portuguese, French, Wallachian, and Rhaetian dialects would enable us to say that at some time there must have been a language, from which these six modern dialects derive their origin in common.|454|Charles Lyell, Max Müller, Indo-European, Romance, inference, key to the past 825|Tamariz2014|Human communication systems evolve culturally, but the evolutionary mechanisms that drive this evolution are not well understood. Against a baseline that communication variants spread in a population following neutral evolutionary dynamics (also known as drift models), we tested the role of two cultural selection models: coordination- and content-biased. We constructed a parametrized mixed probabilistic model of the spread of communicative variants in four 8-person laboratory micro-societies engaged in a simple communication game. We found that selectionist models, working in combi- nation, explain the majority of the empirical data. The best-fitting parameter setting includes an egocentric bias and a content bias, suggesting that participants retained their own previously used communicative variants unless they encountered a superior (content-biased) variant, in which case it was adopted. This novel pattern of results suggests that (i) a theory of the cultural evolution of human communication systems must integrate selectionist models and (ii) human communication systems are functionally adaptive complex systems.|000|language evolution, language change, selection, cultural selection, cultural evolution 826|Yao2015|Machine transliteration is often referred to as phonetic translation. We show that transliterations incorporate information from both spelling and pronunciation, and propose an effective model for joint transliter- ation generation from both representations. We further generalize this model to include transliterations from other languages, and enhance it with reranking and lexicon features. We demonstrate significant improvements in transliteration accuracy on several datasets.|000|transliteration, phonetic alignment, linguistic reconstruction, spelling, pronunciation 827|Bonfante1944|The IE character of hieroglyphic Hittite is today universally accepted. The existence of prono- uns 'amu ’I’, 'ames, mes 'my', kis and (rarely) i ̯as 'who(ever)', of numerals i ̯as 'one', t(u)wai 'two', trai 'three', of such words as as-, es- 'to be', as- 'to sit', 'at- 'to eat', lamanese- 'to name', makes any further discussion of this question unnecessary.|172f|genetic relationship, basic vocabulary 828|Meillet1921|Pour beaucoup de langues de peuples non civilisés, on n’a que des vocabulaires, et la gram- maire est ou inconnue, ou connue d’une manière toute partielle. Si, en pareil cas, on observe un très grand nombre de communautés de vocabulaire entre certaines langues, et si ces com- munautés concernent les mots les moins sujets à emprunt, notamment les verbes qui indiquent les actions usuelles comme aller et venir, boire et manger, vivre et mourir, entendre et voir, dire et se taire, etc., ou des adjectifs comme vieux et neuf, grand et petit, long et large, etc., ce serait pur pédantisme que de se refuser à en faire usage. Seulement il ne faut pas se faire illusion sur la rigueur de la preuve ainsi faite, bien que la possession en commun d’un certain fonds de vocabulaire indique le plus souvent une parenté. Là où l’on n’a pas d’autres données on peut provisoirement, et en faisant les réserves nécessaires, se servir des indications ainsi obtenues. L’observation attentive du vocabulaire conduit du reste presque toujours en pareil cas à relever quelques coïncidences grammaticales qui achèvent la démonstration. :translation:`Bei vielen Sprachen nicht zivilisierter Völker hat man nur den Wortschatz zur Verfügung, während die Grammatik entweder gar nicht oder nur zu einem sehr bruchstückhaften Teil bekannt ist. Beobachtet man in einem solchen Fall zwischen bestimmten Sprachen sehr viele Gemeinsamkeiten im Wortschatz, und betreffen diese Gemeinsamkeiten Wörter, die am sel- tensten entlehnt werden, besonders Verben, die gewöhnliche Aktionen wie gehen und kom- men, trinken und essen, leben und sterben, hören und sehen, sagen und schweigen etc. be- zeichnen, oder Adjektive wie alt und neu, groß und klein, lang und breit etc, so wäre es bloßer Pedantismus abzulehnen, davon Gebrauch zu machen. Dennoch sollte man sich über die Strenge des somit erzielten Beweises nichts vormachen, auch wenn der gemeinsame Be- sitz eines gewissen Vokabelfundus meistens auf eine Verwandtschaft hinweist. Dort, wo man keine anderen Daten zur Verfügung hat, kann man sich vorläufig, unter Einräumung der nö- tigen Vorbehalte, der auf diese Art gewonnenen Hinweise bedienen. Außerdem führt eine aufmerksame Betrachtung des Wortschatzes in solchen Fällen fast immer dazu, einige gram- matische Gemeinsamkeiten aufzudecken, welche die Beweisführung abrunden.`|???|genetic relationship, basic vocabulary 829|Dybo2008|One major reason for this is historical: it is no big secret that the Indo- European family was recognized primarily on the basis of the amazing similarity between the paradig- matic systems of Old Indian and classic European lan- guages like Greek or Latin, and, since the general methodology of comparat- ive linguistics grew out of working with Indo-European languages, morpho- logical comparison, by the very force of tradition, is still held in high esteem and frequently suggested as a universal means for establishing relationship.|124f|basic vocabulary, grammar, proof of relationship 830|Dybo2008|Another reason lies in the intuitive sphere. Morphology (and gram- mar in general) is traditionally seen as the «skeleton» of the language, its main constituent which, in comparison with lexics that «comes and goes», is relatively stable and thus far more valid for the first stage of compari- son. Thus, if the languages compared do not seem to share much common morphology, but are nevertheless quite close lexically, for many linguists the obvious explanation will be that the languages are not related, but show traces of extensive contacts («convergence»).|125|basic vocabulary, grammar, proof of relationship 831|Dybo2008|From a purely synchronic, structuralist point of view such an under- standing of morphology is quite reasonable. And it is most certainly true that regular paradigmatic correspondences in morphology are necessarily indicative of genetic relationship (with the possible exception of creole and pigin languages, whose genetic status is still debatable). But is the re-verse also true — that genetically related languages absolutely have to share common morphology? And if not, which cases of genetic relation- ship would be expected to be «morphologically unprovable»?|125|basic vocabulary, proof of relationship, grammar 832|Dybo2008|An interesting test — which we refrain from performing in details since its results are all too predictable — would be to take, for instance, 00 of the most frequent grammatical morphemes reconstructed for Proto-Indo-Euro- pean and see how well they have been preserved in such modern Indo-Euro- pean languages as Hindi, French, or English. For the latter in particular, we are afraid, the results would be catastrophic (just a few minor traces, such as the -s marker for nominal plurals, -t in irregularly formed past participles, etc.). On the other hand, if we take the Swadesh list of 100 most basic lexical items, it seems to fare infinitely better: over 90 % of the corresponding Eng- lish items can be traced back to their Indo-European ancestors, and, what’s more important, at least a good third of them can even be shown to have possessed the exact same meaning in the protolanguage as they do in mod- ern English, including body parts (‛eye’, ‛ear’, ‛nose’, ‛tongue’, ‛foot’, etc.), numerals (‛one’, ‛two’), nature terms (‛sun’, ‛star’), pronouns, verbs, etc. In this particular case at least, «morphology» seems to be a far shakier means to ascertain genetic relationship than basic lexicon.|125f|basic vocabulary, proof of relationship, genetic relationship, grammar 833|Dybo2008|The bottomline here is as follows: we know for a fact that a language’s system of morpholo- gical markers can undergo an overwhelming collapse over a relatively short period of time, and frequently does. Thus, it hardly took more than a few hundred years for the elaborate nominal morpho- logy of Classical Latin to be reduced to almost nothing. Chinese, over a period of one millennium, underwent a transformation from an essen- tially «Sino-Tibetan» type language to an «Austro-Thai» type language, even though genetically its ties certainly lie with the former. The basic lex- icon, however, of both Chinese and Latin has had a much higher rate of survival [STAROSTIN 2000a: 256]; and, although high-scale borrowings can occasionally speed up lexical replacement, to our knowledge, there is not a single historically attested case of the Swadesh 100-wordlist losing even a quarter of its constituents over a one thousand year period.|126|basic vocabulary, genetic relationship, grammar, proof of relationship 834|Dybo2008|A serious reconstruction of Proto-Indo-European morphology, particularly paradigmatic mor- phology, would have been unthinkable without access to language data that significantly de- creases the chronological distance from attested languages to their reconstructed ancestor — which is exactly the case for Altaic and Sino-Caucasian, where the overwhelming majority of information is gained from modern languages.|126f|proof of relationship, genetic relationship, grammar, basic vocabulary 835|Dybo2008|This means that the best results will be attained if (a) cal- culations are performed on the material of entire language groups rather than isolated representatives; (b) at least a provi- sionary set of phonetic correspondences has been established for the compared units, based on an extensive study of the compared vocabularies in their entirety.|132|genetic relationship, basic vocabulary, genetic classification 836|Dybo2008|The question of whether rela- tionship is best demonstrated through morphology or through lexics is exactly that — a question (albeit one that, we believe, has been fully an- swered above); but when it comes to regularity of phonetic corresponden- ces — the pillar of com- parative linguistics — there can be no second opi- nion on the issue: correspondences must be regular. That said, before pro- ceeding to anything else, an understanding must be reached on what exa- ctly constitutes regularity in correspondences.|138|sound correspondences, basic vocabulary, genetic relationship, proof of relationship 837|Teeter1963|All of lexicostatistics operates with a standard diagnostic list of glosses, to which it attaches forms from different languages. These are then scored as similar or dissimilar, by various techniques, and the percentages of similar and dissimilar items thus derived supply the lexi- costatistician with the basic data from which he reasons. The main attractions of the method, aside from the hope of providing dates, are the ease of gathering the necessary wordlists and the anticipation that it can provide a quick route to knowledge of the course of linguistic history. As one linguist has expressed it, there is among lexicostatisticians ’the expectation that it [the method] makes comparative results possible via a shortcut that avoids the fuss and bother of the comparative method of reconstruction by simply counting similarities in forms sharing the same gloss in a brief but particular lexical sample’.[cf. Voegelin 1962, M.L.] Now a technique that lets us say something about comparative grammar without the difficulty of comparing grammars is not to be taken lightly - if indeed it can make good this claim.|638|lexicostatistics, basic vocabulary, validity 838|Teeter1963|To be usable, the list must be universally valid, and at the same time must consist of enough items to handle statistically (for this, the more the better). These aims are opposed in practice. The closer we get to universal validity, the fewer the items we have. My own opinion is that there will be no items on the perfect list.|642|basic vocabulary, validity, lexicostatistics 839|Vovin2002|In my opinion, the final proof is based on morphology... One is only left to wonder why the rich morphology of ‘North Caucasian’, Yeniseian, and Sino-Tibetan is left out in these attempts to establish ‘Sino-Caucasian’ macrofamily... The only reason we do accept the ety- mologies with pande- mic irregularity is because we have something else: regular correspon- dences in morphology as well as comparatively small, but sufficient cor- pus of lexical ety- mologies [...]|157f|genetic relationship, proof of relationship, morphology, grammar 840|Payne1991|This is a very interesting paper which contains ALIGNMENTS of the Arawakan data, quite a few cognate sets, all nicely aligned, and sorted according to base concepts. This is one of the earliest examples where people tried to align data in a rigorous way.|000|phonetic alignment, alignments, proto-form, Arawakan 841|Koch1996|The linguistic literature gives little guidance on how to do morphological re­construction. I propose a few basic procedures for morphological reconstruction and compare them to procedures needed for other kinds of linguistic reconstruction. Since so much of morphological reconstruction depends on knowing what morphological changes are possible, I present a typology of morphological change with com­ments on the reconstruction procedures that follow from each. Finally I illustrate the methodology in a few case studies where aspects of the morphology of Australian languages, principally of the Arandic subgroup 1 of Central Australia, are recon­structed.|000|morphological reconstruction, morphology, linguistic reconstruction 842|Koch1996|Furthermore the literature does provide a certain amount of guidance in the form of **principles** that should be observed in doing morphological reconstruction. It is agreed that reconstruction should proceed by comparing **archaic** patterns. If these provide cumulative and convergent evidence from different languages, one can use them as the basis for reconstructing patterns in the protolanguage (Hock 1991:610f).|219|cumulative evidence, morphological reconstruction 843|Koch1996|Another principle is that one should begin from synchronically **irregular** or **anomalous** forms since regular forms can easily result from regularising or simpli­fying processes at some time during the history of the language.|219|morphological reconstruction, irregular forms, archaic forms 844|Koch1996|:comment:`Quote by Hetzron (1976:358)` .. pull-quote:: If a number of cognate languages each have a system similar to its homologues in the other languages in some respects, but different in other respects­unless one can find a clear conditioning factor for differentiation­the relatively most heterogeneous system might be considered the most archaic, the closest to the ancestor, and the more homo­ geneous ones might be assumed to have arisen as a result of simplification.|219|morphological reconstruction, archaic heterogeneity 845|Koch1996|:comment:`A proposed methodology for morphological reconstruction:` 1. Match tentative morphs, that is, formal bits that are potentially cognate ac­ cording to established phoneme correspondences and changes. These matches may be found in the same language (so we have internal reconstruction), in different but related languages (so we have comparative reconstruction), or in different but not nec­ essarily related languages (so we have an analysis of borrowing). [...] 2. Assess the relative likelihood of each of the compared forms and/or paradig­ matic patterns being archaic or innovative. 3. Posit an appropriate protoform and a series of plausible processes of morpho­ logical change that (in combination with phonetic and semantic changes) would trans­form the protoform into each of the attested forms. |220|morphological reconstruction, methodology 846|Koch1996|1. Assemble a set of tentative cognates in a group of languages assumed to be genetically related. Begin with lexemes which belong to relatively basic vocabulary. To qualify as tentative cognates the words must exhibit similarities in both their se­ mantic and their phonological make­up that could be accounted for by a combination of phonetic and semantic changes. 2. Match the tentative cognates segment by phonological segment. 3. List the sets of matched phonological segments which recur in the matched cognates. These are correspondence sets. 4. For each group of two or more overlapping correspondence sets (that is, sets that share a phoneme in any one of the languages), check whether the sets occur in the same phonological environment (defined in terms of other correspondence sets or boundaries). 5. Group together the correspondence sets which occur in mutually exclusive environments, that is, which occur in complementary or noncontrastive distribution with one another, bearing in mind that there may be more than one possible way to group them. 6. For each such group of noncontrasting correspondence sets, posit (i) a phoneme in the protolanguage and (ii) a chronologically ordered set of changes which will transform the protophoneme into the attested phoneme in each of the languages under comparison. 7. Where two or more languages have undergone the same change—and this change must be ordered chronologically before other changes which are not shared by the languages in question—posit (i) an intermediate protolanguage ancestral to just the languages in question (which are thus defined as a subgroup) and (ii) a single change that took place only once at some time intermediate between the protolan­ guage and the intermediate protolanguage. 8. When all the correspondence sets have been accounted for in terms of a pro­ tophoneme and associated changes, indicate the phonological system of the pro­ tolanguage; that is, the inventory of phonemes and the features that distinguish the phonemes from one another and characterise their pronunciation. 9. Check the reconstructed phonological system for plausibility according to what is known from the typology of synchronic systems. If more plausible solutions are consistent with the comparative data, try them. 10. Give the reconstructed form of all the words that have reflexes in the daugh­ ter languages. 11. From the list of protowords that are reconstructed, it is possible to describe the phonotactics of the protolanguage, that is the distribution of the protophonemes. Criteria for choosing between alternative workable solutions: * *Economy.* Prefer the most economical solution, that is, the solution which in[pb]volves the fewest elements in the protosystem and/or the fewest changes between the protolanguage and each of the descendant languages. (Hock (1991) calls this the Oc­ cam's Razor Principle.) * *Plausibility.* Prefer the most plausible solution for both the protosystem and the sequence of changes. The evidence for plausibility comes from typology: The plau­ sibility of the protosystem is judged by the evidence of synchronic typology; the plau­ sibility of the changes is judged by the evidence of diachronic typology. (Cf. Hock's principle : "Given two otherwise acceptable competing analyses, we prefer the one which postulates more natural or more common processes" (Hock 1991: 535)) The criterion of plausibility should take precedence over the criterion of economy.|219f|phonological reconstruction, linguistic reconstruction, phonetic alignment 847|Koch1996|:comment:`Difference between phonological and morphological reconstruction: regularity versus irregularity (regular change, versus mutation` There is no analogue in morphological change to regular sound change. **Regularity** is, however, taken into account in phonological reconstruction not by the core recon­struction procedure (step 6) but in the correspondence sets which are the input to this core reconstruction procedure. The set of corresponding phonemes is established by extracting phonemes from the same relative position in (tentative) cognate words or morphemes, that is, in words that are similar enough in both phonological form and meaning to be considered possible reflexes of the same original word. The compara­tive method is applied to such sets of phonemes only if they recur in numbers of cog­nate sets. This recurrence follows from the basic regularity of sound change. Mor­phological change is not regular in the same sense, and therefore does not lead to recurrent correspondences between tentative morphs.|220|morphological reconstruction, morphological change, phonological change, sound change, mutation 848|Koch1996|The units matched in morphological reconstruction are linguistic signs, whereas the units of phonological reconstruction (phonemes) are meaningless diacritic marks that serve to differentiate the signs. Each morphological unit consists of both a stretch of phonological substance (form) and an associated grammatical or derivational meaning (function). Like phonological units, however, morphological formatives are characterised by certain combinatorial possibilities; thus morphotactics is compara­ble to phonotactics.|222|sound change, morphological change 849|Koch1996|Like phonological reconstruction, morphological reconstruction basically starts with matchings. For each set of matched formatives there is posited an original for­ mative, that is, a phonological form, meaning/function, and distribution, together with a set of morphological changes leading to each of the attested formatives.|222|morphological reconstruction, phonological reconstruction, matching, comparanda 850|Koch1996|Change in the content side of morphology is similar to that in semantics, except that there is probably more use of grammaticalisation and degrammaticalisation in mor­phological change. Since much of morphology has to do with inflectional paradigms and derivational pseudoparadigms, the semantic side of morphological reconstruc­tion will be similar to the reconstruction of semantic fields.|223|morphological change, semantic change, morphological reconstruction, semantic reconstruction 851|Koch1996|Reconstruction in all domains of language has the following features in common. Both comparative and internal reconstruction are possible. The comparativist needs to find the elements that can be compared. Appeal must be made to directionality, plausibility of changes as established by diachronic typology, as well as to a general principle of economy. In all reconstruction except phonology it is useful to rely on synchronically archaic features of the languages.|223|morphological reconstruction, syntactic reconstruction, semantic reconstruction, phonological reconstruction 852|Koch1996|A simple form of morphological change consists of the replacement of one exponent of a grammatical meaning by another exponent. If the earlier and later exponents dif­fer only in terms of phonemes that are relatable by regular sound change, this is not considered to be an instance of morphological change, but merely of sound change. If the replacement morph was formerly the exponent of a different grammatical or lexical meaning, this may be described as content change (possibly grammaticalisa­tion). Hardest to classify is a replacement where there is no obvious source in the lan­ guage for the new exponent: It may simply have been borrowed from the equivalent exponent in another language or dialect. (For constraints on the borrowing of mor­phemes, see Heath 1978.)|224|morpheme replacement, morphological change 853|Koch1996|There are several subcategories of allomorphic change. These are discussed under the headings of: * development of allomorphy (4.1) * change in relationship between basic/underlying and derived allomorphs (4.2) * change in conditioning of allomorphy (4.3) * loss of allomorphy (4.4) * redistribution of allomorphs (4.5)|224|morphological change, allomorphic change, allomorphy, morphological reconstruction 854|Koch1996|The distribution of co­allomorphs may be altered, one allomorph replacing another in certain contexts. This replacement results in the creation of new morph combina­ tions, and may cause the elimination of certain allomorphs and hence lead to a re­duction in the number of allomorphs of a particular morpheme. If all but one allo­ morph are eliminated, the result is the total loss of allomorphy, and the achievement of the ideal of one meaning being expressed by just one form. Here it is useful to dis­tinguish between allomorph redistribution in a lexical stem and in affixes. (I shall con­fine my discussion to inflectional affixes. Much the same principles apply to deriva­ tional affixes and to modificatory morphological processes.) Reduction of allomorphy in stems is traditionally called paradigm levelling (4.4). In fact, reduction of allomorphy is more likely to take place within inflectional para­digms than across derivationally related forms. Within paradigms, levelling affects (sub)paradigms that are more closely related in meaning before those that are less closely related in meaning (Bybee 1985: 64 f). Thus levelling may affect the gram­matical words expressing all the person­number combinations in a given tense before [pb] it affects other tenses, or all tenses of one aspect before other aspects, or all cases in the plural before similar cases in the singular, etc. The expanding allomorph may be either the newer one (typically created by sound change) or the older one. It is not possible to give a general rule which predicts which allomorph will prevail in such levelling. A number of factors have been suggested. A combination of several factors may be responsible in many cases.|231f|analogy, allomorphic change, paradigm levelling 855|Koch1996|1. Paradigm frequency: The variant occurring in the most forms in the paradigm prevails. 2. The variant occurring in the word form that expresses the semantically most basic category in the paradigm prevails. Basic forms express singular number, nominative case, third person, present tense, indicative mood, etc. 3. The variant occurring in the word form that occurs most frequently for the par­ ticular lexeme. Thus the plural allomorph prevails in words typically used in the plural in Frisian, where 'tooth'/'teeth' kies /kjizz­en is replaced by kjizze /kjizze­n (Tiersma 1982). Similarly, in Polish place names, the stem allomorph of the local cases has penetrated to other word forms of the paradigm (Mariczak 1957­58: 396ff.). 4. The variant that most closely resembles invariant morphemes that occur in related paradigms.|232|paradigm levelling, morphological change, allomorphic change 856|Koch1996|A whole class of morphological changes consists in a change in the external dimen­ sions of a morph, as one of the boundaries of a morph is shifted. (We are not dis­ cussing here the effects of phonological changes on the dimensions of a morph.) If the boundary is shifted outward, the new morph is bigger than the old and includes phonic material that was formerly part of another morph. If the boundary is shifted inward the new morph is smaller than the old, and material that was formerly part of the morph now belongs to another morph, or constitutes a morph of its own.|237|morphological change, morphological reconstruction 857|Koch1996|The mechanism that effects boundary change is reanalysis. The motivation for re­analysis is that an alternative analysis becomes possible and more plausible to speak­ ers for various reasons. These may involve ambiguity or universal considerations of iconicity, etc. The possibility of the new analysis may depend on changes having oc­curred elsewhere in the system. For example, sound changes may have affected the relevant morphs, the meaning of the morph in other combinations may have changed, or the morph may have disappeared from other environments.|237|reanalysis, morphological change 858|Koch1996|Morphemes may change in their semantic content or function. This is in principle a kind of semantic change. Several subtypes can be distinguished. :comment:`on the pages after, the author mentions different types, like 6.1: grammaticalization, 6.2: regrammaticalization, and 6.3: degrammaticalization.`|240|grammaticalisation, morphological change 859|Koch1996|A morph may change with respect to its status as a formal unit. That is, it may be­ come a different kind of morph. It may change with respect to its independence, from free word to bound clitic, from clitic to affix, or from affix to simply a part of anoth­er morph. (The change in status from a free lexical morpheme to a free grammatical morpheme with no cliticisation is grammaticalisation; this is syntactic but not morphological change.) :comment:`mentions later different types, like 7.1: Free morpheme becomes bound word, 7.2: Bound morpheme becomes affix (affixisation), 7.3: phrasal becomse affixal, and 7.4: Absorption.`|242|affixation, free morpheme, bound morpheme, morphological change, absorption 860|Koch1996|An affix may cease to be interpreted as a separate morph and be reanalysed as mere­ ly a part of another (typically lexical) morph. The morph is absorbed into another morph and loses its morphic status altogether. This proceeds by means of boundary deletion, which has been discussed in 5.1. This kind of change in status of a morph has been called demorphologization by Hopper: "When a morpheme loses its gram­matical­semantic contribution to a word, but retains some remnant of its original form, and thus becomes an indistinguishable part of a word's phonological construc­tion, I shall speak of the resulting phonological material as morphological residue, and of the process itself as demorphologization" (Hopper 1990: 154).|244|absorption, morphological change 861|Koch1996|Although it is generally recognised that the order of elements is more rigid within a word than between words, it is nevertheless necessary to accept the possibility that morphemes can be reordered. In the first place, reordering may take place within se­quences of clitics, and these clitics may later be reinterpreted as affixes. Secondly, there are documented instances (though relatively rare) of the the reordering of af­fixes within words.|244|morphological change, re-ordering 862|Koch1996|Another minor kind of morphological change is the doubling of morphological mark­ ers. This involves the addition of a productive affix onto the periphery of a word to make the analysis of the word more transparent, and typically occurs when the exist­ ing marker is obscure.|246|morpheme doubling, morphological change 863|Koch1996|Certain general and unidirectional developmental tendencies in morphological change can be described. Following Ferguson's (1990) terminology with respect to phonological change, I shall call these pathways. Hopper uses the term trajectory (Hopper 1990: 153). Apathway that leads from phonetics to morphology has already been mentioned (see 5.3). A different pathway leads into morphology from the lexi­ con. This long­range tendency can be described as a "diachronic trajectory for mor­ phemes, which might start out as full words, become clitics and then affixes, and fi­ nally disappear from the scene" (Hopper 1990: 153). There are two variants of this pathway, one that leads through more grammatical territory, and one which remains in the lexical domain. |247|morphological change, pathways, trajectory 864|Koch1996|:comment:`proposes a couple of different pathways of morphological change:` **Pathways from lexicon to morphology** * Lexeme → grammatical clitic → inflectional affix → part of lexical morph * Lexeme → lexical component → derivational affix → part of lexical morph |247|pathways, trajectory, morphological change 865|Klimov1990|Методика современной компаративистики, щироко известная в лингвистической литературе под довольно неудачным термином ''сравнительно-исторический метод'', представляет собой больщую совокупность методов и конкретных приемов изучения истории родственных языков,генетически восходящих к некоторой единой традиции прощлого, обычно квалифицируемой в качестве праязыка или языка-основой. Этот методический инструментарий, призванный обслуживать рещение множества задач, используется для построения системы знаний об историческом развитии языковых семей, формируемой в конечном счете в виде сравнительно-исторических грамматик. :translation:`Die Methode der heutigen Komparativistik, welche allgemein unter dem nicht sehr glücklichen Terminus ''vergleichend-historische Methode'' bekannt ist, stellt eine große Gesamtheit an abstrakten und konkreten Verfahren zur Untersuchung der Geschichte verwandter Sprachen dar, die genetisch auf eine bestimmte einheitliche Tradition der Vergangenheit zurückgehen, welche man üblicherweise als Proto-Sprache oder Grundsprache qualifiziert. Dieses methodische Instrumentarium, auf welches zurückgegriffen wird, um eine große Menge verschiedener Probleme zu lösen, wird verwendet, um ein Erkenntnissystem über die historische Entwicklung von Sprachfamilien aufzubauen, welches seine endgültige Gestalt in Form historisch-vergleichender Grammatiken erhält.` :translation:`The method of comparatistics today is generally known under the not very well-chosen term "comparative-historical method". It constitutes a huge complex of abstract and concrete procedures for the investigation of the history of related languages which genetically go back to some unofrom tradition of the past. Such a past tradition of speech is usually called a proto-language. This collection of methodological instruments which is supposed to solve a huge amount of different problems, is used to construct a system of insights into the development of language families. This system of insighs is then reported in form of historical grammars.`|6|Klimov, comparative method, system of knowledge, complex of methods, linguistic reconstruction, methodology 866|Ross1996|1. Determine on the strength of diagnostic evidence that a set of languages are genetically related, that is, that they constitute a 'family'; 2. Collect putative cognate sets for the family (both morphological paradigms and lexical items). 3. Work out the sound correspondences from the cognate sets, putting 'irregular' cognate sets on one side; 4. Reconstruct the protolanguage of the family as follows: a. Reconstruct the protophonology from the sound correspondences worked out in (3), using conventional wisdom regarding the directions of sound changes. b. Reconstruct protomorphemes (both morphological paradigms and lexical items) from the cognate sets collected in (2), using the protophonology reconstructed in (4a). 5. Establish innovations (phonological, lexical, semantic, morphological, morphosyntactic) shared by groups of languages within the family relative to the reconstructed protolanguage. 6. Tabulate the innovations established in (5) to arrive at an internal classification of the family, a 'family tree'. 7. Construct an etymological dictionary, tracing borrowings, semantic change, and so forth, for the lexicon of the family (or of one language of the family). |6f|workflow, comparative method, 868|Mitterhofer2013|It follows the first three steps of the Comparative Method by aligning words with similar meanings and looking for sound correspondences. If certain correspondences occur at least three times in a 200-item word list they are considered as regular sound correspondences. Like in the Comparative Method, one determines whether these correspondences are conditioned sound changes or whether they are independent of their environment. The correspondences and the words where they can be found are documented.|18|Blair method, cognate detection, sound correspondences, correspondence detection 869|Wang2014|The identification of cognates between two distinct languages has recently start- ed to attract the attention of NLP re- search, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cog- nates within monolingual texts (texts that are not accompanied by aligned translat- ed equivalents), by integrating word shape similarity approaches with word sense disambiguation techniques in order to account for context. Our implementa- tion is based on BabelNet, a semantic network that incorporates a multilingual encyclopedic dictionary. Our approach is evaluated on two manually annotated da- tasets. The first one shows that across different types of natural text, our method can identify the cognates with an overall accuracy of 80%. The second one, con- sisting of control sentences with semi- cognates acting as either true cognates or false friends, shows that our method can identify 80% of semi-cognates acting as cognates but also identifies 75% of the semi-cognates acting as false friends.|000|cognate detection, bilingual texts 870|Gould1979|An adaptationist programme has dominated evolutionary thought in England and the United States during the past forty years. It is based on faith in the power of natural selection as an optimizing agent. It proceeds by breaking an organism into unitary "traits" and proposing an adaptive story for each considered separately. Trade-offs among competing selective demands exert the only brake upon perfection; nonoptimality is thereby rendered as a result of adaptation as well. We criticize this approach and attempt to reassert a competing notion (long popular in continental Europe) that organisms must be analyzed as integrated wholes, with baupläne so constrained by phyletic heritage, pathways of development, and general architecture that the constraints themselves become more interesting and more important in delimiting pathways of change than the selective force that may mediate change when it occurs. We fault the adaptationist programme for its failure to distinguish current utility from reasons for origin (male tyrannosaurs may have used their diminutive front legs to titillate female partners, but this will not explain why they got so small); for its unwillingness to consider alternatives to adaptive stories; for its reliance upon plausibility alone as a criterion for accepting speculative tales; and for its failure to consider adequately such competing themes as random fixation of alleles, production of nonadaptive structures by developmental correlation with selected features (allometry, pleiotropy, material compensation, mechanically forced correlation), the separability of adaptation and selection, multiple adaptive peaks, and current utility as an epiphenomenon of nonadaptive structures. We support Darwin's own pluralistic approach to identifying the agents of evolutionary change.|000|adaptationism, evolution, adaptation, spandrels 871|Gould1979|Uniformitarianism is a dual concept. Substantive uniformitarianism (a testable theory of geologic change postulating uniformity of rates or material condisions) is fales and stifling to hypothesis formation. Methodological uniformitarianism (a procedural proinciple asserting spatial and temporal invariance of natural laws) belongs to the definition of science and is not unique to geology. Methodological uniformitarianism enalbed Lyell to exclude the miraculous from geologic explanation; its invocation today is anachronistic since the question of divine intervention is no longer an issue in scienece. Substantive uniformitarianism, an incorrect theory, should be abandoned. Methodological uniformitarianism, no a superfluous term, is best confined to the past history of geology.|000|uniformitarianism, substantive uniformitarianism, methodological uniformitarianism, Charles Lyell 872|Baker1998|Catastrophism in the Earth sciences is rooted in the view that Earth signifies its causative processes via landforms, structures and rock. Processes of types, rates and magnitudes not presently in evidence may well be signified this way. Uniformitarianism, in contrast, is a regulative stipulation motivated by the presumed necessity that science achieves logical validity in what can be said (hypothesized) about the Earth. Regulative principles, including simplicity, actualism and gradualism, are imposed a priori to insure valid inductive reasoning. This distinction lies at the heart of the catastrophist versus uniformitarian debates in the early nine- teenth century and it continues to influence portions of the current scientific program. Uniform- itarianism, as introduced by Charles Lyell in 1830, is specifically tied to an early nineteenth century view of inductive inference. Catastrophism involves a completely different form of inference in which hypotheses are generated retroductively. This latter form of logical inference remains relevant to modern science, while the outmoded notions of induction that warranted the doctrine of uniformitarianism were long ago shown to be overly restrictive in scientific practice. The latter should be relegated solely to historical interest in the progress of ideas.|000|uniformitarianism, catastrophism 873|Zhang0000|This paper describes a rule-based machine learning approach to morphological processing i the system called XMAS. XMAS discovers and acquires linguistic rules from examples of morphological combinations and accomplishes the morphological analysis and synthesis by applying the rules. This approach is independent of languages, saves time and effort for development and maintenance, and takes small lexicon space. A Korean version of XMAS is effecitively working in the English-Korean machine translation system KSHALT.|000|machine translation, morpheme detection, morphological segmentation, morphology, machine learning 874|Youn2015|How universal is human conceptual structure? The way concepts are organized in the hu- man brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but mean- ing is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single pol- ysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analy- sis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use.|000|polysemy, colexification, network, semantic change, semantic similarity 875|Beckstette2006|**Background** In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. **Results** We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. **Conclusion** Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | A|m + m - 1, where m is the length of the PSSM and A a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript. |000|amino acid alphabet, sound classes, alphabet reduction 876|Beckstette2006|The interesting part about this article is the fact that they use something similar to sound classes in order to reduce their amino-acid alphabet and then search for signal. Maybe it is even interesting to look at what they do directly, since they use a tree for the reduction of their amino acids. One could think of something similar for linguistic purposes (think of feature-based trees, or the like). Anyway, this paper is worth being mentioned in some context where sound classes play a role.|000|amino acid alphabet, alphabet reduction, sound classes 877|Mufwene2001|This major new work explores the development of creoles and other new languages, focusing on the conceptual and methodological issues they raise for genetic linguistics. Written by an internationally renowned linguist, the book discusses the nature and significance of internal and exter- nal factors – or “ecologies” – that bear on the evolution of a language. The book surveys a wide range of examples of changes in the structure, function and vitality of languages, and suggests that similar ecologies have played the same kinds of roles in all cases of language evolution. Drawing on major theories of language formation, macroecology and population genetics, Mufwene proposes a common approach to the development of creoles and other new languages. The Ecology of Language Evolution will be welcomed by students and researchers in creolistics, sociolinguistics, theoretical linguistics, and theories of evolution.|000|language evolution, language change, creolisation, creole languages, ecology, language as species, biological parallels 878|Ansaldo2009|Why do groups of speakers in certain times and places come up with new varieties of languages? What are the social settings that determine whether a mixed language, a pidgin, or a Creole will develop, and how can we understand the ways in which different languages contribute to the new grammar? Through the study of Malay contact varieties such as Baba and Bazaar Malay, Cocos Malay, and Sri Lanka Malay, as well as the Asian Portuguese vernacular of Macau, and China Coast Pidgin, the book explores the social and structural dynamics that underlie the fascinating phenomenon of the creation of new, or restructured, grammars. It emphasizes the importance and interplay of historical documentation, socio-cultural observation, and linguistic analysis in the study of contact languages, offering an evolutionary framework for the study of contact language formation – including pidgins and Creoles – in which historical, socio-cultural, and typological observations come together.|000|language as species, ecology, language evolution, creolisation, creole languages, South-East Asian languages, Chinese dialects, biological parallels 879|Auer2005|Dialects are constantly changing, and due to increased mobility in recent years, European dialects have ‘levelled’, making it difficult to distinguish a native of Reading from a native of London, or a native of Bonn from a native of Cologne. This comprehensive study brings together a team of leading scholars to explore all aspects of recent dialect change, in particular dialect conver- gence and divergence. Drawing on examples from a wide range of European countries – as well as areas where European languages have been transplanted – they examine a range of issues relating to dialect contact and isolation, and show how sociolinguistic conditions differ hugely between and within European countries. Each specially commissioned chapter is based on original research, giving an overview of current work on that particu- lar area and presenting case studies to illustrate the issues discussed. The first ever book devoted to the position of dialects in Europe, Dialect Change will be welcomed by all those interested in sociolinguistics, dialectology, and European languages.|000|language evolution, ecology, language as species, language contact, biological parallels, Indo-European, Chinese dialectology 881|Zhang1997|Adaptive evolution at the molecular level can be studied by detecting convergent and parallel evolution at the amino acid sequence level. For a set of homologous protein sequences, the ancestral amino acids at all interior nodes of the phylogenetic tree of the proteins can be statistically inferred. The amino acid sites that have experienced convergent or parallel changes on independent evolutionary lineages can then be identified by comparing the amino acids at the beginning and end of each lineage. At present, the efficiency of the methods of ancestral sequence inference in identifying convergent and parallel changes is unknown. More seriously, when we identify convergent or parallel changes, it is unclear whether these changes are attributable to random chance. For these reasons, claims of convergent and parallel evolution at the amino acid sequence level have been disputed. We have conducted computer simulations to assess the efficiencies, of the parsimony and Bayesian methods of ancestral sequence inference in identifying convergent and parallel-change sites. Our results showed that the Bayesian method performs better than the parsimony method in identifying parallel changes, and both methods are inefficient in identifying convergent changes. However, the Bayesian method is recommended for estimating the number of convergent-change sites because it gives a conservative estimate. We have developed statistical tests for examining whether the observed numbers of convergent and parallel changes are due to random chance. As an example, we reanalyzed the stomach lysozyme sequences of foregut fermenters and found that parallel evolution is statistically significant, whereas convergent evolution is not well supported.|000|adaptive evolution, convergent evolution, parallel evolution 882|Valls2013|We investigate ongoing linguistic changes in north-western Catalan using a contemporary corpus designed for this purpose. Using dialectometric methods we analyze the aggregate linguistic distance between varieties in apparent time, paying special attention to structural dialect loss due to advergence to standard Catalan in Catalonia (Spain) and Andorra. Dialect leveling there contrasts with the relative stability of the Catalan dialects in Aragon, where Catalan is not an official language. The dialect leveling in Catalonia and Andorra is due to (standard) language support policies for Catalan, which are absent in Aragon. The Catalonian and Andorran changes have strengthened the state-internal border between Aragon and Catalonia. This paper is one of the first studies of border effect not only between different states, but also between different administrative regions within a single state. We also show that urban populations in Catalonia and Andorra have pronunciations closer to standard Catalan than rural populations.|000|aggregate distances, Levenshtein distance, dialectometry, advergence, divergence, Catalan dialects 883|Stern2013|Abstract | The evolution of phenotypic similarities between species, known as convergence, illustrates that populations can respond predictably to ecological challenges. Convergence often results from similar genetic changes, which can emerge in two ways: the evolution of similar or identical mutations in independent lineages, which is termed parallel evolution; and the evolution in independent lineages of alleles that are shared among populations, which I call collateral genetic evolution. Evidence for parallel and collateral evolution has been found in many taxa, and an emerging hypothesis is that they result from the fact that mutations in some genetic targets minimize pleiotropic effects while simultaneously maximizing adaptation. If this proves correct, then the molecular changes underlying adaptation might be more predictable than has been appreciated previously.|000|convergence, convergent evolution, parallel evolution, parallel development 884|Jones2010|The motor protein prestin confers sensitive and selective hearing in mammals. Remarkably, prestin amino-acid sequences of echolocating dolphins have converged to resemble those of distantly related echolocating bats.|000|convergent evolution, parallel evolution 885|Nishi1998|This paper aims to examine the various interpretations of the phonological system of Old Burmese (of Burma, now Myanmar) so far made and propose a conceivable framework of the history of Burmese in the light of our recent knowledge of Burmish languages and the regional dialects of Burmese, as well as orthographic variations in, and or- thographic changes since, Old Burmese, from the standpoint that Pre- sent-day Standard Burmese is a later changed form of Old Burmese.|000|Tibeto-Burman, Burmese, Old Burmese, language change, linguistic reconstruction 886|Gray2010|Many of the cell's macromolecular machines appear gratuitously complex, comprising more components than their basic functions seem to demand. How can we make sense of this complexity in the light of evolution? One possibility is a neutral ratchet-like process described more than a decade ago (1), subsequently called constructive neutral evolution (2). This model provides an explanatory counterpoint to the selectionist or adaptationist views that pervade molecular biology (3). |000|constructive neutral evolution, neutral evolution, adaptationism, evolution of systems, systematic aspect of evolution 887|Covello1993|The term 'RNA editing' encompasses a variety of processes that change the primary nucleotide sequence of an RNA transcript from that of its encoding DNA. As in the case of certain other molecular genetic phenomena, for example RNA splicing, the discovery of RNA editing presented molecular biologists with an evolutionary puzzle, since the existence of RNA editing offers no obvious selective advantage. A three-step model for the evolution of RNA editing is proposed, based on the co-evolution of editing activity and editing sites, with genetic drift as an important component. The implications of this model for the known forms of RNA editing are discussed.|000|rna editing, genetic drift, adaptationism, neutral evolution 888|Gray2010|One possibility is a neutral ratchet-like process described more than a decade ago ( 1) (@Covello1993), subsequently called constructive neutral evolution ( 2) (@Stoltzfus1999). This model provides an explanatory counterpoint to the selectionist or adaptationist views that pervade molecular biology ( 3) (@Lynch2007).|920|constructive neutral evolution, neutral evolution, ratchet-like process, adaptationism 889|Stoltzfus1999|The neutral theory often is presented as a theory of "noise" or silent changes at an isolated "molecular level," relevant to marking the steady pace of divergence, but not to the origin of biological structure, function, or complexity. Nevertheless, precisely these issues can be addressed in neutral models, such as those elaborated here with regard to scrambled ciliate genes, gRNA-mediated RNA editing, the transition from self-splicing to spliceosomal splicing, and the retention of duplicate genes. All of these are instances of a more general scheme of "constructive neutral evolution" that invokes biased variation, epistatic interactions, and excess capacities to account for a complex series of steps giving rise to novel structures or operations. The directional and constructive outcomes of these models are due not to neutral allele fixations per se, but to these other factors. Neutral models of this type may help to clarify the poorly understood role of nonselective factors in evolutionary innovation and directionality.|000|constructive neutral evolution, neutral evolution, ratchet-like process, adaptationism 890|Lynch2007|Although numerous investigators assume that the global features of genetic networks are moulded by natural selection, there has been no formal demonstration of the adaptive origin of any genetic network. This Analysis shows that many of the qualitative features of known transcriptional networks can arise readily through the non-adaptive processes of genetic drift, mutation and recombination, raising questions about whether natural selection is necessary or even sufficient for the origin of many aspects of gene-network topologies. The widespread reliance on computational procedures that are devoid of population-genetic details to generate hypotheses for the evolution of network configurations seems to be unjustified.|000|non-adaptive processes, neutral evolution, constructive neutral evolution, adaptationism 891|Gray2010|When faced with such complexity, the favored (adaptationist) explanation would surely be selection for improved basic function. For example, ribosomal complexity is not generally regarded as gratuitous, but rather the result of evolutionary accretion of proteins that made this machine progressively faster, more stable, and more efficient at translation ( 6). For the addition of some of these proteins, selection probably did drive increased complexity, but there is no basis to assume that this explains all, or even most, of the increased complexity of these machines.|920|adaptationism, neutral evolution, constructive neutral evolution 892|Gray2010|At the organismal level, Maynard Smith and Szathmary proposed that a ratchet mechanism called contingent irreversibility might render previously independent evolutionary units interdependent for “accidental reasons that have little to do with the selective forces that led to the evolution of the higher entity in the first place” ( 8). (@Smith1995) An example is the mutual loss of autonomy by the symbionts that became mitochondria or plastids and the cells that harbored them.|921|contingent irreversibility, neutral evolution 893|Gray2010|An idealized general model of such a chain of events (see the figure) illustrates how the two components (such as an intron and TyrRS) might revert to independence, but are more likely to “ratchet” toward greater dependency over time. An initial mutation creates a dependent state, and only reversion at this site is likely to break the dependency. By contrast, mutations at any other site have the potential to create further dependencies. Random mutations are therefore unlikely to restore one component to its original state of independence from the other. If there are more ways for dependence to increase than decrease, an increase is unavoidable. Thus, constructive neutral evolution is a directional force that drives increasing complexity without positive (and in small populations, against mildly negative) selection. Negative selection is involved, but only as the stabilizing force that keeps this directionality from reversing.|921|negative selection, neutral evolution, directional force, ratchet-like process 894|Gray2010|Both the order of events and the poten- tial for a ratchet-like increase in complexity are often overlooked when explaining complex systems, in particular when intricate features are interpreted as having arisen as corrections or countermeasures.|921|adaptationism, ratchet-like process, neutral evolution 895|Gray2010|Although compensation for defects caused by “selfish” (self-propagating) DNA elements may seem intuitive, it is problematic to propose that, on the way to evolving compensatory machinery, an intermediate state had to exist that was less fit than its ancestors and sisters. Why would such an intermediate not just die out in competition before its rescue by compensatory complexity yet to be invented? A more workable model is that the compensating mechanism was already present (possibly serving unrelated functions).|921|neutral evolution, adaptationism, complex systems 896|Gray2010|Indeed, although complexity in biology is generally regarded as evidence of “fine tuning” or “sophistication,” large biological conglomerates might be better interpreted as the consequences of runaway bureaucracy—as biological parallels of nonsensically complex Rube Goldberg machines that are overengineered to perform a single task (13). (@Sancar2008)|921|Rube Goldberg machine, neutral evolution, adaptationism 897|Sancar2008|Circadian rhythms are generated by cell-autonomous molecular clocks in organisms ranging from cyanobacteria to humans. Recent research on the mechanisms of molecular clocks call for a reflection on the cause and necessity of complexity in biological systems.|000|Rube Goldberg machine, neutral evolution, adaptationism 898|Anthony2015|For two centuries, the identification of the “homeland” of the Indo-European (IE) languages and the details of the family’s diversification and expansion have remained unsolved problems. One reason is the difficulty of linking linguistic evidence with archaeological evidence in the absence of archaeological finds of writing; another is that the problem’s solution requires an interdisciplinary effort in an age of increasing specialization. We were trained in European archaeology (Anthony) and IE historical linguistics (Ringe), and we have both had to educate ourselves in related dis- ciplines in order to pursue our work. However, collaboration between specialists eventually becomes necessary. It is not just a matter of avoiding elementary errors; in a case such as the IE homeland problem, a broadly satisfying solution must be global, applying methods from all relevant disciplines to act as checks on solutions that satisfy only a selected range of data. We believe that such an integrated solution is finally attainable.|000|Indo-European, Indo-European homeland, Indo-European diversification 899|Deo2015|It is well established that meanings associated with linguistic expres- sions evolve in systematic ways across time. We have little precise understanding, though, of why and how this happens. We know even less about its implications for our models of grammar, communica- tion, and cognition. This article reviews developments and results from grammaticalization, typology, and formal semantics/pragmatics that can be brought to bear on addressing the problem of semantic change. It deconstructs the notion of grammaticalization paths and offers a set of questions for systematic investigation, following which I contextualize the small body of literature at the intersection of formal semantics/pragmatics and language change. The approach I take is pro- grammatic rather than survey oriented, given the emergent nature of the domain of investigation and the limited body of existing literature that pertains directly to the questions raised here.|000|semantics, semantic change, semantic similarity, historical semantics, historical linguistics 900|Nishi1998|This paper aims to examine the various interpretations of the phonological system of Old Burmese (of Burma, now Myanmar) so far made and propose a conceivable framework of the history of Burmese in the light of our recent knowledge of Burmish languages and the regional dialects of Burmese, as well as orthographic variations in, and or- thographic changes since, Old Burmese, from the standpoint that Pre- sent-day Standard Burmese is a later changed form of Old Burmese.|000|Tibeto-Burman, Old Burmese, Burmish languages, genetic classification 901|Thomas1967|The wordlists are based primarily on the Greenberg wordlist for African languages, as revised by Armstrong and used at the University of Ibadan. In the course of the Ijo dialect survey referred to above, however, Kay Williamson added a number of words which were easy to elicit in the Delta area and eventually formed the nucleus of the Supplementary wordlist used at Iabadan. Not all of these words had, however, been added at the time that the Epie and engenni lists were collected, so that the items include only some of those on the Supplementary list. For ease [pb] of reference, the items have, except for the numerals, been arranged alphabeticaly; their number on the Greenberg or Supplementary list is enclosed in parentheses after the item, those on the Supplementary list being preceded by S.|4f|Ibadan concept list, concept list, African languages 902|Snider2006|Collectively, linguistic research in Africa has produced a wealth of lexical data, and while these data often serve useful purposes in their individual projects, their use to comparative linguistics is minimal, given their lack of a standard format. The SIL Comparative African Wordlist (1700 words) is therefore an attempt to offer a format for these data that is more amenable to comparative analysis. There are two main reasons for the development of this wordlist. First, many of the existing African wordlists simply do not contain enough lexical items to allow one to do serious comparative analysis. Second, many existing African wordlists are specific to a particular language family, and thus, a pan-African list offers the potential of serious comparative research. The items in this wordlist appear with both English and French glosses and are arranged semantically under twelve main headings, generally moving from human domains to non-human domains, and from concrete to more abstract items.|000|concept list, comparative wordlist, African languages 903|Zeisler2015|The paper discusses recent suggestions that Tibetan may originally have had a system of person marking, which could thus be reconstructed for proto-Tibeto-Burman. While self-evident traces of such person marking are clearly missing, the ‘irregular’ paradigm of the verb ‘eat’ has been taken as indirect evidence. This proposal, however, is in need of several further assumptions. The ‘irregular’ stem forms zos and zo, on the other hand, correspond to a regular, albeit obsolete modal derivation of ability in Old and Classical Tibetan.|000|Sino-Tibetan, Tibetan, irregular paradigm, morphological change 904|LaPolla2015|This chapter presents a view of communication not as coding and decoding, but as ostension and inference, that is, one person doing something to show the intention to communicate, and then another person using abductive inference to infer the reason for the person’s ostensive act, creating a context of interpretation in which the communicator’s ostensive act “makes sense”, and thereby inferring the communicative and informative intention of the person. Language is not necessary for communication in this view, but develops as speakers use linguistic patterns over and over again to constrain the addressee’s creation of the context of interpretation. Speakers choose which aspects to constrain the interpretation of, and language forms conventionalize from frequent repetition. As constraining the interpretation requires more effort than not constraining it in that way, it must be important to the speakers to constrain that particular aspect of the meaning, otherwise they would not put in the extra effort. Logically, then, the forms that do conventionalize must have been motivated by the cognition and culture of the speakers of the language when they conventionalized, even though over time the motivation is often lost and the form continues to be used only due to convention and habit.|000|cultural evolution, language evolution, language and communication 905|Lewis2001|Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modification of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modification, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses. |000|ancestral states, morphological characters, multi-state models, maximum likelihood 906|Lewis2001|Rather than propose a new model, my purpose here is to examine the consequences of applying this type of model to morphological character data and to describe modiŽcations needed to accommodate the peculiarities of discrete mor- phological data sets. I will hereafter use the acronym Mk to refer to this family of models (where the “M” stands for “Markov” and “k” refers to the number of states observed). The Mk model is a generalized JC69 model, the latter representing the special case of k=4 (the JC69 model could thus be referred to as the M4 model). The Mk model assumes that a lineage is always in one of k possible states (k >=2), with no state considered plesiomorphic or apomorphic a priori. Along a particular branch of the phylogeny, a character can change state at any instant in time, with the probability of such an event being equal for all such time intervals along the branch. An instant is deŽned to be an inŽnitesimal period of time, denoted *dt*, during which there can be at most one substitution (= change of state) event. Different instantaneous time periods are independent of one another with respect to the probability of a character state change, and the probability of change is symmetrical (i.e., the instantaneous probability of changing from state i to state j is the same as the instantaneous probability of changing from j to i). The length of a branch under the Mk model is deŽfined to be the expected number of changes per character across the branch, which is equal to :math:`(k - 1) \alpha t`, where :math:`\alpha` is the instantaneous rate of any particular transition between states, and :math:`t` is the amount of time represented by the branch. :comment:`Matrix is shown in the paper and further discussed.`|915|multi-state models, evolutionary model, maximum likelihood, likelihood model 907|Lewis2001|My primary tool in the defense of the Mk model is the fact that Tufey and Steel’s (1997) parsimony model (the TS97 model), when used within the framework of ML inference, has the property of always choosing the same tree(s) as equally-weighted parsimony, even to the point of choosing multiple trees if there are multiple most-parsimonious solutions. That is, the likelihood under TS97 is a monotonically decreasing function of the parsimony score, meaning that a likelihood analysis using TS97 is identical to a parsimony analysis using equally-weighted par- simony (the tree that minimizes the number ofsteps also maximizes in L).Thus, the justiŽcation for using likelihood for morphological data instead of parsimony hinges on whether the differences between Mk and TS97 are acceptable from both a biological and a statistical standpoint.|916|likelihood model, multi-state models, parsimony, likelihood, 908|Lewis2001|This acquisition bias never arises in the use of likelihood for molecular phylogenetics because the linear nature of nucleic acids and proteins allows easy circumscription of a range of characters, including constant as well as variable characters. Acquisition bias is problematic because mean rates of evolution embodied in the branch length parameters will be overestimated if only variable characters are present in the dataset. Because branch lengths play an important role in determining the overall likelihoodfora particular tree topology, such overestimation, if not corrected, would lead to bias in tree topology inferences. Fortunately, one can correct for acquisition bias, and the remainder of this section is devoted to an explanation of how this is accomplished for the Mk model.|917|acquisition bias, morphological characters, multi-state models, likelihood model 909|Lewis2001|Characters can be divided into (1) parsimony-informative characters, which can potentially have different parsimony character lengths on different trees; (2) autapomorphic characters, which are variable but have the same length on all trees; and (3) constant characters, which have only one state. Because autapomorphic characters are considered uninformative by many systematists using parsimony, these too are often left out of data sets used in phylogenetic analyses (for an exception, see Funk and Wagner, 1995). One of the reasons likelihood methods resist long-branch attraction problems is that they can accept an explanation of similarity based on convergent or parallel evolution, whereas parsimony allows only historical explanations of similarity among the terminal taxa.|917|autapomorphie, parsimony, parsimony-informative characters, morphological characters, likelihood model 910|Lewis2001|Likelihood methods may choose to keep separate two lineages having very long branches because the evidence for convergence/parallelism is (in this case) stronger than the evidence for shared history (Felsenstein, 1978). If branch lengths are incorrectly estimated (say, consistently overestimated), likelihood methods would be biased in their choice of tree topology. Thus, it is important to in some way correct for the systematic omission of constant characters. Autapomorphic and highly variable characters do not present the same problem as constant characters, being identiŽable and at least enumerable.|917|long-branch attraction, likelihood model, multi-state models, morphological characters, acquisition bias 911|Tamariz2014|For example, variants that are easier to learn or use have an increased likelihood of being adopted, and therefore propagate in populations faster than a neutral drift model would predict.|2|content-biased selection, content-biased model, selection, cultural selection, cultural evolution 912|Ghirlanda2006|In the human sciences, cultural evolution is often viewed as an autonomous process essentially free of genetic influence. A question that follows is, If culture is not influenced by genes, can it take any path? Employing a simple mathematical model of cultural transmission in which individuals may copy each others traits, it can be shown that cultural evolution favors individuals who are weakly influenced by others and able to influence others. The model suggests that the cultural evolution of rules of cultural transmission tends to create populations that evolve rapidly toward conservatism. Bias in cultural transmission may result purely from cultural dynamics. Freedom from genetic influence is not freedom to take any direction.|000|cultural evolution, selection, cultural selection 913|Arber2009|In recent years molecular mechanisms and natural strategies have been explored that spontaneously generate genetic variations at low rates without seriously affecting genetic stability at the level of populations. Thereby acquired knowledge suggests systemic aspects of evolutionary interdependences both in the past and in future evolutionary developments. The natural strategy of DNA acquisition by horizontal gene transfer interconnects different branches of the tree of evolution at random times. This makes in principle the entire global gene pool of the biosphere available to any kinds of living beings for their further evolutionary development. The relevance of this knowledge for risk assessments of genetically engineered organisms is discussed.|000|systematic aspect of evolution, evolutionary mechanisms, driving forces of evolution, drift, selection 914|Arber2009|The Darwinian theory of biological evolution has its roots in paleontological studies comparing fossils with their counterparts of living organisms, on the one hand, and in the comparison of phenotypic traits of obviously related organisms living in geographic and/or in reproductive isolations, on the other hand. Darwin proposed that phenotypic variants represented the substrate for natural selection. This means that natural populations of living organisms are steadily exposed to the selective pressure exerted by the encountered living conditions. Variants that can deal better with their environment than others are thereby favoured and will overgrow less favoured variants in the longer term. This suggested that related phenotypic variants and related species of living organisms have common descent. In order to illustrate this situation, Chares Darwin has drawn in his notebooks symbolic evolutionary trees. This testifies of the idea that phenotypically related organisms and perhaps all living organisms represent a large evolutionary system of interrelatedness and historical interdependence.|242|selection, evolution, driving forces of evolution, evolutionary mechanisms 915|Arber2009|The tree pillars of the concept of Neo-Darwinian evolution are drawn on top of Fig. 1: (1) genetic variation is the driving force of biological evolution. (2) Together with ***** the at any time available genetic variants it is natural selection which directs the biological evolution. (3) Geographic and reproductive isolations act as modulators of biological evolution.|242|genetic variation, natural selection, geographic isolation, reproductive isolation 916|Winther2008|Darwin's 19th century evolutionary theory of descent with modification through natural selection opened up a multidimensional and integrative conceptual space for biology. We explore three dimensions of this space: explanatory pattern, levels of selection, and degree of difference among units of the same type. Each dimension is defined by a respective pair of poles: law and narrative explanation, organismic and hierarchical selection, and variational and essentialist thinking. As a consequence of conceptual debates in the 20th century biological sciences, the poles of each pair came to be seen as mutually exclusive opposites. A significant amount of 21st century research focuses on systems (e.g., genomic, cellular, organismic, and ecological/global). Systemic Darwinism is emerging in this context. It follows a “compositional paradigm” according to which complex systems and their hierarchical networks of parts are the focus of biological investigation. Through the investigation of systems, Systemic Darwinism promises to reintegrate each dimension of Darwin's original logical space. Moreover, this ideally and potentially unified theory of biological ontology coordinates and integrates a plurality of mathematical biological theories (e.g., self-organization/structure, cladistics/history, and evolutionary genetics/function). Integrative Systemic Darwinism requires communal articulation from a plurality of perspectives. Although it is more general than these, it draws on previous advances in Systems Theory, Systems Biology, and Hierarchy Theory. Systemic Darwinism would greatly further bioengineering research and would provide a significantly deeper and more critical understanding of biological reality. |000|systematic aspect of evolution, natural selection, adaptationism, evolutionary theory, evolutionary mechanisms 917|Ferguson1990|One of the most powerful tools in the armamentarium of linguists engaged in the study of diachronic phonology is the often implicit notion that some changes are phonetically more likely than others. Thus if a linguist finds a systematic correspondence between [g] and [d  ] in two related language varieties, it will be reasonable to assume that the stop is the older variant and the affricate the younger one until strong counter evidence is found. The linguist makes such an assumption because experience with many languages has shown that the change of [g] to [d  ] is fairly common and tends to occur under certain well-documented conditions whereas the reverse change is unusual and problematic. This line of argumentation has been employed, either explicitly or implicitly, since the earliest days of modern historical linguistics. Because of the importance of this methodological tool, one might expect that general treatises and introductory textbooks on historical linguistics would devote considerable space to a presentation of the relative probabilities of various possible sound changes, as well as explanatory factors accounting for them. Also, because of the centrality of alternations and processes to the field of phonological theory, one might expect that general treatises and introductory textbooks in phonology would devote considerable space to this topic. Unfortunately, authors of books on historical linguistics or phonological theory have a great deal of other ground to cover, and this simple but important concern tends to be neglected. :comment:`Quted after @Blevins2004 p. 6f`|59f|frequency, sound change, regular sound change, probability of sound change patterns, sound change patterns 918|Blevins2004|Evolutionary Phonology addresses itself directly to this basic but central concern. This study fills a gap in the literature by providing a sustained argument demonstrating that a broad range of phonological phenomena can be explained in terms of common phonetically motivated sound change. Evolutionary Phonology constitutes a concrete and comprehensive attempt to explain the majority of the world’s recurrent sound pat- terns in terms of well-understood instances of phonetically motivated sound change. As a concrete model, it incorporates current models of articulatory phonetics, speech perception, and language acquisition. As a comprehensive model, it summarizes a great deal of work in experimental phonetics, typology, variation, and theoretical phonology, and relates this to centuries of work modeling sound change and sound patterns. As an explanatory model, it locates the domain of explanation for many recurrent synchronic patterns in the diachronic dimension.|7|evolutionary phonology, sound change, sound change patters 919|Blevins2004|First, *all spoken language is characterized by a wide range of phonetic variation, some of which is language specific, and some of which is determined by physical properties of the [pb] human vocal apparatus.* [...] A second observation is that, though language transmission from one generation to the next is constrained by perceptual, articulatory, cognitive, and social factors, *language transmission is, by its very nature, indirect and imperfect*. Within this imperfect system of transmission, sound change may be viewed as the norm, not the exception. Since every individual will have slightly different early childhood experiences, every individual will, by definition, form a grammar based on distinct sets of surface forms.|7f|evolutionary phonology, sound change model 920|Blevins2004|First, the suggested typology of sound changes with sources in misperception, ambiguous segmentation, and ambiguity due to variation is descriptively adequate. Second, where sound changes appear to defy this typology, they can be shown to have non-phonetic origins. Third, and most strikingly, the general model of sound change makes predictions regarding phonetic preconditions of change which find general support in experimental and typological studies.|8|misperception, synchronic variation, evolutionary phonology, sound change 921|Blevins2004|The working hypothesis supported throughout this volume is that *recurrent synchronic sound patterns have their origins in recurrent phonetically motivated sound change*. [pb] Common instances of sound change give rise to commonly occurring sound patterns. Certain sound patterns are rare or unattested, because there is no common pathway of change which will result in their evolution.|8f|synchronic variation, evolutionary phonology, phonetically motivated sound change 922|Blevins2004|Examples of generalizations over sound systems of the world’s languages i.Segment inventories a. If a language has only three vowels, it will usually have /i, u, a/ b. All languages have voiced sonorants and voiceless obstruents in their segment inventories. c. In the series of voiced stops /b d g/, /g/ is most likely to be missing. d. No language contrasts voiceless laryngealized obstruents with their voiceless ejective counterparts. ii.Stress patterns e. There are languages in which stress falls consistently on the first syllable of the word, or the last syllable of the word, but there are no languages in which stress falls regularly on the middle syllable of the word (e.g. the second syllable of a three-syllable word, the third syllable of a five-syllable word, and the fourth syllable of a seven-syllable word.) f. There are languages in which stressed syllables must be separated by single unstressed syllables, and others where stressed syllables must be separated by two unstressed syllables, but there are no languages where stressed syllables must be separated by three unstressed syllables. g. There are languages with long vowels and short vowels where all long vowels must be stressed, but there are no languages with long and short vowels where all short vowels must be stressed. [pb] iii. Phonotactics h. In nearly all languages, each consonant in a syllable- internal obstruent cluster must agree in laryngeal features. i. In many languages, each consonant in an obstruent cluster must agree in laryngeal features. j. In many languages, there is no possible laryngeal contrast for obstruents in pre-obstruent position. k. In languages where there is no possible laryngeal contrast for obstruents in pre-obstruent position, laryngeal contrasts are neutralized in this position in derived environments. |9f|stress, phonotactics, segment inventories, stress patterns 923|Zipf1935|Frequency counts of phonemes, morphemes, and words in samples of written discourse in diverse languages are presented in support of the generalization that the more complex any speech element, the less frequently does it occur. Thus, the greater the frequency of occurrence of words, the less tends to be their average length, and the smaller also is the number of different words. The relation between frequency and number of different words is said to be expressed by the formula ab2 = k, in which a represents the number of different words of a given frequency and b the frequency. The relationship between the magnitude of speech elements and their frequency is attributed to the operation of a "law" of linguistic change: that as the frequency of phonemes or of linguistic forms increases, their magnitude decreases. There is thus a tendency to "maintain an equilibrium" between length and frequency, and this tendency rests upon an "underlying law of economy." Human beings strive to maintain an "emotional equilibrium" between variety and repetitiveness of environmental factors and behavior. A speaker's discourse must represent a compromise between variety and repetitiveness adapted to the hearer's "tolerable limits of change in maintaining emotional equilibrium." This accounts for the maintenance of the relationship ab2 = k; the exponent of b expresses this "rate of variegation." (PsycINFO Database Record (c) 2012 APA, all rights reserved)|000|Zipf's law, frequency, rank frequencies 924|Rimor1984|Historical change in ASL is characterized by a variety of reduction phenomena that facilitate articulation and perception. We show that similar changes can be induced under laboratory conditions. Two experiments are reported. In the first serial transmission and speeded discourse were used in the laboratory to induce reductions in signs performed in their old forms. The changes were found to be similar to historical change; reductions occurred in both signs and non-linguistic pantomime and for both signers and non-signers, implying the work of natural phonetic processes. In the second experiment, similar reductions were exhibited in signs when they underwent register shift, from citation form to conversational form. The results of the two experiments suggest that synchronic variation and diachronic change stem from the same natural phonetic processes that favor ease of articulation and ease of perception. |000|naturalness, historical change, American Sign Language, language change 925|Frege1918|Vorstellungen können nicht gesehen oder getastet, weder gerochen, noch geschmeckt, noch gehört werden. [...] Zweitens: Vorstellungen werden gehabt. Man hat Empfindungen, Gefühle, Stimmungen, Neigungen, Wünsche. Eine Vorstellung, die jemand hat, gehört zu dem Inhalte seines Bewußtseins. [...] Drittens: Vorstellungen bedürfen eines Trägers. Die Dinge der Außenwelt sind im Vergleiche damit selbständig.|67|Three Worlds, world II, world of mental objects, Popper's World III 926|Frege1918|So scheint das Ergebnis zu sein: Die Gedanken sind weder Dinge der Außenwelt noch Vorstellungen. Ein drittes Reich muß anerkannt werden. Was zu diesem gehört, stimmt mit den Vorstellungen darin überein, daß es nicht mit den Sinnen wahrgenommen werden kann, mit den Dingen aber darin, daß es keines Trägers bedarf, zu dessen Bewußtseinsinhalte es gehört. So ist z. B. der Gedanke, den wir im pythagoreischen Lehrsatz aussprachen, zeitlos wahr, unabhängig davon wahr, ob irgendjemand ihn für wahr hält. Er bedarf keines Trägers. Er ist wahr nicht erst, seitdem er entdeckt worden ist, wie ein Planet, schon bevor jemand ihn gesehen hat, mit andern Planeten in Wechselwirkung gewesen ist.|69|Popper's World III, Three Worlds, logical objects 927|Popper1978|By world 3 I mean the world of the products of the human mind, such as languages; tales and stories and religious myths; scientific conjectures or theories, and mathematical constructions; songs and symphonies; paintings and sculptures. But also aeroplanes and airports and other feats of engineering.|144|Popper's World III, logical objects 928|Buffon1755|Pour donner une idée plus nette de l’ordre des chiens, de leur dégénération dans les différens climats, et du mélange de leurs races, je joins ici une table, ou, si l’on veut, une espèce d’arbre généalogique, où l’on pourra voir d’un coup d’œil toutes ces variétés : cette table est orientée comme les cartes géographiques, et l’on a suivi, autant qu’il étoit possible, la position respective des climats. Le Chien de Berger est la souche de l’arbre : ....." :translation:`[The 1781 English translation by William Smellie is] "To give a clear idea of the different kinds of dogs, of their degeneration in particular climates, and of the mixture of their races, I have subjoined a table, or genealogical tree, in which all these varieties may be easily distinguished. This tree is drawn in the form of a geographical chart, preserving as much as possible the position of the different climates to which each variety naturally belongs. The shepherd’s dog is the root of the tree ....."]`|225|phylogenetic network, genealogy of dogs 929|Ragan2009|It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate description of order in nature, and by about 1780 it had been largely abandoned without a satisfactory alternative having been agreed upon. In 1750 Donati described aquatic and terrestrial organisms as forming a network, and a few years later Buffon depicted a network of genealogical relationships among breeds of dogs. In 1764 Bonnet asked whether the Chain might actually branch at certain points, and in 1766 Pallas proposed that the gradations among organisms resemble a tree with a compound trunk, perhaps not unlike the tree of animal life later depicted by Eichwald. Other trees were presented by Augier in 1801 and by Lamarck in 1809 and 1815, the latter two assuming a transmutation of species over time. Elaborate networks of affinities among plants and among animals were depicted in the late Eighteenth and very early Nineteenth centuries. In the two decades immediately prior to 1837, so-called affinities and/or analogies among organisms were represented by diverse geometric figures. Series of plant and animal fossils in successive geological strata were represented as trees in a popular textbook from 1840, while in 1858 Bronn presented a system of animals, as evidenced by the fossil record, in a form of a tree. Darwin's 1859 tree and its subsequent elaborations by Haeckel came to be accepted in many but not all areas of biological sciences, while network diagrams were used in others. Beginning in the early 1960s trees were inferred from protein and nucleic acid sequences, but networks were re-introduced in the mid-1990s to represent lateral genetic transfer, increasingly regarded as a fundamental mode of evolution at least for bacteria and archaea. In historical context, then, the Network of Life preceded the Tree of Life and might again supersede it. |000|phylogenetic network, phylogenetic tree, Darwin, history, history of science 930|Whewell1840|The Consilience of Inductions takes place when an Induction obtained from one class of facts, coincides with an Induction, obtained from another different class. This Consilience is a test of the truth of the Theory in which it occurs. |469|consilience, cumulative evidence, biological parallels 931|Whewell1840|[...] the cases in which inductions from classes of facts altogether different have thus jumped together, belong only to the best established theories which the history of science contains. And as I shall have occasion to refer to this particular feature in their evidence, I will take the liberty of describing it by a particular phrase; and will term it the Consilence of Inductions. |230|consilience, cumulative evidence, biological parallels 932|Berg1998|There is only one way out of this unforutnate state of affairs. Thsi method is known as the *cumulative-evidence argument*. The basic idea is that, as the theoretical value of any individual test is preforce limited, the number of tests has to be fairly high to reach relibalbe conclusions. [...] A correct prediction for one particular data set is no more than suggestive of the explanatory potential of processing principles; but correct predictions for a wide range of materials may warrant the claim that processing should be taken seriously as an explanation of linguistic patterns.|66|cumulative evidence, consilience 933|Popper1978|Thought contents are, we may conjecture, products of human language; and human languages, in their turn, are the most important and basic of world 3 objects.|159|Popper's World III, language, Three Worlds 934|Popper1967|To explain this expression I will point out that, without taking the words 'world' or 'universe' too seriously, we may distinguish the following three worlds or universes: first, the world of physical objects or of physical states; secondly, the world of states of consciousness, or of mental states, or perhaps of behavioural dispositions to act; and thirdly, the world of *objective contents of thought*, especially of scientific and poetic thoughts and of works of art.|58|Three Worlds, Popper's World III, Karl Popper 935|Slingerland2012|[...] the three real worlds: crudely stated, the world of material things, the world of mental states and, finally, quite crucially, the world of meanings, ideas, or concepts -- the latter conjectured to be an observer independant realm of intellectual/unphyslical objecst that human beings are able to discover and graps by means of their mental powers. [...] The intellectual objects in World 3 are not themselves mental states, yet are real, in part, because they seem to have something like an observer independent status as objects of discovery (for example, he would argue mathematical truths are neither physical nor mental but they are real), and also, in part, because the ideas or concepts that we are able to grasp by means of our mental powers can ultimately have an effect on our bodies and on the creation of material artifacts. |72|Popper's World III, Three Worlds 936|Hoeffe2008|Darin gründet der Stufenbau des Seienden. Als Prinzip des Zusammenhalts und der Organisation wird es im Anorganischen *hexis*, bei den Pflanzen *physis* (im engeren Sinn), bei den Tieren *psychê* und beim Menschen *logos* genannt. Der spezifische und individuelle Charakter eines realen Gegenstandes verdankt sich der besonderen energetischen Verfassung, in der sich das göttliche Pneuma in dem betreffenden Ausschnitt der Welt befindet. |98|Three Worlds, Stoics 937|Olson2015|Some adaptationist explanations are regarded as maximally solid and others fanciful just-so stories. Just-so stories are explanations based on very little evidence. Lack of evidence leads to circular-sounding reasoning: “this trait was shaped by selection in unseen ancestral populations and this selection must have occurred because the trait is present.” Well-supported adaptationist explanations include evidence that is not only abundant but selected from comparative, populational, and optimality perspectives, the three adaptationist subdisciplines. Each subdiscipline obtains its broad relevance in evolutionary biology via assumptions that can only be tested with the methods of the other subdisciplines. However, even in the best-supported explanations, assumptions regarding variation, heritability, and fitness in unseen ancestral populations are always present. These assumptions are accepted given how well they would explain the data if they were true. This means that some degree of “circularity” is present in all evolutionary explanations. Evolutionary explanation corresponds not to a deductive structure, as biologists usually assert, but instead to ones such as abduction or Bayesianism. With these structures in mind, we show the way to a healthier view of “circularity” in evolutionary biology and why integration across the comparative, populational, and optimality approaches is necessary.|000|adaptationism, adaptation, just-so stories, abduction, Bayesianism, biology 938|Morrison2015a|==== ============================================================ ============================================================ Year Author Contributionb ==== ============================================================ ============================================================ Evolutionary networks --------------------------------------------------------------------------------------- ------------------------------------------------------------ 1671 Georg Stiernhielm Small language network 1750 Vitaliano Donati Suggested net metaphor in biology 1751 Carl von Linné Suggested map metaphor in biology 1755 Georges-Louis Leclerc, Comte de Buffon Intraspecies network 1766 Antoine Nicolas Duchesne Intraspecies network 1792 Carl von Linné Interspecies map 1800 Félix Gallet Language network 1832 Friedrich Wilhelm Ritschl Small manuscript network 1863 Franz Martin Hilgendorf Unpublished interspecies network 1888 Ferdinand Albin Pax Interspecies network ==== ============================================================ ============================================================ ==== ============================================================ ============================================================ Year Author Contributionb ==== ============================================================ ============================================================ Evolutionary trees --------------------------------------------------------------------------------------- ------------------------------------------------------------ 1776 Peter Simon Pallas Suggested tree metaphor in biology 1809 Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck Small interspecies tree 1827 Carl Johan Schlyter Manuscript tree 1852 Charles Naudin Preferred tree to network or chain for biology relationships 1853 František Ladislav Čelakovský Language tree 1853 Auguste Schleicher Language tree 1855 Alfred Russel Wallace Suggested evolutionary tree metaphor 1859 Charles Robert Darwin Theoretical evolutionary tree 1864 Johann Friedrich Theodor Müller Small interspecies tree 1865 St George Jackson Mivart Large interspecies tree 1866 Franz Martin Hilgendorf Large interspecies tree ==== ============================================================ ============================================================ |2|phylogenetic network, phylogenetic tree, history, history of science 939|Francis2015|A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on “2-SAT”) that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion. |000|phylogenetic network, phylogenetic tree, algorithms 940|Dodds2015|Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.|000|positivity, human language, corpus studies 941|Gusfield1997|:comment:`Minimum spanning tree and maximum parsimony:` The unweighted Steiner tree problem on hypercubes has been shown to be NP-hard [165]. Usually, the NP-hardness of an unweighted problem immediately implies the NP-hardness on the weighted version of the problem. After all, the unweighted case is simply the weighted case with all the weights set to one. However, the case of the Steiner tree problem on hypercubes is different, and the result in [165] does not imply the NP-hardness of the weighted Steiner tree problem on hypercubes. We leave it to the reader to puzzle out why this might be. The answer, along with a proof that the weighted version is in fact NP-hard, is contained in [198]. Although the weighted Steiner tree problem on hypercubes is NP-hard, an efficient algorithm to approximate it to within a factor of less than two does exist. This follows because the weighted Steiner tree problem on any graph can be efficiently approximated within an error bound of 11 /6 [62,484]. Prior to that result, it was known that the minimum spanning tree can be used to obtain a Steiner tree whose total weight is less than twice the weight of the optimal Steiner tree [284]. Specialized to the case of the hypercube, the method in brief is the following: a. compute the distance d(i, j) in the hypercube between each pair of input objects / and j in set X (in the case of binary characters, this is just the Hamming distance); b. form a complete, undirected graph Kx with one node for each object in X and one edge between each pair of nodes, and assign weight d(i, j) to each edge (/, j) in Kx; c. compute a minimum spanning tree T of Kx (where each edge in T corresponds to a path in the hypercube); d. re-expand each edge in T to its original corresponding path in the hypercube; e. superimpose the paths found in step d to form graph G', and then find any spanning tree of G'. The result is a Steiner tree of X with total weight less than twice that of the optimal. The above description is only for conceptual purposes, and in the case of binary character data, this computation can be done without explicitly embedding the problem in an actual hypercube. That is quite important for efficiency. The better approximations given in [62] and [484] can also be adapted for the maximum-parsimony problem without needing the hypercube explicitly. We leave that as an exercise for the reader.|471|minimum spanning tree, parsimony, maximum parsimony, approximation 942|Swofford2009|Methods for inferring evolutionary trees can be divided into two broad categories: those that operate on a matrix of discrete characters that assigns one or more attributes or character states to each taxon (i.e. sequence or gene-family member); and those that operate on a matrix of pairwise distances between taxa, with each distance representing an estimate of the amount of divergence between two taxa since they last shared a common ancestor (see Chapter 1). The most commonly employed discrete-character methods used in molecular phylogenetics are **parsimony and maximum likelihood** methods. For molecular data, the character-state matrix is typically an aligned set of DNA or protein sequences, in which the states are the nucleotides A, C, G, and T (i.e. DNA sequences) or symbols representing the 20 common amino acids (i.e. protein sequences); however, other forms of discrete data such as restriction-site presence/absence and gene-order information also may be used.|267|PAUP, software, maximum parsimony, parsimony 943|Howard2006|This is an interesting book introduction the basics of maximum parsimony and also specific techniques for optimization, such as branch and bound, but also the removal or pre-calculation of uninformative sites.|000|maximum parsimony, parsimony, branch and bound, ordered character states 944|Woolley2008|**Background**: We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination. **Principal Findings**: In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths. **Conclusions**: Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future.|000|network, phylogenetic network, networks, phylogenetic reconstruction, evaluation 945|Forster2004|Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree- and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.|000|compositional heterogeneity, data modeling, evolutionary model, phylogenetic reconstruction 946|Roch2010|Interesting paper (of a lecture) in which a proof is given, why a minimum spanning tree approximates the ideal length of the most parsimonious tree in a dataset.|000|maximum parsimony, minimum spanning tree, approximation 947|Felsenstein1978|A simple method of counting the number of possible evolutionary trees is presented. The trees are assumed to be rooted, with labelled tips but unlabelled root and unlabelled interior nodes. The method allows multifurcations as well as bifurcations. It makes use of a simple recurrence relation for T(n,m), the number of trees with n labelled tips and m unlabelled interior nodes. A table of the total number of trees is presented up to n = 22. There are 282,137,824 different trees having 10 tip species, and over 8.87 × 1023 different trees having 20 tip species. The method is extended to count trees some of whose interior nodes may be labelled. The principal uses of these numbers will be to double-check algorithms and notation systems, and to frighten taxonomists. |000|phylogenetic tree, enumeration, combinatorics 948|Labi2007|Interesting summary on counting in phylogenetics (number of possible trees, and the like), neat explanation of counting different kinds of trees.|000|phylogenetic tree, combinatorics 949|Felsenstein1981|The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. This method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of the estimate of the tree.|000|pruning algorithm, maximum likelihood, phylogenetic reconstruction, phylogenetic tree 950|Felsenstein1973|The general maximum likelihood approach to the statistical estimation of phylogenies is outlined, for data in which there are a number of discrete states for each character. The details of the maximum likelihood method will depend on the details of the probabilistic model of evolution assumed. There are a very large number of possible models of evolution. For a few of the simpler models, the calculation of the likelihood of an evolutionary tree is outlined. For these models, the maximum likelihood tree will be the same as the “most parsimonious” (or minimum-steps) tree if the probability of change during the evolution of the group is assumed a priori to be very small. However, most sets of data require too many assumed state changes per character to be compatible with this assumption. Farris (1973) has argued that maximum likelihood and parsimony methods are identical under a much less restrictive set of assumptions. It is argued that the present methods are preferable to his, and a counterexample to his argument is presented. An algorithm which enables rapid calculation of the likelihood of a phylogeny is described. |000|pruning algorithm, phylogenetic tree, phylogenetic reconstruction, maximum likelihood 951|Dotti2015|Some knowledge of historical linguistics is expected of students of linguistics. This area presents a big challenge to most students of linguistics because of either a lack of sufficient practice or a lack of a solid foundation in phonology and phonetics. In order to alleviate this problem, a teaching tool was developed to assist in the practice of phonological and orthographical evolution since Indo-European until Present Day English. Previous tools were aimed mostly at comparison, such as LingPy (@List2012) or Iberochange (Eastlack, 1977) and many are not functional today, such as PHONO (Hartman, 2003). Previous approaches geared specifically at teaching this subject were not found. The goal is not only to help in the acquisition of knowledge in historical linguistics, but also to foster scientific thought and to serve as a first approach towards scientific linguistic research. With this in mind, the tool can be used in two different ways: teachers can preset the rules that they want students to practice, or students can be instructed to deduce and create their own rules, and then simulate the evolution. The rules are defined in terms of patterns, such as whether changes took place in all vowels or only long vowels or in a student-defined group of characters/phonemes, in what period the change occurred, whether shifts were involved, whether there were exceptions, the phonetic/orthographic context in which the rule applies, etc. The tool is currently being improved towards dialect identification, and it can interact with large corpora of English dialectal writing. The program also makes use of statistical techniques to identify and suggest the most prominent features of English dialects, if provided with a sufficient number of texts corresponding to that dialect.|000|LingPy, historical linguistics, computer-aided approaches 952|Vos2003|The existence of multiple likelihood maxima necessitates algorithms that explore a large part of the tree space. However, because of computational constraints, stepwise addition-based tree-searching methods do not allow for this exploration in reasonable time. Here, I present an algorithm that increases the speed at which the likelihood landscape can be explored. The iterative algorithm combines the computational speed of distance-based tree construction methods to arrive at approximations of the global optimum with the accuracy of optimality criterion based branch-swapping methods to improve on the result of the starting tree. The algorithm moves between local optima by iteratively perturbing the tree landscape through a process of reweighting randomly drawn samples of the underlying sequence data set. Tests on simulated and real data sets demonstrated that the optimal solution obtained using stepwise addition-based heuristic searches was found faster using the algorithm presented here. Tests on a previously published data set that established the presence of tree islands under maximum likelihood demonstrated that the algorithm identifies the same tree islands in a shorter amount of time than that needed using stepwise addition. The algorithm can be readily applied using standard software for phylogenetic inference.|000|maximum likelihood, heuristics, methods 953|Sagart2015|This paper shows that the putative ‘East Formosan’ subgroup proposed in Blust (1999) on the basis of a sound change allegedly turning a voiced palatalized velar stop into an alveolar nasal, pays no attention to the geographical principle of conservation by the periphery, and to the need for phonetically natural sound changes in historical interpretations. It argues that the phoneme known as *j was a palatal nasal rather than a voiced palatalized velar stop, and proposes for it the new label *n y . With support from the history of the Chinese palatal nasal, it argues that *n y evolved to a sound combining nasality and friction, and that denasalisation of that sound occurred for lack of a prenasalized series in which it could be integrated. The paper also shows, based on an earlier proposal by Dahl, that the putative phoneme known as PAN *ñ, which competes with *n y for the palatal nasal slot in the PAN consonant system, arose no earlier than PMP, with an independent parallel development in the Formosan language Kanakanabu, when the outcome of the merger of PAN *niV and *NiV became palatalized. It concludes that reconstructing PAN by validating Dempwolff’s PMP is a counter-productive strategy and advocates reconstructing PAN directly on Formosan evidence.|000|conservation by periphery, Austronesian, phylogeny, subgrouping 954|Sagart2015|The paper presents the very interesting of the "principle of conservation by periphery". This principle, first mentioned by Gilliéron (paper by Gilliéron and Roques in 1912), states that we can have the conservative form of a sound change in the periphery, with many alternative versions in the center.|000|phylogenetic reconstruction, Austronesian, conservation by periphery, subgrouping 955|Chambers1994|Instead, in this pattern one finds a particular isogloss delimiting areas in more than one part of the survey region, with no continuity. In other words, a linguistic feature exists in two or more parts of the region but those parts are separated from one another by an area in which a different, or opposing, feature occurs. Such a pattern indicates a late stage in the displacement of a formerly widespread linguistic feature by an innova- tion. In earlier times, the feature which now occurs in isolated areas was also found in the in-between areas. Its status is now that of a relic feature, and the in-between areas show the progress of the innovation.|94|conservation by periphery, dialectology, dialect geography, conservation, 956|Sagart2015|Misidentification of the conservative side in such a geographical pattern fairly automatically leads to the positing of a bizarre sound change trying to capture the putative evolution from an innovated form into its precursor. This in turn may lead to the following argumentative sequence (observe how an error transforms itself into a winning argument !): because the likelihood of a bizarre sound change taking place convergently in separate regions is low, the change will be said to have occurred only once. A subgroup will be posited. A migration transporting the putative change’s output to the other relic areas will be proposed. The proposed subgroup will of course not be supported by any other innovations: but the bizarre character of the sound change said to define it will be felt to provide sufficient evidence of its existence.|3|conservation by periphery, shared innovation, phylogenetic reconstruction, subgrouping 957|Blench2015|The primary object of this paper has not been to put forward a definitive phylogenetic proposal, but instead to suggest that for too long a bundle of ideas and assumptions has been repeated in the literature without any serious evidential base. “Reconstructions” have been proposed which have failed to take many languages of high phyletic significance into account; these forms have been repeatedly quoted without remark in the literature, in the process gaining a lustre they hardly deserve. Sino-Tibetan has no agreed internal structure, and yet its advocates have been happy to propose dates for its origin, expansion and homeland in stark contradiction to the known archaeological evidence. A focus on “high cultures” (Chinese, Tibetan, Burmese) has led to an emphasis on these languages and their written records, something wholly inappropriate for a phylum where an overwhelming proportion of its members speak unwritten languages. Standard handbooks have ignored minority languages whose lexicon and grammar do not fit with prevailing stereotypes. This paper is intended as a contribution towards redressing this balance.|000|Sino-Tibetan, subgrouping 958|Clemente2009|**Background:** Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n 2 ) in the number of states, making it impractical for large values of n. **Results:** In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1a and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation. **Conclusion:** The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.|000|parsimony, maximum parsimony, Sankoff parsimony, ancestral states, ancestral state reconstruction, optimization 959|Woodard2008|This book, derived from the acclaimed Cambridge Encyclopedia of the World’s Ancient Languages, describes the ancient languages of Asia and the Americas, for the convenience of students and specialists working in that area. Each chapter of the work focuses on an individual language or, in some instances, a set of closely related varieties of a language. Pro- viding a full descriptive presentation, each of these chapters examines the writing system(s), phonology, morphology, syntax, and lexicon of that language, and places the language within its proper linguistic and historical context. The volume brings together an international ar- ray of scholars, each a leading specialist in ancient language study. While designed primarily for scholars and students of linguistics, this work will prove invaluable to all whose studies take them into the realm of ancient language. Roger D. Woodard is the Andrew Van Vranken Raymond Professor of the Classics at the University of Buffalo. His chief research interests lie generally within the areas of Greek and Roman myth and religion, Indo-European culture and linguistics, the origin and develop- ment of writing among the Greeks, and the interaction between Greece and the ancient Near East. His other books include The Cambridge Companion to Greek Mythology (2007), Indo-European Sacred Space (2006), The Cambridge Encyclopedia of the World’s Ancient Languages (2004), Ovid’s Fasti (with A. J. Boyle, 2000), Greek Writing from Knossos to Homer: A Linguistic Interpretation of the Origins of the Greek Alphabet (1997), and On Interpreting Morphological Change (1990). He has also published numerous articles and served as President of the Society for the Study of Greek and Latin Language and Linguistics from 1992 to 2001.|000|South-East Asian languages, American languages 960|Blench2015a|Historical linguistics inevitably contains many uncertainties due to gaps in the data, many of which are contingent and will never be filled. In particular, the interpretation of the results depends on data from other disciplines, although the links cannot be established according to any standard methodological criteria. But the conclusion should not be drawn that this excludes a science-like fashion of proceeding. This should involve; * a) Careful compilation of complete datasets, using the best-transcribed material * b) Identification of cognates through application of plausible phonological laws * c) Identification of borrowing similarly, although with the additional of cultural/sociological insights where appropriate * d) Comparison with archaeological data where available, including specialised disciplines such as palaeoclimatology, archaeobotany etc. * e) Comparison with historical anthropology as well as established generalisations in the field Modern methods may allow time to be saved in data compilation, but there are no shortcuts on the analysis side. Proposals for explanation of the datasets must be based on linguistic plausibility, but also match known chronology and other corroboratory datasets. There is a school of historical linguistics which asserts that the second part of this is unnecessary, that reconstructions are abstract forms that explain linguistic data. That if we reconstruct ‘iron’ to a time and place when iron was unknown, an unexplained semantic shift has occurred. Assuming this view is rejected then palaeosociolinguistics must meet criteria for both linguistic and chronological plausibility. But what is being advanced is a model and to that extent it cannot meet the classic test of falsifiability, it can only be made to seem less plausible by new analysis. Similarly, as a model it can be applied in other parts of the world which seem at first sight similar, and if plausibility continues to be maintained, a nascent generalisation occurs. But historical specificity cannot be ironed out; the conjunction of the Roman limes and Berber livestock nomads and traders can only occur once. Working out the narrative and convincing the scientific community should be challenge enough.|000|language levelling, computer-based approaches, critics, phylogenetic reconstruction, subgrouping 961|Zwick1978|[...] We consider first the hierarchy of spoken language [2]: phoneme, morpheme, word, sentence, utterance, discourse; or of written language: letter, syllable, word, sentence, paragraph, section, chapter, book. These lists are straightforward up to and including the "sentence," beyond which they are somewhat arbitrary. The items, “utterance," "paragraph," etc., are meant only to illustrate more complex units and to suggest that linguistic hierarchies consist typically of a small number of levels. This is characteristic of the organizational and/or operational structure of many "concrete" [3] systems; more "abstract" hierarchies, e.g. those which specify gradations of some attribute, often consist of a greater number of elements. For example, the system of cosmological entities, ordered by gravitational forces, ranges from aggregates of galaxies to planetary satellites, meteors, etc., but the hierarchy of the structure of matter extends through additional levels down to elementary particles and the like. Similarly, the organizational or "line" structure of command in the military has fewer levels than might be suggested by existing gradations of rank. Concrete systems are typically limited to a small number of levels, say five to ten, after which the hierarchy often becomes consolidated in a stable and coherent whole (which may become a base unit for still higher levels). We shall refer to the range between base and terminal levels as a "period." For example, a library consists of two periods: letter to book and book to library. The hierarchy of structures from atom to cell is, in our opinion, also such an interval, the intermediary levels of which include small molecules, polymers, macromolecular aggregates, and the like. We could continue on to tissues, organs, organ systems, organisms, populations, etc., but suggest that the intervals from atom to cell and cell to organism represent natural divisions. In general, there may be some uncertainty about which level should begin and/or end a period, but the range of uncertainty is usually not great, and the principle of this distinction, the idea of periods, is not purely subjective. For example, it would be inappropriate to start the cellular hierarchy at the level of protons and neutrons or quarks, because the domain of influence of the cell as a system does not penetrate down to these levels. Or we might extend the biological hierarchy to include tissues, organs, and so on up to the level of the individual organism, but at this point, biological structure most typically passes into social organization, as in families, populations, and communities. The biological hierarchy we shall actually discuss is the following: atom (or atomic ion); chemical group (or small molecule or molecular ion); amino acid; (monomeric) protein; (multimeric) enzyme; multienzyme complex; organelle (or membrane system); cell. That is, we trace out only one "path" (of many) in the set of structures between single atoms and whole cells, focusing mainly on proteins, the principal dynamic agents in the cell, and especially on those entities of the protein hierarchy most general in function [4]. Thus, for example, above the level of polypeptide chain or monomeric protein, we choose the "spherical" multimer rather than the helical polymer, as the latter typically has a more specialized function. Much of the discussion, which follows, will pertain also to the nucleic acids and to some deeper and more general properties of the cellular period.|000|biological parallels, analogy, biology, linguistics, hierarchies, hierarchical order 962|Zwick1978|.. raw:: html
8cell book
7membrane system, organelle chapter
6 multienzyme complex section
5 (multimeric) enzyme paragraph
4 (monomeric) protein sentence
3 amino acid word
2 chemical group syllable/morpheme
1 atom letter/phoneme
|2|biological parallels, cell-analogy, analogy 963|Zwick1978|Interactions between parts of a protein cause the linear sequence of amino acids (the "primary structure") to fold up into a three-dimensional structure, which performs a specific catalytic function at a locus on the molecule known as the "active site." Similarly, the syntactic or surface structure of a sentence is the result of its primary structure folding up, as it were, in some cognitive space, so as to bring certain words, not neighbors in the linear sequence, into syntactic proximity and perhaps also conferring upon the sentence an active site or principal locus of function.|4|analogy, biological parallels 964|Zwick1978|One might speak of such a "deep structure" for a protein, embodying the functional essence of the molecule in its most economical form. Despite variations in the sequence and structure of a particular protein across different species and within populations of the same species, some amino acid sites will be essential for the catalytic function and will therefore be invariant, and additional sites may show restricted variability.|4|biological parallels, analogy 965|Hill2012|Gong Hwang-Cherng in two papers (1980, 1995) collected a number of cognate sets among Chinese, Tibetan, and Burmese. This paper reexamines these cognate sets (base on Gong 1995) using a six vowel version of Old Chinese, specifically the Baxter-Sagart system. In light of six vowel theory it is possible both to be more confident about some cognate sets and possible to reject or revise others.|000|six-vowel-hypothesis, Burmish languages, Old Burmese, linguistic reconstruction 966|Yan2003|The problem of finding a phylogeny with maximum parsimony is one of the main problems in computational biology. While it is impossible to search the possible tree space exhaustively for large data sets, most heuristic approaches try to search in the neighborhood of sub-optimal trees. The speed of computing a score for each tree (e.g. tree length or total number of character changes) is as important as the different tree search strategies. Some techniques include short-cuts that have not been proven to be exact, and an incremental character optimization approach by which the speedup gains depends on data sets and is hardly analyzed in theory. This paper describes an exact and fast algorithm to compute tree length and our algorithm can obtain great speedup for any data sets. We discuss the algorithm for Fitch-parsimony, but it can also be applied to Wagner-parsimony.|000|maximum parsimony, approximation, algorithms, phylogenetic reconstruction, optimization 967|Grand2013|The cladistic literature does not always specify the kind of multistate character treatment that is applied for an analysis. Characters can be treated either as unordered transformation series or as rooted [three-item analysis (3ia)] or unrooted state trees (ordered characters). We aimed to measure the impact of these character treatments on phylogenetic inference. Discrete characters can be represented either as rows or columns in matrices (e.g. for parsimony) or as hierarchies for 3ia. In the present study, we use simulated and empirical examples to assess the relative merits of each method considering both the character treatment and representation. We measure two parameters (resolving power and artefactual resolution) using a new tree comparison metric, ITRI (inter-tree retention index). Our results suggest that the hierarchical character representation not only results (with our simulation settings) in the greatest resolving power, but also in the highest artefactual resolution. Our empirical examples provide equivocal results. Parsimony unordered states yield less resolving power and more artefactual resolutions than parsimony ordered states, both with our simulated and empirical data. Relationships between three operational taxonomic units (OTUs), irrespective of their relationships with other OTUs, are called three-item statements (3is). We compare the intersection tree (which reconstructs a single tree from all of the common 3is of source trees) with the traditional strict consensus and show that the intersection tree retains more of the information contained in the source trees.|000|multi-state models, ordered character states, maximum parsimony 968|Slowinski1993|Multistate morphological characters have generally been treated as either "unordered" or "ordered" in phylogenetic analyses using parsimony Because ordering relations do not apply to the states of characters treated under these methods, I prefer "maximally connected" character to "unordered" character and "minimally connected" character to "ordered" character. This paper formally defines the two character types, compares their properties, and considers the consequences of the two methods for both resolution and congruence. The results demonstrate that minimally connected characters increase resolution relative to maximally connected characters. Minimally connected characters do not, however, necessarily increase congruence among data sets. Because both methods produce nonrandom congruence among data sets, both character types constitute valid phylogenetic methods. A mixed-parsimony approach is advocated, wherein multistate characters are treated as minimally connected whenever reasonable but treated as maximally connected otherwise.|000|ordered character states, maximum parsimony, multi-state models 969|Xu2012|For more than a century scholars have proposed laws of semantic change that characterize how words change in meaning over time. Two such laws are the law of differentiation, which proposes that near-synonyms tend to differentiate in meaning over time, and the law of parallel change, which proposes that related words tend to undergo parallel changes in meaning. Researchers have identified a handful of changes that are consis- tent with each proposed law, but there are no systematic evaluations that assess the validity and generality of these compet- ing laws. Here we evaluate these laws by using a large corpus to assess how thousands of related words changed in meaning over the twentieth century. Our analyses show that the law of parallel change applies more broadly than the law of differentiation, and thereby illustrate how large-scale computational analyses can place laws of semantic change on a more secure footing.|000|semantic change, computer-based approaches 970|Flynn2014|A number of consonant shifts in the history of Athabaskan languages are consid- ered. The goal is to better explain examples of ‘auditorily based substitution’ by invoking ‘phonetic features’ as is required by the sound change theory of Blevins (2004). We argue that the shifts are better understood as instances of Blevins’s change process involving the phonetic features ⟦grave⟧ and ⟦flat⟧. These fea- tures are defined acoustically in accord with recent phonetic studies of obstru- ents. It is crucial that these and other phonetic features are scalar-valued, and thus are part of a phonetics-phonology interface component which is separate from the distinctive phonological feature system.|000|sound change, Athabaskan languages, speech acoustics 971|Adkins2010|This thesis offers a minimum spanning tree framework for inferring phylogenies. It may be quite interesting to read. 1. Introduction 1 2. Molecular Phylogeny 3 3. Learning Quartets 19 4. Gene Order Phylogeny 24 |000|minimum spanning tree, phylogenetic reconstruction 972|Adkins2010|This dissertation discusses how to write efficient, deterministic programs to infer phylogenetic trees. These programs are based on a theoretically optimal polynomial time algorithm. The pro- grams are practical and efficient. The problem is to infer phylogenetic trees from sequence data at the leaves. Likelihood ap- proaches attempt to solve an NP-complete optimization problem, and are computationally intense. Our approach is to apply efficient, polynomial time algorithms which are provably accurate to within a constant factor of optimal. We present a program which is practical and computationally efficient.|000|minimum spanning tree, phylogenetic reconstruction 973|Murawaki2015|Thus, a suitable compromise involves supplementing the reference tree with edges representing horizontal transfer. This approach was originally proposed for prokaryote evolutionary models but has also been applied to languages [24, @NelsonSathi2011].|2/15|reference tree, phylogenetic reconstruction, phylogenetic network 974|Bonny2015|Sequence alignment is an essential tool in almost any computational database research such as Biology, linguistics, social science, etc. It processes large database sequences and therefore, it consumes high computation time (may take hours of mainframe time to get the optimum solution). Here, we introduce our ”Frequency- Deviation Technique”, we call it ”FDT”, which may be combined with any alignment algorithm (optimal or heuristic) to accelerate the alignment process. The FDT reduces the range of the searched database by excluding the sequences which their code frequency plus deviation is far from the query sequence. Using the FDT, we explicitly accelerate the database sequencing alignment up to 63% in comparison to the traditional alignment algorithms.|000|sequence alignment, optimization, BLAST, database search, heuristics 975|Kolaczkowski2004|All inferences in comparative biology depend on accurate estimates of evolutionary relationships. Recent phylogenetic analyses have turned away from maximum parsimony towards the probabilistic techniques of maximum likelihood and bayesian Markov chain Monte Carlo (BMCMC). These probabilistic techniques represent a parametric approach to statistical phylogenetics, because their criterion for evaluating a topology—the probability of the data, given the tree—is calculated with reference to an explicit evolutionary model from which the data are assumed to be identically distributed. Maximum parsimony can be considered nonparametric, because trees are evaluated on the basis of a general metric—the minimum number of character state changes required to generate the data on a given tree—without assuming a specific distribution1. The shift to parametric methods was spurred, in large part, by studies showing that although both approaches perform well most of the time2, maximum parsimony is strongly biased towards recovering an incorrect tree under certain combinations of branch lengths, whereas maximum likelihood is not3, 4, 5, 6. All these evaluations simulated sequences by a largely homogeneous evolutionary process in which data are identically distributed. There is ample evidence, however, that real-world gene sequences evolve heterogeneously and are not identically distributed7, 8, 9, 10, 11, 12, 13, 14, 15, 16. Here we show that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time. Maximum parsimony performs substantially better than current parametric methods over a wide range of conditions tested, including moderate heterogeneity and phylogenetic problems not normally considered difficult.|000|maximum parsimony, maximum likelihood, evaluation, performance, phylogenetic reconstruction 976|Rindal2010|The use of model-based methods to infer a phylogenetic tree from a given data set is frequently motivated by the truism that under certain circumstances the parsimony approach (MP) may produce incorrect topologies, while explicit model-based approaches are believed to avoid this problem. In the realm of empirical data from actual taxa, it is not known (or knowable) how commonly MP, maximum-likelihood or Bayesian inference are inaccurate. To test the perceived need for “sophisticated” model-based approaches, we assessed the degree of congruence between empirical phylogenetic hypotheses generated by alternative methods applied to DNA sequence data in a sample of 1000 recently published articles. Of 504 articles that employed multiple methods, only two exhibited strongly supported incongruence among alternative methods. This result suggests that the MP approach does not produce deviant hypotheses of relationship due to convergent evolution in long branches. Our finding therefore indicates that the use of multiple analytical methods is largely superfluous. We encourage the use of analytical approaches unencumbered by ad hoc assumptions that sap the explanatory power of the evidence.|000|maximum parsimony, maximum likelihood, evaluation, performance, phylogenetic reconstruction 978|Schleicher1863|:comment:`Parallels between languages and species according to Schleicher` Die und Deinen Collegen kann ich gleichnissweise die Wurzeln als einfache Sprachzellen bezeichnen, bei welchen für die Function als Nomen, Verbum u. s. f. noch keine besonderen Organe vorhanden sind und bei denen diese Functionien (die grammatischen Beziehungen) noch eben so wenig geschieden sind, als bei einzelligen Organismen oder im Keimbläschen höherer lebender Wesen Athmen und Verdauen. |23|root words, cell-analogy, biological parallels 979|Schleicher1863|:comment:`before talking on the loss (dying out) of species according to Darwin:` Aber das gänzliche Erlöschen einer Arten-Gruppe mag oftein sehr langsamer Prozess sein, wenn einzelne Arten in geschützten oder abgeschlossenen Standorten kümmernd noch eine Zeit lang fortleben können [bei Sprachen pflegt diess in Gebirgen der Fall zu sein, ich erinnere z. B. an das BAskische der Pyrenäen, den Rest einer nachweislich früher weit verbreiteten Sprache; ähnlich verhält es sich im Kaukasus und sonst]. Ist eine Gruppe einmal untergegangen, so kann sie nie wieder erscheinen, weil ein Glied aus der Generationen-Reihe zerbrochen ist.|28|niche, mountains, language death, extinction, species death 980|Schleicher1863|Begreiflicher Weise konnten es nur die Grundzüge der Darwinschen Anschauungen sein, die auf die Sprachen Anwendung finden. Das Reich der Sprachen ist von dem der Pflanzen und Thiere zu verschieden, als dass die Gesammtheit der Darwinschen Ausführungen mit ihren Einzelheiten für dasselbe Geltung haben könnte. Desto unbestreitbarer ist aber auf sprachlichem Gebiete die Entstehung der Arten durch allmähliche Differenzierung und die Erhaltung der höher entwickelten Organismen im Kampfe ums Dasein. |29|biological parallels, biological evolution, language evolution 981|Lyell1830|In orde to set this in a clear light, let the reader suppose himself acquainted with justone-tenth part of the words of some living languae, and that he is presented with several books purporting to be written in the same tongue ten centuries ago. If he should find that he comprehends a tenth part of the terms in teh ancient volumes, and that he cannot divine the meaning of the other nine-tenths, would he not be strongly disposed to believe that, for a thousand of years, the language has remained *unaltereed*? Could he, without great labour and study, interpret the greater part of what is written in the antique documents, he must feel at once convinced that, in the interval of ten centuries, a great revolution in the language had taken place. [...]|461|uniformitarianism, language change 982|Lyell1863|It would therefore be a waste of time to speculate on the number of original monads or germs from which all plants and animals were subsequently evolved, more especially as teh oldest, fossiliferous strata know to us may be the last of a long series of antecedent formations, which once contained organic remains. [...] and teh question now at issue, whether the living species are connectee with the extinct by a common bond of descend, will be best be cleared up by devonting ourselves to the study of teh actual state of the living world, and to those monuments of the past in which the reclics of the animate creation of former ages are best preserved and least multilated by the hand of time.|471|monophyly, origin of life, origin of language 983|Gamov1954|IN a communication in Nature of May 30, p. 964, J. D. Watson and F. H. C. Crick showed that the molecule of deoxyribonucleic acid, which can be considered as a chromosome fibre, consists of two parallel chains formed by only four different kinds of nucleotides. These are either (1) adenine, or (2) thymine, or (3) guanine, or (4) cytosine with sugar and phosphate molecules attached to them. Thus the hereditary properties of any given organism could be characterized by a long number written in a four-digital system. On the other hand, the enzymes (proteins), the composition of which must be completely determined by the deoxyribonucleic acid molecule, are long peptide chains formed by about twenty different kinds of amino-acids, and can be considered as long ‘words’ based on a 20-letter alphabet. Thus the question arises about the way in which four-digital numbers can be translated into such ‘words’.|000|biological parallels, DNA sequence, code 984|Crick1959|It is widely assumed that the amino acid sequence of a particular protein is in some way determined by the sequence of the bases in some particular length of nucleic acid. While the indirect evidence in favor of some relationship of this type is very suggestive, the direct evidence is fragmentary in the extreme, and nothing whatever is known about the actual mechanisms involved. It is possible, however, to consider the problem in an abstract way as that of translating from one language to another; that is, from the 4-letter language of the nucleic acids to the 20-letter language of the protein, without any detailed consideration of the chemical processes involved. This approach is often referred to as the coding problem.|000|code, translation, biological parallels, DNA sequence 985|Szathmary1995|There is no theoretical reason to expect evolutionary lineages to increase in complexity with time, and no empirical evidence that they do so. Nevertheless, eukaryotic cells are more complex than prokaryotic ones, animals and plants are more complex than protists, and so on. This increase in complexity may have been achieved as a result of a series of major evolutionary transitions. These involved changes in the way information is stored and transmitted.|000|biological parallels, evolutionary theory, language evolution, biological evolution, 986|Szathmary1995|Major transitions in evolution: 1 Replicating molecules to pupulations of molecules in compartments 2 Unlinked replicators to chromosomes 3 RNA as gene and anzyme to DNA and protein (genetic code) 4 Prokaryotes to eukaryotes 5 Absexual clones to sexual populations 6 Protists to animals, plants and fungi (cell differentiation) 7 Solitary individuals to colonies (non-reproductive castes) 8 Primate societies to human societies (language)|000|evolutionary theory, evolutionary transitions, biological parallels 987|Szathmary1995|The emergence of human language with a universal grammar and unlimited semantic representation. Grammar enables a speaker with a finite vocabulary to convey an indefinitely large number of meanings, just as the genetic code enables DNA to specify an indefinitely large number of proteins. We accept Chomsky's argument that grammatical competence is unique, both in the sense of being peculiar to humans, and of being special to language, and not merely an aspect of general learning ability. But we are puzzled by the reluctance of many linguists, including Chomsky himself, to think about the evolution of this competence. The objection takes the form of asserting, not only that human language is different in kind from animal communication, but that no intermediate is possible between the two. |231|language evolution, evolutionary transitions, biological parallels 988|Szathmary1995|It is argued that any rudimentary form of grammar would not allow one to generate some types of sentence. This is true but irrelevant: by analogy, it is better to have some light-sensitive cells than none at all; a perfect eye is not the only useful solution to the problem. |231|language evolution, language origin, evolutionary transitions 989|Szathmary1995|It is in fact rather easy to think of intermediates between protolanguage and true language. There remains the question of teh evolutionary origin of grammatical novelties. It is reasonable to assume that this happened by genetic assimilation, neew ruels being made up by individuals as non-genetic innovations, then learnt by members of teh community, then hard-wired into the 'language organ' subsequently. It has been demonstrated that learning and selection can lead to such an assimilation in extreme cases when the latter alone could not get anywhere: learning can transform an initially flat fitness landscape with a needle-like peak into a well-behaved Fujiyama-like one.|231|language origin, language evolution, evolutionary transitions 990|Milo2004|Complex biological, technological, and sociological networks can be of very different sizes and connectivities, making it difficult to compare their structures. Here we present an approach to systematically study similarity in the local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, represents “rate-limited” information-processing networks strongly constrained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure networks and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages. |000|evolutionary networks, network methods, linguistics, biology 991|Chomsky1959|A grammar can be regarded as a device that enumerates the sentences of a language. We study a sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state IV[arkov sources (finite automata). These restrictions are shown to be increasingly heavy in the sense that the languages that can be generated by grammars meeting a given restriction constitute a proper subset of those that can be generated by grammars meeting the preceding restriction. Various formulations of phrase structure description are considered, and the source of their excess generative power over finite state sources is investigated in greater detail.|000|formal grammar, grammatical hierarchy, 992|Searls2013|Polymeric macromolecules, when viewed abstractly as strings of symbols, can be treated in terms of formal language theory, providing a mathematical foundation for characterizing such strings both as collections and in terms of their individual structures. In addition this approach offers a framework for analysis of macromolecules by tools and conventions widely used in computational linguistics. This article introduces the ways that linguistics can be and has been applied to molecular biology, covering the relevant formal language theory at a relatively nontechnical level. Analogies between macromolecules and human natural language are used to provide intuitive insights into the relevance of grammars, parsing, and analysis of language complexity to biology.|000|formal language theory, grammar, molecular biology, macromolecules 993|Brochhagen2015|Meaning conveyance is bottlenecked by the linguistic conventions shared among interlocutors. One possibility to convey non-conventionalized meaning is to employ known expressions in such a way that the intended meaning can be abduced from them. This, in turn, can give rise to ambiguity. We investigate this process with a focus on its use for semantic coordination and show it to be conducive to fast agreement on novel meaning under a mutual expectation to exploit semantic structure. We argue this to be a motivation for the crosslinguistic pervasiveness of systematic ambiguity.|000|ambiguity, meaning, meaning potential, 994|Sagart2004|This paper presents a new higher phylogeny for the Austronesian family, based on three independent lines of evidence: the observation of a hierarchy of implications among the numerals from 5 to 10 in the languages of Formosa and in PMP; the finding that the numerals *pitu '7', *walu '8' and *Siwa '9' can be derived from longer additive expressions meaning 5+2, 5+3 and 5+4, preserved in Pazeh, using only six sound changes; and the observation that the phylogeny which can be extracted from these and other innovations -mostly changes in the basic vocabulary- evinces a coherent spatial pattern, whereby an initial Austronesian settlement in NW Taiwan expanded unidirectionally counterclockwise along the coastal plain, circling the island in a millennium or so. In the proposed phylogeny, Malayo-Polynesian is a branch of Muic, a taxon which also includes NE Formosan (Kavalan plus Ketagalan). The ancestor language: Muish, is deemed to have been spoken in or near NE Formosan. Further evidence that the The Tai- Kadai languages, contrary to common sense, are a subgroup of Austronesian (specifically: a branch of Muic, coordinate with PMP and NE Formosan) is presented. |000|Austronesian, phylogenetic reconstruction, subgrouping, language family 995|Hsu2010|We start by noting the rule whereby the original character of particular word is already known for a dialect, the literary and vernacular pronunciations of that character evidence phonetic correspondence. Likewise, the basic assumption for determining the original character for a word in a particular dialect is that if the literary pronunciation and the original character are already known or determined, the phonetic position of the vernacular pronunciation and the original character will be identical. Next, we focus on one of the most basic words in Chinese – da – and its two historical pronunciations. Interestingly, the word da is an exception to the above rule insofar as its literary and vernacular pronunciations may be either, or neither, of the two readings given above. We continue by considering the fact that sometimes it is necessary to search for evidence for which character is best suited to represent a word either internally, through correspondences between literary and vernacular pronunciations, or externally, from words common to different dialects or from cognatic relations among related dialect words. Once this distinction is made, research into Chinese historical linguistics, the history of dialects, or the history of dialect regions will be able to provide valuable historical/cultural evidence that could also be of use in resolving the big problem of the word da.|000|Chinese dialects, Chinese, Benzi, original dialect characters 996|Bertalanffy1972|The article presents a history of general systems theory and discusses several of its various aspects. According to the author, the notion of general systems theory first stemmed from the pre-Socratic philosophers, and evolved throughout the ages through different philosophic entities until it was eventually formally structured in the early 1900s. The theory has three main aspects. The first is called “systems science,” or the scientific exploration and theory of systems in various sciences. The second is called “systems technology,” or the problems arising in modern technology and society. The third aspect is called “systems philosophy” and refers to the reorientation of thought and world view. |000|complex systems, systems theory, general systems theory 997|Bertalanffy1972|Major functions are to: (1) investigate the isomorphy of concepts, laws, and models in various fileds, and to help in useful transfers from one field to another; (2) encourage the development of adequate theoretical models in the fields which lack them; (3) minimize the duplication of theoretical effort in different fields; (4) promote the unity of science through improving communication among specialists.|413|complex systems, interdisciplinary research, analogy 999|Milo2002|Complex networks are studied across many fields of science. To uncover their structural design principles, we defined “network motifs,” patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks. |000|complex network, complex systems, network motifs 1000|Milo2002|In information-processing networks, the motifs may have specific functions as elementary computational circuits (11). More generally, they may be interpreted as structures that arise because of the special constraints under which the network has evolved (27). It is of value to detect and understand network motifs in order to gain insight into their dynamical behavior and to define classes of networks and network homologies.|827|network motifs, complex network 1001|Morrison2015b|Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.|000|sequence alignment, multiple sequence alignment, homology 1002|Lukes2011|Complex cellular machines and processes are commonly believed to be products of selection, and it is typically understood to be the job of evolutionary biologists to show how selective advantage can account for each step in their origin and subsequent growth in complexity. Here, we describe how complex machines might instead evolve in the absence of positive selection through a process of "presuppression," first termed constructive neutral evolution (CNE) more than a decade ago. If an autonomously functioning cellular component acquires mutations that make it dependent for function on another, pre-existing component or process, and if there are multiple ways in which such dependence may arise, then dependence inevitably will arise and reversal to independence is unlikely. Thus, CNE is a unidirectional evolutionary ratchet leading to complexity, if complexity is equated with the number of components or steps necessary to carry out a cellular process. CNE can explain "functions" that seem to make little sense in terms of cellular economy, like RNA editing or splicing, but it may also contribute to the complexity of machines with clear benefit to the cell, like the ribosome, and to organismal complexity overall. We suggest that CNE-based evolutionary scenarios are in these and other cases less forced than the selectionist or adaptationist narratives that are generally told.|000|constructive neutral evolution, ratchet-like process, complexity, 1003|Lakkaraju2008|Over their evolutionary history, languages most likely increased in complexity from simple sig- nals to protolanguages to complex syntactic structures. This paper investigates processes for increasing linguistic complexity while maintaining communicability across a population. We assume that higher linguistic communicability (more accurate information exchange) increases participants’ effectiveness in coordination-based tasks. Interaction, needed for learning others’ languages and for converging to communicability, bears a cost. There is a threshold of inter- action (learning) effort beyond which (the coordination payoff of) linguistic convergence either doesn’t pay or is pragmatically impossible. Our central findings, established mainly through simulation, are: 1) There is an effort-dependent “frontier of tractability” for agreement on a lan- guage that balances linguistic complexity against linguistic diversity in a population. To remain below some specific bound on collective convergence effort, either a) languages must be simpler, or b) their initial average communicability must be higher. To stay below such a pragmatic effort limit, even agents who have the ultimate capability for complex languages must not invent them from the start or they won’t be able to communicate; they must start simple and grow complexity in a staged process. 2) Such a staged approach to increasing complexity, in which agents initially converge on simple languages and then use these to “scaffold” greater complexity, can outperform initially-complex languages in terms of overall effort to convergence. This performance gain improves with more complex final languages. |000|scaffolding, linguistic complexity, language evolution 1004|Bisang2010|The present paper shows that constructions are the driving force of grammaticalization in Chinese. It will be argued that this is due to two typological properties: the relative freedom with which a lexical item can be assigned to different grammatical functions (precategoriality in Late Archaic Chinese) and the ease with which one and the same surface structure can be subject to different syntactic analyses (Bisang 2009 on “hidden complexity”). The constructions that are relevant for grammaticalization in Chinese consist of slots that are associated with certain grammatical categories. Processes of reanalysis take place within these slots – a given lexical item is assigned the function associated with the syntactic slot in which it occurs. Such a construction-based account excludes continuity because the occurrence in a particular slot always leads to a discrete interpretation that is determined by the function associated with that slot. Continuity is only possible if two or more constructions are combined into a larger structure or if a new construction emerges. The former case will be illustrated by verbs in adpositional functions (coverbs), the latter by the resultative construction as it emerged in the 1st centuries AD. Finally, the constructional approach will also show that Aarts’s (2004, 2007) distinction between subsective and intersective gradience cannot be maintained.|000|grammaticalization, Chinese, scaffolding 1005|Bisang2010|The basic notion that underlies this framework is Langacker's scaffolding metaphor, in which "component structures are seen as scaffolding [pb] erected for the construction of a complex expression" (@Langacker1987:461). In Croft's own words, "what appears to be the coding of syntactic relations is in fact scaffolding to help the hearer to identify which element of the construction symbolizes which component in the semantic structure of teh construction (@Croft2001:238).|248f|scaffolding, grammaticalization, Chinese 1006|Bisang2010|Crofts (@2001:238) idea that the syntactic structure of a construction is a "scaffolding" for identifying the semantic components of a construction also serves as a link to the concept of coerciion in terms of Michaelis (@2004). In her account, the syntactic structure provides the pattern that has the power to coerce a lexical item into a particular function. Transferred to the situation in Late Archaic Chinese, this means that a lexical item occurring in a certain slot of a given construction is "coerced" into a particular semantic interpretation associated with that slot. The notion of coercion is no doubt adequate in the sense that a certain syntactic slot leads to a particular interpretation of a lexical item and thus excludes other potential interpretations. If it is used in a more strict sense that implies that a lexical item acquires a new function which is not foreseen in its lexicon, coercien is problematic in some instances. In the case of precategoriality, the occurrence of a lexical item in a certain slot only highlights or specifies a particular function that is inderspecified in the lexicon -- the lexical item is not forced into a new function. This is different with processes of grammaticalization in which a lexical item gets a new function as in the case of verbs that are reanalysed as prepositions [...]. For the understanding of grammaticalization in Chinese, both concepts are important. The more general mechanism of scaffolding is important in spite of hidden tcomplexity and the high syntactic flexibility of lexical items. The more specific mechanism of coercion is necessary for understanding processes by which a lexical item is reanalysed in terms of a grammatical function it did not mark before, a process to which lexical items seemed to have been more open in Late Archaic Chinese than in languages of the English type.|249|scaffolding, grammaticalization, coercion, Chinese 1007|GodfreySmith2015|This paper develops a conceptual framework for addressing questions about reproduction, individuality, and the units of selection in symbiotic associations, with special attention to the origin of the eukaryotic cell. Three kinds of reproduction are distinguished, and a possible evolutionary sequence giving rise to a mitochondrion-containing eukaryotic cell from an endosymbiotic partnership is analyzed as a series of transitions between each of the three forms of reproduction. The sequence of changes seen in this “egalitarian” evolutionary transition is compared with those that apply in “fraternal” transitions, such as the evolution of multicellularity in animals.|000|scaffolded evolution, symbiosis, eukaryotes, reproduction, 1009|Darwin1859|I am well aware that this doctrine of natural selection, exemplified in the above imaginary instances, is open to the same objections which were at first urged against Sir Charles Lyell's noble views on 'the modern changes of the earth, as illustrative of geology;' but we now very seldom hear the action, for instance, of the coast-waves, called a trifling and insignificant cause, when applied to the excavation of gigantic valleys or to the formation of the longest lines of inland cliffs. Natural selection can act only by the preservation and accumulation of infinitesimally small inherited modifications, each profitable to the preserved being; and as modern geology has almost banished such views as the excavation of a great valley by a single diluvial wave, so will natural selection, if it be a true principle, banish the belief of the continued creation of new organic beings, or of any great and sudden modification in their structure. |IV|Darwin, uniformitarianism, biology 1010|Maddison2006|It is now well known that incomplete lineage sorting can cause serious difficulties for phylogenetic inference, but little attention has been paid to methods that attempt to overcome these difficulties by explicitly considering the processes that produce them. Here we explore approaches to phylogenetic inference designed to consider retention and sorting of ancestral polymorphism. We examine how the reconstructability of a species (or population) phylogeny is affected by (a) the number of loci used to estimate the phylogeny and (b) the number of individuals sampled per species. Even in difficult cases with considerable incomplete lineage sorting (times between divergences less than 1 Ne generations), we found the reconstructed species trees matched the “true” species trees in at least three out of five partitions, as long as a reasonable number of individuals per species were sampled. We also studied the tradeoff between sampling more loci versus more individuals. Although increasing the number of loci gives more accurate trees for a given sampling effort with deeper species trees (e.g., total depth of 10 Ne generations), sampling more individuals often gives better results than sampling more loci with shallower species trees (e.g., depth = 1 Ne). Taken together, these results demonstrate that gene sequences retain enough signal to achieve an accurate estimate of phylogeny despite widespread incomplete lineage sorting. Continued improvement in our methods to reconstruct phylogeny near the species level will require a shift to a compound model that considers not only nucleotide or character state substitutions, but also the population genetics processes of lineage sorting.|000|incomplete lineage sorting, phylogenetic reconstruction, 1011|Maddison2006|Just as the incorporation of explicit models of evolutionary [pb] character change, whether stochastic (e.g., Felsenstein, 1981) or not (e.g, Hennig, 1966), was vital to the developnent of phylogenetic methods, tncorporation of explicit models of lineage sorting will be needed for continued development of phylogenetic inference near the species level.|21f|incomplete lineage sorting, biological evolution, evolutionary model 1012|Axelsen2014|Global linguistic diversity (LD) displays highly heterogeneous distribution patterns. Though the origin of the latter is not yet fully understood, remarkable parallelisms with biodiversity distribution suggest that environmental variables should play an essential role in their emergence. In an effort to construct a broad framework to explain world LD and to systematize the available data, we have investigated the significance of 14 variables: landscape roughness, altitude, river density, distance to lakes, seasonal maximum, average and minimum temperature, precipitation and vegetation, and population density. Landscape roughness and river density are the only two variables that universally affect LD. Overall, the considered set accounts for up to 80% of African LD, a figure that decreases for the joint Asia, Australia and the Pacific (69%), Europe (56%) and the Americas (53%). Differences among those regions can be traced down to a few variables that permit an interpretation of their current states of LD. Our processed datasets can be applied to the analysis of correlations in other similar heterogeneous patterns with a broad spatial distribution, the clearest example being biological diversity. The statistical method we have used can be understood as a tool for cross-comparison among geographical regions, including the prediction of spatial diversity in alternative scenarios or in changing environments.|000|linguistic diversity, linguistic complexity, mountains, diversification 1013|Rahat1990|Is there a systematic relation between what we intuitiely call the metaphorical meaning of metaphoric expressions, and the literal meaning of those expressions? This, I believe, is the most crucial issue underlying a more general question, as to wht metaphors mean. |143|metaphor, meaning, semantic change 1014|Rahat1990|If we consider the history of the language devices comprising a metahorical utterance, what it should mean is what we would take it to mean literally. An utterance of "love is oxygen", for example, is an indicative sentance that has the proper function of creating in a hearer a true belief to the effect that love is oxygen. |149|metaphor, meaning, semantic change 1015|Rahat1990|My basic claim is that metaphors are the linguistic analogue of biological mutations, and other forms of variation. In the following section, the question of whether biological variations have a proper function will be discussed, and later on the conclusions of this discussion will be applied to the question of systematicity and metaphor.|150|mutation, metaphor, semantic change 1016|Wichmann2009|Previous empirical studies of population size and language change have produced equivocal results. We therefore address the question with a new set of lexical data from nearly one-half of the world's languages. We first show that relative population sizes of modern languages can be extrapolated to ancestral languages, albeit with diminishing accuracy, up to several thousand years into the past. We then test for an effect of population against the null hypothesis that the ultrametric inequality is satisfied by lexical distances among triples of related languages. The test shows mainly negligible effects of population, the exception being an apparently faster rate of change in the larger of two closely related variants. A possible explanation for the exception may be the influence on emerging standard (or cross-regional) variants from speakers who shift from different dialects to the standard. Our results strongly indicate that the sizes of speaker populations do not in and of themselves determine rates of language change. Comparison of this empirical finding with previously published computer simulations suggests that the most plausible model for language change is one in which changes propagate on a local level in a type of network in which the individuals have different degrees of connectivity.|000|population size, language change, rate of change 1017|Haspelmath2004|a. *Survival of the Frequent ("Unmarked")* (e.g. Winter 1971, Wurzel 1994) When a grammatical distinction is given up, it is the more frequent category that survives. (E.g. plural forms survive when dual/plural distinction is lost.) b. *Sound Alternations Result from Sound Change* (phonetics > phonology; morphology > phonology) c. *From Space to Time (e.g. Haspelmath 1997b)* (spatial > temporal marker; temporal > spatial marker) d. *From Something to Nothing (Haspelmath 1997a:230)* 'something' > 'nothing' ('nothing' > 'something') e. *From Esses to Aitches: s > h (h > s) (Ferguson 1990)* |2|directionality, language change, universality 1018|Haspelmath2004|Now when we look at reasonably robust universals of language change, we see that many of them take the form of directionality constraints. Of the five examples in (1) four have the form "A can change into Y, but Y cannot change into X". Especially in phonology, it is easy to find cases of this type, and I list a few more in (2). :comment:`mentiones some interesting cases, keep this as a potential reference for a paper on ordered character state models`|3|sound change, directionality, language change, universals, 1019|Ferguson1990|One of the most powerful tools in the armamentarium of linguists engaged in the study of diachronic phonology is the often implicit notion that some changes are phonetically more likely than others . Thus if a linguist finds a systematic correspondence between [g] and [dJ] in two related language varieties, it will be reasonable to assume that the stop is the older variant and the affricate the younger one until strong counter evidence is found. The linguist makes such an assumption because experience with many languages has shown that the change of [g] to [dJ] is fairly common and tends to occur under certain well- documented conditions whereas the reverse change is unusual and problematic. :comment:`quoted from Haspelmath2002`|59f|sound change, directionality, language change, universals, 1020|Rogers2014|Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for various primate species, and analyses of several others are underway. Whole-genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other non-human primates offer valuable insights into genetic similarities and differences among species that are used as models for disease-related research. This Review summarizes current knowledge regarding primate genome content and dynamics, and proposes a series of goals for the near future.|000|incomplete lineage sorting, biology, 1021|Rogers2014|Incomplete lineage sorting (ILS). The process by which, as a result of segregation of an ancestral polymorphism, the evolutionary relationships among a series of homologous DNA sequences in a set of distinct populations do not match the phylogenetic relationships among the overall populations; that is, the gene trees do not match the population trees.|351|incomplete lineage sorting, definition 1022|Galtier2008|Incongruence between gene trees is the main challenge faced by phylogeneticists in the genomic era. Incongruence can occur for artefactual reasons, when we fail to recover the correct gene trees, or for biological reasons, when true gene trees are actually distinct from each other, and from the species tree. Horizontal gene transfers (HGTs) between genomes are an important process of bacterial evolution resulting in a substantial amount of phylogenetic conflicts between gene trees. We argue that the (bacterial) species tree is still a meaningful scientific concept even in the case of HGTs, and that reconstructing it is still a valid goal. We tentatively assess the amount of phylogenetic incongruence caused by HGTs in bacteria by comparing bacterial datasets to a metazoan dataset in which transfers are presumably very scarce or absent. We review existing phylogenomic methods and their ability to return to the user, both the vertical (speciation/extinction history) and horizontal (gene transfers) phylogenetic signals.|000|incomplete lineage sorting, phylogenetic incongruence, biology, 1023|Galtier2008|Three major evolutionary mechanisms potentially resulting in true phylogenetic discordance between genes are known: incomplete lineage sorting; hidden paralogy; and horizontal gene transfer (HGT). |4023|incomplete lineage sorting, phylogenetic incongruence, paralogy, lateral gene transfer 1024|Galtier2008|Incomplete lineage sorting occurs when an ancestral species undergoes several speciation events in a short period of time. If, for a given gene, the ancestral polymorphism is not fully resolved into two monophyletic lineages when the second speciation occurs, then with some probability the gene tree will be different from the species tree (Tajima 1983; Pamilo & Nei 1988). |4023|incomplete lineage sorting, definition 1025|Galtier2008|A very different reason why gene trees can be truly incongruent is hidden paralogy. If a dataset includes paralogous copies, then the true phylogeny will partly reflect the duplication history of the gene that is independent of species divergence history.|4023|phylogenetic incongruence, paralogy 1026|Galtier2008|The third mechanism is HGT. If genetic exchanges occur between species, then the phylogeny of individual genes will be influenced by the number and nature of transfers they have undergone.|4023|phylogenetic incongruence, lateral gene transfer 1027|Roy2009|Comparative genomics has revealed the ubiquity of gene and genome duplication and subsequent gene loss. In the case of gene duplication and subsequent loss, gene trees can differ from species trees, thus frequent gene duplication poses a challenge for reconstruction of species relationships. Here I address the case of multi-gene sets of putative orthologs that include some unrecognized paralogs due to ancestral gene duplication, and ask how outgroups should best be chosen to reduce the degree of non-species tree (NST) signal. Consideration of expected internal branch lengths supports several conclusions: (i) when a single outgroup is used, the degree of NST signal arising from gene duplication is either independent of outgroup choice, or is minimized by use of a maximally closely related post-duplication (MCRPD) outgroup; (ii) when two outgroups are used, NST signal is minimized by using one MCRPD outgroup, while the position of the second outgroup is of lesser importance; and (iii) when two outgroups are used, the ability to detect gene trees that are inconsistent with known aspects of the species tree is maximized by use of one MCRPD, and is either independent of the position of the second outgroup, or is maximized for a more distantly related second outgroup. Overall, these results generalize the utility of closely-related outgroups for phylogenetic analysis.|000|gene duplication, paralogy 1028|Mufwene2001|languages are analogs of parasitic species|179|biological parallels, language, analogy 1029|Mayr2004|[Biological species are] groups of interbreeding natural populations that are reproductively (genetically) isolated from other such groups. |177|definition, species, biology 1030|Croft2000|SIBLING LANGUAGES are two linguistic varieties that are so similar that they are considered to be “dialects of the same language”, yet are perceived by the speakers ... as distinct languages.|16|sibling languages, terminology, linguistic varieties, diasystem 1031|Croft2000|[...] language “speciation” is more like plant speciation than animal speciation.|8|speciation, plants, biological parallels, divergence 1032|Mayr2004|[...] the BSC is inapplicable to asexual organisms, which form clones, not populations.|182|species, biological parallels, asexual organisms 1033|Mufwene2001|[...] the linguistic species need not be a clone of any biological species, despite the fact that it shares several properties with the parasitic species |145|language as species, parasites, analogy, biological parallels 1034|Mufwene2001|[...] there is no particular reason why every structural notion applicable to a biological species should be applicable to a linguistic species. |30|biological parallels, language as species, species, 1035|Mufwene2001|[...] I gave up unsuccessful attempts to clone the linguistic species on the biological species ... and developed my own notion of a linguistic species. |xiv|analogy, species, language as species, biological parallels 1036|Lyell1863|Progressive improvement in language is a necessary consequence of the progress of the human mind from one generation to another. As civilisation advances, a greater number of terms are required to express [...] ideas and things, which a single word had before signified, though somewhat loosely and imperfectly.|chapter 23|language change, progression, 1037|McMahon1994|:comment:`rejects the notion of either decay or progress in language change` |324f|language change, progress, decay 1038|Labov1972|[...] the same mechanisms which operated to produce the large- scale changes of the past may be observed operating in the current changes taking place around us.|161|uniformitarianism, linguistics, language history 1039|Lass1997|Nothing that is now impossible *in principle* was ever the case.|26|uniformitarianism, historical linguistics, language change, language history 1040|Croft2003|[...] the languages of the past [...] are not different in nature from those of the present.|233|language history, uniformitarianism, historical linguistics 1041|Gould1987|[...] the history of our earth ... follows no vector of progress in any inexorable direction. Our planet always looked and behaved just about as it does now. :comment:`Gould on uniformitarianism: (1) law across time and space, (2) uniformity of process, (3) uniformity of rate, (4) uniformity of state.`|123|uniformitarianism, history of science, biology 1042|Lass2003|The fundamental issue is whether the terms of the metaphor actually have referents, or at least can point to some ontologically specifiable domain.|48|language evolution, evolution, biological parallels 1043|Lewens2007|[...] memetics merely offers a cosmetic re-packaging of a familiar set of stories about cultural change.|???|meme, memetics, cultural evolution, cultural selection, biological parallels 1044|Kirby2000|We must [...] be careful of analogies such as these [...] whereas grammars have to be reconstructed every generation through learning or acquisition, DNA sequences do not (they are physically passed on and copied).|FN2|biological parallels, language acquisition, reproduction 1045|Croft2000|It is difficult to describe the language learning process as a replication of the grammar of the parent by the child [because the process is] very indirect.|45|biological parallels, language acquisition, reproduction 1046|Atkinson2005|=============================== ============================== Biological evolution Linguistic evolution =============================== ============================== Discrete characters Lexicon, syntax, and phonology Homologies Cognates Mutation Innovation Drift Drift Natural selection Social selection Cladogenesis Lineage splits Horizontal gene transfer Borrowing Plant hybrids Language Creoles Correlated genotypes/phenotypes Correlated cultural terms Geographic clines Dialects/dialect chains Fossils Ancient texts Extinction Language death =============================== ============================== |???|biological parallels, linguistics, biology, analogy 1047|Zuckerkandl1965|Of all natural systems, living matter is the one which, in the face of great transformations, preserves inscribed in its organization the largest amount of its own past history.|357|molecules, biological parallels, historical linguistics, analogy 1048|Ringe2002|Languages replicate themselves (and thus ‘survive’ from generation to generation) through a process of native-language acquisition by children. Importantly for historical linguistics, that process is tightly constrained.|61|biological parallels, language acquisition, reproduction 1049|Zuckerkandl1965|Different types of molecules are discussed in relation to their fitness for providing the basis for a molecular phylogeny. Best fit are the “semantides”, i.e. the different types of macromolecules that carry the genetic information or a very extensive translation thereof. The fact that more than one coding triplet may code for a given amino acid residue in a polypeptide leads to the notion of “isosemantic substitutions” in genic and messenger polynucleotides. Such substitutions lead to differences in nucleotide sequence that are not expressed by differences in amino acid sequence. Some possible consequences of isosemanticism are discussed.|000|molecules, documents, biological parallels, analogy 1050|Walkden2012|In this paper I question the Inertial Theory of language change put forward by Longobardi (2001), which claims that syntactic change does not arise unless caused and that any such change must originate as an ‘interface phenomenon’. It is shown that these two claims and the contention that ‘syntax, by itself, is diachronically completely inert’ (Longobardi, 2001:278), if construed as a substantive, falsifiable theory of diachrony, make predictions that are too strong, and that they cannot be reduced (as seems desirable) to properties of language acquisition. I also express doubt as to the utility and necessity of a methodological/heuristic principle of Inertia.|000|inertial theory, language evolution, syntax, syntactic reconstruction 1051|Laubichler2007|Francois Jacob's article 'Evolution and Tinkering' published in Science in 1977 is still the locus classicus for the concept of tinkering in biology. It first introduced the notion of tinkering to a wide audience of scientists. Jacob drew on a variety of different sources ranging from molecular biology to evolutionary biology and cultural anthropology. The notion of tinkering, or more accurately, the concept of bricolage, are conceptual abstractions that allow for the theoretical analysis of a wide range of phenomena that are united by a shared underlying process--tinkering, or the opportunistic rearrangement and recombination of existing elements. This paper looks at Jacob's analysis as itself an example of conceptual tinkering. It traces the history of some of its elements and sketches how it has become part of an inclusive discourse of theoretical biology and evolutionary developmental biology that emerged over the last 30 years. I will argue that the theoretical power of Jacob's analysis lies in the fact that he captured a widespread phenomenon. His conceptual analysis is thus an example of an interdisciplinary synthesis that is based on a shared process rather than a shared object.|000|tinkering, evolution, biological parallels 1052|GodfreySmith2015|Third, a scaffolded reproducer is an entity that reproduces (or is reproduced) in a way highly dependent on resources external to itself.|10121|scaffolded evolution, definition 1053|Shiro1961|:comment:`Indicates different ways of coding cognacy in Swadesh lists.` `+` indicates perfect correspondence between the two forms compared. This symbol is also extended to items which have been subjected to analogical modification. With regard to 'verbs' and 'adjectives', however, only roots are taken into consideration. [geteilt-zeichen :-] indicates that differences are due to an affix or other modifying element in one of the forms compared, or are due to morphophonemic change. (These cases are rated + by Swadesh and others.) `-` indicates that the two forms compared are etymologically different. `×` indicates that + or + might be appropriate except for the fact that the form in one or both dialects compared is a borrowing from the Standard Japanese. `=` indicates that - or 0 might be appropriate, but with the same reservation as in the preceding paragraph: the form in one of the dialects is a borrowing from the Standard Japanese. `O` indicates that an etymology is obscure, `( )` indicates loan words from Chinese which, though rated + or +, cannot be derived from proto-Japanese. `—` indicates that there is no appropriate form in the dialect for that test list item. A blank space indicates that the appropriate form was not obtained. |54|homology, cognacy, morphological change, partial cognacy 1054|OSC2015|Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams. |000|reproducibility, psychology, research, science 1055|DeGroot1956|Adrianus Dingeman de Groot (1914–2006) was one of the most influential Dutch psychologists. He became famous for his work “Thought and Choice in Chess”, but his main contribution was methodological — De Groot co-founded the Department of Psychological Methods at the University of Amsterdam (together with R. F. van Naerssen), founded one of the leading testing and assessment companies (CITO), and wrote the monograph “Methodology” that centers on the empirical-scientific cycle: observation–induction–deduction–testing– evaluation. Here we translate one of De Groot's early articles, published in 1956 in the Dutch journal Nederlands Tijdschrift voor de Psychologie en Haar Grensgebieden. This article is more topical now than it was almost 60 years ago. De Groot stresses the difference between exploratory and confirmatory (“hypothesis testing”) research and argues that statistical inference is only sensible for the latter: “One ‘is allowed’ to apply statistical tests in exploratory research, just as long as one realizes that they do not have evidential impact”. De Groot may have also been one of the first psychologists to argue explicitly for preregistration of experiments and the associated plan of statistical analysis. The appendix provides annotations that connect De Groot's arguments to the current-day debate on transparency and reproducibility in psychological science.|000|significance, reproducibility, science, 1056|Jaeger2015|Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily. |000|mass comparison, deep genetic relations, phylogenetic reconstruction, ASJP 1057|Andreopoulos2009|Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.|000|clustering, partitioning, bioinformatics 1058|Andreopoulos2009|This article gives an overview on different cluster and partitioning algorithms. It distinguishes hierarchical from graph-based cluster algorithms. In detail, the following types are distinguished: 1 Partitioning k-Means 2 Hierarchical 3 Density-based DBSCAN 4 Model-based 5 Graph-based|000|clustering, partitioning, bioinformatics 1059|Namboodiri2013|SpecP is an open-source Python module that performs Spectral Partitioning on Protein Contact Graphs. Protein Contact Graphs are graph theory based representation of the protein structure, where each amino acid forms a ‘vertex’ and spatial contact of any two amino acids is an ‘edge’ between them. Spectral partitioning is carried out in SpecP based on the second smallest spectral value (eigen value) of the Protein Contact Graph. The eigen vector corresponding to the second smallest spectral value are partitioned into two clusters based on the sign of the corresponding vector entry. Spectral Partitioning algorithm is repeatedly carried out until the desired numbers of partitions are obtained. SpecP visualizes the spectrally partitioned clusters of protein structure along with the Protein Contact Map and Protein Contact Graph which can be saved for later use. It also possesses an interactive mode whereby the user has the ability to zoom, pan, resize and save these raster images in various image formats (.eps, .jpg, .png) manually. SpecP is a stand-alone extensible tool useful for structural analysis of proteins.|000|spectral partitioning, partitioning, protein contact graph, graph theory, bioinformatics 1060|Namboodiri2013|Spectral partitioning is a graph partition algorithm which partitions data represented in the form of a graph G = (V,E), with V vertices and E edges, into smaller components with specific properties [1]. Spectral partitioning has gained momentum in recent times due to its simplicity and better performance. They have been successfully applied in protein science [2].|545|spectral partitioning, partitioning, definition, bioinformatics 1061|GodfreySmith2015|A simple reproducer is something that can give rise to more objects of the same kind largely through the operation of resources internal to it—through its own biological machinery, in a broad sense—and, further, is not made of smaller parts that also have this capacity. A paradigm case is a bacterial cell.|10121|reproduction, simple reproduction 1062|GodfreySmith2015|A collective reproducer is a reproducing object that has parts that are themselves simple or collective re- producers. A paradigm case is a multicellular organism such as a human, which is made of cells that also can reproduce.|10121|reproduction, sexual reproduction, collective reproduction 1063|GodfreySmith2015|Third, a scaffolded reproducer is an entity that reproduces (or is repro- duced) in a way highly dependent on resources external to itself. Paradigm cases are viruses and also genes; the copying of genes is a form of reproduction, but it is dependent on the machinery of a whole cell. The photocopying of a piece of paper also is scaffolded reproduction in this sense.|10121|scaffolded reproduction, reproduction 1064|GodfreySmith2015|All three forms of reproduction— simple, collective, and scaffolded—are sufficient to generate parent–offspring lineages in a population of objects, but they have different requirements and different kinds of borderline cases.|10121|reproduction 1065|GodfreySmith2015|A reproducer of one kind contains reproducers of other kinds. A simple reproducer need not be self-contained or simple in a more general sense; it may need a great deal of environmental support, and it might be a biologically complicated object. The term simple applies only to its mode of reproduction. Simple reproducers need not be the lowest-level reproducing entities in a hierarchy: A bacterial cell contains scaffolded reproducers but still qualifies as a simple reproducer. Cell reproduction works in part through the copying (reproduction) of genetic material.|10121|reproduction 1066|GodfreySmith2015|One further category comprises objects or structures that recur without reproducing. An example is a heart. Hearts are recurring biological objects, seen in generation after generation, but new hearts are not brought into existence by preexisting hearts in the way new cells are brought into existence by preexisting cells.|10122|reconstructed reproducers, reproduction 1067|Pinker1994|I have no problem with Greenberg’s use of many loose correspondences, or even with the fact that some of his data contains random errors. What bothers me more is his reliance on gut feelings of similarity rather than on actual statistics that control for the number of correspondences that might be expected by chance…Though I am willing to be patient with Nostratic and similar hypotheses pending the work of a good statistician with a free afternoon, I find the Proto-World hypothesis especially suspect. :comment:`Secondary quote from a blog-post, book from "The Language Instinct"`|255|genetic relationship, proof of relationship, chance resemblance 1068|Merrell2001|[there is no] necessary natural link [...], or a link due to some resemblance or similarity [...].|31|arbitrariness, linguistic sign, semantics, 1069|Liao1999|A surprisingly large fraction of the eukaryotic genome is repetitive. More than one third of the human genome consists of interspersed repetitive DNA, and tandemly repeated DNA sequences may occupy as much as 10% of the human genome. The majority of the repetitive sequences are nongenic; the rest encode multigene families. The genomic organization of repetitive DNA sequences takes different forms: these repetitive sequences either disperse throughout the genome, as with short interspersed sequences (SINEs), long interspersed sequences (LINEs), and transposable elements, or, like tRNA genes and human histone genes, they may cluster in one or a few chromosomal regions. Multigene families, including those for ribosomal RNAs (rRNAs) and small nuclear RNAs (snRNAs) (Pavelitz et al. 1995), as well as noncoding sequences such as satellite DNA, minisatellite sequences, and microsatellite sequences (Charlesworth et al. 1994), are often arranged in tandem arrays. Despite the abundance of repetitive DNA in the genomes of eukaryotic organisms, the biological functions, if any, of noncoding repetitive sequences remain elusive. However, most repetitive sequences, whether coding or noncoding, exhibit an unexpected property: they evolve in a concerted fashion.|000|concerted evolution, evolutionary processes, 1070|Murphy1999|Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10–12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made. |000|amino acid alphabets, protein folding, sound classes, biological parallels 1072|Haspelmath1999|In this programmatic paper, I argue that the universal constraints of Optimality Theory (OT) and the functional explanations of functionalists need to be complemented by a theory of diachronic adaptation. OT constraints are traditionally stipulated as part of Universal Grammar, but this misses the generalization that the grammatical constraints normally correspond to constraints on language use. As in biology, observed adaptive patterns in language can be explained through diachronic evolutionary processes, as the unintended cumulative outcome of numerous individual intentional actions. The theory of diachronic adaptation also provides a solution to the teleology problem, which has often been used as an argument against functional explanations. Finally, I argue against the view that the grammatical constraints could be due to accident, and I conclude that an explanatory theory of grammatical structure needs a theory of adaptation.|000|optimality theory, universal grammar, diachronic adaptation 1073|Langacker1977|I believe we can isolate a number of broad categories of linguistic optimality. Languages will tend to change so as to maximize optimality in each of these categories... The tendencies toward these various types of optimality will often conflict with one another. :comment:`quoted from` @Haspelmath1999|102|optimality theory, naturalness, language change, 1074|Haspelmath1999|With the advent of structuralism and its rigid synchrony/diachrony separa- tion, this kind of thinking went out of fashion, as the focus was now on explicit and elegant descriptions of individual languages, rather than on highly general (if often vague) explanatory principles. But after several decades of abstention, linguists again began to become interested in highly general principles; and since principles can be formulated in a more general way if they are violable, this meant that the idea of conflicting preferences resurfaced. Within one tradition, such competing preferences were called *naturalness principles* in conflict (e. g. Dressier 1977:13, Dressier et al. 1987:7,93); in another, competing motivations (Haiman 1983:812, Du Bois 1985, Croft 1990: §7.4). @Langacker<1977> (1977:102) used the term *optimality*.|181|structuralism, language change, naturalness, optimality theory, motivation 1075|Kirby2000|In this excellent paper, Haspelmath highlights the importance of understanding the mechanism that links, on the one hand, functional pressures on language use, and on the other, patterns of grammaticality within and across languages. The author argues that there is a close mapping between these two disparate aspects of language and demonstrates this within the framework of Optimality Theory. This is particularly interesting since OT might be considered to be a typical generative theory of language. As such, there is no theory-internal reason to expect OT constraints to map well onto pressures from language use. Haspelmath argues convincingly, however, not only that OT constraints do indeed match functional pressures, but even that OT theorists actually justify particular constraints with reference to function. |000|optimality theory, adaptation, diachronic adaptation, language change 1076|Kirby2000|This paper references the idea of diachronic adaptation brought up by @Haspelmath<1999> (1999).|000|diachronic adaptation, language change, grammaticalization, optimality theory 1077|Tynjanow1928|Der Gegensatz von Synchronie und Diachronie war der von Systembegriff und Evolutionsbegriff; er verliert grundsátzlich seinen Sinn, sofern wir anerkennen, daß jedes System notwendigerweise als Evolution auftritt, und andererseits die Evolution zwangsläufig Systemcharakter trägt. :translation:`The distinction between synchrony and diachrony was the same as the distinction between system and evolution. It looses its sense completely, if only we acknowledge that each system necessarily manifests as evolution, while, on the other hand, evolution necessarily bears systemic character.` |68|complex systems, structuralism, systemic evolution, systemic processes 1078|Schmidt1872|Man mag sich also drehen und wenden wie man will, so lange man an der anschauung fest hält, dass die in historischer zeit erscheinenden sprachen durch merfache gabelungen aus der ursprache hervorgegangen seien, d.h. so lange man einen stammbaum der indogermanischen sprachen annimmt, wird man nie dazu gelangen alle die hier in frage stehenden tatsachen wissenschaftlich zu erklären. :translation:`No matter how we look at it, as long as we stick to the assumption that today's languages originated from their common proto-language via multiple furcation, we will never be able to explain all facts in a scientifically adequate way.`|17|wave theory, family tree 1079|Schmidt1872|Ich möchte an seine [des Baumes] stelle das bild der welle setzen, welche sich in concentrischen mit der entfernung vom mittelpunkte immer schwächer werdenden ringen ausbreitet. :translation:`I want to replace [the tree] by the image of a wave that spreads out from the center in concentric circles becoming weaker and weaker the farther they get away from the center.`|27|wave theory, family tree 1080|Schmidt1872|Nimmt man nun eine graecoitalische grundsprache an, so wird man diese sämmtliche griechischen und lateinischen worte, welche sich im arischen widerfinden, zusprechen müssen. Soll es nun reiner zufall sein, dass von disen 123 worten im italischen nur 24, im griechischen aber 103 erhalten sind? Wer die geographischen verhältnisse in betracht zieht, wird an solchen zufall schwer glauben.|24|wave theory, patchy data 1081|Brezina2013|The current study presents a New General Service List (new-GSL), which is a result of robust comparison of four language corpora (LOB, BNC, BE06, and EnTenTen12) of the total size of over 12 billion running words. The four corpora were selected to represent a variety of corpus sizes and approaches to representativeness and sampling. In particular, the study investigates the lexical overlap among the corpora in the top 3,000 words based on the average reduced frequency (ARF), which is a measure that takes into consideration both frequency and dispersion of lexical items. The results show that there exists a stable vocabulary core of 2,122 items (70.7%) among the four corpora. Moreover, these vocabulary items occur with comparable ranks in the individual wordlists. In producing the new-GSL, the core vocabulary items were combined with new items frequently occurring in the corpora representing current language use (BE06 and EnTenTen12). The final product of the study, the new-GSL, consists of 2,494 lemmas and covers between 80.1 and 81.7 per cent of the text in the source corpora.|000|basic vocabulary, core vocabulary, general service list 1082|Fitzpatrick2013|This article argues that, across different psychological contexts, the methods of data collection, treatment, and analysis in word association tests have hitherto been inconsistent. We demonstrate that this inconsistency has resulted from inadequate control, in previous studies, of certain important variables including the basis of norm comparisons, and we present a principled method for collect- ing, scoring, and analysing association responses, to address these issues. The method is evaluated using test and retest data sets from 16-year-old and over- 65-year-old twins (n = 636), which enable us to (a) compare samples matched for key environmental variables, (b) assess the transferability of norming information between age cohorts, and (c) evaluate the reliability of the scoring protocols. We find systematic differences in the association behaviour of the two age cohorts, indicating the importance of evaluating data only against norms lists that are matched to the target population. Individual association behaviour is found to be consistent across test times, both in terms of response stereotypy and response type.|000|word association data, semantic similarity 1083|Bonin2004|his paper concerns the influence of age of acquisition (AoA) in word reading and other tasks, and attempts to develop a number of issues raised by Zevin and Seidenberg (2002). Analyses performed on both rated and objective measures of AoA show that the frequency trajectory of words is a reliable predictor of their order of acquisition, which validates its use as a variable to examine age-limited learning effects. We report a large-scale multiple regression study of French word reading which shows that controlling for cumulative frequency (derived from child and adult frequency counts) does not result in the removal of an effect of AoA in reading aloud French words, but there was no effect of frequency trajectory. We also report some re-analyses of previous published data which show that frequency trajectory has a reliable influence on spoken and written object naming latencies and lexical decision times, but not on spelling-to- dictation or word reading latencies. Cumulative frequency has a reliable effect in all tasks. The methodological and theoretical implications of these findings are discussed.|000|age of acquisition, word reading task, memory 1084|Coltheart1986|Evidence is presented from two experiments and from a reanalysis of data published by Chris- tian, Bickley, Tarka, and Clayton (1978) that the chronological age at which a word is acquired does not affect free recall or recognition memory. Morris's (1981)report, that late acquired words are better recalled than early acquired words, was not replicated and appears to be attributable to a difference in the emotionality value of his lists. Although the data are consistent with an interpretation in terms of semantic, but not episodic, memory tasks' being sensitive to word age of acquisition, it is suggested that a more fine-grained analysis is necessary.|000|age of acquisition, word imagery, memory, speech norms 1085|Hauser2014|Understanding the evolution of language requires evidence regarding origins and processes that led to change. In the last 40 years, there has been an explosion of research on this problem as well as a sense that considerable progress has been made. We argue instead that the richness of ideas is accompanied by a poverty of evidence, with essentially no explanation of how and why our linguistic computations and representations evolved. We show that, to date, (1) studies of nonhuman animals provide virtually no relevant parallels to human linguistic communication, and none to the underlying biological capacity; (2) the fossil and archaeological evidence does not inform our understanding of the computations and representations of our earliest ancestors, leaving details of origins and selective pressure unresolved; (3) our understanding of the genetics of language is so impoverished that there is little hope of connecting genes to linguistic processes any time soon; (4) all modeling attempts have made unfounded assumptions, and have provided no empirical tests, thus leaving any insights into language’s origins unverifiable. Based on the current state of evidence, we submit that the most fundamental questions about the origins and evolution of our linguistic capacity remain as mysterious as ever, with considerable uncertainty about the discovery of either relevant or conclusive evidence that can adjudicate among the many open hypotheses. We conclude by presenting some suggestions about possible paths forward.|000|language evolution, Chomsky, language origin 1086|MendivilGiro2014|The goal of the present contribution is to explore what kinds of objects languages are from a biolinguistic point of view. I define the biolinguistic point of view as a naturalistic study of languages and I show that from this point of view, languages are human language organs, that is, they are natural objects. However, languages change over time; therefore, they are also historically modified objects. Considering that natural organisms are historically modified natural objects, I look for inspiration in evolutionary theory to better specify what kinds of objects languages are and how they change and diversify. I conclude that every language is a ‘unique evolutionary history’ within a restricted space of design. This conclusion means that although the structure of languages reveals aspects of formal elegance and aspects of functional efficiency, there are no arguments to state that these aspects are manifested more or less intensely in some languages than in others. Then their formal and functional aspects are part of what is common to all languages, while variable parts of language are a reflection of the essentially historical nature of the lexical interface between the components of our language organs.|000|language evolution, language model, biological parallels 1087|Bast2015|Evolutionary patterns of languages and organisms have surprising similarity, as Darwin famously captured by what he termed as “Curious Parallelism.” While traditional comparative and historical linguistic methods such as detailed analysis of cognate correspondences reveal similarities between languages and group them into linguistic families like how Carl Linnaeus grouped organisms into taxonomical hierarchies based on overall similarity-an approach known as phenetic clustering, this will not help to answer such as “when did Proto-Dravidian split to Proto-South Dravidian and Proto-South-Central Dravidian?” and so on. Conventional methods for dating linguistic trees such as glottochronology are severely flawed such that these have now been largely discredited. Proposed in this invited editorial is the direct extension of the molecular clock hypothesis and time-calibration techniques of molecular phylogenetics to the field of phylolinguistics. For ‘calibration checkpoints’, ancient dated texts, as well as dated and reliable historical information (such as Cro-Magnon migration to Europe, etc.), can be employed. Also deliberated here is a call to make use of Maximum Parsimony-based approaches for the ancient character-state reconstruction, for reconstructing long-lost languages.|000|time calibration, glottochronology, historical linguistics, lexicostatistics 1088|Hadikin2015|In this paper I discuss similarities and differences between a potential new model of language development - lexical selection, and its biological equivalent - natural selection. Based on Dawkins' (1976) concept of the meme I discuss two units of language and explore their potential to be seen as linguistic replicators. The central discussion revolves around two key parts - the units that could potentially play the role of replicators in a lexical selection system and a visual representation of the model proposed. draw on work by Hoey (2005), Wray (2008) and Sinclair (1996, 1998) for the theoretical basis; Croft (2000) is highlighted as a similar framework. Finally brief examples are taken from the free online corpora provided by the corpus analysis tool Sketch Engine (Kilgarriff, Rychly, Smrz and Tugwell 2004) to ground the discussion in real world communicative situations. The examples highlight the point that different situational contexts will allow for different units to flourish based on the local social and linguistic environment. The paper also shows how a close look at the specific context and strings available to a language user at any given moment has potential to illuminate different aspects of language when compared with a more abstract approach.|000|lexical change, lexical replacement, lexical evolution 1089|RamonCasas2010|Early in development infants can discriminate many phonetic contrasts, either present or not in their native language, while after a few months of exposure to the language in their environment, perceptual reorganization begins to take place. These perceptual processes reflect increasing sensitivity towards the sound categories of their native language and a perceptual decline for contrasts that are not present in the ambient language.|000|bilingualism, cognates, perception, phonology, phonetic contrast 1090|Wheeler2015|**Background** Many problems in comparative biology are, or are thought to be, best expressed as phylogenetic “networks” as opposed to trees. In trees, vertices may have only a single parent (ancestor), while networks allow for multiple parent vertices. There are two main interpretive types of networks, “softwired” and “hardwired.” The parsimony cost of hardwired networks is based on all changes over all edges, hence must be greater than or equal to the best tree cost contained (“displayed”) by the network. This is in contrast to softwired, where each character follows the lowest parsimony cost tree displayed by the network, resulting in costs which are less than or equal to the best display tree. Neither situation is ideal since hard-wired networks are not generally biologically attractive (since individual heritable characters can have more than one parent) and softwired networks can be trivially optimized (containing the best tree for each character). Furthermore, given the alternate cost scenarios of trees and these two flavors of networks, hypothesis testing among these explanatory scenarios is impossible. **Results** A network cost adjustment (penalty) is proposed to allow phylogenetic trees and soft-wired phylogenetic networks to compete equally on a parsimony optimality basis. This cost is demonstrated for several real and simulated datasets. In each case, the favored graph representation (tree or network) matched expectation or simulation scenario. **Conclusions** The softwired network cost regime proposed here presents a quantitative criterion for an optimality-based search procedure where trees and networks can participate in hypothesis testing simultaneously. |000|phylogenetic network, phylogenetic reconstruction, parsimony, genetic classification 1091|Vogt2010|The present article discusses the need for standardization in morphology in order to increase comparability and communicability of morphological data. We analyse why only morphological descriptions and not character matrices represent morphological data and why morphological terminology must be free of homology assumptions. We discuss why images only support and substantiate data but are not data themselves. By comparing morphological traits and DNA sequence data we reveal fundamental conceptual shortcomings of the former that result from their high average degree of individuality. We argue that the delimitation of morphological units, of datum units, and of evidence units must be distinguished, each of which involves its own specific problems. We conclude that morphology suffers from the linguistic problem of morphology that results from the lack of (i) a commonly accepted standardized morphological terminology, (ii) a commonly accepted standardized and formalized method of description, and (iii) a rationale for the delimitation of morphological traits. Although this is not problematic for standardizing metadata, it hinders standardizing morphological data. We provide the foundation for a solution to the linguistic problem of morphology, which is based on a morphological structure concept. We argue that this structure concept can be represented with knowledge representation languages such as the resource description framework (RDF) and that it can be applied for morphological descriptions. We conclude with a discussion of how online databases can improve morphological data documentation and how a controlled and formalized morphological vocabulary, i.e. a morphological RDF ontology, if it is based on a structure concept, can provide a possible solution to the linguistic problem of morphology.|000|standardization, biology, morphology, 1092|Searls1997|Biologists have long made use of linguistic metaphors in describing and naming cellular processes involving nucleic acid and protein sequences. Indeed, it is very natural to view the genetic ‘text’ and its sequential transliterations in these terms. However, a metaphor is not a tool, and it is necessary to ask whether the techniques used in analyzing other kinds of languages, such as human and computer languages, can in fact be of any use in tackling problems in molecular biology. This paper reviews the work of the author and others in applying the methods of computational linguistics to biological sequences.|000|biological parallels, grammar, biology, RNA folding, human language, 1093|Regier2007|The nature of color categories in the world's languages is contested. One major view holds that color categories are organized around universal focal colors, whereas an opposing view holds instead that categories are defined at their boundaries by linguistic convention. Both of these standardly opposed views are challenged by existing data. Here, we argue for a third view based on a proposal by Jameson and D'Andrade [Jameson KA, D'Andrade RG (1997) in Color Categories in Thought and Language, eds Hardin CL, Maffi L (Cambridge Univ Press, Cambridge, U.K.), pp 295–319]: that color naming across languages reflects optimal or near-optimal divisions of an irregularly shaped perceptual color space. We formalize this idea, test it against color-naming data from a broad range of languages and show that it accounts for universal tendencies in color naming while also accommodating some observed cross-language variation. |000|cognition, language evolution, color terms, color categories, 1094|Kay1999|Various revisions of the Berlin and Kay (1969) model of the evolution of basic color term systems have been produced in the last thirty years, motivated by both empirical and theoretical considerations. On the empirical side, new facts about color naming systems have continually come to light, which have demanded adjustments in lhe descriptive model. On the theoretical side, lhere has been a sustained effort to find motivation in the vision science literature regarding color appearance for the synchronic and diachronic constraints observed to govern color terminology systems. The present paper continues the pursuit of both of these goals. A new empirical question is addressed with data from the World Color Survey (WCS), and a revised model is proposed, which both responds to recently raised empirical questions and provides new motivation from the field of color vision for the observed constraints on color naming.|000|color terms, color categories, language evolution, perception, semantic change, semantic universals 1095|Syrjanen2013|Encouraged by ongoing discussion of the classification of the Uralic languages, we investigate the family quantitatively using Bayesian phylogenetics and basic vocabulary from seventeen languages. To estimate the heterogeneity within this family and the robustness of its subgroupings, we analyse ten divergent sets of basic vocabulary, including basic vocabulary lists from the literature, lists that exclude borrowing-susceptible meanings, lists with varying degrees of borrowing-susceptible meanings and a list combining all of the examined items. The results show that the Uralic phylogeny has a fairly robust shape from the perspective of basic vocabulary, and is not dramatically altered by borrowing- susceptible meanings. The results differ to some extent from the ‘standard paradigm’ classification of these languages, such as the lack of firm evidence for Finno-Permian.|000|Uralic languages, lexicostatistics, basic vocabulary 1096|Percival1987|«Languages are natural organisms that arose independently of human volition, grew in accordance with definite laws, and developed and in turn grew old and died. They are are also characterized by the set of phenomena we are accustomed to understand by the term ''life''.» :comment:`Quote translated from` @Schleicher1863 |8|organism, language change, language evolution, August Schleicher 1097|Haak2015|We generated genome-wide data from 69 Europeans who lived between 8,000–3,000 years ago by enriching ancient DNA libraries for a target set of almost 400,000 polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of Western and Far Eastern Europe followed opposite trajectories between 8,000–5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000–7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000-year-old Siberian6. By ~6,000–5,000 years ago, farmers throughout much of Europe had more hunter-gatherer ancestry than their predecessors, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~75% of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for a steppe origin9 of at least some of the Indo-European languages of Europe.|000|expansion, migration, Indo-European, Steppe Hypothesis 1098|Pagel1999|Phylogenetic trees describe the pattern of descent amongst a group of species. With the rapid accumulation of DNA sequence data, more and more phylogenies are being constructed based upon sequence comparisons. The combination of these phylogenies with powerful new statistical approaches for the analysis of biological evolution is challenging widely held beliefs about the history and evolution of life on Earth.|000|molecular evolution, ancestral state reconstruction 1099|Chacon2014|This paper presents a reconstruction of Proto-Tukanoan consonants and the classifica- tion of the family based on shared phonological innovations. The proposed reconstruction contrasts with previous comparative studies of the family by proposing a series of laryn- gealized stops (instead of voiced stops) and a different set of sounds for the alveolar and palatal points of articulation. Methodologically, it considers lexicostatistical methods to play a secondary role in the classification of languages in the Tukanoan family. It o ­ ffers a detailed discussion of contact vs. inherited linguistic traits in the Tukanoan family and in the Vaupes region, well known for its multilingualism and system of linguistic exogamy, and concludes with an interpretation of the evolution of the Tukanoan family in historical terms.|000|Proto-Tukano, Tukano languages, linguistic reconstruction, phylogenetic reconstruction, Southern American languages 1100|Hansson2008|Phonological systems show clear signs of being shaped by phonetics. Sound patterns are overwhelmingly phonetically ‘natural’, in that they reflect the influence of physical constraints on speech production and perception, and categorical phonological processes often mirror low-level gradient phonetic effects. The question of how best to explain and model the influence of phonetics on phonology has been approached in different ways, one of which situates the locus of explanation in the diachronic domain of language change, in particular sound change. On this view, recurrent sound patterns merely reflect recurrent sound changes with phonetic origins, typically in speech perception. Explicit models of sound change are reviewed and illustrated, in particular Ohala's listener-based model and Blevins’ Evolutionary Phonology framework, and the relevance of exemplar-based models of speech production and perception is also noted. Current issues of controversy regarding the adequacy of diachronic vs. synchronic explanations for the typology of sound patterns are surveyed.|000|sound change, sound patterns, evolutionary phonology 1101|Natale2000|The immunology of human papillomavirus (HPV) infections has peculiar characteristics. The long latency for cervical cancer development after primary viral infection suggests mechanisms that may aid the virus in avoiding the host immunosurveillance and establishing persistent infections. In order to understand whether molecular mimicry phenomena might explain the ability of HPV to avoid a protective immune response by the host cell, sequence similarity between HPV16 E7 oncoprotein and human self-proteins was examined by computer-assisted analysis. Data were obtained showing that the HPV16 E7 protein has high and widespread similarity to several human proteins involved in a number of critical regulatory processes. In addition, multiple identical and different E7 peptide motifs are present in the same human protein. Thus, sharing of common motifs between viral oncoproteins and molecules of normal cells may be one cause underlying the scarce immunogenicity of HPV infections. The hypothesis is advanced that synthetic peptides harbouring viral motifs not and/or scarcely represented in the host's cellular proteins may represent a valuable immunotherapeutic approach for cervical cancer treatment.|000|computer-aided approaches, computer-assisted analysis 1102|Greenhill2011|The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from a large database of Austronesian languages. Comparing the classifica- tion proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of the time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages. This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks.|000|Levenshtein distance, automatic linguistic reconstruction, automatic phylogenetic reconstruction, blackbox methods 1103|Barrachina2008|Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statis- tical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English–Spanish, English–German, and English–French.|000|computer-assisted translation, CAT, computer-aided approaches 1104|Rogozin2006|The Dollo parsimony method is based on the assumption that a complex character that has been lost during evolution of a particular lineage cannot be regained. When applicable, this principle leads to a substantial simplification of evolutionary analysis and provides for unambiguous reconstruction of evolutionary scenarios, which may not be attainable with other methods. In this chapter, applications of Dollo parsimony are described for the quantitative analysis of the dynamics of genome evolution. Dollo parsimony is the method of choice for reconstructing evolution of the gene repertoire of eukaryotic organisms because although multiple, independent losses of a gene in different lineages are common, multiple gains of the same gene are improbable. This contrasts with the situation in prokaryotes where the widespread occurrence of horizontal gene transfer makes multiple gains possible, thereby invalidating the Dollo principle. The chapter applies Dollo parsimony to reconstruct the scenario of evolution for the genomes of crown-group eukaryotes by assigning the loss of genes and emergence of new genes to the branches of the phylogenetic tree, and delineate the minimal gene sets for various ancestral forms. A similar analysis, with rather unexpected results, was performed to infer gain versus loss of introns in conserved eukaryotic genes. The applicability of the Dollo principle for these and other problems in evolutionary genomics is discussed.|000|Dollo parsimony, Dollo model, parsimony, maximum parsimony 1105|Koehnlein2015| The production of a contour tone requires a longer duration than the production of a level tone. This paper demonstrates that this durational relationship becomes considerably more complex when tones are realized on bimoraic sonorant units that can support both level tones and contour tones. Evidence comes from diachronic processes in which pitch and duration interact. In languages where (intrinsic) durational differences between two groups of bimoraic units lead to tonal contrasts, the longer units commonly receive a contour tone, and the shorter ones a level tone; yet over time, the units with the fully developed contour tone tend to shorten, and those with the level tone tend to lengthen. Ultimately, this can even lead to durational reversals between the units in question. The discussion focuses primarily on Franconian tone accent dialects but also incorporates data from Estonian, Hup, Las Norias Piman and North Low Saxon. |000|tone, tone language, tone change, tonogenesis, sound change 1106|Kondrak2009|Identification of cognates and recurrent sound correspondences is a component of two principal tasks of historical linguistics: demonstrating the relatedness of languages, and reconstructing the histories of language families. We propose methods for detecting and quan- tifying three characteristics of cognates: recurrent sound correspondences, phonetic similarity, and semantic affinity. The ultimate goal is to identify cognates and correspondences directly from lists of words representing pairs of languages that are known to be related. The proposed solutions are language independent, and are evaluated against authentic linguistic data. The results of evaluation experiments involving the Indo-European, Algonquian, and Totonac lan- guage families indicate that our methods are more accurate than comparable programs, and achieve high precision and recall on various test sets. The results also suggest that combining various types of evidence substantially increases cognate identification accuracy.|000|sound correspondences, cognate detection, phonetic alignment 1107|Klukas2005|The analysis of patterns in graphs has applications in many fields of science. We propose a new method for analyzing graph patterns consisting of a user-friendly and flexible mechanism to specify patterns, an algorithm to recognize multiple appearances of patterns in a target graph, a pattern preserving layout algorithm, and a navigation technique to explore the underlying structure of the graph given by the patterns. This method has been implemented in a tool called PatternGravisto. We demonstrate the utility of our approach with the example graphs from the Graph Drawing Contest 2003 which cover problems from biology and sociology.|000|graph analysis, pattern analysis, network analysis, complex systems, networks 1108|Kemp2012|Languages vary in their systems of kinship categories, but the scope of possible variation appears to be constrained. Previous accounts of kin classification have often emphasized constraints that are specific to the domain of kinship and are not derived from general principles. Here, we propose an account that is founded on two domain-general principles: Good systems of categories are simple, and they enable informative communication. We show computationally that kin classification systems in the world’s languages achieve a near-optimal trade-off between these two competing principles. We also show that our account explains several specific constraints on kin classification proposed previously. Because the principles of simplicity and informativeness are also relevant to other semantic domains, the trade-off between them may provide a domain-general foundation for variation in category systems across languages. |000|kinship terms, semantics, language variation 1109|Jones2012|Phylogenetic models have recently been proposed for data that are best represented as a mathematical function (i.e. function valued). Such methods can be used to model the change over time in function-based descriptions of various data of interest to evolutionary biologists, including the sound of speech. This approach to phylogenetic inference and analysis is challenging, both in terms of modeling the phylogenetics of functions and in engaging with previously existing evidence for character-state change. Nevertheless, it is both a real and exciting prospect. Our approach could provide those interested in investigating a greater range of evolutionary processes with the ability to use statistical hypothesis-testing procedures and to create estimates of the states of function-valued characteristics (e.g. speech sounds) at earlier historical times.|000|phylogenetic reconstruction, speech sounds, speech acoustics 1110|Gilchrist2004|Paracrine factors secreted by oocytes play a pivotal role in promoting early ovarian follicle growth and in defining a morphogenic gradient in antral follicles, yet the exact identities of these oocyte factors remain unknown. This study was conducted to determine the extent to which the mitogenic activity of mouse oocytes can be attributed to growth differentiation factor 9 (GDF9). To do this, specific anti-human GDF9 monoclonal antibodies were generated. Based on epitope mapping and bioassays, a GDF9 neutralizing antibody, mAb-GDF9-53, was characterized with very low cross-reactivity with related transforming growth factor (TGF)beta superfamily members, including BMP15 (also called GDF9B). Pep-SPOT epitope mapping showed that mAb-GDF9-53 recognizes a short 4-aa sequence, and three-dimensional peptide modeling suggested that this binding motif lies at the C-terminal fingertip of mGDF9. As predicted by sequence alignments and modeling, the antibody detected recombinant GDF9, but not BMP15 in a Western blot and GDF9 protein in oocyte extract and oocyte-conditioned medium. In a mouse mural granulosa cell (MGC) bioassay, mAb-GDF9-53 completely abolished the mitogenic effects of GDF9, but had no effect on TGFbeta1 or activin A-stimulated MGC proliferation. An unrelated IgG at the same dose had no effect on GDF9 activity. This GDF9 neutralizing antibody was then tested in an established oocyte-secreted mitogen bioassay, where denuded oocytes cocultured with granulosa cells promote cell proliferation in a dose-dependent manner. The mAb-GDF9-53 dose dependently (0-160 microg/ml) decreased the mitogenic activity of oocytes but only by approximately 45% at the maximum dose of mAb. Just 5 microg/ml of mAb-GDF9-53 neutralized 90% of recombinant mGDF9 mitogenic activity, but only 15% of oocyte activity. Unlike mAb-GDF9-53, a TGFbeta pan-specific neutralizing antibody did not affect the mitogenic capacity of the oocyte, but completely neutralized TGF beta 1-induced DNA synthesis. This study has characterized a specific GDF9 neutralizing antibody. Our data provide the first direct evidence that the endogenous GDF9 protein is an important oocyte-secreted mitogen, but also show that GDF9 accounts for only part of total oocyte bioactivity.|000|alignment editor, GoCore, computer-aided approaches, computer-assisted analysis 1111|Donati2015|**BACKGROUND** Phylogenetic tree reconciliation is the approach of choice for investigating the coevolution of sets of organisms such as hosts and parasites. It consists in a mapping between the parasite tree and the host tree using event-based maximum parsimony. Given a cost model for the events, many optimal reconciliations are however possible. Any further biological interpretation of them must therefore take this into account, making the capacity to enumerate all optimal solutions a crucial point. Only two algorithms currently exist that attempt such enumeration; in one case not all possible solutions are produced while in the other not all cost vectors are currently handled. The objective of this paper is two-fold. The first is to fill this gap, and the second is to test whether the number of solutions generally observed can be an issue in terms of interpretation. **RESULTS** We present a polynomial-delay algorithm for enumerating all optimal reconciliations. We show that in general many solutions exist. We give an example where, for two pairs of host-parasite trees having each less than 41 leaves, the number of solutions is 5120, even when only time-feasible ones are kept. To facilitate their interpretation, those solutions are also classified in terms of how many of each event they contain. The number of different classes of solutions may thus be notably smaller than the number of solutions, yet they may remain high enough, in particular for the cases where losses have cost 0. In fact, depending on the cost vector, both numbers of solutions and of classes thereof may increase considerably. To further deal with this problem, we introduce and analyse a restricted version where host switches are allowed to happen only between species that are within some fixed distance along the host tree. This restriction allows us to reduce the number of time-feasible solutions while preserving the same optimal cost, as well as to find time-feasible solutions with a cost close to the optimal in the cases where no time-feasible solution is found. **CONCLUSION** We present EUCALYPT, a polynomial-delay algorithm for enumerating all optimal reconciliations which is freely available at http://eucalypt.gforge.inria.fr/ webcite. |000|gene tree reconciliation, reconciliation, EUCALYPT, parsimony, 1112|Greenhill2015|The island of New Guinea has the world’s highest linguistic diversity, with more than 900 languages divided into at least 23 distinct language families. This diversity includes the world’s third largest language family: Trans-New Guinea. However, the region is one of the world’s least well studied, and primary data is scattered across a wide range of publications and more often then not hidden in unpublished “gray” literature. The lack of primary research data on the New Guinea languages has been a major impediment to our under- standing of these languages, and the history of the peoples in New Guinea. TransNewGui- nea.org aims to collect data about these languages and place them online in a consistent format. This database will enable future research into the New Guinea languages with both traditional comparative linguistic methods and novel cutting-edge computational tech- niques. The long-term aim is to shed light into the prehistory of the peoples of New Guinea, and to understand why there is such major diversity in their languages.|000|New Guina languages, database, lexical database 1113|DeLancey2013|A persistent problem in Sino-Tibetan linguistics is that Chinese is characterized by a mix of lexical, phonological, and syntactic features, some of which link it to the Tibeto-Burman languages, others to the Tai-Kadai, Hmong-Mien, and Mon-Khmer families of Southeast Asia. It has always been recognized that this must reflect intense language contact. This paper develops a hypothesis about the nature of that contact. The language of Shang was a highly-creolized lingua franca based on languages of the Southeast Asian type. Sinitic is a result of the imposition of the Sino-Tibetan language of the Zhou on a population speaking this lingua franca, resulting in a language with substantially Sino-Tibetan lexicon and relict morphology, but Southeast Asian basic syntax.|000|Sino-Tibetan, Chinese, Sinitic, genetic classification 1115|Boc2010|Interesting approach and later also used in @Willems2016, early attempt for gene tree reconciliation methods in historical linguistics.|000|phylogenetic network, lexical borrowing, borrowing detection, 1116|Matisoff2003|Compounding has been a pervasive morphological process for at least the past two millennia of the history of the ST family, as part of the languages’ response to the ever-present danger of homophony among their monosyllabic morphemes. Once a dissyllabic compound has been created, however, it is subject to phonological reduction of its first syllable, a process which is readily observable synchronically throughout the family [...].|153f|compounding, partial cognacy, cognacy, word formation 1117|Benedict1976|The main findings of another look (after Conspectus [1972] - reviews noted) are that Sino-Tibetan is now a well-established family; Tai and Miao-Yao must still be excluded, although each has made early borrowings (especially numerals) from Chinese dialects or related languages; lexical analysis (Swadesh 100-word list) supports the taxonomic arrange- ment (Conspectus) setting Chinese apart from Tibeto-Burman, but the position of Karen remains indeterminate; the Sino-Tibetan reconstruction (Conspectus) remains largely un- changed despite some refinements, but recent studies have uncovered an extensive pre- fixation pattern (mainly s-, also ?- and m-) for Archaic Chinese, radically altering the 'look' of the language in the direction of Tibetan and other Tibeto-Burman languages; finally, a review of comparative Sino-Tibetan studies reveals that data (sources) are less often at fault than scholars|000|Sino-Tibetan, Swadesh list, lexicostatistics, subgrouping 1118|Handel2008|As elaborated in LaPolla 1994, such developments are attributable in part to the tendencies for parallel ‘drift’ that inhere in genetically related languages because of the perseverance of typological similarities – an idea originally put forward by Sapir.|430|drift, linguistic drift, systemic evolution, language change 1119|Campbell1999|"chaque mot a son histoire" (Jules Gilliéron, 1854 – 1926) :translation:`every word has its history`|189|language history, mosaic history, word history 1120|Swadesh1954|There are also ways of controlling the factor contributed by the semantics of comparison. One method, which is feasible for time depths within certain limits, is to count ony exact equivalences. The word for 'water' in language A is compared, for statistical purposes, only with the word for 'water' in language B. Any similarities with the word for 'rain' or river' or 'drink' are disregarded in the calculation. A second approach, which can be applied when it is felt encessary to bring different meanings together, is to do so within stated limits and the effect of this taken into account. Thus, if 'water' in language A can be compared either with 'water' or 'drink' in language B, this doubles the chances of finding cognates, and by the time 'drink' of language A is compared with both in b, the [pb] chances are quadrupled. Comparisons made only because of the eagerness of the comparativist to prove his case thus are canceled out in the final analysis of proof. |314f|cross-semantic cognacy, semantic change, cognate detection, cross-semantic cognates 1121|Swadesh1954| Were vocabularies formed of all the languages spoken in North and South America, preserving their appelation of the most common objects in nature, of those which must be present to every nation, barbarians or civilized, with the inflections of their names and verbs, their principles of regimen and concord, and these deposited in all the public libraries, it would firnish oportunities to those skilled in the languages of the world to compare them with the ne, no or at any future time, and hence to construct the best evidence of the derivation of this part of the human race. -- Thomas Jefferson (in Notes on the State of Virginia) |000|comparative method, lexicostatistics, cognate detection, comparative wordlist 1122|Swadesh1954|A simple way to reduce the sound-imitative factor to a negligible minimum is to omit from consideration all such words as 'blow, breathe, suck, laugh' and the like, that is all words which are known to learn toward sound imitation. The borrowing factor can be held down to a very small percentage by sticking to non-cultural words. It is possible to observe the extensiveness of loan intrusions in historical cases and thereby measure the weight of this factor. The ninety-seven item test list used in the present paper has so far never been found [pb] to contain more than two historical borrowings, even in languages which have been undert strong and constant Spanish influence for four centuries. In most cases there is not even one borrowing among the 97 items.|313f|basic vocabulary, borrowability, sound symbolism 1123|Hill2011|Widespread agreement prevails that Tibetan o is the result of the merger of several distinct sounds in proto-Tibeto-Burman. Here I attempt to reconcile Matisoff and Gong’s presentations of the origins of Written Tibetan o , making fuller use of philological evidence than Matisoff and taking advantage of a more recent version of Old Chinese than Gong. A number of sound laws are proposed to explain the relevant vowel correspondences among Tibetan, Burmese, and Chinese|000|Tibeto-Burman, Sino-Tibetan, Tibetan, vowel system, linguistic reconstruction, subgrouping 1124|Schleicher1866|Im vor ligenden werke ist der versuch gemacht worden die erschloßene indogermanische ursprache neben ire wirklich vorhandenen ältesten tochtersprachen zu stellen. Außer dem vorteile, den dise einrichtung dadurch bietet, daß sie dem lernenden sofort die lezten ergebnisse der forschung in concreter anschaulichkeit vor augen stelt und im so die einsicht in das wesen der einzelnen eindogermanischen sprachen erleichtert, wird noch ein zweiter, wie mich bedünkt, nicht ganz unwichtiger zweck durch die selbe sicher erreicht. Es wird nämlich so der augenfällige beweis gelifert für die völlige grundlosigkeit der noch immer nicht ganz verschollenen anname, daß auch die nicht indischen indogermanischen sprachen vom altindischen (sanskrit) ab stammen. |8|August Schleicher, linguistic reconstruction, history of science 1125|Starostin2010|The article discusses the basic methodology that underlies the construction of a global lexi- costatistical database for all of the world’s languages, currently one of the main tasks of the Evolution of Human Languages project at the Santa Fe Institute. The author presents several important modifications of the traditional lexicostatistical procedure, such as: replacing the traditional 100­item wordlist with a more compact list of 50 “ultra-stable” items; use of low- level protolanguage reconstructions as primary construction nodes; a combination of the comparative-historical method and principles of phonetic similarity as the basis for the cog- nate scoring procedure; and, most importantly, a heavy emphasis on semantic precision and severe restrictions on the use of synonyms.|000|Global Lexicostatistical Database, preliminary lexicostatistics, cognate detection, comparative method 1126|Shao1991|中古匣、云两母当合而为一,音韵学界的看法比较一致。上古两母的关系如何,则意见颇为分歧。到目前为止,提出的不同看法大致有下列几种: 1.匣母并于群母读g‘,云母也是塞音读g。此说出于高本汉。 2.匣、云两母合一,跟中古一样读γ。此说出于曾运乾,完成于董同龢。 3.匣母并于群母读g,云母仍同中古读γ。此说出于周法高。 4.匣、云、群三母合而为一,读g(中古开口)和gw(中古合口)。此说出于李方桂。|000|Old Chinese, xiámǔzì, sound change, splits and mergers 1127|Park2010| **Background** Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold. **Results** In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrap-based measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples. **Conclusions** We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/ webcite), and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution. |000|bootstrap, lateral gene transfer, phylogenetic network, LGT detection 1128|Driem2005|2. T HE DEFAULT HYPOTHESIS : T IBETO -B URMAN . The first rigorous polyphyletic exposition of Asian linguistic stocks was presented in Paris by the German scholar Julius Heinrich von Klaproth in 1823. His Asia Poly- glotta was more comprehensive, extended beyond the confines of the Rus- sian Empire and included major languages of East Asia, Southeast Asia and Polar America. Based on a systematic comparison of lexical roots, Klaproth identified and distinguished twenty-three Asian linguistic stocks, which he knew did not represent an exhaustive inventory. Yet he argued for a smaller number of phyla because he recognised the genetic affinity between certain of these stocks and the distinct nature of others. :comment:`Graphic on second page shows that it is tibetan, Burmese and Chinese, which Klaproth groups together, nothing else`|291f|Sino-Tibetan, genetic classification, history of science 1129|LaPolla2010|The Sino-Tibetan language family is one of the largest language families in the world, both in terms of number of speakers and in terms of geographic distribution. It includes the majority languages of China and Myanmar, plus minority languages in China, Myanmar, Thailand, Vietnam, and Northeast India. Three main factors have been involved in the formation of the present-day Sino-Tibetan language family: a shared genetic origin, divergent population movements (i.e. innovations appearing in the different groups after their split), and language contact (among themselves and with non-Sino-Tibetan languages). Population movements and language contact have in fact generally been two aspects of a single phenomenon. This paper looks at the history of the development of the Sinitic branch of the Sino-Tibetan language family from the point of view of population movements and language contact, to show the role language contact has had in the formation of the branch as we know it today. These factors have been an important part of the development of the branch from its origin in the central plains of what is now north China, in the valley of the Yellow River, some 6,500 years ago, right up to the present, and are still the main factors in language change today.|000|language contact, Sino-Tibetan, Chinese, Chinese dialects, 1130|Kressing2015|The scope of this paper is to highlight models of reticulate evolution in a dual sense: (1) by stressing the importance of early models of horizontal/lateral transfer instead of models of unilinear vertical transfer in biology, linguistics, anthropology and related disciplines, and (2) by demonstrating that the acceptance of evolutionism as leitmotif in the nineteenth century was only possible by intense and repeated networks between scholars of different academic realms which lead to the assumption that the development of biological species and human cultures could be perceived as part of the same co-evolutionary process. Contrary to these widely popularized models of unilinear evolution, I would like to draw attention to alternative theories emphasizing the horizontal transfer of words, phenotypes/genotypes, and culture traits. Examples are the method of areal typology in linguistics, the theory of endosymbiosis in biology, and the anti-evolutionist attitude in Boasian anthropology, combined with an emphasis on the diffusion of culture traits. Further, it shall be pointed out that, even when—after the general dismissal of evolutionist ideas in the beginning of the twentieth century—the idea of co-evolutionary processes in the development of human populations and languages was again forwarded in the late twentieth century, this ‘modern synthesis’ of genetics, linguistics and archeology relied largely on interdisciplinary reticulations between sciences and humanities and serves as another example of reticulate evolution.|000|phylogenetic network, reticulate evolution, anthropological evolution, language history, language evolution, biological evolution, biological parallels 1131|Bradie2015|Darwinian models of cultural change have been motivated, in part, by the desire to provide a framework for the unification of the biological and the human sciences. In this paper, drawing upon a distinction between the evolution of enabling mechanisms for the acquisition and dissemination of knowledge (EEM) and the evolution of epistemic theses as cultural products (EET), we propose a model of how culture emerges as a product of biological evolution on the basis of the concept of reaction norms. The goal of this model is to provide a means for conceptualizing how the biological and the cultural realms are connected, when they start to disconnect, and what the key transitions are. We then assess the viability of a Darwinian approach to cultural change. We conclude that the prospects of producing a Darwinian model of cultural change that unifies the human sciences in a way that mirrors the unification of the biological sciences in the light of Darwin’s theory are rather dim.|000|cultural evolution, Darwin, biological evolution, biological parallels 1132|Pappas2011|:comment:`six-point scale of language change` ====== ====================== ============================================= Value Type of Change Example ====== ====================== ============================================= 0 Sound change *skʰizo* : > *skizo* "I tear" 1 Levelling *megas* > *meɣalos* "great" 2 Four Part Analogy *plynoː* > *pleno* "I wash" 3 Syntactic Reanalysis *ei* "if" > *an* via *ei an* "if only" 4 Semantic Change *heːpar* "liver" > *sikoti* via *siki*, "fig" 5 Borrowing *melas* > *mavros* via Latin *maurus* "black" ====== ====================== ============================================= |212|analogy, sound change, paradigm levelling, reanalysis, semantic change, lexical borrowing 1133|Pappas2011|We review and assess the diff erent ways in which research in evolutionary-theory-inspired biol- ogy has infl uenced research in historical linguistics, and then focus on an evolutionary-theory inspired claim for language change made by Pagel et al. (2007). Th ey report that the more Swadesh-list lexemes are used, the less likely they are to change across 87 Indo-European lan- guages, and posit that frequency-of-use of a lexical item is a separate and general mechanism of language change. We test a corollary of this conclusion, namely that current frequency-of-use should predict the amount of change within individual languages through time. We devise a scale of lexical change that recognizes sound change, analogical change and lexical replacement and apply it to cognate pairs on the Swadesh list between Homeric and Modern Greek. Current frequency-of-use only weakly predicts the amount of change within the history of Greek, but amount of change does predict the number of forms across Indo-European. Given that current frequency-of-use and past frequency-of-use may be only weakly correlated for many Swadesh-list lexemes, and given previous research that shows that frequency-of-use can both hinder and facili- tate lexical change, we conclude that it is premature to claim that a new mechanism of language change has been discovered. However, we call for more in-depth comparative study of general mechanisms of language change, including further tests of the frequency-of-use hypothesis.|000|lexical change, analogy, paradigm levelling 1134|Pappas2011|Second, in order to devise a scale by which to measure change within a particular language, we arranged known types of change in a hierarchy that depicts distance from the original form of the word. Th e fi rst attempt was a six-point scale as seen in Table 1 . Th e cognate pairs were then checked and corrected by a senior researcher in Greek historical linguistics and, subsequently, two other researchers in the fi eld were asked to apply the scale to the Swadesh list. The inter-scorer correlations are quite robust (all above 0.7), but, on the basis of the scorers’ reports that the scale may be too fi ne-grained, we decided to construct a three point scale, that distinguishes changes on the basis of phonetics (0), morphosyntax (1, 2, and 3) and semantics (4). [pb] Borrowing was eliminated due to the small number of tokens (5). Th e new scale has a stronger correlation among scorers (>0.8). :comment:`Note that this is exactly the scale by` @Gevaudan2007 :comment:`, but with sound change included into the schema!`|211f|lexical change, morphological change, semantic change, sound change, lexical borrowing 1135|Labov1994|Historical linguistics can then be thought of as the art of making the best use of bad data.|11|sparse data, bad data, historical linguistics, big data 1136|Smith1989|"We hold a damn dim candle over a damn dark abyss." :comment:`Quote attributed to the American scholar Charles Beard.`|1247|bad data, anthropology, prehistory, human prehistory 1137|Auer2015|This article introduces the new Journal of Historical Sociolinguistics by situating it in the developing field of historical sociolinguistics. The landmark paper of Weinreich et al. (1968), which paid increased attention to extralinguistic factors in the explanation of language variation and change, served as an important basis for the gradual development and expansion of historical sociolinguistics as a separate (sub)field of inquiry, notably since the influential work of Romaine (1982). This article traces the development of the field of historical sociolinguistics and considers some of its basic principles and assumptions, including the uniformitarian principle and the so-called bad data problem. Also, an overview is provided of some of the directions recent research has taken, both in terms of the different types of data used, and in terms of important approaches, themes and topics that are relevant to many studies within the field. The article concludes with considerations of the necessarily multidisciplinary nature of historical sociolinguistics, and invites authors from various research traditions to submit original research articles to the journal, and thus help to further the development of the fascinating field of historical sociolinguistics.|000|sociolinguistics, historical linguistics, historical sociolinguistics, 1138|Auer2015|The important role of Weinreich et al.’s (1968) seminal paper has already been pointed out. Interested in a more profound understanding of language change, the paper is centered around five central problems to be solved: (i) identifying the (crosslinguistic) constraints on linguistic change; (ii) studying the transition of features from one speaker to another; (iii) uncovering the embedding of changes, both in the linguistic and in the social structure; (iv) taking into account speakers’ evaluations of linguistic forms; and (v) delving into the actuation of language change, with causes for change originating from “stimuli and constraints both from society and from the structure of the language” (@Weinreich<1968> et al. 1968: 186).|4|challenges, language change, historical linguistics, methodology, sociolinguistics 1139|Auer2015|To answer these questions, historical sociolinguists can draw on insights and principles from modern-day sociolinguistics, on the working assumption that the fundamental principles and mechanisms of language variation and change are valid across time. This uniformitarian principle finds its origin in the premise of uniformitarianism in natural sciences such as geology and is described by Labov (@Labov1972: 275) as the idea that “the forces operating to produce linguistic change today are of the same kind and order of magnitude as those which operated five or ten thousand years ago” (cf. also @Joseph2011 for further discussion). This principle [pb] certainly holds true for basic assertions, such as “the fact that language must always have been variable, that different social groups and genders had different ways of speaking, and that people have always been aware of these differences” (@Bergs2012: 96).|4f|uniformitarianism, historical linguistics, language change 1140|Labov1972|Current difficulties in achieving intersubjective agreement in linguistics require attention to principles of methodology which consider sources of error and ways to eliminate them. The methodological assumptions and practices of various branches of linguistics are considered from the stand- point of the types of data gathered: texts, elicitations, intuitions and observa- tions. Observations of the vernacular provide the most systematic basis for linguistic theory, but have been the most difficult kinds of data for linguists to obtain; techniques for solving the problems encountered are outlined. Intersubjective agreement is best reached by convergence of several kinds of data with complementary sources of error.|000|methodology, historical linguistics, uniformitarianism 1141|Auer2015|However, applying the idea of uniformitarianism beyond such basic assumptions holds the danger of “ideational anachronism” (@Bergs2012), whereby we transpose modern concepts such as social class, gender or prestige to historical settings, the applicability and validity of which is largely constricted to modern Western societies. To avoid the pitfalls of anachronism, it is the task of historical sociolinguists to reconstruct a broad picture of the social context in which the language varieties under investigation were used, drawing on the inductive method to identify the social conditions of language variation and change, ensuring empirical, social and historical validity (cf. @Nevalainen2006). In fact, this challenge lies at the heart of the so-called historical paradox: we know that the past was different, but what we do not know exactly is how different it was (@Labov1994: 11; cf. @Nevalainen2010). For this reason, historical sociolinguistics needs to transcend the mere application of modern-day sociolinguistic methods and questions to historical settings: part of the endeavor lies exactly in finding out how different the past was, and thus “every language period and every linguistic community must be investi- gated independently and in its own right” (@Bergs2012: 96).|5|uniformitarianism, historical linguistics, methodology 1142|Auer2015|The requirement to come by enough data is of course aggravated by the bad data problem. As pointed out earlier, written sources – especially from times when literacy was not common – have a strong bias toward the educated classes, which excludes large parts of the population from available textual records. This makes it very hard to get a complete picture of the social distribution of any linguistic feature at a given time in the past. Since education is connected to class and gender, the literacy bias entails a social bias, which is why certain sectors of society, such as women or the lower classes, have been under-represented in conventional language histories. One of the core concerns of historical sociolinguistics, therefore, is the effort to over- come the social bias connected to class, education and literacy inherent in written sources that has afflicted language historiography.|6|bad data, sparse data, historical linguistics 1143|Bergs2012|The Uniformitarian Principle (UP), sometimes also referred to as the Priniple of Uniformity, very simply claims that the processes which we observe in the present can help us to gain knowledge about processes in the past. The reasoning behind this is that we must assume that whatever happens today must also have been possible in the past; whatever is impossible today must have been impossible in the past. If we observe today that water (on earth...) boils at around 100 degrees Celsius, we can assume that it also did so at any given point in the past. This means that when we analyze a historical phenomenon we should first look at known causes in order to explain it, before we turn to unknown causes. [...] In the humanities, however, the Uniformitarian Principle must be taken with a pinch of salt since there is no clear and simple correlate to the laws of nature. The aim of this chapter is to evaluate critically the usefulness of the Uniformitarian principle in historical linguistics in general and in historical sociolinguistics in particular. To this end, I wil first discuss the history of this principle in linguistics and of its traditional uses in historical linguistics. I will then proceed with a discussion of its usefulness in sociohistorical linguistics and the risks associated with it, in [pb] particular, the danger of anachronism. These case studies on the central and almost ubiquitous concepts of social class, gender, and social networks conclude this chapter.|000|uniformitarianism, historical linguistics, historical sociolinguistics 1144|Whitney1867|The nature and uses of speech [...] cannot but have been essentially the same during all periods of history [...] there is no way in which its unknown past can be investigated, except by the careful study of its living present and recorded past, and the extension and application to remote conditions of laws and principles deduced by that study. :comment:`Quoted after `@Bergs2012 :comment:` (81)`|24|uniformitarianism, historical linguistics, history of science 1145|Lass1997|*General Uniformity Principle* No linguistic state of affairs (structure, inventory, process, etc.) can have been the case only in the past. *Uniform Probabilities Principle* The (global, cross-linguistics) likelihood of any linguistic state of affairs (structure, inventory, process, etc.) has always been roughly the same as it is now. :comment:`quoted after `@Bergs2012 :comment:` (82)`|28|uniformitarianism, historical linguistics 1146|Bergs2012|Anachronism in the literal sense translates as 'against time' and usually means an error in chronology, typically placing some state or event earlier in history than it can actually have occured. If you are shooting a movie about the Roman Empire you certainly do not want any of the actors to wear wrist watches on the set -- simply because these did not exist. And the Romans should [pb] not go to war with handguns -- gunpowder did not reach mainland Europe or even the Islamic world before about 1200. Similarly, potatoes probably did not figure in the diet of people in medieval Europe, because they come from south America and can only have been imported from there after 1492. If you want to claim that some Jon Doe in Worcestershire in 1275 had jacket potatoes for lunch, you may of course do that, but you need to show convincingly how those potatoes came to England before 1492.|82f|anachronism, uniformitarianism 1147|Bergs2012|In (historical) linguistics, the concept of anachronism has not been extensively discussed -- which is surprising, given that it is a key issue in history as a scientific discipline. Still, one particularly obvious anachronism may be commited in historical linguistics and is occasionally an issue in discussions: that is, positing or assuming language structures which are neither attested nor plausible at the releventa point in time (like the watch in the movie on the Roman empires). So, for example, Old English simply did not have *do*-periphrasis or a fully-fledged system of modal verbs, nor did it have voiced and/or voiceless labio-dental fricative phonemes. |83|anachronism, uniformitarianism 1148|Bergs2012|It seems very plausible to assume that speakers have always had individual social networks, based on ties to other members of their speech community. And it seems equally plausible to assume that these networks and ties -- despite different features on the network strength scales -- were basically characterized by very similar factors: the number of ties a given speaker has, and the quality of those ties (good friends, loose friends, socioeconomic dependency, etc.). This is certainly a way in which the uniformity principle makes a lot of sense, even though data problems can make the analysis of historical networks very difficult (see Bergs 2005: 45-42). Now, it is very tempting to think that networks have also always operated in the same fashion: loose-knit networks facilitate change, tight-knit networks are norm-enforcing, central members are more conservative than bridges, and so on. This iddea, however, is problematic. ONce again, the whole concept of conservative versus innovative language use in relation to linguistic norms (such as a langauge standard) crucially depends on the existence of those overtly prescribed linguistic norms and the idea of a language standard as such.|94|social networks, anachronism, uniformitarianism, historical linguistics 1149|Langer2012|Linguistic Purism is one of the most noticeable areas of historical sociolinguistics since it very publicly deals with what speakers think of (particular) language use. It thus touches on the field of folk linguistics – as defined by Niedzielski and Preston (2000) – which places great importance on the perception of language varie-ties, rather than just the sociological significance of particular linguistic variables and variants. There is no agreement amongst academic linguists as to what counts as linguistic purism and what does not. The principal divisions lie, on the one hand, between those for whom an attempt to rid a language of any undesirable elements constitutes purism and those who define it more narrowly as an attempt to rid a language only of foreign elements, and on the other, between those who see linguistic purism as a completely unacceptable academic activity and those who feel that purism is sometimes a subject worthy of study for academic lin-guists, such as with regard to the protection of regional or minority languages, or the process of standardizing and codifying a language. As it stands, the study of purism is connected to several important aspects of historical linguistics, including the process of standardizing languages, the use of language as a building block in the creation of nations, and the stigmatiza-tion of linguistic varieties or cultures as undesirable, or even a threat to one ’ s identity. There are a number of publications on this subject, of which three in particular provide fairly recent and comprehensive overviews and case studies from different languages and historical periods: van der Sijs (1999) , Brincat, Boeder, and Stolz (2003) , and Langer and Davies (2005) . In this chapter we will outline the key metalinguistic motivations, actors, and principal concerns in the area of linguistic purism.|000|linguistic purism, purification, language change 1150|Hoey2005|Collocations -- recurrent combinations of words -- are [pb] both pervasive and subversive. Their pervasiveness is widely recognised in corpus linguistics; probably all lexical items have collocations (Sinclair 1991; Stubbs 1996). The notion is usually attributed to Firth ([1951]1957), and certainly his discussion of the concept underpins all that has followed on the subject. Interestingly, though, Doyle (2003) draws attention to the fact that the word *collocation* was being used in linguistic discourse prior to Firth; in this connection he draws attention to a citation from 1940 in the *Oxford English Dictionary * (1995). |2f|collocations, lexicon, language structure 1151|Hoey2005|The statistical definition of collocation is that it is 'the relationship a lexical item has with items that appear with greater than random probability in its (textual) context' (Hoey 1991a: 6-7). This definition, though better, confuses method with goal. It is true that to discover collocations one needs to examine the statistical distribution of words and that those that occur in each other's company more often than can be accounted for by the mechanisms of random distribution can be said to collocate. But the definition says nothing interesting [pb] about the phenomenon; it gives no cluse as to why collocation should exist in the first place. |3f|collocations, language structure, lexicon 1152|Hoey2005|So our definition of collocation is that it is a pyschologica association between words (rather than lemmas) up to four words apart and is evidenced by their occurrence together in corpora more often than is explicable in terms of random distribution. This definition is indended to pick up on the fact that collocation is a psycholinguistic phenomenon, the evidence for which can be found statistically in computer corpora. It does not pick up on the causal relationship identified by Leech, but only because that will be attended to separetely.|5|collocations, definition, language structure 1153|Hoey2005|:comment:`Nice example for the importance of collocations as a special structure in language.` .. pull-quote:: In winter Hammerfest is a thirty-hour ride by bus from Oslo, though why anyone would want to go there in winter is a question worth considering. .. pull-quote:: Through winter, rides between Oslo and Hammerfest use thirty hours up in a bus, though why travellers would select to ride there then might be pondered. [pb] One of these sentences is drawn from Bill Bryson's travel book *Neither Here Nor There* (1991) about his trips around Europe and is indeed, in some respects the first sentence of the book. [...] The other is best seen as a translation from Bill Bryson's English into my altogether less fluent English. [...] However, according to the theories of the lexicon that have dominated linguistic thought for the past 200 years there is no reason to regard the naturalness or clumsiness of the sentences as being of any importance. Both sentences are, after all, grammatical. [...] Both sentences are textually appropriate as well; there is no apparent reason why either should not begin a text. :comment:`He exaggerates here, since there is a clear breaking of what Coseriu calls the NORM in the second sentence!` |5f|collocations, definition, language structure, lexicon 1154|Shen2014|The reconstruction of Chi 翅 has been a long puzzle to students of Chinese historical phonology. The character was used as a form with initial *k- n early Indo-Chinese transliteration, while it was recorded with initial [cj] in Middle Chinese. IN this paper, I proposed a hypothesis that the graph Chi 翅 was initially invented for a word for ‘ wing ’ withinitial *k-. When this word was replaced by another word with initial [cj], the graph remained the same. Therefore, a mismatch between word and character prevent us from recognize the early form which is kept as a literary reading of the graph ji 踃 . Furthermore, the early form still exist in modern Chinese dialects such as Northern Canton Vernacular 粤北土话 and Northern Wu 北部吴语. The modern forms with initial *tcj- are also derived from a further palatalized form of the ‘ wing ’ with initial *k-.|000|wing, Old Chinese, linguistic reconstruction 1155|Shen2014|口语中‘翅’替代‘翼’当不晚于汉末到六朝后期,‘翅’在书面文学语言中也已取代‘翼’而占据主导地位 :translation:`In the vernacular language, chì 翅 replaced yì 翼 not later than late Han; until the late six dynasties, chì replacesd yì also in the written language and occupied an important position.` :comment:`quoted after` @Wang2000 |50|wing, Old Chinese, linguistic reconstruction, semantic reconstruction 1156|Shen2014b|This paper describes the tonal system of Shibei Min, a northern Min variety by an explanation of the measurement procedure, it is argued that the so-called `voice initials' are not distinguished by voicing. Rather, the main phonetic differences ilie in the phonation n word-initial position and duration in word-medial position. The nine tones in Shibei fall into two natural classes and could be represented better in the RL model proposed in Zhu (2012).|000|Northern Mǐn, voiced initial, phonation type 1157|Viti2015|This paper discusses the problem of degrammaticalization, that is, the exceptions to the unidirectionality of grammaticalization. After analyzing the criteria that allow us to distinguish between various instances of counter- directional change, two principles underlying degrammaticalization are identified; one is related to the type of language and the other to the type of target structures in which degrammaticalization occurs. Firstly, the targets of degrammaticalization are usually closed-class parts of speech with an abstract semantic component. Secondly, the languages in which counter-directional grammatical changes occur turn out to be deprived of an elaborate fusional morphology. These findings may also have an impact on the theoretical conception of grammaticalization, some of whose definitional properties are discussed. The paper ends with a discussion of a more controversial point, namely, counter-directional changes by folk etymology rather than by etymology proper.|000|grammaticalization, directionality 1158|Good2012|Much work within digital linguistics has focused on the problem of developing concrete methods and general principles for encoding data structures designed for non-digital media into digital formats. This work has been successful enough that the field is now in a position to move past “retrofitting” digital solutions onto analog structures and to consider how new technologies should actually change linguistic practice. The domain of grammaticography is looked at from this perspective, and a traditional descriptive grammar is reconceptualized as a database of linked data, in principle curated from distinct sources. Among the consequences of such a reconceptualization is the potential loss of two valued features of traditional descriptive grammars, here termed coverage and coherence. The nature of these features is examined in order to determine how they can be integrated into a linked data model of digital descriptive grammars, thereby allowing us to benefit from new technology without losing important features intrinsic to the structure of the traditional version of the resource.|000|RDF, linked data, grammar 1159|Viti2014|This paper discusses some general problems of etymology and lexicology in the ancient Indo-European (IE) languages, taking into account both theoretical and empirical aspects. Theoretically, our aim is to revise the negative reception that etymology as cultivated by ancient grammarians has usually encountered in modern linguistics, by considering the broad cultural context in which ancient folk etymologies were produced. Empirically, we investigate the taxonomies underlying lexicon in the ancient IE languages, which turns out to be less hierarchical than in many modern IE languages. As an example of this, we will consider the lexicalization of colour in antiquity. We will see that this flat categorization also influences the substantially synchronic type of etymology practiced in the ancient world, which was based on a series of similarities among lexemes placed on the same cognitive level.|000|linguistic reconstruction, Indo-European, etymology, history of science 1160|Malkiel1993|In different times and at different places, etymology has meant slightly or en- tirely different things to the few or many people who, under varying sets of cir- cumstances, have used that word, applying it to their own spheres of interests. Basically, etymology always meant something approximating to the paraphrase ‘original meaning, or use, of a given lexical unit or proper name’. But the cultural implications of this lame descriptive statement can be entirely different. The core meaning of a word can be imagined as something wholly independent of the passage of time and endowed with magic messages or mystic overtones. :comment:`quoted after` @Viti2014 :comment:`(3f)`|1|etymology, semantic reconstruction, linguistic reconstruction 1161|Viti2014|Although lexicon is more variable and sensitive than other linguistic domains to the material and cultural environment of a speech community, various scholars ranging from the framework of glottochronology to Wierzbicka’s (@1996) semantic primes have attempted to identify a core of meanings that are prone to be lexicalized in all languages, as for example the names for body parts and for basic family relationships or the verbs for meteorological events and for elementary activities such as ‘eat’, ‘drink’ or ‘sleep’. We think, however, that even the most basic and apparently stable concepts may be variously lexicalized in languages, and that before assuming any universal status for a certain lexical meaning one should first take into consideration the taxonomy of the vocabulary it takes part in.|23|lexicalization, basic vocabulary, lexical change,systematic aspect of evolution 1162|Durkin2009|Etymology can be a very demanding area of linguistic research, drawing on many different aspects of linguistics. It also draws at time on a good deal of non-linguistic information, about the transmission of texts or other sources of data, or about developments in social or cultural history. For this very reason it can also be extremely rewarding. Few areas of study offer points of contact with so many other fields. A discovery in etymology often depends upon insights drawn from many different areas of research, and often has the potential to illuminate questions in many linguistic sub-disciplines or beyond. :comment:`quoted after` @Viti2014 :comment:`(28f)`|287|cultural history, human prehistory, etymology, 1163|Atkinson2005a|================================ =============================== Biological Evolution Linguistic Evolution ================================ =============================== Discrete characters Lexicon, syntax, and phonology Homologies Cognates Mutation Innovation Drift Drift Natural selection Social selection Cladogenesis Lineage splits Horizontal gene transfer Borrowing Plant hybrids Language Creoles Correlated genotypes/phenotypes Correlated cultural terms Geographic clines Dialects/dialect chains Fossils Ancient texts Extinction Language death ================================ =============================== |54|biological parallels, analogy, biology, historical linguistics 1164|Taylor2004|Over 35 years ago, Susumu Ohno stated that gene duplication was the single most important factor in evolution (97). He reiterated this point a few years later in proposing that without duplicated genes the creation of metazoans, vertebrates, and mammals from unicellular organisms would have been impossible. Such big leaps in evolution, he argued, required the creation of new gene loci with previously nonexistent functions (98). Bold statements such as these, combined with his proposal that at least one whole-genome duplication event facilitated the evolution of vertebrates, have made Ohno an icon in the literature on genome evolution. However, discussion on the occurrence and consequences of gene and genome duplication events has a much longer, and often neglected, history. Here we review literature dealing with the occurence and consequences of gene duplication, begining in 1911. We document conceptual and technological advances in gene duplication research from this early research in comparative cytology up to recent research on whole genomes, “transcriptomes,” and “interactomes.”|000|gene duplication, gene speciation 1165|Zhang2003|The importance of gene duplication in supplying raw genetic material to biological evolution has been recognized since the 1930s. Recent genomic sequence data provide substantial evidence for the abundance of duplicated genes in all organisms surveyed. But how do newly duplicated genes survive and acquire novel functions, and what role does gene duplication play in the evolution of genomes and organisms? Detailed molecular characterization of individual gene families, computational analysis of genomic sequences and population genetic modeling can all be used to help us uncover the mechanisms behind the evolution by gene duplication.|000|gene duplication, genome evolution, history of science 1166|Doolittle1999|More than 20 complete prokaryotic genome sequences are now publicly available, each by itself an unparalleled resource for understanding organismal biology. Collectively, these data are even more powerful: they could force a dramatic reworking of the framework in which we understand biological evolution. It is possible that a single universal phylogenetic tree is not the best way to depict relationships between all living and extinct species. Instead a web- or net-like pattern, reflecting the importance of horizontal or lateral gene transfer between lineages of organisms, might provide a more appropriate visual metaphor. Here, I ask whether this way of thinking is really justified, and explore its implications.|000|lateral gene transfer, history of science 1167|Fitch1970|This work provides a means by which it is possible to determine whether two groups of related proteins have a common ancestor or are of independent origin.|000|homology, paralogy, history of science 1168|Studer2009|Homologous genes are classified into orthologs and paralogs, depending on whether they arose by speciation or duplication. It is widely assumed that orthologs share similar functions, whereas paralogs are expected to diverge more from each other. But does this assumption hold up on further examination? We present evidence that orthologs and paralogs are not so different in either their evolutionary rates or their mechanisms of divergence. We emphasize the importance of appropriately designed studies to test models of gene evolution between orthologs and between paralogs. Thus, functional change between orthologs might be as common as between paralogs, and future studies should be designed to test the impact of duplication against this alternative model.|000|homology, paralogy, similarity 1169|Fitch1970|Where the homology is the result of gene duplication so that both copies have descended side by side during the history of an organism, (for example, a and ft hemoglobin) the genes should be called *paralogous* (para = in parallel). Where the homology is the result of speciation so that the history of the gene reflects the history of the species (for example a hemoglobin in man and mouse) the genes should be called *orthologous* (ortho = exact).|113|orthology, paralogy, history of science 1170|Gray1983|The kanamycin resistance gene from Staphylococcus aut-eus has been se- quenced and its structure compared with similar genes isolated from Strep- tomycesfradiae and from two transposons, Tn5 and Tn903, originally isolated and Salmonella typhirnurium, respectively. The from Klebsiella pneumoniae genes are all homologous but, since their common ancestor, have undergone extensive divergence, with more than 43% divergence between the closest pair. The phylogeny of the genes cannot be made congruent to the phylogeny of the taxa from which they were isolated without requiring rather improbable differences in rates. One is therefore led to conclude that there have been multiple occurrences of gene transfer between these species. Thus, although they are homologous, they are neither orthologous nor paralogous. It is suggested that homologous genes of this type be called xenologous.|000|xenology, history of science, homology 1171|Starostin2013c|“Proper” lexicostatistics derives from etymological judgements made by historical linguists, but etymological judgements are often questionable. Some of them may be completely false, being based on erroneous phonetic correspondences; some may be mistaking areal contacts for cognacies; in quite a few cases, “cognacies” may be etymologically correct, but reflect [pb] the result of unilateral independent semantic developments (see below) rather than direct descent of the “form/meaning” pair from the nearest common ancestor. All of these problems influence the outcome of the calculations and sometimes result in significant errors.|132f|paralogy, unilateral semantic development, lexicostatistics, homology 1172|Starostin2013c|“Lexicostatistics”, a method originally proposed by Morris Swadesh to build rela- tive genetic classifications of languages based on percentages of related items in their basic lexicon, and “glottochronology”, used to assign absolute dates of split- ting to language groups based on the assumption of a regular rate of change, have not been overtly popular with mainstream comparative linguists, after an early set of critical works had undermined their general credibility. Since then, however, significant process has been achieved in understanding and correcting the flaws of the original method. The current paper focuses on drawing attention to some of these corrections, such as (a) distinguishing between externally and internally trig- gered lexical change, and (b) factoring out independent semantic innovation. This improved methodology, without significantly cluttering up the formal apparatus, consistently yields results that are not only more credible than Swadesh’s original procedure, but are also much more in line with standard comparative-historical linguistics.|000|lexicostatistics, homology, historical linguistics 1173|Starostin2013c|It is absolutely imperative that the essence of lexicostatistics should not be reduced to discussions on its mathematical representation. Unlike genetics, where researchers operate with huge numbers of characters that can only be properly as- sessed within the framework of general models, Swadesh-type wordlists are gener- ally small (100–200 items), stimulating individual case studies of the evolution of particular meanings in particular languages. “Proper” lexicostatistics derives from etymological judgements made by historical linguists, but etymological judgements are often questionable. Some of them may be completely false, being based on er- roneous phonetic correspondences; some may be mistaking areal contacts for co- gnacies; in quite a few cases, “cognacies” may be etymologically correct, but reflect [pb] the result of unilateral independent semantic developments (see below) rather than direct descent of the “form/meaning” pair from the nearest common ancestor. All of these problems influence the outcome of the calculations and sometimes result in significant errors.|132f|paralogy, homology, cognacy, unilateral semantic development 1174|Starostin2013c|But here is the catch: careful etymological analysis of the evidence shows us that Irish *cluas* and Tocharian *klots* do not preserve the original Indo-European word for 'ear'. That word is almost certainly represented by an entirely different root, the one found today in such forms as Russian *yx-o*, Lithuanian *aus-is*, English *ear*, French *or-eille* (← Latin *aur-iculum*), etc., going back to Proto-Indo-European *`*`ous-*. Unlike the Irish and Tocharian forms, this root is found in a much larger number of branches, and, most importantly, it is unmotivated, i. e. represents an original non-derived nominal stem, whereas both *cluas* and *klots* may uncontroversially be regarded as nominal derivatives from Proto-Indo-European *`*`kleu̯-* 'to hear'. :comment:`Outgoing example is an illustration of reflexes for "to kill" and "ear" in Indo-European langauges.`|139|unilateral semantic development, paralogy, etymology 1175|Starostin2013c|====== ========= ======== ========== =========== Number Meaning Hindi Irish Tocharian A ====== ========= ======== ========== =========== (1) 'to kill' mār- [A] maraim [A] ko [B] (2) 'ear' kān [C] cluas [D] klots [D] ====== ========= ======== ========== =========== :comment:`This is an example for paralogous processes which are wrong for the lexicostatistical model of language evolution.`|138|unilateral semantic development, paralogy, homology, cognacy 1176|Starostin2013c|There is only one other solution: assume that the shift from `*`ous- to `*`kleu̯ - took place *independently* of each other in Irish and Tocharian. How high is the probability of that assumption? If we remember what has already been mentioned above on the issue of semantic change typology – namely, that some types of mean- ing shifts happen far more frequently than others – it must be quite high, since the semantic shift from ‘hearʼ to ‘earʼ is one of the most commonly encountered shifts connected with these meanings all over the world (curiously, the opposite shift, from ‘earʼ to ‘hearʼ, is extremely rare in comparison).|139|unilateral semantic development, directionality of semantic change, semantic change 1177|Starostin2013c|The importance of this process, which we may call unilateral independent se- mantic development (UISD for short), should not be underestimated. For some rea- son, it seems to be ignored in most works on lexicostatistics (or, at least, is never paid all the attention that it deserves). However, examination of the Swadesh word- list for Indo-European languages alone shows that UISD may be reliably postulated for many more cases – coincidentally, the Hindi-Irish isogloss for the word ‘to killʼ (Hindi mār- : Irish maraim), listed above, also happens to be one such case, reflect- ing an old Indo-European causative form of the verb `*`mer- ‘to dieʼ, which was certainly not the default equivalent for the meaning ‘to killʼ in Old Indian.|140|unilateral semantic development, semantic change, parallel development, parallel evolution 1178|Starostin2013c|But what about *cluas* and *klautso*? Their phonemic structures also coincide (at least, as far as the root is concerned), their meanings are identical, but the Indo-European word that they go back to must have, by all accounts, had a different meaning. Neither of them continues a “form/meaning” pair that goes back to the same common ancestor; in all likelihood, they represent the results of randomly coinciding paths of development, and, what is most important, we have at our disposition a real instrument to show that this is the most likely situation – distributional analysis of the various forms for ‘earʼ in Indo-European lan- guages, to which we may add knowledge of the typology of semantic change|140|unilateral semantic development, cognacy, Swadesh cognacy, strict cognacy 1179|Starostin2013c|We may distinguish between “etymological cognacy” of the items, whose forms go back to one and the same protoform, and “lexicostatistical cognacy”, when their meanings go back to the same meaning in the protolanguage as well.|140|paralogy, homology, orthology, etymological cognacy 1180|Owen1843|Homologue. (Gr. *homos; logos,* speech.) The same organ in different animals under every variety of form and function.|379|homology, history of science, definition 1181|Owen1843|A part or organ in one animal which has the same function as another part or organ in a different animal. |374|homology, analogy, definition, history of science 1182|Webber2004|Richard @Owen<1843> (1804–1892) defined homology as “the same organ under every variety of form and function”. Owen conceived of homologous structures as those that, while differing in detail, were derived from the same body plan, or ‘archetype’. By contrast, analogous structures were those that performed similar functions but did not appear to be derived from the same archetype. After Darwin, homologous morphologies were reinterpreted as having derived by divergence from a common ancestral structure. Meanwhile, analogous morphologies were thought to have arisen by convergence, such as the independent invention of wings during bird and bat evolution. So now, homology describes descent from a common evolutionary origin: two genes are homologous if they derive from the same ancestral gene. Differentiating between homology and analogy is not mere pedantry: homology allows Darwinian evolutionary theory to be applied accurately across the biosciences. And, as Theodosius Dobzhansky (1900–1975) famously remarked, “Nothing in biology makes sense except in the light of evolution”.|R332|homology, definition, history of science 1183|Webber2004|Definitely not. Sequence similarity is a quantity that is agnostic of evolution. In contrast, homology is a property that describes evolutionary history. Just as with bird wings and bat wings, perceived similarities between sequences need not be due to a common evolutionary origin. Research papers sometimes wrongly quote values of ‘percent homology’. In these cases ‘percent identity’ is meant, as two genes either have a common ancestor or they do not. The only appropriate use of ‘percent homology’ is when separate portions of a gene have distinct evolutionary histories, for example as a result of a gene fusion event.|R332|homology, sequence similarity 1184|Webber2004|Using statistics you can estimate how likely it is that randomly composed sequences yield alignment scores that are at least as high as that obtained between the real sequences in question. For example, the BLAST program reports an Expect (or E) value for each alignment (with score x), which is the number of times sequences are expected, with scores ≥x, to crop up in a search just by chance. As E gets closer to zero, the more confident one should be in a prediction of homology. Many users cautiously consider only those alignments with E-values lower than 10−3 as substantiating evidence for homology.|R332|sequence similarity, homology, definition 1185|Webber2004|These are relationships between genes best visualized in a phylogenetic tree. Orthologs are genes resulting from the splitting of different lineages – speciation. Paralogous genes arise from duplications within the same genome. Lastly, genes that have been acquired via horizontal – or ‘lateral’ – transfer between different species are referred to as xenologues. These relationships are clearly illustrated in Figure 1. However, lineage-specific gene deletion, pseudogenisation, duplication, conversion and rapid sequence divergence can all confuse phylogenetic tree reconstruction. For example, the loss of genes A2 and B1 in Figure 1 may cause duplication event DP1 to go undetected, and hence an erroneous assignment of paralogous genes A1 and B2 as orthologs. Gene conversion fuses sequences with contrasting heritages. It can result in a gene in one species being both orthologous and paralogous to a gene in another. Horizontal gene transfers can lead to incongruencies between gene-based and taxon-based trees which often assist the detection of xenology relationships. Note that these relationships are defined with respect to evolution, and not function. Nevertheless, they are useful in predicting function as the more recently two genes shared a common ancestor, the more likely it is that they have retained similar functions. Moreover, orthologous genes that have been spared by natural selection from deletion or duplication over many millions of years are also likely to share overlapping functions.|R333|orthology, paralogy, xenology, homology, definition 1186|Webber2004|No: the same terms can be used for genomic regions encompassing several genes, and even single nucleotide sites. For example, large chromosomal segments that arose by an intra-genome duplication are paralogous genomic regions, which some call ‘paralogons’. Similarly, sequences that have persisted essentially intact in two species since their common ancestor may be termed orthologous genomic regions.|R333|orthology, paralog, xenology, homology, 1187|Doolittle1994|Convergence as a phenomenon in molecular evolution is an issue that confuses many discussions. Often the problem is that not enough care is taken to state exactly what kind of convergence one has in mind. Functional and mechanistic convergence are both common, and some structural convergence has probably occurred, but a convincing case for genuine sequence convergence has yet to be made. |000|convergent evolution, convergence, definition, terminology 1188|Petsko2001|The creation of new terms seems to be an irreversible trend, but I wish it could be stopped. Genomics is best carried out by multidisciplinary teams, but meaningful communication between scientists of different backgrounds is not aided by the use of jargon words that are not easy to understand from their context. Medicine is famous for this, of course, but at least physicians have the excuse of wanting to build a wall of mystery around their profession to provide themselves with the distance and authority they believe they need to deal with patients effectively. Scientists have no such justification; in fact, they should eschew anything that separates them from the public, who, after all, pay for their research. The rationale I hear most often is that of economy of expression, and I concede that brevity is often desirable, but not at the expense of ease of understanding. Do we really lose so much time and word-space by substituting 'programmed cell death' for 'apoptosis', a word no one is even sure how to pronounce? |1|terminology, homology 1189|Makarenkov2006|This chapter presents a review of the mathematical techniques available to construct phylogenies and to represent reticulate evolution. Phylogenies can be estimated using distance-based, maximum parsimony, or maximum likelihood methods. Bayesian methods have recently become available to construct phylogenies. Reticulate evolution includes horizontal gene transfer between taxa, hybridization events, and homoplasy. Genetic recombination also creates reticulate evolution within lineages. Several methods are now available to construct reticulated networks of various kinds. Twelve such methods and the accompanying software are described in this review chapter.|000|phylogenetic network, phylogenetic reconstruction 1190|Forni2013|This article provides phonetic, lexical and grammatical evidence that Basque is an Indo-European language. It provides a brief history of previous research into the origins of Basque; a short description of the genesis of this article; a description of the methodology adopted for the present research; an overview of Michelena’s internal reconstruction of Pre-Basque; 23 sets of chronologically arranged sound laws linking Proto-Indo-European to Pre-Basque; Indo-European etymologies for 75% of the Basque native basic lexicon, with systematic cross-references to regular sound laws; Indo- European etymologies of some Basque bound morphemes, including case markers; a discussion of the findings; and Indo- European etymologies of 40 additional, non-basic lexical items.|000|Basque language, Indo-European, distant relationship 1191|Gontier2015|Since the 1990s, results coming in from molecular phylogenetics necessitate us to recognize that Horizontal Gene Transfer (HGT) occurs massively across all three domains of life. Nonetheless, many of the mechanisms whereby genes can become transferred laterally have been known from the early twentieth century onward. The temporal discrepancy between the first historical observations of the processes, and the rather recent general acceptance of the documented data, poses an interesting epis-temological conundrum: Why have incoming results on HGT been widely neglected by the general evolutionary community and what causes for a more favorable recep-tion today? Five reasons are given: (1) HGT was first observed in the biomedical sci-ences and these sciences did not endorse an evolutionary epistemic stance because of the ontogeny/phylogeny divide adhered to by the founders of the Modern Synthesis. (2) Those who did entertain an evolutionary outlook associated research on HGT with a symbiotic epistemic framework. (3) That HGT occurs across all three domains of life was demonstrated by modern techniques developed in molecular biology, a field that itself awaits full integration into the general evolutionary synthesis. (4) Molecular phylogenetic studies of prokaryote evolution were originally associated with exobiol-ogy and abiogenesis, and both fields developed outside the framework provided by the Modern Synthesis. (5) Because HGT brings forth a pattern of reticulation, it contrasts the standard idea that evolution occurs solely by natural selection that brings forth a vertical, bifurcating pattern in the “tree” of life. Divided into two parts, this chap-ter first reviews current neo-Darwinian “tree of life” versus reticulate “web of life” polemics as they have been debated in high-profile academic journals, and secondly, the historical context of discovery of the various means whereby genes are transferred laterally is sketched. Along the way, the reader is introduced to how HGT contradicts some of the basic tenets of the neo-Darwinian paradigm.|000|lateral gene transfer, history of science 1192|Morrison2015|The classic example is the comparison of bird wings and bat wings. These are homologous as forelimbs (structures), which are general throughout the tetrapods, but they are not homologous as wings (functions), because they represent independent modifications of those forelimbs in the ancestors of birds and bats.|50|homology, terminology 1193|Kassian2015b|In this paper we discuss the results of an automated compari- son between two 50-item groups of the most generally stable elements on the so-called Swadesh wordlist as reconstructed for Proto-Indo-European and Proto-Uralic. Two forms are counted as potentially related if their first two consonantal units, transcribed in simplified consonantal class notation (a rough variant of the Levenshtein distance method), match up with each other. Next to all previous attempts at such a task (Ringe 1998; Oswalt 1998; Kessler & Lehtonen 2006; Kessler 2007), our automated algorithm comes much closer to emulating the traditional procedure of cognate search as em- ployed in historical linguistics. “Swadesh slots” for protolanguages are filled in strict accordance with such principles of reconstruction as topology (taking into consideration the structure of the genealogical tree), morphological transparency, typology of semantic shifts, and areal distribution of particular items. Altogether we have counted 7 pairs where Proto-Indo-European and Proto-Uralic share the same biconsonantal skeleton (the exact same pairs are regarded as cognates in traditional hypotheses of Indo-Uralic relationship). To verify the probability of arriving at such a result by chance we have applied the permutation test, which yielded a positive result: the probability of 7 matched pairs is equal to 1.9% or 0.5%, depending on the constituency of the consonantal classes, which is lower than the standard 5% threshold of statistical significance or even lower than the strong 1% level. Standard methodology suggests that we reject the null hypothesis (accidental resemblance) and offer a more plau- sible explanation for the observed similarities. Since the known typology of language contacts does not speak in favor of explaining the observed Indo-Uralic matches as old lexical borrowings, the optimal explanation is seen in the hypothesis of an Indo-Uralic genetic relationship, with the 7 matching pairs in question representing archaic retentions, left over from the original Indo-Uralic protolanguage. |000|probability, probability of sound change patterns, sound classes 1194|Kassian2015b|:comment:`Authors mention principles of semantic reconstruction.` * Topological principle * External etymology principle * Internal etymology principle * Semantic plausibility principle * Areal effect exclusion principle :comment:`Note that these principles are all those principles which are also used in automated reconstruction, or which at elast could be used.`|304-306|semantic reconstruction 1195|Kassian2015b|:comment:`Explicit table of GLD consonant classes.` |308|sound classes, Global Lexicostatistical Database 1196|Kassian2015b|:comment:`Detailed set of consonant classes, divided into more symbols, 13 members`|309|sound classes, Global Lexicostatistical Database 1197|Kassian2015c|It is true that consonant classes are not based on for- mal statistic data (as we have explicitly stated), since modern linguistics still lacks a representative universal database of diachronic sound shifts.|387|sound change, database, 1198|Kassian2015c|The basic idea behind consonant classes is as follows: sounds are arranged in classes in such a manner that for any given sound X its shift to sound Y is typologically more frequent than its shift to Z if X and Y belong to the same class and Z belongs to a different class. We assume that most historical linguists would agree that this condition works for the majority of pairs of consonant sounds within the GLD classes.|387|sound classes, Global Lexicostatistical Database 1199|Fisler2013|The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a ‘‘tree of trees.’’ Then, we categorize schools of tree-representations. Classical schools like ‘‘cladists’’ and ‘‘pheneticists’’ are recovered but others are not: ‘‘gradists’’ are separated into two blocks, one of them being called here ‘‘grade theoreticians.’’ We propose new interesting categories like the ‘‘buffonian school,’’ the ‘‘metaphoricians,’’ and those using ‘‘strictly genealogical classifications.’’ We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization.|000|family tree, history of science 1200|Arcodia2007|The lexicon of Old Chinese (ca. 1200 BC – 300 AD) has a strong tendency towards monosyllabic- ity; all the character / morphs in the quoted passage are actually words.|83|compounding, Chinese, Chinese dialects 1201|Arcodia2007|In the evolution towards the modern language, the lexicon has undergone a massive process of di- syllabification: whereas before 200 BC disyllabic words represented roughly 20% of the lexicon (at least in the written style), in the modern language estimates are above 80% (Shi, 2002: 70-72), 11 and the disyllable is now regarded as the preferred word-form in the Modern Chinese lexicon. Given the fact that, as said in section 3, the syllable in Chinese largely coincides with the morpheme, and, therefore, almost all disyllables are made up of two lexical morphemes, it is not surprising that Chinese has been defined as a “language of compound words”, as quoted in the introduction. However, I insist that disyl- labification and compounding are two distinct phenomena, albeit related and indeed interacting with each other; we shall now examine the mechanisms which led to this peculiar configuration of the Chi- nese lexicon.|83|Chinese, Chinese dialects, language history, compounding 1202|Arcodia2007|In the literature about Chinese word formation, Mandarin has often been described as a "language of compounded words" (see Lin 2001, Dong 2004), with compounding as the main process in Chinese word formation. This paper proposes that the coinage of a great number of compounded words was caused by several factors: the morphology of the language, the near- absence of agglutinating or inflectional morphological markers (which can act as "word boundaries"), and the stability of the phonological (and orthographical) shape of Chinese morphemes, together with the scarcity of grammatical morphemes compared to lexical ones. The paper then provides a functional motivation for the development of different kinds of compounds. From the structural point of view, this is reflected in zero-marking in phrases which designate an easily identifiable and frequent referent, and zero-marking for coordination in syntax. Finally, the paper proposes that there may be a correlation between the synthetic character of word formation in a language and the abundance of compounds in it, whereas in languages with a tendency to analyticity in word formation, such as Romance languages, compounding is expected to be a less prominent phenomenon. |000|Chinese, compounding 1203|Arcodia2007|According to the first (Cheng, 1992: 44, Packard, 2000: 265-266), in the transition from a primitive society to a feudal one (between 1000 and 300 B.C.), there was a pragmatic need to create new words for new referents: “the shift towards the use of disyllabic words occurred when free monosyllabic words combined into new disyllabic words both through compounding (...) and through abbreviation of longer phrases. The newly juxtaposed morphemes subsequently often lost their status as free words, undergoing semantic shift or reduction due to the general effects of lexicalization (...)” (Packard, 2000: 365). According to the ‘phonological explanation’, which is indeed very common in the literature and seem to enjoy a widespread consensus (see e.g. Wang, 1998, Lin, 2001, Shi, 2002), the simplification of the phonology of the language caused the loss of many distinctions, and many syllables which were once separated became homophonous: to avoid ambiguity, the size of the word was enlarged, by adding an extra syllable.|84|disyllabification, Chinese, language history 1204|Wurzel1985|Eine der interessanten Fragen der historischen Linguistik ist zweifel­sohne, ob und wieweit Sprachveränderungen vorhersagbar sind. Will man diese Frage beantworten, so ist zu klären, * (a) unter welchen Bedingungen welcher Wandel eintritt oder eintreten kann, * (b) unter welchen Bedingungen welcher Wandel ausgeschlossen ist. Die Vorhersagbarkeit von Sprachveränderungen setzt verständlicher­ weise deren N i c h t Z u f ä l l i g k e i t voraus; wenn jegliche Sprachverän­ derung grundsätzlich zufällig ist, dann ibt es auch keine Möglichkeit der Vorhersage. Vorhersagbarkeit erfordert also zumindest einen bestimmten Grad an D e t e r m i n i e r t h e i t . |1985|predictability, naturalness, morphological change 1205|Wurzel1985|The paper deals with natural processes of morphological change.|000|morphological change, naturalness 1206|Wurzel1985|:comment:`criteria for naturalness` Grammatische Erscheinungen werden dann als n a t ü r l i c h angesehen, wenn sie — in den verschiedenen Sprachen weit verbreitet sind, — durch Sprachwandel häufig entstehen, aber selbst gegenüber Sprach­ wandel relativ resistent sind, — bei der Sprachaneignung durch das Kind verhältnismäßig früh erworben werden, — von Sprachstörungen in vergleichsweise geringem Maße betroffen sind usw. :translation:`Grammatical appearances are natural if they are a) frequent in different languages, b) occur frequently as results of change, c) are early learned in acquisition, and d) do not suffer from language disorder.`|588|naturalness, typology, grammaticalization 1207|Wurzel1985|Zu den morphologischen Natürlichkeitsprinzipien gehören — das Prinzip des konstruktionellen Ikonismus, — das Prinzip der Uniformität und Transparenz ('eine Funktion — eine Form 5 ), — das Prinzip des phonetischen Ikonismus sowie einige weitere Prinzipien, deren Wirkung in der Literatur verhält­ nismäßig häufig diskutiert worden ist :translation:`The morphological criteria of naturalness comprise a) the principal of constructional iconicism, b) the principle of uniformity and transparency (one function -- one form), c) the principle of phonetic iconicism, and further principles whose characteristics are frequently discussed in the literature.`|589|naturalness, morphological change, morphology 1208|Wurzel1985|Jedes Flexionssystem ist durch eine Reihe von typlogischen Parametern charakterisiert; die wichtigsten davon sind: — ein Inventar an Kategoriengefügen und Kategorien, — das Auftreten von Grundformflexion oder Stammflexion, [pb] — die separate oder kombinierte Symbolisierung von Kategorien, — die Anzahl und Ausprägung der formalen Distinktionen in den Para­ digmen, — die auftretenden Markertypen bezogen auf die einzelnen Katego- riengefüge, — das Vorhandensein oder Nichtvorhandensein von Flexionsklassen. :translation:`Every inflexion system has a set of typological parameters. The most important ones are: a) an inventory of categories, an inventory of basic inflexion forms and stem inflexion, c) the separate and combined symbolisation of categories, d) the number and states of formal distinctions in the paradigms, e) the marker-types regarding the categories, the presence or absence of inflexion classes.` |589f|inflexion, systemic processes 1209|Wurzel1985|In uneinheitlich aufgebauten Systemen entsprechen die einzelnen morphologischen Erscheinungen, d. h. Flexionsparadigmen, Flexions­ formen, Marker sowie die entsprechenden Flexionsregeln, den system­ definierenden Struktureigenschaften oder weichen von ihnen ab. Sie sind systemangemessen oder mehr oder weniger nichtsystemangemessen. Die S y s t e m a n g e m e s s e n heit morphologischer Erscheinungen be­ mißt sich am Übereinstimmungsgrad mit den systemdefinierenden Struk­ tureigenschaften. Die Sprecher empfinden systemangemessene mor­ phologische Strukturen als normal, nichtsystemangemessene als mehr oder weniger unnormal. :translation:`In inhomogeneous systems the morphological features, that is, the inflexion paradigms, the inflection forms, the markers, and the inflexion rulse, are in concordance with the system or not. The system adequacy of morphological phenomena can be measured by comparing the agreement with the system defining structural properties. The speakers will treat structures which are in concordance with the system as normal, structures which are not will be treated as more or less unnormal.`|590|complex systems, morphology, inhomogeneity, systemic processes 1210|Wurzel1985|Wenn also in einem bestimmten Bereich der Flexionsmorpho­logie, in dem Erscheinungen unterschiedlicher Systemangemessenheit miteinander konkurrieren, morphologisch bedingte Veränderungen eintreten, dann immer zugunsten der jeweils systemangemessenen Erscheinungen, also in einer festgelegten und damit v o r h e r s a g b a r e n R i c h t u n g . :translation:`If in a certain area of inflexion morphology, in which appearances of different degrees of system adequacy are in conflict with each other, morphologically induced changes appear, then they will occur in a pre-defined and predictable direction.` |590|inhomogeneity, directionality, systemic processes, morphological change 1211|Hudson2007|Word Grammar (Hudson 1984, 1990, 2007) is a theory of language which touches on almost all aspects of synchronic linguistics and unifies them all through a single very general claim (Hudson 1984: 1): :pull-quote: The Network Postulate: Language is a conceptual network This claim is hardly contentious in Cognitive Linguistics, where it is often taken for granted that language as a whole is a network in contrast with the more traditional view of language as a grammar plus a dictionary—a list of rules or principles and a list of lexical items. However, it is particularly central to Word Grammar, in which each of the main traditional areas of language is a subnetwork within the total network of language.|00|word grammar, language, conceptual network, cognitive linguistics, network 1212|Hudson2007|Most obviously, ‘‘the lexicon’’ is a network of: * a. Forms * b. Meanings * c. Lexemes (The scare-quotes round ‘‘the lexicon’’ anticipate section 8, which argues that the lexicon is not an identifiable part of the total language.) This is a network rather than a simple list because the elements among the parts are in a many-to-many relation. There are lexemes which have more than one meaning (polysemy); there are meanings which are shared by more than one lexeme (synonymy); and there are lexemes which have more than one form (inherent variability).|509|language change, language model, network 1213|Hamed2006|Some meanings are represented by a root with appended affixes and modifiers. The root can stay constant while the affix/modifier can change, or both can vary. The first case can be simply resolved by coding only the variable part, but the latter case is more problematic. In fact, the root and the affix/modifier cannot be split into two characters, since they have a linked evolution within a semantic slot. We thus coded each combination root/affix or modifier as a new state. There are also cases of optional affixation, which we coded as distinct states from the stable form. For instance, the meaning “head” is represented in Nanchang (Gan) by the root 头 (tou2), state one, which can appear either systematically suffixed as in Zhangping (Min) 头壳 (ke4), state two, or optionally suffixed as in Guangzhou (Yue) 头 [壳], state three. The duplication of a root or the combination of roots is also treated as a suffixation. In the latter case, if each root and their combined form are semantic variants of the same meaning, then they constitute three different states.|38|multi-state models, Chinese dialects, partial cognacy 1214|Zhang2014|As father of modern linguistics, Ferdinand de Saussure has been known in China for more than eighty years. A historical review of Saussurean studies in China indicates that some misunderstandings on Saussure need to be cleared up. This paper mainly focuses on how Chinese scholars misread Saussure from the perspective of the arbitrary nature of language and points out that motivation or iconicity is not contradictory to arbitrariness in Saus- surean linguistics. It also shows that Chinese historical developments have had profound effects on the reception of Saussure. To some extent, politics, culture, and education in China are responsible for the misinterpretations of Saussure. Misreading Saussure just reflects that there is still a long way to go for the study of Saussure in the Chinese linguistics world.|000|Chinese, Ferdinand de Saussure, history of science 1215|Wright2014|Despite the introduction of likelihood-based methods for estimating phylogenetic trees from phenotypic data, parsimony remains the most widely-used optimality criterion for building trees from discrete morphological data. However, it has been known for decades that there are regions of solution space in which parsimony is a poor estimator of tree topology. Numerous software implementations of likelihood-based models for the estimation of phylogeny from discrete morphological data exist, especially for the Mk model of discrete character evolution. Here we explore the efficacy of Bayesian estimation of phylogeny, using the Mk model, under conditions that are commonly encountered in paleontological studies. Using simulated data, we describe the relative performances of parsimony and the Mk model under a range of realistic conditions that include common scenarios of missing data and rate heterogeneity.|000|Bayesian approaches, maximum parsimony, morphological data, 1216|Schwenk2001|This paradox is at the heart of our ignorance of how phenotypes evolve and it manifests a growing philosophical schism in the field. At one extreme we have the traditional, atomistic approach as noted--organisms viewed as little more than "bags of characters," each character available for individual honing by environmental selection to create adapted phenotypes (Rieppel, 1986). Accordingly, the phenotype is held to be highly responsive to the exigencies of an ever-changing environment, virtually protean through evolutionary time, with diversity the inevitable outcome. Change, though inevitable, occurs incrementally, character by character. Phenotypic stasis is seen as exceptional and explicable only by reference to unusually persistent environmental conditions (Simpson, 1953). In short, the organism can be understood as the sum of its parts in relation to the environment. Yet this view of the phenotype is transparently false. Characters are not diffused through space, but are grouped and bounded within organisms where they exist in ordered relations with one another to form tissues, organs, and systems (Whyte, 1965). These character associations are further related temporally through development and growth, and dynamically through functional interaction. As historical entities, organisms transmit not only their morphological features from one generation to the next, but their unique set of organizational properties, as well--the patterns of interaction among their characters. Thus, by virtue of being organismal attributes, characters must evince associations with other characters. Organisms thus manifest complexly nested webs of character interaction and integration, such that a phenotypic change in one character will almost certainly have an impact on others.|166|functional units, character states, character concept, biological evolution, systemic processes 1218|Hoehna2014|Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution.|000|RevBayes, software, graphical models, Bayesian approaches 1219|McElham1967|To reduce the effect of borrowings in the diagnostic vocabulary list the writer has employed the basic vocabulary list of 215 items proposed by Swadesh (1952, 1955). For a number of reasons 75 of these items were omitted. Some were obviously non-cultural in all the languages (e.g., ice, freeze and snow). Others were non-cultural in many of the languages (e.g., fish, sea, salt and swim) and conse- quently involved borrowings. A large number of items were omitted because they involved repetitions of the same vernacular term (e.g., dirty-black, far-long, near-short, feather-hair, fog-cloud, narrow-thin- little, wide-thick-big, river-water, sharp-tooth, here-this, there-that, wife-woman, husband-man, lie-sleep, wipe-wash, hear-know and kill- hit). The last term in each of the above sets was retained in the [pskip] diagnostic list. Furthermore, a number of items were omitted either because of difficulty in elicitation or because compounds or phrases were involved. In order not to skew the final results the writer has decided not to incorporate further "basic vocabulary" since the choice could determine the precise position of Dedua in the overall classification. |6-8|character dependencies, Swadesh list, concept list, lexical borrowing 1220|McElham1967|In counting probable cognates the writer followed those princi- ples set forth by Gudschinsky (1956). Regular correspondences were given primary value in determining cognate forms. After this a number of criteria relating to phonetic similarity were employed. A general figure of 50 per cent similarity in form was accepted in identifying forms as probable cognates. Consonants were weighted more heavily than vowels as also were initial syllables over final syllables. An example of the latter is that Kosorong appears to exhibit a noun class marker -kota, so its occurrence was treated as a suffixial morpheme and given little value in cognate identification. Pronouns posed a special problem inasmuch as all first person singular forms begin with n and second person singular forms begin with g. Rather than treat all the forms as cognate the writer weighted vowel quality more heavily and separated cognates on the basis of front or back vowels. Cognate chains are common and quite frequently cross family divisions. In general it could be said that the writer attempted to be conservative in the identification of cognates. A careful comparative study and reconstruction would probably yield slightly different results.|8|cognate detection, cognate judgments, phonetic similarity 1221|Fitch1971|A method is presented that is asserted to provide all hypothetical ancestral character states that are consistent with describing the descent of the present-day character states in a minimum number of changes of state using a predetermined phylogenetic relationship among the taxa represented. The character states used as examples are the four messenger RNA nucleotides encoding the amini acid sequences of proteins, but the method is general.|000|Fitch parsimony, Fitch algorithm, maximum parsimony, methodology 1222|Bohl2003|A frequently used model in phylogenetics uses a discrete Markov process to model molecular drift as a cause of evolutionary diversi cation. Predictions are often based on eigenvalue computations which, we argue, only make sense if the process is reversible. Since there is no evidence (to our knowledge) to support the reversible nature of the process, a re-examination is made of the model and necessary algorithms with emphasis on irreversible processes. The paper includes careful discussion of the underlying Markov processes, careful distinction between the properties of reversible and irreversible processes (including earlier widely accepted analysis) all in the language of linear algebra, a new least c 20 John Wiley & Sons, squares approach to data adjustments, and numerical examples.|000|irreversible markov processes, directionality, 1223|Williams2015|The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its pos- ition. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without theneedforanoutgroup. We comparethe performanceofthese models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made|000|irreversible markov processes, phylogenetic reconstruction, Bayesian approaches 1224|Williams2015|There is therefore a clear need for alternative rooting approaches, both to corroborate results from outgroup rooting and for use in cases where outgroup rooting is problematic.|2|rooting of phylogenetic trees, irreversible markov processes, directionality 1225|Williams2015|Here, we consider another approach to rooting phylogenetic trees: the use of non-reversible or non-stationary substitution models, in which the likelihood of the tree depends on the position of the root. These models allow the tree to be rooted as an integral part of the analysis, without the need for outgroups. Despite the potential of these approaches for addressing questions in deep phylogeny, there has been surprisingly little work on the subject. @Barry<1987> & Hartigan [19] developed a non-reversible and non-stationary substitution model that was implemented by @Jayaswal<2005> et al. [20] and has been applied to the inference of rooted trees [21 @Jayaswal2011b]; however, the large number of parameters involved has limited the general applicability of the model. Yang & Roberts [22] proposed a non-stationary model which allowed a change in the composition vector at each speciation event. They fitted their model to small-subunit ribosomal RNA (rRNA) sequences from across the tree of life, and recov- ered an eocyte tree in which the root was placed between E. coli—the single bacterial representative—and the remaining sequences. The NDCH model of @Foster<2004> [23] and the BP model of Blanquart & Lartillot [24,25] are similar except a fixed, but unknown, number of composition vectors apply to different parts of the tree. While these models have the potential to offer a more parsimonious description of the data, their unknown dimension makes model-fitting computationally difficult. Finally, @Huelsenbeck<2002> et al. [26] investigated the ability of a non-reversible model to correctly identify the root position on simulated data and found that the approach worked best when the data contained a high degree of non-reversibility. |2|irreversible markov processes, directionality, rooting of phylogenetic trees, Bayesian approaches 1226|Huelsenbeck2002|Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution. We perform simulation analyses to examine the relative ability of these three criteria to correctly identify the root of the tree. The outgroup and molecular clock criteria were best able to identify the root of the tree, whereas the nonreversible model was able to identify the root only when the substitution process was highly nonreversible. We also examined the performance of the criteria for a tree of four species for which the topology and root position are well supported. Results of the analyses of these data are consistent with the simulation results.|000|rooting of phylogenetic trees, irreversible markov processes, directionality 1227|Foster2004|Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree-and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.|000|irreversible markov processes, compositional heterogeneity, directionality, Bayesian approaches 1228|Jayaswal2011b|The selection of an optimal model for data analysisis an important component of model-based molecular phylogeneticstud- ies. Owing to the large number of Markov models that can be used for data analysis, model selection is a combinatorial problem that cannot be solved by performing an exhaustive search of all possible models. Currently, model selection is based on a small subset of the available Markov models, namely those that assume the evolutionary process to be globally station- ary, reversible, and homogeneous. This forces the optimal model to be time reversible even though the actual data may not satisfy these assumptions. This problem can be alleviated by including more complex models during the model selection. We present a novel heuristic that evaluates a small fraction of these complex models and identifies the optimal model|000|irreversible markov processes, general markov process, evolutionary model, Bayesian approaches, directionality 1229|Jayaswal2011|The general Markov model (GMM) of nucleotide substitution does not assume the evolutionary process to be stationary, reversible, or homogeneous. The GMM can be simplified by assuming the evolutionary process to be stationary. A stationary GMM is appropriate for analyses of phylogenetic data sets that are compositionally homogeneous; a data set is considered to be compositionally homogeneous if a statistical test does not detect significant differences in the marginal distributions of the sequences. Though the general time-reversible (GTR) model assumes stationarity, it also assumes reversibility and homogeneity. We propose two new stationary and nonhomogeneous models--one constrains the GMM to be reversible, whereas the other does not. The two models, coupled with the GTR model, comprise a set of nested models that can be used to test the assumptions of reversibility and homogeneity for stationary processes. The two models are extended to incorporate invariable sites and used to analyze a seven-taxon hominoid data set that displays compositional homogeneity. We show that within the class of stationary models, a nonhomogeneous model fits the hominoid data better than the GTR model. We note that if one considers a wider set of models that are not constrained to be stationary, then an even better fit can be obtained for the hominoid data. However, the methods for reducing model complexity from an extremely large set of nonstationary models are yet to be developed.|000|general markov process, directionality, irreversible markov processes, Bayesian approaches 1230|Jayaswal2005|The non-homogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191-210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two data sets, each violating one or more of the assumptions of stationarity, homogeneity, and reversibility. The log likelihood values returned by programs based on the F84 model (J Mol Evol, 29: 170-179), the general time reversible model (J Mol Evol, 20: 86-93), and Barry and Hartigan's model are compared to determine the validity of the assumptions made by the first two models. In addition, we present a method for assessing whether sequences have evolved under reversible conditions and discover that this is not so for the two data sets. Finally, we determine the most likely tree under the three models of DNA evolution and compare these with the one favoured by the tests for symmetry.|000|general markov process, irreversible markov processes, directionality, Bayesian approaches 1231|Barry1987|The core data of molecular biology consists of DNA sequences. We will show how DNA sequences may be used to infer the evolution of the primates, human, chimpanzee, ape, orangutan and gibbon. the underlying probability models are taken to be Markov processes on trees. Some dependencies along the sequence due to the genetic code are also considered.|000|molecular evolution, irreversible markov processes, general markov process, DNA sequence, 1232|Barry1987|Many of the probability models used in constructing evolutionary trees assume that the sites on the DNA molecule are independent and identically distributed (iid) over the set of bases {A, C, G, T}. However, some patterns are observable in DNA sequences. For example, purines tend to follow purines and pyrimidines tend to follow pyrimidines. Certain subsequences tend to occur more frequently than others and, as will be seein in Section 7, some sites are more prone to change than others. |195f|DNA sequence, probability models, irreversible markov processes 1233|Barry1987|We considered a generalization of the iid model to a Markov model. The probability of the whole seuence is determined by the conditional distribution of the next base given the preceding sequence of bases. It is assumed that this conditional distribution is determined by just a few different subsequences which we call the effectives. |196|irreversible markov processes, probability models 1234|Barry1987|A change from one purine to the other or from one pyrimidine to the other is said to be a *transition*; other changes are said to be *transversions*. Thus a transition in the third position has no effect on the protein: the third position is said to be silent. [...] Empirical evicence suggets treating silent sites differently from the others -- indeed, since they do not affect the prtotein, they offer a selection-neutral history of change.|-|silent site, confusion probability, transition, transversion 1235|Barry1987|In order to determine the expected number of substitutions, it is necessary to assume that the process is *reversible*, that is, the transition matrix in going from 0 to *t* equals the transition matrix in going from *t* to 0. |199|probability models, reversible process 1236|Bohl2006|A recently developed mathematical model for the analysis of phylogenetic trees is applied to comparative data for 48 species. The model represents a return to fundamentals and makes no hypothesis with respect to the reversibility of the process. The species have been analysed in all subsets of three, and a measure of reliability of the results is provided. The numerical results of the computations on 17,296 triples of species are made available on the Internet. These results are discussed and the development of reliable tree structures for several species is illustrated. It is shown that, indeed, the Markov model is capable of considerably more interesting predictions than has been recognized to date.|000|irreversible markov processes, general markov process, phylogenetic reconstruction, Bayesian approaches 1237|Klopfstein2015|Directional evolution has played an important role in shaping the morphological, ecological, and molecular diversity of life. However, standard substitution models assume stationarity of the evolutionary process over the time scale examined, thus impeding the study of directionality. Here we explore a simple, nonstationary model of evolution for discrete data, which assumes that the state frequencies at the root differ from the equilibrium frequencies of the homogeneous evolutionary process along the rest of the tree (i.e., the process is nonstationary, nonreversible, but homogeneous). Within this framework, we develop a Bayesian approach for testing directional versus stationary evolution using a reversible-jump algorithm. Simulations show that when only data from extant taxa are available, the success in inferring directionality is strongly dependent on the evolutionary rate, the shape of the tree, the relative branch lengths, and the number of taxa. Given suitable evolutionary rates (0.1-0.5 expected substitutions between root and tips), accounting for directionality improves tree inference and often allows correct rooting of the tree without the use of an outgroup. As an empirical test, we apply our method to study directional evolution in hymenopteran morphology. We focus on three character systems: wing veins, muscles, and sclerites. We find strong support for a trend toward loss of wing veins and muscles, while stationarity cannot be ruled out for sclerites. Adding fossil and time information in a total-evidence dating approach, we show that accounting for directionality results in more precise estimates not only of the ancestral state at the root of the tree, but also of the divergence times. Our model relaxes the assumption of stationarity and reversibility by adding a minimum of additional parameters, and is thus well suited to studying the nature of the evolutionary process in data sets of limited size, such as morphology and ecology. |000|directionality, general markov process, Bayesian approaches, irreversible markov processes 1238|Williams2015|The assumptions of stationarity, reversibility and across-branch homogeneity are largely made for mathematical convenience. However, these assumptions come at an inferential cost, with stationary and reversible models yielding likelihood functions that do not depend on the position of the root. As such, topological inference is limited to unrooted trees. We consider two recently developed Bayesian hierarch- ical models that relax some of these standard assumptions, and thereby allow the models to be used to make inference about root positions.|2|irreversible markov processes, phylogenetic reconstruction, rooting of phylogenetic trees, Bayesian approaches, probability models 1239|Williams2015|The first of these models, NR [28], is branch-homogeneous and stationary, but non-reversible. In a non-reversible model, the direction of time is important. The structure of the Bayesian hierarchical model is based on the simple HKY85 model, but allows the parameters of the instantaneous rate matrix to depart from this structure through two perturbations: the first to allow a more general GTR structure, and the second to allow the most general non-reversible form. The size of the perturbations is controlled by standard deviation parameters s R and s N , respectively, whose values are unknown. By fitting the model to data, we learn which values are more or less plausible, with large values of s R providing evidence of revers- ible departures from the HKY85 model, and large values of s N providing evidence of non-reversibility. Indeed, it is this evidence of non-reversibility that drives inference about the root position.|2|rooting of phylogenetic trees, irreversible markov processes, phylogenetic reconstruction 1240|Williams2015|Despite their potential for addressing key questions in early cellular evolution, non-reversible and non-stationary substitution models are still an under-explored area of phylo- genetics. In this article, we have explored two recently developed models for inferring rooted trees from nucleotide data on modest but reasonable numbers of taxa—up to 30 in the case of the archaeal dataset. These models make root [pb] inferences that are consistent with independent phyloge- nomic analyses and anciently duplicated genes and may provide a useful alternative to outgroup rooting. Our results therefore show that real sequence alignments can contain useful information about the position of the root that is implicit both in changes in sequence composition over time as well as—in at least some cases—evidence for directionality in the process of substitution.|6f|irreversible markov processes, general markov process, Bayesian approaches, directionality 1241|Heaps2014|In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.|000|heterogeneity, Bayesian approaches, probability models 1242|Rivas2008|A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.|000|phylgenetic reconstruction, ancestral state reconstruction, insertion, deletion, irreversible markov processes 1243|Kitchen2009|The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which gave rise to some of the world's first major religious and literary traditions. In this study, we employ Bayesian computational phylogenetic techniques recently developed in evolutionary biology to analyse Semitic lexical data by modelling language evolution and explicitly testing alternative hypotheses of Semitic history. We implement a relaxed linguistic clock to date language divergences and use epigraphic evidence for the sampling dates of extinct Semitic languages to calibrate the rate of language evolution. Our statistical tests of alternative Semitic histories support an initial divergence of Akkadian from ancestral Semitic over competing hypotheses (e.g. an African origin of Semitic). We estimate an Early Bronze Age origin for Semitic approximately 5750 years ago in the Levant, and further propose that contemporary Ethiosemitic languages of Africa reflect a single introduction of early Ethiosemitic from southern Arabia approximately 2800 years ago.|000|Bayesian phylogenetics, Semitic languages, language tree, dataset 1244|Lee2011|Languages, like genes, evolve by a process of descent with modification. This striking similarity between biological and linguistic evolution allows us to apply phylogenetic methods to explore how languages, as well as the people who speak them, are related to one another through evolutionary history. Language phylogenies constructed with lexical data have so far revealed population expansions of Austronesian, Indo-European and Bantu speakers. However, how robustly a phylogenetic approach can chart the history of language evolution and what language phylogenies reveal about human prehistory must be investigated more thoroughly on a global scale. Here we report a phylogeny of 59 Japonic languages and dialects. We used this phylogeny to estimate time depth of its root and compared it with the time suggested by an agricultural expansion scenario for Japanese origin. In agreement with the scenario, our results indicate that Japonic languages descended from a common ancestor approximately 2182 years ago. Together with archaeological and biological evidence, our results suggest that the first farmers of Japan had a profound impact on the origins of both people and languages. On a broader level, our results are consistent with a theory that agricultural expansion is the principal factor for shaping global linguistic diversity.|000|Bayesian phylogenetics, language tree, dataset, Japanese 1245|Augst2009|Die Wortfamilie stellt eine komplexe Struktur dar. Ausgehend von einem Kernwort baut sich modellhaft ein vielliniges Netz auf, dessen Knoten und Endpunkte Ableitungen und Zusammensetzungen verschiedenen Grades sind. In Schulbüchern und Einführungsveran- staltungen bildet der Baum mit Wurzel, Stamm und sich vielfältig verzweigenden Ästen die bildliche Veranschaulichung. Wenn es nur um die graphische Präsentation solcher Wortfami- lienstrukturen ginge, dann ließen sich für alle ca. 8.000 Wortfamilien solche „Stammbäume" mit 1 bis ca. 500 Wörtern als Knoten- und Endpunkte zeichnen.|XII|word formation, word family, morphology, lexicon, German 1247|Wright2014|The Mk model assumes a Markov process for character change, allowing for multiple character-state changes along a single branch. The probability of change in this model is symmetrical; in other words, the probability of changing from one state to another is the same as change in the reverse direction. This assumption can be relaxed in Bayesian implementations through the use of a hyperprior allowing variable change probabilities among states|1|Bayesian approaches, character evolution, morphological characters 1248|Yiu2014|On the basis of the degree of subtopic prominence, Liu (2001a) suggests that Wu and Min are weak VO dialects, while Cantonese is a strong VO dialect. The present study examines the word orders exhibited by the directional verb/the directional complement and the theme object/the locative object in Wu, Min and Cantonese, when denoting self-agentive and agentive motion events. The findings of the study show that the word orders exhibited in the three Min dialects studied, namely Fuqing, Hui’an and Chao’an, closely resemble those in Cantonese, but differ significantly from the ones exemplified in Wu. The contrast between Min/Cantonese and Wu is further supported by their differences in the tolerance of postverbal object and in the use of preposition or postposition. The findings of the present study suggest that Min and Cantonese are strong VO dialects while Wu is a weak VO dialect in the sense of Liu (2001a).|000|dialect classification, Mǐn, Wú, Cantonese, Chinese dialects 1249|Wiens2001|Many aspects of morphological phylogenetics are controversial in the theoretical systematics literature and yet are often poorly explained and justified in empirical studies. In this paper, I argue that most morphological characters describe variation that is fundamentally quantitative, regardless of whether they are coded qualitatively or quantitatively by systematists. Given this view, three fundamental problems in morphological character analysis (definition, delimitation, and ordering of character states) may have a common solution: coding morphological characters as continuous quantitative traits. A new parsimony method (step-matrix gap-weighting, a modification of Thiele's approach) is proposed that allows quantitative traits to be analyzed as continuous variables. The problem of scaling or weighting quantitative characters relative to qualitative characters (and to each other) is reviewed, and three possible solutions are described. The new coding method is applied to data from hoplocercid lizards, and the results show the sensitivity of phylogenetic conclusions to different scaling methods. Although some authors reject the use of continuous, overlapping, quantitative characters in phylogenetic analysis, quantitative data from hoplocercid lizards that are coded using the new approach contain significant phylogenetic structure and exhibit levels of homoplasy similar to those seen in data that are coded qualitatively. |000|morphological data, morphological characters, maximum parsimony, ordered character states, 1250|Wiens2004|We live in the age of comparative genomics, and it may seem that there is not much point in reconstructing phylogenies using morphological data anymore. As more and more genes and genomes are being sequenced, the possibility that thousands or even millions of informative, independently evolving molecular characters can be brought to bear on a given phylogenetic problem is quickly becoming a reality (e.g., Rokas et al., 2003). Given the rate that new sequence data are being added, and the rate at which new innovations continue to accelerate this process, it seems possible that in the not-too-distant future we will be able to have a perfectly accurate and well-supported phylogeny of most living species on earth using molecular data alone. So why bother with morphology? A recent paper by Scotland et al. (2003; SEA hereafter) offered a reappraisal of the role of morphology in phylogeny reconstruction. This is certainly an important and timely topic to discuss, and their main thesis is bold and controversial. They state that “We view any attempt to include more morphological data in phylogeny reconstruction as inherently problematic” (p. 545). Unfortunately, most of their arguments are based on unsupported speculation, and they fail to mention numerous studies that clearly contradict their conclusions. Given that many of their comments are written as responses to book chapters written by my collaborators and myself (e.g., Hillis and Wiens, 2000; Poe and Wiens, 2000; Wiens, 2000a), I feel obligated to elucidate some of these problems. Many of the issues raised are central to how systematics is done and will be conducted in the future. I will argue that, despite many undeniable advantages of molecular data, it is still absolutely necessary that we continue to collect additional morphological data for phylogenetic analysis, and continue to improve our methods for morphology-based phylogenetics. Note that Jenner (2004) has provided an independent rebuttal of the SEA paper, and he describes a large number of substantive criticisms which show only limited overlap with my own. |000|morphological characters, morphology, biology, phylogenetic reconstruction, 1251|Jacques2016a|This paper documents a case of analogy from the first person to the third in the Dhegiha languages. It discusses the significance of this example for historical linguistics in general, and proposes that higher frequency in discourse of the first person form in the case of cognition verbs explains why the direction of analogical change was reversed in the case of this verb.|000|analogy, directionality, language change, Dhegiha languages, frequency 1252|Bux1999|Computer-assisted language comparison (original: Computergestützter Sprachvergleich). This paper describes alignment algorithms, and ideas for proto-language reconstruction for the purpose of language comparison in Indo-European languages. The paper contains, interestingly, explicit alignments of orthographies, and further interesting points which are worth to be given a read and also eventually be translated from the German in parts.|000|computer-assisted analysis, computer-aided approaches, Indo-European, phonetic alignment, language comparison, automatic linguistic reconstruction 1253|Packard2000|In modern Chinese, the coining of new words still overwhelmingly yields forms that are bisyllabic rather than monosyllabic: less than 1 per cent (from 0.13 per cent to 0.32 per cent) of new words were monosyllabic in a study done by @Sawer<1995> (1995). Furthermore, such word creation rarely occurs anew, ‘out of thin air’ as it were, but rather derives from the combination of extant morphemes using existing word structure rules.|267|word formation, neologism, Chinese, Mandarin, compounding 1254|Trask2015|These five languages (and all the other Romance languages) are thus genetically related: they are all descended from a common ancestor. The words for ‘100’ in these five languages are cognate: that is, they are descended from the same single ancestral word in that com- mon ancestor, and of course the other sets of words are also cognate. Given the abundant systematic correspondences linking these four words and hundreds of other words in the Romance languages, we could be sure of this conclusion even if we had no Latin data to confirm it.|193|cognacy, definition, terminology, etymological relations 1255|Trask2015|In the first place we must recognize that language-internal explanations seem to be favoured by many commentators on these phenomena. Edward Sapir (1921) suggested that closely related languages tended to change in roughly the same direction, a phenomenon he termed drift. There is a longstanding and ongoing scholarly debate about how separate languages can ‘know’ to go in the same way, a view some linguists would consider unfortunately mystical. Nevertheless, it is certainly the case that all of the Germanic languages|301|drift, historical linguistics, language change, systemic processes, directionality 1256|Trask1996|The linguist Edith Moravscsik (1978) has proposed some universal principles applying to grammatical borrowing, the most interesting of these are given below (At the cost of some precision, I have reworded her principles for the sake of clarity; Moravcsik is not responsible for my rewordings.) *Universal 1: Grammatical morphemes cannot be borrowed until after some lexical items have been borrowed.* [...] *Universal 2: Bound morphemes can be borrowe3d only as parts of complete words.* [...] *Universal 3: Verbs cannot be borrowed directly.* [...] Nevertheless, this claim appears to be simply false. English has borrowed innumerable verbs from French [...] *Universal 4: Inflectional morphemes cannot be borrowed until after some derivational (word-forming) morphemes have been borrowed.* [...] *Universal 5: A proposed grammatical item may not be borrowed as a post-posed one, and vice versa.* |314|lexical borrowing, grammatical borrowing, language contact, restrictions 1258|Sun2010|This study conducts a psycholinguistic experiment using the concept-formation paradigm to probe perception of tones across different syllable types. Native-speaking subjects were trained to perceive tonal identities in Taipei Taiwanese and then tested with novel tokens comprising checked versus non-checked pairs to see if comparable tonal values across distinct syllable types are categorized as ‘having the same tone’. The results of our experiment indicate that ‘tonal identity’, like other concepts formed by human categorization, has its fuzzy edges. Extracting and establishing tonal identity becomes increasingly difficult as syllable structures become more different.|000|tone, tone language, Taiwanese, perception, tone perception 1259|Sun2003|The author discusses to which degree tonal development is an indicator of genetic relationship. Taking Tibetan dialects as an example, he claims that independent development of tone is just as possible as that it was developed once. So in these terms, one can say that tone, tonogenesis, and tone patterns are not necessarily a good indicator for genetic subgrouping in historical linguistics.|000|tone, tonogenesis, Tibetan, Tibetan dialects, shared innovation, subgrouping 1260|Ardia2007|Developing a cross-linguistic naming test has represented a challenge in language evaluation. In this paper, it is proposed that a cross-linguistic naming test should fulfill at least the following three criteria: (1) include only “universal” words found across different languages. The basic cross-linguistic core vocabulary is usually referred as the “Swadesh word list”; (2) include different semantic categories (e.g., living and nonliving elements); and (3) avoid the confounding of perceptual difficulties. Departing from the Swadesh word list, a cross-linguistic naming test was developed, including six different semantic categories: (a) body-parts (10 words), (b) natural phenomena (non-touchable) (5 words), (c) external objects (potentially known through the sight and the touch) (5 words), (d) animals (5 words), (e) colors (5 words), and (f) actions (10 words). A total of 40 color pictures were selected to represent these basic words. It is emphasized that this test has two major advantages: on one hand, it is readily available in hundreds of different languages; and, on the other hand, it is not a “fixed” test, but it includes photographs that can be replaced. Theoretically, norms are not required, and it represents a low-ceiling test. Word frequency can be used as a criterion of the level of difficulty. The next step will be to find the performance profile in different language pathologies, as well as the decline pattern in cases of dementia.|000|basic vocabulary, naming test, aphasia, neurology, cognition, lexicon 1261|Gonzalez2016|The traditional view in the philosophy of language has been that linguistic competence is constituted by rule-governed, algorithmic skills, which underlie the learnable, creative nature of language. However, increasing evidence about the contextual sensitivity of lin- guistic understanding has put this traditional view under pressure. I argue that researchers should turn to anti-computationalist proposals in the cognitive sciences in order to develop non-algorithmic views of language use that do not make communication and learning mysterious. Approaches like ecological psychology, enactivism or dynamical systems theory have shown how complex, open-ended competences can be explained without positing underlying rule-guided processes, but rather focusing on the complex interactions and couplings among agents and their environment. These alternative views of linguistic competences make it possible to address metaphor and other areas of speech that, because of their non-algorithmic nature, have been considered derivative and have tended to be excluded from the domain of linguistic meaning. Understanding metaphors requires being able to perceive relevant similarities and correlations between different subjects. This is a highly context-sensitive, embodied ability, which often relies heavily on interpersonal coordination|000|metaphor, semantic change, 1262|Ramscar2016|Historically, linguists and psychologists have generally assumed that language is a com- binatoric process, thereby taking the idea that language users have access to inventories of discrete, combinable units (phonemes, morphemes, words, etc.) for granted, despite the fact that these units have tended to resist formal definitions. We propose a new approach to language understanding based on the psychological mechanisms that underpin context-sensitive processing. This new method is surprisingly simple, in large part because it embraces a view of learning that has been developed from studies of animal behavior and neuroscience. From this perspective, learning is seen as a systematic, discriminative process that seeks to reduce a learner’s uncertainty in making moment-to-moment predictions. We suggest that language processing employs all the information available to the listener at any given moment to predict what will happen in the next moment, in the next couple of sentences, etc. This approach does not rely on any of the ambiguous traditional linguistic units because continuous-time processing simply acts to reduce a hearer’s uncertainty about an actual message in relation to possible messages, rather than building up an interpretation out of elemental components. From this perspective, the conventional units of language – phonemes, morphemes, words – can be seen as idealizations of patterns that evolved for communicative efficiency that can serve the purposes of orthographic (and linguistic) description, rather than psychologically ‘real’ elements that are essential to language processing.|000|language processing, parsing, psycholinguistics, morpheme, language evolution, language history, phoneme, communicative efficiency 1263|Zwaan2014|The debate on whether language comprehension involves the manipulation of abstract symbols or is grounded in perception and action has reached an impasse, with authors from different theoretical persua- sions unable to agree on the diagnostic value of empirical findings. To escape this impasse, I propose a pluralist view of cognition that encompasses abstract and grounded symbols. The contributions of these symbol types to language comprehension vary as a function of the degree to which language use is embedded in the environment. I distinguish five levels of embeddedness: demonstration, instruction, projection, displacement, and abstraction. Only through a closer analysis of context will we make significant progress toward understanding language comprehension and cognition in general.|000|embodiment, language processing, language comprehension, semantics, semantic change, semantic similarity 1264|Lopez2015|**Background** Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. However, are we missing novel lineages in the microbial dark universe by relying on standard phylogenetic and BLAST methods? If so, how can we probe that universe using alternative approaches? We performed a novel type of multi-marker analysis of genetic diversity exploiting the topology of inclusive sequence similarity networks. **Results** Our protocol identified 86 ancient gene families, well distributed and rarely transferred across the 3 domains of life, and retrieved their environmental homologs among 10 million predicted ORFs from human gut samples and other metagenomic projects. Numerous highly divergent environmental homologs were observed in gut samples, although the most divergent genes were over-represented in non-gut environments. In our networks, most divergent environmental genes grouped exclusively with uncultured relatives, in maximal cliques. Sequences within these groups were under strong purifying selection and presented a range of genetic variation comparable to that of a prokaryotic domain. **Conclusions** Many genes families included environmental homologs that were highly divergent from cultured homologs: in 79 gene families (including 18 ribosomal proteins), Bacteria and Archaea were less divergent than some groups of environmental sequences were to any cultured or viral homologs. Moreover, some groups of environmental homologs branched very deeply in phylogenetic trees of life, when they were not too divergent to be aligned. These results underline how limited our understanding of the most diverse elements of the microbial world remains, and encourage a deeper exploration of natural communities and their genetic resources, hinting at the possibility that still unknown yet major divisions of life have yet to be discovered.|000|metagenomics, gene similarity network, divergence, homolog detection 1265|Orzechowska2015|The goal of this paper is to discuss phonotactic constraints that govern the formation of word-initial consonant clusters in two different languages, German and Polish, and to establish a general procedure for a rank-ordering of clusters. The description of clusters is based on corpus and dictionary data. We define several dimensions in cluster description, namely (i) complexity of clusters, (ii) place of articulation, (iii) manner of articulation and (iv) voicing of segments, for which a total of 15 parameters is derived, each expressing a structural preference for cluster formation true to various degrees in the two languages. These preferences can be seen in part as extensions of the frequently used measure of sonority based mainly on manner of articulation features, and can be considered to provide a more detailed description of clusters than one based on sonority. In addition, a cluster analysis for the two sets of clusters lends support to the conclusions on the fundamental difference between Polish and German word-initial phonotactics. For each cluster, values obtained for every parameter are summed up to calculate an empirically justified rank ordering of the clusters in both languages.|000|phonotactics, rank ordering, German, Polish 1266|Bentz2015|Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.|000|correlational studies, morphological complexity, word family, linguistic complexity 1267|Jacques2013|The sound change *s > n in initial position in Arapaho is unparalleled in the world’s languages, and previous attempts at explaining it have failed to produce a convincing account of its intermediate stages. This article presents two hypotheses to account for the correspondence between PA *s‐ and Arapaho n‐, taking into account not only the individual steps of this particular proto-phoneme, but the evolution of the whole consonant system. It shows that the change *s > n in ini- tial position only appears to reflect an unnatural development: it can in fact be explained in terms of a sequence of natural changes and mergers.|000|sound change, rare change processes 1268|Kazmierski2015|Looking at the fate of the concept of exaptation in historical linguis- tics, this article attempts an extension of exaptation from morphosyntactic change to phonological change. It argues that explicit recognition of the links between language change and other manifestations of Darwinian evolution can provide a context in which the use of this concept might be justified. First, an overview of the applications of exaptation in linguistics is provided (Section 2). Next, the historical data, that is the raisings of the close–mid long vowels as part of the Great Vowel Shift, as well as the lowerings of the short vowels as part of the Short Vowel Shift, adduced in this paper to verify the usefulness of exapta- tion in studying sound change are presented (Section 3). Consequently, two ways in which exaptation can be applied in the analysis of these data are presented: first (Section 4.1), a superficially evolutionary approach, which treats exaptation as a biologically inspired metaphorical label, and second (Section 4.2), a strictly evolutionary approach, which goes beyond metaphorical exten- sions of biological terms to linguistics, and which instead treats languages as truly evolutionary systems.|000|exaptation, sound change, phonological change, biological parallels 1269|Joseph2014|I discuss here various ways in which one might devise a counting heuristic for gram- maticalization with an eye to testing the quantificational claims that have been made against specific implementations of such a heuristic. More specifically, I address the question of grammaticalization as a phenomenon of individuals versus a phe- nomenon of speech communities versus a phenomenon of languages. Similarly, I hope to show, once the individual versus group issue is dealt with, that by adopting Haspelmath’s (2004) definition of grammaticalization as the tightening of internal dependencies, and thus a weakening of boundaries, between elements, we are in a better position to undertake a census since linguists have developed a reasonable idea of the sort of grammatical boundaries that need to be posited (word bound- aries, clitic boundaries, morpheme boundaries, phoneme-to-phoneme transitions, etc.). Further, this view generalizes to offer a solution to the problematic notion of gradience in grammaticalization – cf. Kuryłowicz’s famous definition of grammat- icalization as taking in movement from “less” to “more” grammatical – since lin- guists have long posited a hierarchy of boundary strength that can be appealed to.|000|grammaticalization, Greek, gradience, 1270|Huening2014|The rise of new derivational affixes can be analyzed adequately as a case of “construc- tionalization” within the framework of Construction Morphology as developed by Booij (2010). We review some aspects and problems of previous accounts that view the emergence of derivational affixes as a case of grammaticalization or as a case of lexicalization, respectively. In line with recent developments in grammatical- ization research, not the isolated element (word or affix) is viewed as the locus of change, but the complex word as a whole – seen as a “construction” in the sense of Construction Grammar – and its relation with other constructions. Morphological change can be conceived as constructional change at the word level.|000|compounding, derivation, word formation, 1271|Cabrera2012|Sound symbolism seems to be a good candidate for a linguistic universal. It can be observed in many languages of the world. 1 In addition, no known present or past language seems to be completely based on sound–symbolic associations, since such associations are very limited and cannot be used to create a large vocabulary. Nevertheless, the persistence of this feature in the languages of the world and its alleged naturalness could be easily explained if we assume that sound-symbolism is a remnant of some earlier stage of human language. In this paper, I will first sketch the role of sound symbolism in the origins of vocabulary following the proposals made by M. Swadesh in the early seventies and then I will provide linguistic and archaeological data suggesting the important role of sound symbolism in the final stages of human symbolic mind evolution. In order to do this I will re- interpret some of the so-called global etymologies proposed by Bengtson and Ruhlen 1994 as instances of sound-symbolic associations, and I will relate the entities denoted by these etymologies to some of the engraved or painted symbols characterizing the artistic activity of the Upper Paleolithic period.|000|sound symbolism, Swadesh list 1272|Kilani2015|This paper presents an extension of Baxter & Manaster-Ramer’s (2000) approach to the problem of false cognates in the determination of relationships between languages. Their approach uses a Monte Carlo simulation to estimate how many lexical similarities we can expect to be due to chance between two lexical lists from different languages, and consequently how many are too many to be all false cognates. Although very efficient, their model has the shortcoming of being applicable only to simple lexical lists such as the Swadesh list, with one-to-one semantic correspondences between the individual terms. Here I present a new model that can be applied to any kind of word list, and can include comparisons between multiple terms sharing the same semantic field. After a theoretical description, a controlled test and a contra-test, I finally apply the method to a real test case, investigating the probability of relation between Pre-Greek, the non‐Indo-European substrate of classical Greek, and Proto-Basque, Proto-Uralic and ‘Proto-Altaic’.|000|sound classes, significance, genetic relationship, proof of relationship 1273|Kilani2015|In particular, taking the two English and Hindi lists from Table 2, the likelihood of a relationship between them can be tested with a procedure comparing the number of phonetic matches between terms belonging to the same semantic fields in the real lists with the number of matches obtained in 1000 trials with randomly generated lists. The lists used in the trials can be generated by randomly shuffling the terms of the two original lists and redistributing them into the various semantic fields. Note that the semantic proportions of the original lists have to be respected, so if a specific semantic field contains 2 terms in the original list, it will contain only 2 terms also in its randomly generated counterpart.|338|Monte-Carlo permutation, sound classes, proof of relationship 1274|Pelkey2015b|Language varieties undergo constant evolution, as do varieties of life. Both language and life, in other words, unfold by semiosis – pervasive processes of growth in which relationships shared between the inherited past, the unstable present and the virtual future are organically intertwined. Although many recent attempts have been made to reunite biotic and linguistic evolution, contemporary treatments are mired in unexamined presuppositions inherited from 20 th century biological theory. Chief among these is the denial of implicit end-directed processes, that which biosemiotics finds to be the necessary condition of living systems – thereby providing semiotic foundations for human inquiry. After reviewing the history and problems of dialogue between linguistics and biology, I make two primary arguments in this essay, one a critique using historical evidence, the other a suggestion using empirical evidence. My critical argument is that crucial features of semiosis are missing from contemporary linguistic-biotic proposals. Entangled with these missing accounts is an analogous form of neglect, or normative blindness, apparent in both disciplines: the role of ontogeny in biological evolution and the role of diagrammatization in linguistic evolution. This linguistic- biotic analogy points to a deeper congruence with the third (and most fundamental) mode of evolution in Peirce’s scientific ontology: “habit taking” or “Agapasm”. My positive argument builds on this linguistic- biotic analogy to diagram its corollary membership in light of Peirce’s “three modes of evolution”: Chance (Tychasm), Law (Anancasm) and Habit Taking (Agapasm). The paper ends with an application involving complex correspondence patterns in the Muji language varieties of China followed by an appeal for a radically evolutionary approach to the nature of language(s) in general, an approach that not only encompasses both linguistic and biotic growth but is also process-explicit.|000|biological parallels, language evolution, biological evolution 1275|Pelkey2015b|=================================== ============================== Biology Linguistics =================================== ============================== Discrete characters Lexicon, syntax, and phonology Homologies Cognates Mutation Innovation Drift Drift Natural selection Social selection Cladogenesis Lineage splits Horizontal gene transfer Borrowing Vegetative hybrids Language Creoles Correlated genotypes / phenotypes Correlated cultural terms Geographic clines Dialects / dialect chains Fossils Ancient texts Extinction Language death =================================== ============================== :comment:`The page number is not original, but taken from the manuscript`|6|biological parallels, analogy, language evolution, biological evolution 1276|Huelsenbeck2002|Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution. We perform simulation analyses to examine the relative ability of these three criteria to correctly identify the root of the tree. The outgroup and molecular clock criteria were best able to identify the root of the tree, whereas the nonreversible model was able to identify the root only when the substitution process was highly nonreversible. We also examined the performance of the criteria for a tree of four species for which the topology and root position are well supported. Results of the analyses of these data are consistent with the simulation results.|000|rooting of phylogenetic trees, molecular clock, outgroup, phylogenetic reconstruction 1277|Love2004|There is arguably a parallel between recent ideas within cognitive science about the distrib- uted mind and the development within linguistics known as integrationism, turning on simi- larities between the critique offered by the former of the ÔclassicalÕ view of mind and by the latter of the ÔclassicalÕ view of language. However, at the heart of the integrationist attack on the classical view of language is rejection of the idea that natural languages are codes. This idea appears to be taken for granted by certain cognitive scientists as the basis for explaining not only how language is mentally apprehended by the individual, but also how it facilitates Ôsecond-order cognitionÕ. It is suggested that the language-as-code idea, although prima facie endowed with the attractiveness of common sense, is untenable, and should not figure, at least in the role usually assigned to it, in any inquiry into either language or human cognition in general.|000|language as code, code, theory of mind, cognition, language and mind 1278|Vylder2008|A lot of conventions emerge in gradual stages without being centrally imposed. The most significant and complex example in our human society is undoubtedly human language which evolved according to our need for communication. Also in artificial multi-agent systems, e.g. mobile robots or software agents, it is often desirable that agents can reach a convention in a distributed way. To make this possible, it is important to have a sound grasp of the mechanism by which conventions arise. In this thesis we define a theoretical framework that enables us to examine this process carefully. We make a strict distinction between the description of the convention problem on the one hand and the solution to this problem in terms of an agent design on the other. A convention problem specifies the preconditions any type of agent must comply with. This includes (i) the space of alternatives from which the convention is to be chosen, (ii) the interaction model between the agents, which determines which agents interact at what time and (iii) the amount, nature and direction of information transmitted between the agents during an interaction. A particular agent design solves a convention problem if a population of such agents will reach an agreement in a reasonable time, under the given restrictions. We focus on the class of convention problems with a global interaction model: every agent is equally likely to interact with any other agent. We argue that for these convention problems the performance of an agent can be predicted by inspecting the properties of the agent’s response function. This response function captures the average behavior of an agent when interacting with agents from a non-changing population. We apply this analytical technique to different sorts of convention problems. For the more simple convention problems we define general, sufficient properties which guarantee that a convention will arise after a certain amount of time when an agent possesses these. For the more difficult convention problems we confine ourselves to the construction of agents who, we can show, will solve the problem. Finally, our framework is applied to the problem of language evolution in artificial agents. This is a complicated domain for which precise mathematical results are very difficult to obtain. We will focus on the naming game, a rela- tively simple instance in the paradigm of languages games. In certain instances our analysis will surface problems of convergence that have not been noticed before. This shows on the one hand that it is important to theoretically sub- stantiate computer experiments in language evolution and on the other that the framework introduced in this thesis is very suitable to this extent.|000|multi-agent system, artificial agents, language origin, language model 1279|Tomasello1999|Human beings are biologically adapted for culture in ways that other primates are not, as evidenced most clearly by the fact that only human cultural traditions accumulate modifications over historical time (the ratchet effect). The key adaptation is one that enables individuals to understand other individuals as intentional agents like the self. This species-unique form of social cognition emerges in human ontogeny at approximately 1 year of age, as infants begin to engage with other persons in various kinds of joint attentional activities involving gaze following, social referencing, and gestural communication. Young children’s joint attentional skills then engender some uniquely powerful forms of cultural learning, enabling the acquisition of language, discourse skills, tool-use practices, and other conventional activities. These novel forms of cultural learning allow human beings to, in effect, pool their cognitive re- sources both contemporaneously and over historical time in ways that are unique in the animal kingdom.|000|ratchet-like process, human language, cultural evolution, language learning, language acquisition 1280|Vieu2015|In this paper we report the results of four exper- iments conducted to extract lists of nouns that exhibit inherent polysemy from corpus data following semiautomatic and auto- matic procedures. We compare the methods used and the results obtained. We argue that quantitative methods can be used to distinguish different classes of polysemous nouns in the language on the basis of the variability of copredication contexts.|000|polysemy, polysemy detection, semantic change 1281|Barbieri2008|Biosemiotics is the idea that life is based on semiosis, i.e., on signs and codes. This idea has been strongly suggested by the discovery of the genetic code, but so far it has made little impact in the scientific world and is largely regarded as a philosophy rather than a science. The main reason for this is that modern biology assumes that signs and meanings do not exist at the molecular level, and that the genetic code was not followed by any other organic code for almost four billion years, which implies that it was an utterly isolated exception in the history of life. These ideas have effectively ruled out the existence of semiosis in the organic world, and yet there are experimental facts against all of them. If we look at the evidence of life without the preconditions of the present paradigm, we discover that semiosis is there, in every single cell, and that it has been there since the very beginning. This is what biosemiotics is really about. It is not a philosophy. It is a new scientific paradigm that is rigorously based on experimental facts. Biosemiotics claims that the genetic code (1) is a real code and (2) has been the first of a long series of organic codes that have shaped the history of life on our planet. The reality of the genetic code and the existence of other organic codes imply that life is based on two fundamental processes—copying and coding—and this in turn implies that evolution took place by two distinct mechanisms, i.e., by natural selection (based on copying) and by natural conventions (based on coding). It also implies that the copying of genes works on individual molecules, whereas the coding of proteins operates on collections of molecules, which means that different mechanisms of evolution exist at different levels of organization. This review intends to underline the scientific nature of biosemiotics, and to this purpose, it aims to prove (1) that the cell is a real semiotic system, (2) that the genetic code is a real code, (3) that evolution took place by natural selection and by natural conventions, and (4) that it was natural conventions, i.e., organic codes, that gave origin to the great novelties of macroevolution. Biological semiosis, in other words, is a scientific reality because the codes of life are experimental realities. The time has come, therefore, to acknowledge this fact of life, even if that means abandoning the present theoretical framework in favor of a more general one where biology and semiotics finally come together and become biosemiotics.|000|systemic evolution, biosemiotics, semiotics, code, language as code 1282|Barbieri2011|Modern biology has not yet come to terms with the presence of many organic codes in Nature, despite the fact that we can prove their existence. As a result, it has not yet accepted the idea that the great events of macroevolution were associated with the origin of new organic codes, despite the fact that this is the most parsimonious and logical explanation of those events. This is probably due to the fact that the existence of organic codes in all fundamental processes of life, and in all major transitions in the history of life, has enormous theoretical implications. It requires nothing less than a new theoretical framework, and that kind of change is inevitably slow. There are too many facts to reconsider, too many bits of history to weave together in a new mosaic. But this is what science is about, and the purpose of the present paper is to show that it can be done. More precisely, it is shown that the whole natural history of the brain can be revisited in the light of the organic codes. What is described here is only a bird’s-eye view of brain macroevolution, but it is hoped that the extraordinary potential of the organic codes can nevertheless come through. The paper contains also another message. The organic codes prove that life is based on semiosis, and are in fact the components of organic semiosis, the first and the most diffused form of semiosis on Earth, but not the only one. It will be shown that the evolution of the brain was accompanied by the development of two new types of sign processes. More precisely, it gave origin first to interpretive semiosis, mostly in vertebrates, and then to cultural semiosis, in our species.|000|cultural semiosis, semiotics, biosemiotics, evolution, evolution of the brain, system modelling 1283|Galik2013|Biosemiotics is a new approach to the explanation of living. The central thesis of biosemiotics, “life is semiosis”, is a basis for a new science of living which should replace contemporary (or traditional) biology. The reason is that biosemiotics reveals new qualities of living, which are unaccessible through the methods of contemporary, pure empirical biology. The paper outlines basic theses of biosemiotics, distinguishes two main approaches, and challenges the central thesis with the focus upon its interpretation in “scientific” biosemiotics.|000|biosemiotics, semiotics, analogy, biological parallels 1284|Franco2015|In this paper, we investigate the influence of semantic concept features on lexical geographical variation. More specifically, we take an onomasiological approach to inquire into the effect of concept vagueness, salience, affect and semantic field. We use quantitative operationalizations of these features as predictors in a linear regression analysis. Our response variable is a composite variable that takes into account the number of variants per concept and the degree to which the concepts are scattered across geographical space in a heterogeneous way. Our model reveals that vaguer, less salient and non-neutral concepts show significantly more variation and that the lexical variants for these concepts are scattered across geographical space in a less homogeneous way. We also find differences between semantic fields.|000|semantics, concepts, lexical variation, geographic variation, onomasialogical approach, onomasiology 1285|Griffiths2015|Zellig Harris proposed a method for grouping phonemes in an utterance into morphemes by simply using counts of each of the phonemes in a corpus relative to their position in sequences contained in the data set. Thus, using an n-gram model, one can model this process and see whether a computational model can actually group representations of phonemes into segments which correspond to morphemes. Here, we use a general n-gram modelling tool created for melodic grouping in music corpora and apply it to a natural language data set. We show that this method which approximates Harris’s can indeed find morphemes in a given language corpus by calculating the distributions of phonemes across a corpus.|000|n-gram model, phoneme detection, phoneme, phoneme sequence, morpheme, morpheme detection, automatic approach, language model 1286|Barbieri2008b|The existence of different types of semiosis has been recognized, so far, in two ways. It has been pointed out that different semiotic features exist in different taxa and this has led to the distinction between zoosemiosis, phytosemiosis, mycosemiosis, bacterial semiosis and the like. Another type of diversity is due to the existence of different types of signs and has led to the distinction between iconic, indexical and symbolic semiosis. In all these cases, however, semiosis has been defined by the Peirce model, i.e., by the idea that the basic structure is a triad of ‘sign, object and interpretant’, and that interpretation is an essential component of semiosis. This model is undoubtedly applicable to animals, since it was precisely the discovery that animals are capable of interpretation that allowed Thomas Sebeok to conclude that they are also capable of semiosis. Unfortunately, however, it is not clear how far the Peirce model can be extended beyond the animal kingdom, and we already know that we cannot apply it to the cell. The rules of the genetic code have been virtually the same in all living systems and in all environments ever since the origin of life, which clearly shows that they do not depend on interpretation. Luckily, it has been pointed out that semiosis is not necessarily based on interpretation and can be defined exclusively in terms of coding. According to the ‘code model’, a semiotic system is made of signs, meanings and coding rules, all produced by the same codemaker, and in this form it is immediately applicable to the cell. The code model, furthermore, allows us to recognize the existence of many organic codes in living systems, and to divide them into two main types that here are referred to as manufacturing semiosis and signalling semiosis. The genetic code and the splicing codes, for example, take part in processes that actually manufacture biological objects, whereas signal transduction codes and compartment codes organize existing objects into functioning supramolecular structures. The organic codes of single cells appeared in the first three billion years of the history of life and were involved either in manufacturing semiosis or in signalling semiosis. With the origin of animals, however, a third type of semiosis came into being, a type that can be referred to as interpretive semiosis because it became closely involved with interpretation. We realize in this way that the contribution of semiosis to life was far greater than that predicted by the Peirce model, where semiosis is always a means of interpreting the world. Life is essentially about three things: (1) it is about manufacturing objects, (2) it is about organizing objects into functioning systems, and (3) it is about interpreting the world. The idea that these are all semiotic processes, tells us that life depends on semiosis much more deeply and extensively than we thought. We realize in this way that there are three distinct types of semiosis in Nature, and that they gave very different contributions to the origin and the evolution of life.|000|semiosis, semiotics, biological parallels, biosemiotics 1287|Hoffmeyer2015|A central idea in biosemiotic writings has been the idea of growth in semiotic freedom as a persistent trend in evolution (Hoffmeyer 1992). By semiotic freedom we mean the capacity of species or organisms to derive useful information by help of semiosis or, in other words, by processes of interpretation in the widest (Peircean) sense of this term. While even bacteria have a certain very limited ability to interpret cues in the medium this ability obviously becomes more developed in more complex organisms, and is typically most developed in big-brained animals that are late arrivals at the evolutionary scene. The evolution of a richer semiotic capacity is of course only one among many strategies available in the evolutionary game. Yet, this particular strategy potentially ignites a self-perpetuating evolutionary dynamics, since each step taken by a species along this route potentially opens new agendas for further change: the more capable some species are of anticipating and interpreting complex and fast- changing situations or events, the more will evolution favor the development in other species of a well-adjusted set of semiotic tools.|000|semiotics, scaffolding, biosemiotics, biological parallels 1288|Boettiger2014|As computational work becomes more and more integral to many aspects of scientific research, computational repro- ducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be suc- cessfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a ‘DevOps’ philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.|000|reproducibility, research, Docker, software 1289|Clark2014|Embodied agents use bodily actions and environmental interventions to make the world a better place to think in. Where does language fit into this emerging picture of the embodied, ecologically efficient agent? One useful way to approach this question is to consider language itself as a cognition-enhancing animal-built structure. To take this perspective is to view language as a kind of self-constructed cognitive niche: a persisting but never stationary material scaffolding whose crucial role in promoting thought and reason remains surprisingly poorly understood. It is the very materiality of this linguistic scaffolding, I suggest, that gives it some key benefits. By materializing thought in words, we create structures that are themselves proper objects of percep- tion, manipulation, and (further) thought|000|embodiment, language, artificial agents, language origin 1290|Jacques2016|This paper (blogpost) is very interesting, since it deals with the question of how one can choose from the set of all languages of a recognized family for a smaller subset that is useful for reconstruction. |000|linguistic reconstruction, language selection, methodology, comparative method 1291|Stern2015|Historical Linguistics studies language change over time. If a group of languages derives from changes to a common ancestor language (proto-language) then they are said to be related. Whenever there exists a lack of written records for an ancestor language, a relevant question in Historical Linguistics is to determine whether two languages are related. The gold standard for finding these relationships is the Comparative Method. Despite the success of the Comparative Method in finding language relationships, it suffers from at least two limitations. First, the Comparative Method involves the manual comparison of various features from a group of languages. Second, the Comparative Method doesn’t provide a numerical measure of evidence for how much the database under consideration corroborates an hypothesis. Given the above limitations, the field of Computational Historical Linguistics is presented as a complement to the Comparative Method. This field has experienced a recent expansion with the adaptation of methods from biological phylogenetics. Nevertheless, there is debate whether the evolutionary models used in phylogenetics also incorporate valid linguistical assumptions. In this thesis, I propose a new (probability) model for the evolution of the phonology of languages. A relevant innovation of this model is that it captures the regularity of sound changes. I also describe the software that I created in order to assist in incorporating into the model a linguist’s expert knowledge. I show that the knowledge obtained in this way agrees with qualitative statements known to linguists. Finally, I present a new algorithm used to compute the probability of linguistic hypotheses regarding language relationships and the occurence of regular sound changes. The main problem that this algorithm overcomes is that it efficiently explores the possible regular sound changes, mutations in languages that simultaneously affect several words. In order to overcome this challange, I present a new variant of Nested Sequential Monte Carlo that is used to explore the large space of language relationships and regular sound changes. To the best of my knowledge, this is the first algorithm that can perform joint inference on regular sound changes and evolutionary trees. I show that this algorithm is a special case of Sequential Importance Resampling. |000|historical linguistics, computational approaches, Monte-Carlo permutation, probability models, Kessler's approach 1292|MacklinCordes2015|Typological datasets for quantitative historical- linguistic inquiry are growing in breadth, but a challenge is also to increase their depth, since advanced methods often ideally require many hundreds of traits per language. Using biphone transition probabilities from phonemicized vocabulary data, we extract several hundred high-definition phonotactic traits per language, for 17 languages in the Ngumpin-Yapa and Yolngu subgroups of the Pama-Nyungan family, Australia. We detect phylogenetic signal at a significant level (p < 0.001 for both subgroups), measured against a reference phylogeny inferred from basic vocabulary cognacy data. This contrasts with simpler, binary coding of biphones’ occurrence, which provides insufficient detail for the detection of phylogenetic signal. Thus, we demonstrate the viability of a new method in quantitative historical linguistics, and emphasize the inferential power to be harnessed from high-definition, trait-rich datasets for comparative research. :comment:`Published in the QITL proceedings`|000|phonotactics, phylogenetic reconstruction, sound inventories 1293|Ng2015|This dissertation identifies three previously unexplained typological differences between creoles, other types of language contact, and ‘normal’ sound change. (1) The merger gap: French /y/ merges with /i/ in all creoles worldwide, whereas merger with /u/ is also well-attested in other forms of language contact. The rarity of /u/ outcomes in French creoles is unexplained, especially because they are well attested in French varieties spoken in West Africa. (2) The assimilation gap: In creoles the quality of the stressed vowel often spreads to unstressed vowels, e.g. Spanish dedo > Papiamentu /dede/ ‘finger’. Strikingly, we do not find the opposite in creoles, but it is well attested among non-creoles, e.g. German umlaut and Romance metaphony. (3) The epenthesis gap: Word-final consonants are often preserved in language contact by means of vowel insertion (epenthesis), e.g. English big > Sranan bigi, but in normal language transmission this sound change is said not to occur word-finally. These three case studies make it possible to test various theories of sound change on new data, by relating language contact outcomes to the phonetics of non-native perception and second language speech production. I also explore the implications of social interactions and historical developments unique to creolisation, with comparisons to other language contact situations. Based on the typological gaps identified here, I propose that sociohistorical context, e.g. age of learner or nature of input, is critical in determining linguistic outcomes. Like phonetic variation, it can be biased in ways which produce asymmetries in sound change. Specifically, in language contact dominated by adult second language acquisition, we find transmission biases towards phonological rather than perceptual matching, overcompensation for perceptual weakness, and overgeneralisation of domain-final prominence. |000|creolisation, creole languages, sound change, 1294|Alva2015|The seemingly limitless diversity of proteins in nature arose from only a few thousand domain prototypes, but the origin of these themselves has remained unclear. We are pursuing the hypothesis that they arose by fusion and accretion from an ancestral set of peptides active as co-factors in RNA-dependent replication and catalysis. Should this be true, contemporary domains may still contain vestiges of such peptides, which could be reconstructed by a comparative approach in the same way in which ancient vocabularies have been reconstructed by the comparative study of modern languages. To test this, we compared domains representative of known folds and identified 40 fragments whose similarity is indicative of common descent, yet which occur in domains currently not thought to be homologous. These fragments are widespread in the most ancient folds and enriched for iron-sulfur- and nucleic acid-binding. We propose that they represent the observable remnants of a primordial RNA-peptide world.|000|domains, partial homology, gene fusion, biological parallels, analogy 1295|Alva2015|Today, for instance, it is known that hundreds of words in European languages contain the conserved Semitic root qnw (*qanaw-) meaning ‘reed’ (Huehnergard, 2011), despite it having diversified into a wide range of functional forms by the same processes as already familiar from biological evolution (e.g., orthology, paralogy, horizontal transfer).|1-3|biological parallels, paralogy, orthology, homology, cognacy, word formation, word family 1296|Bentz2016|The quantitative measurement of language complexity has witnessed a recent rise of interest, not least because language complexities reflect the learning constraints and pressures that shape languages over historical and evolutionary time. Here, an information-theoretic account of mea- suring language complexity is presented. Based on the entropy of word frequency distributions in parallel text samples, the complexities of overall 646 languages are estimated. A large-scale finding of this analysis is that languages just above the equator exhibit lower complexity than languages further away from the equator. This geo-spatial pattern is here referred to as the Low-Complexity-Belt (LCB). The statistical significance of the positive latitude/complexity re- lationship is assessed in a linear regression and a linear mixed-effects regression, suggesting that the pattern holds between different families and areas, but not within different families and areas. The lack of systematic within-family effects is taken as potential evidence for a phylogenetically “deep” explanation. The pressures shaping language complexities probably pre-date the expansion of language families from their proto-languages. Large-scale prehistoric contact around the equator is tentatively given as a possible factor involved in the evolution of the LCB.|000|linguistic complexity, AUTOTYP, expansion, ancestral languages, pre-modern languages 1297|Naccache2016|The field of Language Evolution is at a stage where its speed of growth and diversification is blurring the image of the origin of language, the “prime problem” at its heart. To help focus on this central issue, we take a step back in time and look at the logical analysis of it that Edward Sapir presented nearly a century ago. Starting with Sapir’s early involvement with the problem of language origin, we establish that his analysis of language is still congruent with today’s thinking, and then show that his insights into the origin of language still carry diagnostic and heuristic value today.|000|origin of language, Edward Sapir, 1298|Longobardi2016|Since its launch in 2007, the Automated Similarity Judgment Program has collected basic vocabulary lists from more than 6,000 languages and dialects, covering close to two thirds of the world’s languages. Using these data and phylogenetic techniques from computational biology, such as weighted sequence alignment and distance-based phylogenetic inference, we computed a phylogenetic language tree covering all continents and language families. Our method relies on word lists in phonetic transcription only, i.e. it does not rely on expert cognacy judgments. This decision enabled us to perform inference across the boundaries of language families. The world tree of languages thus obtained largely recaptures the established classification of languages into families and their sub-groupings. Additionally it reveals intriguing large-scale pat- terns pointing at a statistical signal from deep time.|000|ASJP, T-Coffee, phonetic alignment, grammar, phylogenetic reconstruction 1299|Jaeger2016a|Since its launch in 2007, the Automated Similarity Judgment Program has collected basic vo- cabulary lists from more than 6,000 languages and dialects, covering close to two thirds of the world’s languages. Using these data and phylogenetic techniques from computational biology, such as weighted sequence alignment and distance-based phylogenetic inference, we computed a phylogenetic language tree covering all continents and language families. Our method relies on word lists in phonetic transcription only, i.e. it does not rely on expert cognacy judgments. This decision enabled us to perform inference across the boundaries of language families. The world tree of languages thus obtained largely recaptures the established classification of lan- guages into families and their sub-groupings. Additionally it reveals intriguing large-scale pat- terns pointing at a statistical signal from deep time.|000|tree of language, phylogenetic reconstruction, ASJP 1300|Zou2016|Morphological decomposition is an important part of complex word processing. In Chinese, this requires a comprehensive consideration of phonological, orthographic and morphemic information. The left inferior frontal gyrus (L-IFG) has been implicated in this process in alphabetic languages. However, it is unclear whether the neural mechanisms underlying morphological processing in alphabetic languages would be the same in Chinese, a logographic language. To investigate the neural basis of morphological processing in Chinese compound words, an fMRI experiment was conducted using an explicit auditory morphological judgment task. Results showed the L-IFG to be a core area in Chinese morphological processing, consistent with research in alphabetic languages. Additionally, a broad network consisting of the L-MTG, the bilateral STG and the L-FG that taps phonological, orthographic, and semantic information was found to be involved. These results provide evidence that the L-IFG plays an important role in morphological processing even in languages that are typologically different.|000|morphological decomposition, morphological processing, morpheme, 1301|Nettle1998|The six and a half thousand languages spoken by humankind are very unevenly distributed across the globe. Language diversity generally increases as one moves from the poles toward the equator and is very low in arid environments. Two belts of extremely high language diversity can be identified. One runs through West and Central Africa, while the other covers South and South-East Asia and the Pacific. Most of the world’s languages are found in these two areas. This paper attempts to explain aspects of the global distribution of language diversity. It is proposed that a key factor influencing it has been climatic variability. Where the climate allows contin- uous food production throughout the year, small groups of people can be reliably self-sufficient and so populations fragment into many small languages. Where the variability of the climate is greater, the size of social network necessary for reliable subsistence is larger, and so languages tend to be more widespread. A regression analysis relating the number of languages spoken in the major tropical countries to the variability of their climates is performed and the results support the hypothesis. The geographical patterning of languages has, however, begun to be destroyed by the spread of Eurasian diseases, Eurasian people, and the world economy.|000|language diversity, typology, correlational studies 1302|Pelkey2013|Cross-linguistic evidence from widespread modes of language variation and change demonstrate that language evolution proceeds (at least in part, perhaps in whole) by breaking and renewing symmetrical patterns. Since this activity is identified with semiosis (Nöth 1994, 1998), these patterns-in-process establish further grounds for insisting that the science of language be more adequately situated within semiotic understanding as “an ideoscopic science and sub-discipline under the general doctrine of signs” (Deely 2012: 334). After summarizing the theoretical context of my thesis, including relationships between analogy, symmetry, and linguistic diagrammatization, I present supporting comparative data in successive stages of complexity, ranging from simple reversals of linguistic diagrams through time to the emergence of more involved linguistic mirror patterns, to the emergence of intertwining diagrams and linguistic fractal symmetries. I then point back to the embodied and psychological sources of these patterns in the primary modeling system of the human Umwelt. The essay ends with gestures toward further unexplored sources of evidence and a summary proposal for understanding language evolution as non-linear process, qua semiosis.|000|symmetricity, chiasmus, semiosis, language variation, language change, directionality 1303|Pelkey2013|:comment:`Sturtevant's Paradox` "Sound change is regular and creates irregularities (in the morphology); Analogy is irregular and creates regularity." :comment:`author quotes after` @Anttila1989 :comment:`94f, and` @Bybee2007 :comment:`958`|59|regular sound change, regularity, analogy, Sturtevant 1304|Pelkey2011|This monograph presents a comparative lexicon of five representative Phula languages: Phola [ypg], Phuza [ypz], Hlepho Phowa [yhl], Southern Muji [ymc] and Azha [aza]. These languages belong to the Southeastern Ngwi branch of Burmic in the Tibeto-Burman family and are spoken in southeastern Yunnan Province, China. Following a brief introduction to the ethnohistory, social geography, linguistic typology and genetic lineage of these languages and their next-of-kin, the lexicon provides over 1,100 comparative entries for each representative lect with Chinese and English glosses organized by semantic domain. Footnotes follow each set of 25 entries page-by-page for the clarification of semantic field ambiguities, usage idiosyncrasies, subtle dialect distinctions and other notes of interest gleaned during elicitation sessions. The primary comparative list is followed by a transposed 660-item list sorted according to Ngwi protoforms (Bradley 1979) for diachronic comparison. These combined wordlists constitute a sampling of the data collected by the author from 2005-2006 in cooperation with the Honghe Nationalities Research Institute, Yuxi Normal University, the Wenshan Zhuang Studies Council, La Trobe University and SIL-International, East Asia Group. The work is intended to serve as a companion to Pelkey (2011), in which historical dialectology is undertaken to operationalize these languages, along with 19 others—validating them in the process as ontogenetic representatives of their respective macro-clades.|000|Phula languages, Tibeto-Burman, Sino-Tibetan, etymology, word list 1305|Moore2012|The wealth of available genomic data presents an unrivaled opportunity to study the molecular basis of evolution. Studies on gene family expansions and site-dependent analyses have already helped establish important insights into how proteins facilitate adaptation. However, efforts to conduct full-scale cross-genomic comparisons between species are challenged by both growing amounts of data and the inherent difficulty in accurately inferring homology between deeply rooted species. Proteins, in comparison, evolve by means of domain rearrangements, a process more amenable to study given the strength of profile-based homology inference and the lower rates with which rearrangements occur. However, adapting to a constantly changing environment can require molecular modulations beyond reach of rearrangement alone. Here, we explore rates and functional implications of novel domain emergence in contrast to domain gain and loss in 20 arthropod species of the pan- crustacean clade. Emerging domains are more likely disordered in structure and spread more rapidly within their genomes than established domains. Furthermore, although domain turnover occurs at lower rates than gene family turnover, we find strong evidence that the emergence of novel domains is foremost associated with environmental adaptation such as abiotic stress response. The results presented here illustrate the simplicity with which domain-based analyses can unravel key players of nature’s adaptational machinery, complementing the classical site-based analyses of adaptation.|000|domains, domain loss, domain emergence, composite genes 1306|Moore2012|A domain arrangement is defined as the linear combina- tion of domains in a protein. To avoid overestimating the number of unique arrangements an emerging domain can be found in, we collapsed repeats to a single instance as copy number variation in repeats can occur between even closely related species.|789|domains, domain arrangement 1307|Moore2012|We used Dollo parsimony (Farris 1977) for prediction of an- cestral domain contents. The assumption underlying the use of Dollo parsimony is that domains are gained only once and that number of losses required to explain domain contents at nodes is minimized. Under Dollo, domain gain events will tend to occur early and will be offset by a large number of domain loss events. However, we consider Dollo parsimony as used here sufficiently robust. First, in this study we do not consider copy number variation; we consider only the binary state, presence or absence, of a given domain in any given node. Hence, a domain can only be lost along a branch if 1) it has been gained at an ancestral node to the branch considered and 2) not a single copy is present in the descendant node (or its subtree). Second, in most cases, domains represent the functional unit within a given pro- tein. As horizontal transfer of genetic material within eu- karyotes can at least be considered rare, gain events of such functional modules would imply de novo formation.|788|domains, Dollo model, Dollo parsimony, domain emergence, domain loss 1308|Nasir2014|Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto- eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.|000|domains, domain loss, domain emergence 1309|Nasir2014|Proteins are made up of well-packed structural units referred to as domains. Domain structure in proteins is responsible for protein function and is evolutionarily conserved. Here we report global patterns of protein domain gain and loss in the three superkingdoms of life. We reconstructed phylogenetic trees using domain fold families as phylogenetic characters and retraced the history of character changes along the many branches of the tree of life. Results revealed that both domain gains and losses were frequent events in the evolution of cells. However, domain gains generally overshadowed the number of losses. This trend was consistent in the three superkingdoms. However, the rate of domain discovery was highest in akaryotic microbes. Domain gains occurred throughout the evolutionary timeline albeit at a non- uniform rate. Our study sheds light into the evolutionary history of living organisms and highlights important ongoing mechanisms that are responsible for secondary evolutionary adaptations in the three superkingdoms of life.|2|definition, domains, domain emergence, domain loss 1310|Tordai2005|Originally the term ‘protein module’ was coined to distinguish mobile domains that frequently occur as building blocks of diverse multidomain proteins from ‘static’ domains that usually exist only as stand-alone units of single-domain proteins. Despite the widespread use of the term ‘mobile domain’, the distinction between static and mobile domains is rather vague as it is not easy to quantify the mobility of domains. In the present work we show that the most appropriate measure of the mobility of domains is the number of types of local environments in which a given domain is pre- sent. Ranking of domains with respect to this parameter in different evo- lutionary lineages highlighted marked differences in the propensity of domains to form multidomain proteins. Our analyses have also shown that there is a correlation between domain size and domain mobility: smaller domains are more likely to be used in the construction of multidomain pro- teins, whereas larger domains are more likely to be static, stand-alone domains. It is also shown that shuffling of a limited set of modules was facilitated by intronic recombination in the metazoan lineage and this has contributed significantly to the emergence of novel complex multidomain proteins, novel functions and increased organismic complexity of metazoa.|000|domains, domain structure, 1311|Tordai2005|The average size of a protein domain of known crystal structure is about 175 residues; proteins that are larger than 200–300 residues usually consist of multiple protein folds [1]. The individual structural domains of such multidomain proteins are defined as compact folds that are relatively independent inasmuch as the interactions within one domain are more significant than with other domains. The individual domains of multidomain proteins usually fold independently of the other domains.|5064|domains, proteins, protein structure 1312|Tordai2005|Some multidomain proteins contain multiple copies of a single type of structural domain, indicating that internal duplication of a gene segment encoding a domain has given rise to such proteins.|5064|domains, protein structure 1313|Tordai2005|@Wolf<1999> et al. [4] have counted the number of different folds in each pro- tein of proteomes of archaea, bacteria and eukarya and the average fraction of the proteins with each given number of domains was calculated. It has been conclu- ded from these analyses that distributions of single-, two-, three-domain, etc., proteins in archaea, bacteria and eukarya is such that each next class (e.g. two- domain proteins vs. single-domain proteins, three- domain proteins vs. two-domain proteins, etc.) contains significantly fewer entries than the previous one. More recent mathematical analyses of the distribution of multidomain proteins according to the number of dif- ferent constituent domains have revealed that their distribution follows a power law, i.e. single-domain proteins are the most abundant, whereas proteins con- taining larger numbers of domain-types are increasingly less frequent. This type of distribution is consistent with a random recombination (joining and breaking) model of evolution of multidomain architectures [6].|5065|power law, Zipf's law, domain structure, 1314|Wolf1999|A sensitive protein-fold recognition procedure was developed on the basis of iterative database search using the PSI-BLAST program. A collection of 1193 position-dependent weight matrices that can be used as fold identifiers was produced. In the completely sequenced genomes, folds could be automatically identified for 20%-30% of the proteins, with 3%-6% more detectable by additional analysis of conserved motifs. The distribution of the most common folds is very similar in bacteria and archaea but distinct in eukaryotes. Within the bacteria, this distribution differs between parasitic and free-living species. In all analyzed genomes, the P-loop NTPases are the most abundant fold. In bacteria and archaea, the next most common folds are ferredoxin-like domains, TIM-barrels, and methyltransferases, whereas in eukaryotes, the second to fourth places belong to protein kinases, beta-propellers and TIM-barrels. The observed diversity of protein folds in different proteomes is approximately twice as high as it would be expected from a simple stochastic model describing a proteome as a finite sample from an infinite pool of proteins with an exponential distribution of the fold fractions. Distribution of the number of domains with different folds in one protein fits the geometric model, which is compatible with the evolution of multidomain proteins by random combination of domains. [Fold predictions for proteins from 14 proteomes are available on the World Wide Web at. The FIDs are available by anonymous ftp at the same location.]|000|protein structure 1315|Basu2009|A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or ‘promiscuous’). These promiscuous domains are typically involved in protein ^ protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.|000|mobile domain, domains, proteins, domain mobilty 1316|Basu2009|Protein domains are the structural and functional units of proteins. It is now well established that proteins carry out their functions primarily through their constituent domains. They can be gained by proteins to acquire new function. Domains are, therefore, considered to be the units through which proteins evolve. In structural biology, domains are defined as independent folding units in a protein. However, domains are generally identified as highly conserved regions of the protein sequence. This apparent contradiction in definition of protein domain disappears upon scrutiny: domains identified by sequence conservation alone have been shown to have distinct structural identity [1, 2].|205|domains, proteins, protein structure, definition 1317|Basu2009|The number of unique domains in an organism is roughly proportional to its genome size. In unicellular eukaryotes, such as apicomplexans, diplo- monads and protozoans, the unique number of domains is ~1000, whereas in plants, fungi and animals, the numbers can be as high as ~3000. The average size of domain is ~100 amino acids [11]. The number of domains per gene (modularity) follows the power-law (see below) distribution [12], and it has been shown that tissue-specific genes have higher modularity [12, 13].|206|domains, frequency, power law, statistics 1318|Basu2009|Given the large number of domains present in an organism, the possible combinatorial arrangements are enormous. However, in eukaryotic genomes domains are present only in a limited set of arrangements in multidomain proteins. This suggests that evolutionary constraints play an important role in the selection of domain architectures observed in multidomain proteins [2]. Indeed, domain arrange- ments, even the domain ordering in multidomain proteins, determine their three dimensional arrange- ments, and therefore, might affect function [25]. In earlier studies, it was shown that most of the domain combinations in multidomain proteins have been formed only once in the evolution, and the domain combinations are inherited rather than formed through convergent evolution [14, 26].|206|domain emergence, parallel evolution, homoplasy 1319|Basu2009|However, in a recent study, Forslund and co-workers claimed that convergent evolution is more prevalent than previously thought [27]. They investigated the prevalence of domain architecture reinvention in 96 genomes with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. They detected multiple origins for 12.4% of the architectures. This result indicates that domain architecture reinvention is a much more common phenomenon than previously thought [27]. Thus, it is possible that the process of convergent domain architecture evolution is driven by func- tional necessity.|206f|domain emergence, parallel evolution, homoplasy 1320|Basu2009|Domains are present in various combinations in multidomain proteins. While some domains are present in stable configuration, others are present in many different domain milieus. Promiscuous or mobile domains are domains that reside in many different domain combinations [20, 24, 28]. The term promiscuity carries several connotations when applied to a protein domain. In scientific literature, promiscuity can signify domains with higher degree of mobility (as described above), or domains that physically interact with many other domains (protein–protein interactions), or domains that bind different types of molecules. In this article, the term promiscuous domain will be used to mean mobile domains.|207|definition, mobile domain, promiscuous domain 1321|Basu2009|The power-law has been identified in numerous biological, physical and social contexts, [pb] such as hypertext links in Internet, population distribution is towns, number of reactions in which a particular metabolite is involved, number of pseudogenes in a particular gene family, and many others [31–39]. Two very common versions of the power-law are Zipf’s law, which describes the frequency distribution of words in a text [40] and the Pareto distribution, which describes the distribu- tion of people by wealth [41]. Pareto distribution also led to the famous Pareto principle, which says ‘few contain many and most contain few’ or the so called 80-20 rule. Examples of such rule are 20% of product from a company determines 80% of the return, 20% of the defects caused 80% of the problems, and many others.|207f|domains, statistics, power law 1322|Basu2009|Domain co-occurrence networks also fall under the scale-free category [21]. These networks are graphs in which each node represents a domain, and two nodes are connected by an edge only if they are present is a single protein sequence [20, 21, 43, 44]. In a scale-free network, there are few nodes that are highly connected, but majority of them have low connectivity. Additionally, in a scale-free network, the features of the network and the underlying dis- tribution do not change with the increasing number of nodes. In a protein domain co-occurrence network, promiscuous mobile domains are highly connected nodes or hubs. [...] This type of distribution of connectivity is very different from random network where the connectivity is largely uniform. Moreover, the scale-free nature of such a network is largely assumed to exist due to ‘prefer- ential attachment’, which dictates that the probability of a node acquiring new connections is proportional to its degree (the number of nodes to which a given node is connected). Thus the implication of such connectivity for a domain co-occurrence network is important in showing that domain combinations in proteins are not random and that promiscuous domains have a tendency to become more promiscuous during evolution.|208|domain co-occurrence network, network, co-occurrence, 1323|Basu2009|Although the biological mechanisms that give rise to new domain combinations are largely unknown, several mechanisms have been proposed with anecdotal evidence. Examples of such mechanisms are gene fusion and fission, de novo creation of genes from non-coding elements, and recruitment of the mobile genetic elements [45]. Domains are frequently gained by proteins through insertions at the N or C terminus [46, 47]. Repeated domains can also arise through duplication [48]. Novel structure can also arise due to circular permutation of existing domains [49].|208|domain emergence 1324|Basu2009|To identify promiscuous domain one needs to consider several parameters. Some of these parameters are as follows: (a) other domains that [pb] co-occur with a particular domain in one protein sequence, (b) number of different multidomain architectures in which a domain participates and (c) the abundance of a domain in the genome. Earlier work relied on the parameter (a) to find promiscuous domains. These works made use of the connectivity parameter of domain co-occurrence network to find out promiscuous domains [21, 44].|209f|promiscuous domain, domain co-occurrence network 1325|Basu2009|Note that by definition promiscuous domains co-occur more with other domains, and therefore, are highly connected nodes or hubs in domain- occurrence network. Works that relied on con- nectivity parameters simply identified these highly connected nodes. But relying solely the connectivity parameters is largely misleading, because it is known that many domains, though participating in large multidomain architectures, in fact exist in fewer local contexts [20]. It is, therefore, necessary to consider immediate domain neighbors (domains adjacent to a given domain on a polypeptide sequence) to correctly identify promiscuous domains.|210|promiscuous domain, domain co-occurrence network 1326|Basu2009|In a later study, Tordai and co-workers [20] took this fact into account to identify promiscuous domains by con- sidering ‘domain triplets’, three domains next to each other on a protein sequence. This study identified promiscuous domains as those who participate in many of these triplets. This is akin to using parameter (b). But even this study, which took local environ- ment into account, largely ignored the abundance of domain in the genome [20], a very important criterion to determine domain promiscuity correctly.|210|promiscuous domain, domain co-occurrence network 1327|Basu2009|Recently, we developed a method to objectively measure mobility/promiscuity of a protein domain [28], taking the abundance of a domain into consideration. The method uses techniques from computational linguistics to measure promiscuity from domain co-occurrence. The method, called ‘bigram analysis’, is generally used to find words with more semantic importance in any language [58]. It has also been employed in finding words that are semantically linked to each other. The idea is to count the number of times a pair of words (bigram) occurs in a text (corpus). If a pair occurs less frequently from the background distribution, it carries more semantic information than the others. Additionally, this analysis also points out the words that, by nature, tend to participate is many bigrams and are, therefore, promiscuous.|210|bigram analysis, promiscuous domain, domain co-occurrence network 1328|Basu2009|Promiscuity values of the protein domains can be used as an evolutionary character in eukaryotes. Using parsimony, we reconstructed the evolutionary scenario of promiscuity in the major eukaryotic lineage [28]. We found that promiscuity is a volatile character in evolution. Some evolutionary conserved combinations of domains act as a reservoir from which new lineage-specific domain combinations are created [28]. Over all, very few domains have retained their promiscuity status during evolution.|211|promiscuous domain, phylogenetic reconstruction, 1329|Basu2009| 1. Protein domain promiscuity is a volatile feature in evolution and plays specific functional roles in different phylogenetic lineages. 2. Promiscuous domains are, typically, involved in protein-protein interactions and play crucial roles in interaction networks, particularly those that contribute to signal transduction.  3. Genetic mechanism(s) shaping domain promiscuity is largely unknown, but we have strong evidence of natural selection shaping promiscuity. :comment:`summary taken from authors`|000|protein structure, domains, promiscuous domain, biological parallels, bigram analysis 1330|Alves2009|**Background** Protein domains represent the basic units in the evolution of proteins. Domain duplication and shuffling by recombination and fusion, followed by divergence are the most common mechanisms in this process. Such domain fusion and recombination events are predicted to occur only once for a given multidomain architecture. However, other scenarios may be relevant in the evolution of specific proteins, such as convergent evolution of multidomain architectures. With this in mind, we study glutaredoxin (GRX) domains, because these domains of approximately one hundred amino acids are widespread in archaea, bacteria and eukaryotes and participate in fusion proteins. GRXs are responsible for the reduction of protein disulfides or glutathione-protein mixed disulfides and are involved in cellular redox regulation, although their specific roles and targets are often unclear. **Result** In this work we analyze the distribution and evolution of GRX proteins in archaea, bacteria and eukaryotes. We study over one thousand GRX proteins, each containing at least one GRX domain, from hundreds of different organisms and trace the origin and evolution of the GRX domain within the tree of life. **Conclusion** Our results suggest that single domain GRX proteins of the CGFS and CPYC classes have, each, evolved through duplication and divergence from one initial gene that was present in the last common ancestor of all organisms. Remarkably, we identify a case of convergent evolution in domain architecture that involves the GRX domain. Two independent recombination events of a TRX domain to a GRX domain are likely to have occurred, which is an exception to the dominant mechanism of domain architecture evolution.|000|domains, domain combination, gene fusion, gene shuffling 1331|Bashton2007|During evolution, many new proteins have been formed by the process of gene duplication and combination. The genes involved in this process usually code for whole domains. Small proteins contain one domain; medium and large proteins contain two or more domains. We have compared homologous domains that occur in both one-domain proteins and multidomain proteins. We have determined (1) how the functions of the individual domains in the multidomain proteins combine to produce their overall functions and (2) the extent to which these functions are similar to those in the one-domain homologs. We describe how domain combinations increase the specificity of enzymes; act as links between domains that have functional roles; regulate activity; combine within one chain functions that can act either independently, in concert or in new contexts; and provide the structural framework for the evolution of entirely new functions.|000|domains, domain combination, protein structure, protein functions 1332|Bashton2007|In the previous sections of the paper, we have described in some detail the extent to which the functions found in one- domain proteins are conserved, modified, or changed in homologous domains found in multidomain proteins.|96|domains, protein structure, protein functions 1333|Bashton2007|Here, we have discussed how different combinations of domains from different superfamilies produce new functions. In some ways, this process is analogous to how, in language, word combinations function. |97|biological parallels, protein structure, domains, analogy 1334|Bashton2007|Here, we frequently find that the general function of the homolog of the one-domain protein in the multidomain protein has been conserved but has been modified or made more specific. This is achieved by placing the homologous domain into a new domain context or ‘‘syntax,’’ in which an additional domain serves to expand, alter, or modulate its functionality. Syntax governs the arrangements in a sentence of words that individually have particular meanings and taken together make ‘‘sense’’; this can be modified further by the replacement or the addition of other suitable words. The addition of unsuitable words will produce nonsense.|97|biological parallels, domain combination, syntax, analogy, word combination 1335|Bashton2007|In the small number of cases in which the functions of the domain are totally changed, we find that the common scaffold of the protein domain has been adapted to carry out a different, unrelated reaction or bind a different, unrelated ligand. This is a redesign of the protein’s function through progressive mutation of the domain itself to generate quite different functions. It can be thought of as a change in semantics, i.e., that the function or ‘‘meaning’’ of the domain itself has changed. This is found in words that have quite different meanings according to their context: e.g., ‘‘she is a red’’ (i.e., a communist); ‘‘the pillar-box is red’’ (i.e., is painted red).|97|semantic change, protein functions, domain combination, functional shift 1336|Bashton2007|These features—change in syntax and semantics and the generation of discreteness—are all properties of a natural grammar of domain combination that determines the assembly of functionally coherent combinations of domains and gives rise to more complex protein functions. This grammar is a consequence of the selection of combinations of domains that make ‘‘sense’’ functionally and deletion of those that are ‘‘nonsense.’’|97|grammar, domain combination, syntactic change, semantic change, biological parallels, analogy 1337|Sinsheimer2012|Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) and its extensions, we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root.|000|rooting of phylogenetic trees 1338|Yap2005|**Background:** We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup. **Methods:** Given a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values. Site variation in substitution rates is handled by assigning sites into several classes before the analysis. **Results:** In three test datasets where the trees are small and the roots are assumed known, the nonstationary process gets the correct estimate significantly more often, and fits data much better, than the stationary process. Both processes give biologically plausible root placements in a set of nine primate mitochondrial DNA sequences. Conclusions: The nonstationary process is simple to use and is much better than the stationary process at inferring the root. It could be useful for situations where an outgroup is unavailable.|000|rooting of phylogenetic trees, nonstationary substitution process, stationary substitution process 1339|Barddal2012|That is, in order for a reconstruction to be possible, the input for the relevant correspondence sets must consist of a form side and a meaning side, as only (lexical) items that are inherited from an earlier stage can be cognates, i.e. inherited items with the same form and the same meaning. Therefore, the existence of cognates – as instantiations of form–meaning pairings across related languages – is the foundation of the Comparative Method and all reconstruction.|257|cognacy, syntax, definition 1340|Barddal2012|Moreover, the term cognate, which literally means ‘of common descent, blood relative’ in Latin, has been used in historical-comparative research about structures which “descend from” a corresponding structure in an ancestor language common to two or more daughter languages. Although traditionally this applies to lexical and morphological structures, nothing prevents it from also applying to syntactic structures, such as case and argument structure constructions|266|syntax, cognacy, definition 1341|Walkden2013|While considerable swathes of the phonology and morphology of proto- languages have been reconstructed using the comparative method, syntax has lagged behind. Jeffers (1976) and Lightfoot (2002a), among others, have questioned whether syntax can be reconstructed at all, claiming that a funda- mental problem exists in applying the techniques of phonological reconstruc- tion to syntax. Others, such as Harris & Campbell (1995) and, following them, Barðdal & Eythórsson (2012), have claimed that the problem does not arise in their frameworks. This paper critically examines the isomorphism between phonological and syntactic reconstruction, made possible by an ‘item-based’ view of syntactic variation as assumed within Minimalist theories of syntax as well as Construction Grammar and others. A case study dealing with the ‘middle voice’ suffix -sk in early North Germanic is presented in support of the approach. While the conclusion drawn is not as pessimistic as that of Lightfoot (2002a), it is argued that the ‘correspondence problem’ is real and that reconstruction of syntax is therefore necessarily more difficult, and speculative, than that of phonology.|000|syntax, syntactic reconstruction, cognacy, correspondences, 1342|Walkden2013|[I]t is uncontroversial that the comparative method in phonological-lexical reconstruction involves hypothesizing correspondence sets in which both the lexical item and the sounds that constitute its phonological form are cognate, in the traditional sense of diachronic identity between those items and a single item in the proto-language through transmission across generations.|101|cognacy, definition 1343|Campbell2002|Cognates of any kind are related by virtue of descent from a common ancestor; cognate words, for example, are descended from the same word in the proto-language. Cognate sentences cannot, of course, be descended from a shared sentence (except in formulaic language, as in legal codes, proverbs, etc.); they are examples of shared patterns descended from a pattern in the proto-language.|605|syntax, cognacy, definition 1344|Campbell2002|Lightfoot mistakenly asserts that Harris & Campbell (1995) can reconstruct only when the daughters have identical patterns. Very often the historical linguist is able to explain non-identical corresponding forms, and the weight of the comparative evidence from sister languages often supports a reconstruction in spite of non-identical correspondences. As Meillet (I967: 42) affirmed, 'the initial linguistic unity is not always recognized by a retention pure and simple; it is often shown by DIVERGENT INNOVATIONS' [our emphasis, LC&ACH]. In this section we show that reconstruction in syntax is indeed possible without identity, a process well known in historical linguistics generally|608|cognate, cognate detection, syntax, syntactic reconstruction 1345|Posada2009|Phylogenetic reconstruction is a problem of statistical inference. Since statistical inferences cannot be drawn in the absence of probabilities, the use of a model of nucleotide substitution or amino acid replacement – a **model of evolution** – becomes indispensable when using DNA or protein sequences to estimate phylo- genetic relationships among taxa. Models of evolution are sets of assumptions about the process of nucleotide or amino acid substitution (see Chapters 4 and 9). They describe the different probabilities of change from one nucleotide or amino acid to another along a phylogenetic tree, allowing us to choose among different phylogenetic hypotheses to explain the data at hand. Comprehensive reviews of models of evolution are offered elsewhere (Swofford et al., 1996; Liò & Goldman, 1998).|345|DNA sequence, evolutionary model, probability models 1346|Posada2009|As discussed in the previous chapters, phylogenetic methods are based on a number of assumptions about the evolutionary process. Such assumptions can be implicit, like in **parsimony** methods (see Chapter 8), or explicit, like in distance or **maximum likelihood** methods (see Chapters 5 and 6, respectively). The advantage of making a model explicit is that the parameters of the model can be estimated. Distance methods can only estimate the number of substitutions per site. However, maximum likelihood methods can estimate all the relevant parameters of the model of evolution.|345|maximum parsimony, maximum likelihood, evolutionary model, probability models 1347|Posada2009|In general, models that are more complex will fit the data better than simpler ones just because they have more parameters. An *a priori* attractive procedure to select a model of evolution would be the arbitrary use of the most complex, parameter-rich model available. However, when using complex models a large [pb] number of parameters need to be estimated, and this has several disadvantages. First, the analysis becomes computationally difficult, and requires a large amount of time. Second, as more parameters need to be estimated from the same amount of data, more error is included in each estimate. Ideally, it would be advisable to incorporate as much complexity as needed, i.e. to choose a model that is intricate enough to explain the data, but not that complicated that requires impractical long computations or large data sets to obtain accurate estimates.|346f|evolutionary model, probability models, model complexity 1348|Rodriguez2015|This paper introduces the word list collections which were carried out under the lead of Jekaterina the Great and others, especially in Russia, but apparently also by Spanish missionaries. The article clarifies the relationships and presents the lists which were used to compare the languages. It seems, unfortunately, that the article is wrong in claiming that the original list by Pallas was only 285 items, since an earlier list of about 441 items was already published in 1786, and as a comparison with the 285 item list reveals, it is the predecessor of this later list.|000|word list, concept list, basic vocabulary 1349|Willis2010|Lightfoot (2002) argues that syntactic reconstruction is rendered impossible by the lack of any analogue in syntax to the traditional notion of the phonological ‘corre- spondence set ’ of the Comparative Method and by the radical discontinuity caused by reanalysis between successive grammars. Alice Harris and Lyle Campbell, in various works, have defended the notion of ‘ syntactic pattern ’ as the analogue of the correspondence set, arguing that patterns can be compared across languages, with innovations being stripped away to reveal aspects of the protolanguage. In this arti- cle, I argue that syntactic reconstruction can be carried out while maintaining and indeed utilizing core notions in generative approaches to syntactic change such as the central role of reanalysis and child language acquisition and the distinction between the abstract grammatical system and the surface output of that system. Reanalysis itself is constrained by the fact that both pre- and post-reanalysis grammars must be acquirable on the basis of the same primary linguistic data. This imposes limits on the possible hypotheses that can be entertained (‘local directionality ’) even in the absence of any crosslinguistic generalizations about patterns of change (‘universal direction- ality’). This approach is then applied to aspects of the syntax of free relative clauses and negation in the early Brythonic Celtic languages (Welsh, Breton and Cornish), showing that non-trivial reconstructions can be achieved even where the daughter languages manifest significant differences.|000|syntax, syntactic reconstruction, cognacy, definition, methodology 1350|Vermeulen2013|This article presents the work of Peter Simon Pallas (1741-1811), who collected word lists for different languages and did some other interesting stuff. |000|Peter Simon Pallas, biography 1351|Larrucea1984|It is well known that the Empress Catherine II of Russia had a personal interest in linguistics. She began to collect words and she even prepared a word list of different languages of her Empire and of the world. It is also known that she asked the King of Spain Charles III for information on the American languages. But even in the most recent bibliography many details are missing. A research in the archives of Madrid and Bogotá has permitted the discovery of many unknown details. We could find documented the petition of St.-Petersburg, the sending of documents of the Empress to Madrid, and the King's orders, particularly to the Viceking of Nueva Granada, with their realisation, in which the great naturalist Mutis has had an important part. The Empress also asked for Spanish books on the languages of Philippines and Japan|000|word lists, Catherine the Great, history of science, concept list 1352|Rodriguez2015|Following a long tradition of compiling words to serve as a means of compari- son, at the end of the 18th century, Russian Empress Catherine II started her own lexical compilation. She ordered Peter Simon Pallas to publish a book containing almost 300 words in around 200 languages ordered by semantic fields. She wanted to include as many languages as possible and so she asked Spanish King Charles III to send them books and to translate a longer list, one of 445 words, into the languages spoken in America and the Philippines. These two lexical compilations, the one by Pallas but especially the Matrix List and its translation into around fifty languages, can describe the early stages of the languages and their relationships. The Matrix List in itself is very important, since it was one of the first large scale projects in comparative linguistics — although it was never completed. But in their selection criteria, structure and methodology they were pioneers. Pallas and Mirievo took previous word lists and added the words that were thought to be ba- sic at their time. As for the translations, a comparative study can show: (a) whether the languages share material which is extremely unlikely to have arisen by chance, and thus, that it is most likely they share a common origin; or (b) whether some of the languages have undergone the same changes. The translations can also denote cultural concepts. The relevance of this word list and translations is outstanding [pb] for different research fields. In the future I will study contrastively and thoroughly the translation of the matrix list into the fifty languages mentioned above in order to answer the previous questions and to establish the aims and methods of com- pilation focusing on these lists and in comparison with other contemporary and non-contemporary lexical compilations, older and recent.|328f|Peter Simon Pallas, word list, concept list, history of science 1353|Pellard2009|This dissertation is a linguistic description of the Ōgami dialect of Miyako Ryukyuan, an endangered Japonic language of the Southern Ryukyus. The data have been collected by the author during fieldwork. After having described the linguistic, geographic and historical background of the Ōgami dialect, a synchronic description of the language is given. Ōgami has a relatively small-sized phoneme inventory. It has the very rare characteristic of allowing syllables and words without vowels nor any other voiced sound. Previous analyses positing devoiced vowels in the nucleus of such syllables are refuted on the basis of acoustic and morphophonological evidence. The different word classes, case markers and pragmatic role markers, as well as their morphology, are described in several chapters. An overview of the syntax of the noun phrase and the sentence is given. Special attention is paid to the clause chaining system and desubordination of converbs: some converbs have acquired the ability to head independent sentences, with, for example, a past tense value. This is followed by several chapters on diachronic issues. First, a phylogenetic classification of the Ryukyuan languages and then of the Miyako dialects is proposed. The phonology, noun morphology and verbal system of proto-Miyako are then reconstructed on the basis of data from five dialects. Proto-Miyako is compared to Japanese and several contributions to the reconstruction of proto-Japonic are proposed. |000|Ryukyu languages, Japonic, Japanese, phylogenetic reconstruction, linguistic reconstruction 1354|Pellard2009| :comment:`This chapter uses innovations (potential innovations) and directional processes to reconstruct the phylogeny of Japanese and Ryukyu languages.`|249-294|Ryukyu languages, phylogenetic reconstruction, directionality, maximum parsimony, maximum compatibility 1355|Boothby2015|Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes. |000|lateral gene transfer, tardigrade, phylogenetic reconstruction, phylogenetic network 1356|Boothby2015|This is an interesting case, since they attribute the patchiness in the data to lateral gene transfer, while another group (@Koutsovoulos2015) attribute it to data errors.|000|lateral gene transfer, phylogenetic reconstruction, tardigrade 1357|Koutsovoulos2015|Background Tardigrades are meiofaunal ecdysozoans that may be key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through adoption of a cryptobiotic state. A recent high profile paper suggested that the genome of a model tardigrade, Hypsibius dujardini, has been shaped by unprecedented levels of horizontal gene transfer (HGT) encompassing 17% of protein coding genes, and speculated that this was likely formative in the evolution of stress resistance. We tested these findings using an independently sequenced and assembled genome of H. dujardini, derived from the same original culture isolate. Results Whole-organism sampling of meiofaunal species will perforce include gut and surface microbiotal contamination, and our raw data contained bacterial and algal sequences. Careful filtering generated a cleaned H. dujardini genome assembly, validated and annotated with GSSs, ESTs and RNA-Seq data, with superior assembly metrics compared to the published, HGT-rich assembly. A small amount of additional microbial contamination likely remains in our 135 Mb assembly. Our assembly length fits well with multiple empirical measurements of H. dujardini genome size, and is 120 Mb shorter than the HGT-rich version. Among 23,021 protein coding gene predictions we found 216 genes (0.9%) with similarity to prokaryotes, 196 of which were expressed, suggestive of HGT. We also identified ~400 genes (<2%) that could be HGT from other non-metazoan eukaryotes. Cross-comparison of the assemblies, using raw read and RNA-Seq data, confirmed that the overwhelming majority of the putative HGT candidates in the previous genome were predicted from scaffolds at very low coverage and were not transcribed. Crucially much of the natural contamination in both projects was non-overlapping, confirming it as foreign to the shared target animal genome. Conclusions We find no support for massive horizontal gene transfer into the genome of H. dujardini. Many of the bacterial sequences in the previously published genome were not present in our raw reads. In construction of our assembly we removed most, but still not all, contamination with approaches derived from metagenomics, which we show are very appropriate for meiofaunal species. We conclude that HGT into H. dujardini accounts for 1-2% of genes and that the proposal that 17% of tardigrade genes originate from HGT events is an artefact of undetected contamination.|000|lateral gene transfer, tardigrade, phylogenetic reconstruction, data problems, sequencing error, errors 1358|Koutsovoulos2015|This is a good example for the impact of data errros in biology: here, the authors identify sequencing errors, as David Morrison mentioned in his blog: * http://phylonetworks.blogspot.com/2016/02/tardigrades-and-phylogenetic-networks.html But @Boothby2015 attribute the data patchiness to lateral transfer.|000|lateral gene transfer, tardigrade, sequencing error, errors, data problems 1359|Kroeber1955|When we come to these very low per- centages of cognates, 1000 stems compared would give much solider assurance than 200; but with the added 800 we get increasingly outside of the vocabulary that is really basic and most stable, so that as the stems dealt with increase, the proportion of preserved cognates presumably goes down. |97|concept list, size of concept lists, lexicostatistics, glottochronology 1360|Gudschinsky1956|Swadesh is at present experimenting with the use of a list of only 100 items (see @Swadesh1955 for a detailed analysis of the 200 word list and the suggested revision to 100 words). The reasons given for eliminating some of the items (e.g. the repetition of some roots in such pairs as woman-wife, the non-universality of such words as ice and snow, etc.) seem valid to this author. The gain in quality of test items, hwoever, is balanced by some loss in terms of statistical accuracy. @Kroeber<1955> (1955: 97) has suggested that a list of 1000 items would be preferable, and doubts that deep time depths can be explored by use of a list as small as 200 wrods.|179|basic vocabulary, concept list, size of concept lists, lexicostatistics 1361|Gudschinsky1956|If there is an equal choice of two or more expressions, one should be chosen purely at random (by flipping a coin if necessary) to avoid any bias in teh direction of choosing known cognates, since non-random choice could considerably skew the final results.|179|lexicostatistics, translation of basic words, concept list, synonyms 1362|Kuerschner2014|Das Handbuch steht im Zusammenhang mit einer Sorte linguistischer Hilfs- mittel, die Gabelentz „Collectaneen“ nennt. Dabei handelt es sich um Zusam- menstellungen sprachlicher Materialien unterschiedlicher Art, die mehr oder weniger systematisch gesammelt wurden. [pb] Ihre geschichtlichen Vorläufer sind Wortlisten, auch Vokabularien genannt. Einige sind sehr berühmt geworden, etwa die Linguarum totius orbis vocabularia comparativa von Peter Simon Pallas (1741–1811), die 1786–89 in St. Petersburg veröffentlicht wurden. Hierher gehört auch die Wortliste (1787) von Lorenzo Hervás (1735–1809), dem Verfasser des Catálogo de las lenguas de las naciones conocidas (1800–05). Des Weiteren ist Gabelentz’ Vater, Hans Conon von der Gabelentz (1807–1874), zu erwähnen, der in der ersten Hälfte des 19. Jahrhunderts eine größere Zahl von Grammatiken verschiedener Sprachen veröffentlichte und sich dabei auf Kollektaneen stützte, die er selbst und andere zusammengestellt hatten (s. u.).|239f|Gabelentz, collectaneas, language comparison, concept list 1363|Youn2016|How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides indirect access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here, we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries to translate words to and from languages carefully selected to be representative of worldwide diversity. These translations reveal cases where a particular language uses a single “polysemous” word to express multiple concepts that another lan- guage represents using distinct words. We use the frequency of such polysemies linking two concepts as a measure of their semantic proximity and represent the pattern of these linkages by a weighted network. This network is highly structured: Certain concepts are far more prone to polysemy than others, and naturally interpretable clusters of closely related concepts emerge. Statistical analysis of the polysemies observed in a subset of the basic vocabulary shows that these structural properties are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of a literary tradition. The methods developed here can be applied to any semantic domain to reveal the extent to which its conceptual structure is, similarly, a universal attribute of human cognition and language use.|000|semantic network, polysemy, quantitative analysis 1364|Youn2016|We propose a principled method to construct semantic networks linking concepts via polysemous words identified by cross-linguistic dictionaries. Based on the method, we found overwhelming evidence that the semantic networks for different groups share a large amount of structure in common across geographic and cultural differences. Indeed, our results are consistent with the hypothesis that cultural and environmental factors have little statistically significant effect on the semantic network of the subset of basic concepts studied here.|4|semantic network, polysemy, quantitative analysis 1365|Comtet2011|La première édition du dictionnaire plurilingue de Pallas parue à Saint-Pétersbourg en 1787 proposait la traduction d’une série de substantifs en pas moins de 200 langues différentes selon une transcription cyrillique. Le polonais occupait la di- xième place au sein du groupe des langues slaves placé en tête de l’ouvrage. On se propose, à partir du premier volume qui regroupe 130 vocables, d’étudier la cyrilli- sation du polonais en relation avec la phonologie et la graphie de cette langue et celles du russe, qui apparaît comme une sorte de langue cible de l’opération, tout cela dans le contexte de l’époque. On essaiera ce faisant de déterminer ce qui l’a emporté d’une simple translittération ou d’une transcription phonétique; sur le point traité, le dictionnaire de Pallas annonce une série de tentatives menées en Russie au XIX ème siècle pour cyrilliser l’alphabet latin du polonais dans le but d’as- similer une nation polonaise qui refusait obstinément de perdre son identité après les partages de la Pologne. La comparaison met en valeur les mérites du diction- naire de Pallas qui, dans ce cas précis, et compte tenu des contraintes initiales qui interdisaient l’usage de signes diacritiques non cyrilliques, a su réaliser une syn- thèse astucieuse et qui ne mérite pas les critiques adressées en général au diction- naire pris dans son ensemble.|000|Peter Simon Pallas, concept list, collectaneas, history of science 1366|Plank2003|This little text of 17 pages gives some interesting descriptions of some early questionnaries in linguistics. It's just a draft, but it gives valid hints on concept lists and the role that questionnaires play in historical linguistics and typology.|000|questionnaire, concept list, collectaneas, history of science 1367|Voloshina2012|The article deals with the first experience of making a comparative vocabulary of several languages, «The Comparative Dictionary of All Languages» by Academician Pallas. In this paper, the structure of the dictionary-thesaurus is examined and an example of the dictionary entry is given. This dictionary is compared with its later edition where a different principle of material presentation was proposed - the words were given in alphabetical order. Both dictio- naries are compared in terms of the structure of the language material and the format of dictionary entries. Представлен первый опыт составления сравнительного словаря нескольких языков – «Сравнительный словарь всех языков» академика Палласа. Исследуется структура словаря-тезауруса, приводится пример словарной статьи. Данный словарь сравнивается с его переизданием, в котором был предложен иной принцип подачи материала – слова располагались в алфавитном порядке. Оба словаря сравнива- ются с точки зрения структуры, языкового материала и устройства словарной статьи.|000|Peter Simon Pallas, concept list, collectaneas, dictionary, comparative dictionary, history of science 1368|Cross1964|Linguistic scientists do not need to be adherents of the philosophy of numerology and the religion of digitology may lead us astray into the paths of unrigheousness. By their intensive and extensive borrowings from the disciplines of physics, chemistry, biology, and engineering, linguists ahve paid ample tribute to all their collegues' efforts to uncover, unravel, and codify the ways of nature. But linguistic researchers should not be satisfied with the mere germ of truth, nor a formulation that has wide gaps and the facile quality of elasticity.|488|nice quote, lexicostatistics, quantitative analysis, validity 1369|Cross1964|Lexicostatistics is diverting, but not very useful and it is evenmisleading when employed to set up a taxonomy of language. I experimented with it immediately after the publication of an article by Kroeber and Chretién in *Language* (1937). After a painful experience I found the alluring method very much of a failure.|488f|lexicostatistics, quantitative analysis, nice quote 1370|Buck1949|The words given in the lists are intended to be the most usual expressions of the given notion in the accepted written and spoken language. To try to include all obsolete and dialectal forms would be folly, though such as come to one's attention and offer interesting parallels in semantic development may be mentioned.|xii|etymology, concept list, data preparation 1371|Buck1949|A "Dictionary of Ideas" (a title that would suggest to laymen the point of such study) in a tryly comprehensive sense (history of words for all ideas in all known languages) is, of course, an idle dream. Even for the INdo-European field anything like a complete semantic dictionary is beyond probable realization at present.|x|nice quote, dictionary of ideas, concept list, semantic change 1372|Hunemann2007|As I wanted to establish, the idea of intention itself, in the case of artefacts, has been misconceived, and that implies some unfortunate consequences concerning the answer proposed. My point is that some major elements of the parallel between the biological and the artefact cases have been neglected and must be highlighted. Once those elements are understood and the parallel is strengthened, then we can grasp an idea of the ontological specificity of artefacts.|16|artefacts, evolution, biological parallels, cultural evolution 1373|Hunemann2007|If the function of an artefact is not a property of this artefact alone but a relational property of an object within definite cultural system, then our consideration of artefacts comes closer to the biological cases of function ascriptions. In effect, biologists have always been sensitive to the context-dependency of any evolutionary relevant property, and to the environmental character of each functional ascription. Natural selection is a local process, such that for an organism, changes in other organisms with which it will never be in any interaction can nevertheless bring some fitness changes.|16|fitness, evolutionary fitness, biological evolution 1374|Hunemann2007|Briefly said, indeed: if fitness of an organism of a species depends not only on the ones it preys on, or its predators, but on the complex relationships between all the species in the ecosystem, fitness does not rely on the mere direct interactions between some organisms, but on a web of interactions into which the focal species is embedded (Solé and Goodwin, 2000)—and this exactly parallels the [pb] shift we made from considering the function of an artifact through a dual user-designer relation, towards embedding this function in its context of use (the parallel being all the more compelling if we hold a selected effect (Neander, 1991) or “etiological” theory of functions according to which the function of a trait is based on fitness value or natural selection). |16f|systemic processes, fitness, biological parallels, artefacts, cultural evolution 1375|Hunemann2007|Yet in the case of artefacts, we spontaneously see only the artefact itself, its effects, the user and the designer’s intention. If we look at its entanglement with all other artefacts in the context of use, the high contextdependency of functional ascriptions within artefacts will be empha- sized, like in the biological case.|17|artefacts, complex systems, systemic perspective, cultural evolution 1376|Hunemann2007|In other words, there is no equivalent of the measure of this general property of surviving, etc., which in biology is called fitness. Fitness, no matter how it is defined, has a necessary relationship to the number of offspring. However, artefacts have no offspring, no heredity, so at first blush they have no fitness.|18|fitness, artefacts 1377|Feit2016|This paper revisits the present understanding of typing, which originates mostly from studies of trained typists using the tenfinger touch typing system. Our goal is to characterise the majority of present-day users who are untrained and employ diverse, self-taught techniques. In a transcription task, we compare self-taught typists and those that took a touch typing course. We report several differences in performance, gaze deployment and movement strategies. The most surprising finding is that self-taught typists can achieve performance levels comparable with touch typists, even when using fewer fingers. Motion capture data exposes 3 predictors of high performance: 1) unambiguous mapping (a letter is consistently pressed by the same finger), 2) active preparation of upcoming keystrokes, and 3) minimal global hand motion. We release an extensive dataset on everyday typing behavior.|000|typing, keyboard 1378|Owen2012|This paper presents an analysis of the tones of five contemporary Tai Khuen varieties in order to investigate the differences in the number of distinctive tones reported in the literature. The present study shows that while some contemporary speakers have a tone system with six tones, most speakers have only five tones. Comparison of the distribution and phonetic characteristics of the tones in the contemporary varieties with previous studies shows that the five tone system was derived by the coalescence of two tones in the six-tone system. Investigating the factors that determine which tone system a particular contemporary speaker uses leads to the conclusion that language contact with five-tone Shan was the cause of the change.|000|tone perception, tone change, tonogenesis, borrowing, Tai languages 1379|Sujaritlak2015|Palaung belongs to the Palaungic branch of the Austroasiatic language family. Although at least three main Palaung dialects are generally recognized, namely, Ta-ang, Rumai, and Darang; as many as 13 are recognised by Mak (2012) (according to a combination of language, clothing, and culture, etc.). This paper presents the results of a lexical study using a 100 word data list (chosen following Mann 2004) collected from 16 sites in China, Myanmar, and Thailand. For the lexical analysis, the data were classed into cognates groups, and then analysed using the lexicostatistical package GLOTTO and SplitsTree4 (version 4.13.1) for computing phylogenetic networks. The results are compared with those groupings categorized by names used by the Palaung people in China and by outsiders (Deepadung, 2011); and those classified by the criteria of historical phonology (Mitani, 1977; Ostapirat, 2009).|000|Austro-Asiatic, Palaungic, lexicostatistics, concept list, word list 1380|Keipert2014|“The Church-Slavonic Lines in the Vocabularia comparativa (St. Petersburg, 1787-89)” Edited in 1787 and 1789 by Peter Simon Pallas, the two-volume dictionary Linguarum totius orbis vocabularia comparativa, by Catherine II, contains equivalents in 200 languages (among them twelve Slavic ones) for 285 elementary Russian words. The present article discusses the entries written for Church Slavonic, a Slavic language without native speakers and without complete dictionaries (not to mention bilingual Russian-Church Slavonic ones). The author offers a detailed comparison of the Church Slavonic lines with the Russian head-words, distinguishes several types of head-words in the dictionary, and identifies what might be a French subtext for some of the Russian head-words in the second volume of the Vocabularia. The article offers a characterization of the lexicographic skills of the still-unknown, late- eighteenth-century compiler of the Church Slavonic word-list and suggests that his obvious familiarity with the vocabulary of Church Slavonic texts could not make up for his ignorance of the theoretical problems posed by a translation between two languages regarded by many contemporaries as one and the same.|000|Peter Simon Pallas, word list, concept list, history of science 1381|Bochnakowa2013|The paper discusses an 18 th c. dictionary entitled Linguarum totius orbis Vocabularia comparativa Augustissimae cura collecta (...), which contains the equivalents (in ca. 200 languages and dialects) of more than two hundred Russian entry words. The multilin- gual counterparts are a transliteration (and at times what might be called a phonetic transcription) in the Russian alphabet. The principal objective of the paper is to de- scribe the French (and Old French!) equivalents of the entry words, at the same time not only indicating the simplifications and inaccuracies, but also the historical value of the recorded forms.|000|Peter Simon Pallas, word list, concept list, history of science 1382|Mariscal2015|In the half century since the formulation of the prokaryote : eukaryote dichotomy, many authors have proposed that the former evolved from something resembling the latter, in defiance of common (and possibly common sense) views. In such ‘eukaryotes first’ (EF) scenarios, the last universal common ancestor is imagined to have possessed significantly many of the complex characteristics of contemporary eukaryotes, as relics of an earlier ‘progenotic’ period or RNA world. Bacteria and Archaea thus must have lost these complex features secondarily, through ‘streamlining’. If the canonical three-domain tree in which Archaea and Eukarya are sisters is accepted, EF entails that Bacteria and Archaea are convergently prokaryotic. We ask what this means and how it might be tested.|000|convergent evolution, terminology, homology, similarity 1383|Mariscal2015|As an example, consider the genetic code. Crick [48] once argued that the only reason the genetic code was universal was because all life shares common ancestry; the code is a ‘frozen accident’. His reasoning was based on two factors: the specificity of the code and (discussed later) the difficulty (indeed ‘lethality’) of evolving alternative codes once one had been established.|6|convergent evolution, inheritance, similarity 1384|Mariscal2015|Specificity also applies to structures or processes not so easily enumerated as are possible codes. As an example, consider blubber, the subcutaneous layer of fat used by many endothermic animals to regulate their body heat. Relative to the sheer number of possible ways to regulate body heat, blubber is actually a fairly specific trait. An alternative way to describe this feature, ‘insulation’, would be less specific, but cover more cases, including fur, feathers and so on. An even broader description, ‘thermoregulation’, would include not only blubber, fur and feathers, but even radically different features, such as behaviour. The less specific a trait, the more likely it is to convergently evolve, all else being equal. But very unspecific traits, such as thermoregulation, are weak examples of either convergence or retained similarity, because any number of evolved features might be included as convergent, even if the evolutionary pressures and underlying structures were quite different!|6|specificity, convergent evolution, convergence, inheritance, explanative force 1385|Corel2016|The tree model and tree-based methods have played a major, fruitful role in evolutionary studies. However, with the increasing realization of the quantitative and qualitative importance of reticulate evolutionary processes, affecting all levels of biological organization, complementary network-based models and methods are now flourishing, inviting evolutionary biology to experience a network-thinking era. We show how relatively recent comers in this field of study, that is, sequence-similarity networks, genome networks, and gene families–genomes bipartite graphs, already allow for a significantly enhanced usage of molecular datasets in comparative studies. Analyses of these networks provide tools for tackling a multitude of complex phenomena, including the evolution of gene transfer, composite genes and genomes, evolutionary transitions, and holobionts.|000|phylogenetic network, gene similarity network, similarity networks, review 1386|Fortunato2010|The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.|000|community detection, network analysis, complex network 1387|Alzahrani2016|There is increasing motivation to study bipartite complex networks as a separate category and, in particular, to investigate their community structure. We outline recent work in the area and focus on two high-performing algorithms for unipartite networks, the modularity-based Louvain and the flow-based Infomap. We survey modifications of modularity-based algorithms to adapt them to the bipartite case. As Infomap cannot be applied to bipartite networks for theoretical reasons, our solution is to work with the primary projected network. We apply both algorithms to four projected networks of increasing size and complexity. Our results support the conclusion that the clusters found by Infomap are meaningful and better represent ground truth in the bipartite network than those found by Louvain.|000|bipartite network, community detection, complex network 1388|Lancichinetti2009|Uncovering the community structure exhibited by real networks is a crucial step toward an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom (@2007), and Ronhovde and Nussinov have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.|000|community detection, comparison, algorithms, performance, Infomap 1389|Rosvall2007|To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network—including physics, chemistry, molecular biology, and medicine—information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences|000|Infomap, community detection, network analysis 1390|Kuppermann2012|Our AoA ratings are available as supplementary materials for this article. For each word, we report the number of times that it occurs in the trimmed data (OccurTotal). For most words, the count is about 19. However, for the ten calibration words and the 52 control words, this amounts to more than 1,900 presentations. Next, we provide the mean AoA ratings (in years of age) and standard deviations (Rating.Mean and Rating.SD). We also present the number of responders who gave numeric ratings to the word, rather than rated it as unknown (OccurNum). This information is useful, because it helps to avoid using unknown words in psychological experiments and indicates the degree of reliability of the mean AoA ratings. Finally, we add word frequency counts from the 50-million-word SUBTLEX-US corpus (Brysbaert & New, 2009). Words are presented in decreasing order of frequency of occurrence. The 574 words that were not present in the SUBTLEX-US frequency list were assigned the frequency of 0.5.|988|age of acquisition, dataset 1391|Kuppermann2012|We present age-of-acquisition (AoA) ratings for 30,121 English content words (nouns, verbs, and adjectives). For data collection, this megastudy used the Web-based crowdsourcing technology offered by the Amazon Mechanical Turk. Our data indicate that the ratings collected in this way are as valid and reliable as those collected in laboratory conditions (the correlation between our ratings and those collected in the lab from U.S. students reached .93 for a subsample of 2,500 monosyllabic words). We also show that our AoA ratings explain a substantial percentage of the variance in the lexical-decision data of the English Lexicon Project, over and above the effects of log frequency, word length, and similarity to other words. This is true not only for the lemmas used in our rating study, but also for their inflected forms. We further discuss the relationships of AoA with other predictors of word recognition and illustrate the utility of AoA ratings for research on vocabulary growth.|000|age of acquisition, ranked concept list, concept list, language acquisition 1392|Reesink2012|Similarities between languages can be due to 1) homoplasies because of a limited design space, 2) common ancestry, and 3) contact-induced convergence. Typological or structural features cannot prove genealogy, but they can provide historical signals that are due to common ancestry or contact (or both). Following a brief summary of results obtained from the comparison of 160 structural features from 121 languages (Reesink, Singer & Dunn 2009), we discuss some issues related to the relative dependencies of such features: logical entailment, chance resemblance, typological dependency, phylogeny and contact. This discussion focusses on the clustering of languages found in a small sample of 11 Austronesian and 8 Papuan languages of eastern Indonesia, an area known for its high degree of admixture.|000|convergent evolution, language evolution, chance resemblance, inheritance, homoplasy, discriminative power 1393|Reesink2012|This article is actually similar (though from a linguistic perspective) to @Mariscal2015 and it might be interesting to compare the arguments in there.|000|- 1394|Playfoot2013|In spite of their unusual orthographic and phonological form, acronyms (e.g., BBC, HIV, NATO) can become familiar to the reader, and their meaning can be accessed well enough that they are understood. The factors in semantic access for acronym stimuli were assessed using a word association task. Two analyses examined the time taken to generate a word association response to acronym cues. Responses were recorded more quickly to cues that elicited a large proportion of semantic responses, and those that were high in associative strength. Participants were shown to be faster to respond to cues which were imageable or early acquired. Frequency was not a significant predictor of word association responses. Implications for theories of lexical organisation are discussed.|000|age of acquisition, correlational studies, response time, abbreviations 1395|Playfoot2013|High-frequency words are recognized, produced, and recalled faster and with greater accuracy than low-frequency words (Connine, Mullinex, Shernoff, & Yelen, 1990; Yonelinas, 2002). As studies have shown that most of the associative responses are semantically linked to the cue word (e.g., computer → screen), it is logical to think that word association requires word recognition, a process in fluenced by word frequency. It would be expected, therefore, that an in fluence of word frequency on word association responses would be observed.|1134|correlational studies, age of acquisition, word frequency, response time 1396|Playfoot2013|The literature is equivocal regarding the in fluence of cue word frequency on word association responses. Early investigations indicated that high-frequency cues elicited a stronger dominant response (i.e., produced by many individ- uals) than low-frequency items, and fewer different responses overall (Postman, 1964, 1970). However, de Groot (1989) found no significant effects of the frequency of the cue word on the speed with which an associated word was produced, contradicting the predictions of semantic network models. Furthermore, both de Groot (1989, Experiment 7) and Brysbaert, Van Wijnendaele, and De Deyne (2000) reported that high-frequency cues elicited more diverse responses than low-frequency cues. Interestingly, this was the inverse of the frequency effects reported by Postman (1964, 1970). :comment:`This is exactly the correlation we find with alternative data, where we observe a high negative correlation between weighted degree in association networks and word frequency, we also observe this (but weakly) in polysemy networks.`|1134|word frequency, word association, correlational studies 1397|Playfoot2013|One major tenet of semantic network models is that the link between two concepts should be strengthened by its retrieval. If encountering a word results in the automatic activation of all related words, then the connections stemming from high-frequency words will be accessed more often than for low-frequency words. The links between high-frequency words and their associates should therefore be particularly strong. De Groot’s study instead demonstrated that the imageability of a word was more important in the distribution and speed of word association responses. Imageability refers to the ease with which a word evokes a mental image (Paivio, Yuille, & Madigan, 1968).|1134|imageability, correlational studies, word frequency, word association 1398|Playfoot2013|In discrete word association tasks, a smaller number of different responses are elicited by words that are highly imageable. Correspondingly, the dominant response to a high-imageability cue word has a greater associative strength than that for a less imageable cue, and in addition, the responses are generated more quickly (Altarriba, Bauer, & Benvenuto, 1999; Brysbaert et al., 2000; de Groot, 1989). These findings were interpreted as evidence that the links between highly imageable nodes and related concepts were stronger than the links stemming from low-imageable nodes.|-|word association, imageability, correlational studies, word frequency 1399|Playfoot2013|Age of acquisition (AoA) refers to the moment in time in which words, objects, and faces are first learned. The common finding is that objects, faces, and words learned early in life are processed more quickly than those learned later (e.g., Brysbaert & Ghyselinck, 2006; Izura et al., 2011; Morrison & Ellis, 2000; Pérez, 2007; Richards & Ellis, 2009). A current explanation for the AoA effect is the arbitrary mappings hypothesis (Ellis & Lambon Ralph, 2000). It states that the AoA effect is a product of the connections created during learning. When the relationship between input and output is predictable, a late acquired word can draw on existing knowledge to facilitate processing. In regular words, for example, the relationship of spelling to sound is consistent with other similarly spelled words (e.g., sweet, feet ). Thus, a newly learned word can map on to existing representations (e.g., tweet). However, in irregular words the pronunciation is less predictable and the mapping is arbitrary (e.g., yacht). Under these circumstances late acquired words do not benefit from existing word knowledge and processing is relatively slow.|1134|age of acquisition, word association, word frequency, correlational studies, arbitrary mappings hypothesis 1400|Playfoot2013|Word association responses have also been shown to be affected by age of acquisition. Van Loon-Vervoorn (1989, cited by Brysbaert et al., 2000) showed that responses in a discrete [pb] association task were recorded reliably faster (240 ms) when the cue was early-acquired. Brysbaert et al. (2000) replicated Van Loon-Vervoorn’ s (1989) findings. They reported that early-acquired words elicited association responses 279 ms faster than late-acquired words. Further, Brysbaert et al. (2000) provided evidence that there is greater agreement among participants in the associations generated for early-acquired words. Brysbaert et al. (2000) pointed to the interpretation that the strength of the semantic connections from early- acquired word nodes is greater than from nodes for late-acquired words.|1134f|age of acquisition, word association, correlational studies 1401|Rossiter2013|Background: Previous research has highlighted psycholinguistic variables influencing naming ability for individuals with aphasia, including: familiarity, frequency, age of acquisition, imageability, operativity, and length (Nickels & Howard, 1995) and a potential link between typicality and generalisation to untreated items in intervention (Kiran, Sandberg, & Sebastian, 2011). However, the effect of concept typicality (the extent to which an item can be considered a prototype of a category) on naming in aphasia warrants further examination. Aims: To investigate first whether typicality can be reliably rated across a range of natural semantic categories and second whether, and if so in which direction, typicality influences naming performance for people with aphasia. To provide quantitative and qualitative information on typicality for a set of stimuli for use in future research. Methods & Procedures: Typicality ratings were obtained and the results compared with those in the existing literature. The influence of typicality on picture naming was investigated employing both matched sets (high and low typicality matched for other psycholinguistic variables) and logistic regression analyses for the group and individual participants with aphasia (n = 20). Outcomes & Results: Typicality rating correlated strongly with ratings obtained in previous research (Rosch, 1975: r = .798, N = 35, p < .001; Uyeda & Mandler, 1980: r = .844, N = 47, p s< .001). Typicality was a significant predictor of picture naming for the group and some individuals, with generally better performance for typical items. This was demonstrated in both matched sets and regression analyses. However, other psycholinguistic variables proved more strongly related to naming success, particularly age of acquisition. Conclusions: Typicality can be rated reliably and should be considered alongside other psycholinguistic variables when investigating word retrieval and intervention in aphasia. Further research is necessary to accurately model the direction of typicality effects found in word retrieval. Finally, the differing nature, size, and internal structure of categories require further exploration when investigating typicality effects. |000|typicality, age of acquisition, word frequency, object naming, aphasia 1402|Meschyan2002|The combined contributions of word age of acquisition (AoA) and word frequency (rated and objective) to word retrieval speed and accuracy were investigated, using a picture-naming paradigm. Results from two fully factorial studies revealed that both AoA and word frequency reliably facilitate the speed and accuracy of word retrieval. Furthermore, word frequency and AoA interacted across delay (0, 750, 1,500, and 2,250 msec) in Experiment 2. This resulted in word frequency’s playing a stronger role for late- acquired words across delays. It is concluded that both AoA and word frequency play a fundamental role in lexical retrieval. The results are also consistent with the view that both factors affect the same processing stages.|000|age of acquisition, word frequency, correlational studies 1403|Uyeda1980|The extent to which an item is a prototypical exemplar of a category has been found to predict several experimental results (e.g., reaction times in category classification, free and cued recall of lists, release from proactive inhibition in recall). We present prototypicality ratings for 840 words, equally distributed over 28 categories. The categories were taken from Battig and Montague's (1969) normative tables; only those categories that contained "concrete" items in common usage were employed in the study. Intragroup reliability correlations were high for all categories tested, as were the correlations for prototypicality ratings between the present study and that of Rosch (1975). In addition, correlations between prototypicality ratings, production frequencies, and word frequencies of the items are given.|000|prototypicality, age of acquisition, word frequency, correlational studies 1404|Chiarello1999|Dissociations between noun and verb processing are not uncommon after brain injury; yet, precise psycholinguistic comparisons of nouns and verbs are hampered by the underrepresentation of verbs in published semantic word norms and by the absence of contemporary estimates for part-of-speech usage. We report herein imageability ratings and rating response times (RTs) for 1,197 words previously categorized as pure nouns, pure verbs, or words of balanced noun-verb usage on the basis of the Francis and Kučera (1982) norms. Nouns and verbs differed in rated imageability, and there was a stronger correspondence between imageability rating and RT for nouns than for verbs. For all word types, the image-rating-RT function implied that subjects employed an image generation process to assign ratings. We also report a new measure of noun-verbtypicality that used the Hyperspace Analog to Language (HAL; Lund & Burgess, 1996) context vectors (derived from a large sample of Usenet text) to compute the mean context distance between each word and all of thepure nouns andpure verbs. For a subset of the items, the resulting HAL noun-verb difference score was compared with part-of-speech usage in a representative sample of the Usenet corpus. It is concluded that this score can be used to estimate the extent to which a given word occurs in typical noun or verb sentence contexts in informal contemporary English discourse. The item statistics given in Appendix B will enable experimenters to select representative examples of nouns and verbs or to compare typical with atypical nouns (or verbs), while holding constant or covarying rated imageability.|000|imageability, typicality, concept list 1405|Amancio2012|In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf’s law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.|000|complex network, word association, word frequency 1406|Zevin2002|Recent studies have suggested that age of acquisition (AoA) has an impact on skilled reading independent of factors such as frequency. This result raises questions about previous studies in which AoA was not controlled and about current theories in which it is not addressed. Analyses of the materials used in previous studies suggest that the observed AoA effects may have been due to other factors. We also found little evidence for an AoA effect in computational models of reading that used words that exhibit normal spelling-sound regularities. An AoA effect was observed, however, in a model in which early and late learned words did not overlap in terms of orthography or phonology. The results suggest that with other correlated properties of stimuli controled, AoA effects occur when what is learned about early patterns does not carry over to later ones. This condition is not characteristic of learning spelling-sound mappings but may be relevant to tasks such as learning the names for objects.|000|age of acquisition, correlational studies, word frequency, object naming 1407|Cuetos2011|Recent studies have shown that word frequency estimates obtained from films and television subtitles are better to predict performance in word recognition experiments than the traditional word frequency estimates based on books and newspapers. In this study, we present a subtitle-based word frequency list for Spanish, one of the most widely spoken languages. The subtitle frequencies are based on a corpus of 41M words taken from contemporary movies and TV series (screened between 1990 and 2009). In addition, the frequencies have been validated by correlating them with the RTs from two megastudies involving 2,764 words each (lexical decision and word naming tasks). The subtitle frequencies explained 6% more of the variance than the existing written frequencies in lexical decision, and 2% extra in word naming.|000|word frequency, Spanish, subtitles 1408|Haggarty2014|Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as "public goods," subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more fundamental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which genes have frequently originated. This kind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define "family resemblances" to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution.|000|homology, partial homology, similarity, similarity networks, epistemology, ontology 1409|Haggarty2014|+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Term | Meaning | +==========================+=================================================================================================================================================================================================================================================================================+ | Homologs | Having a relationship through descent from at least one common ancestor | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Family resemblance | Having an evolutionary relationship through intermediate sequences and common descent | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Clique | A subgraph in a network where every member of the subgraph is connected to all other members | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | STT | Strong tree thinking: A perspective that sees homology statements as valid when the homologs have evolved down the branches of a bifurcating phylogenetic tree | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | PNT | Phylogenetic network thinking: A perspective that sees homology statements as valid when the homologs have evolved through tree-like processes, but allowing for some homologous recombination, thereby making a phylogenetic network. | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | GT | Goods thinking: A perspective that sees homology relationships encompass illegitimate recombination, fusion, and fission of evolving entities in addition to vertical descent. Gene evolution is expected at times to be very complex and involve merging of evolving entities. | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | N-rooted fusion networks | A new kind of network that depicts rooted networks with at least one fusion node and at least two roots. | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | TRIBES | Homologs that have a 1:1 correspondence in terms of being homologous for most or all their length. | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | TribeMCL | One of the most successful approaches to finding communities in networks of gene similarity. | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |507|homology, ontology, epistemology, partial homology 1410|Stark1972|From a common sense point of view most poeple would agree that languages consist of sounds, words, sentences, and meanings. And even though linguists tend to be uncomfortable with these everyday terms, it is nevertheless ture, that a theory of linguistics defines itself by the particular way in which it handles sounds, words, sentences, and meanings. The conceptual machinery that a theory sets up to represent these things not only defines the overall design of the machinery itself but also shapes our view of the objects that it deals with a language looks very different depending upon which linguists you talk to, just as Shakespeare's *Hamlet* looks very different depending upon which critics you read.|000|Leonard Bloomfield, compositionality, double articulation, structure of language, morphotactics, phonotactics, syntax 1411|Stark1972|The principal vertical relationschip was 'made up of' or 'composed of' so that a sentence was composed of constructions, constructions were made up of words, words were made up of morphemes, and morphemes were composed of phonemes. Because smaller units 'composed' larger ones, the fundamental difference between the units of different levels was the simple quantitative one of 'size', -- morphemes were simply bigger than phonemes. The units of these levels fitted together to give an extremely homogeneous if somewhat monolithic representation that consisted of just two basic parts (1) an inventory of units and (2) their tactic patterns or arrangements. |390|phonotactics, morphotactics, syntax, Leonard Bloomfield 1412|Stark1972|Very schematically, the overall picture of language structure that these Bloomfieldian assumptions yield looks like this: +-----+-----------+---------------+----------------+------------+ | | | Units | Relations | Levels | +=====+===========+===============+================+============+ | (1) | meanings | -- | -- | -- | +-----+-----------+---------------+----------------+------------+ | (2) | sentences | constructions | constituency | syntax | +-----+-----------+---------------+----------------+------------+ | (3) | words | morphemes | morpho-tactics | morphemics | +-----+-----------+---------------+----------------+------------+ | (4) | sounds | phonemes | phono-tactics | phonemics | +-----+-----------+---------------+----------------+------------+ |390|Leonard Bloomfield, phonotactics, morphotactics, syntax 1413|Clothia1992|Article describes how protein families evolve and states that secondary structure often implies homology, although there is not necessary enough clear similarity left. Interestingly, the authors even assume that based on the weakness of homology detection based on sequence similarity, the number of protein families is just about 1000.|000|homology, protein structure, protein families 1414|Clothia1992|Thus a conservative view of the evidence that we have at present would be that the large majority of proteins come from no more than a thousand different families.|544|protein structure, protein families 1415|Gimona2006|The correspondence between biology and linguistics at the level of sequence and lexical inventories, and of structure and syntax, has fuelled attempts to describe genome structure by the rules of formal linguistics. But how can we define protein linguistic rules? And how could compositional semantics improve our understanding of protein organization and functional plasticity?|000|protein structure, proteins, biological parallels, syntax, grammar 1416|Gimona2006|Note that they do not mention the morphological complexity of languages, and the different areas, but only provide the comparison with syntax. |000|protein structure, biological parallels, grammar, syntax 1417|Decoene1993|The basic speech unit (phoneme or syllable) problem was investigated with the primed matching task. In primed matching, subjects have to decide whether the elements of stimulus pairs are the same or different. The prime should facilitate matching in as far as its representation is similar to the stimuli to be matched. If stimulus representations generate graded structure, with stimulus instances being more or less prototypical for the category, priming should interact with prototypicality because prototypical instances are more similar to the activated category than are low-prototypical instances. Rosch (1975a, 1975b) showed that, by varying the matching criterion (matching for physical identity or for belonging to the same category), the specific patterns of the priming x prototypicality interaction could differentiate perceptually based from abstract categories. By testing this pattern for phoneme and syllable categories, the abstraction level of these categories can be studied. After finding reliable prototypicality effects for both phoneme and syllable categories (Experiments 1 and 2), primed phoneme matching (Experiments 3 and 4) and primed syllable matching (Experiments 5 and 6) were used under both physical identity instructions and same-category instructions. The results make clear that phoneme categories are represented on the basis of perceptual information, whereas syllable representations are more abstract. The phoneme category can thus be identified as the basic speech unit. Implications for phoneme and syllable representation are discussed.|000|perception, speech unit, primed matching task, phoneme 1418|Decoene1993|The impetus for the investigation of the speech unit hypothesis came from a simple experiment. Savin and Bever (1970) had subjects monitor a sequence of meaningless syllables for either a complete syllable or a phoneme (the syllable-initial consonant or the medial vowel). Phoneme monitoring was significantly slower than syllable monitoring. Savin and Bever concluded that phonemes are idenified after syllables of which they are parts, because phonemes are not perceptual, but abstract, nonsensory entities.|601|phoneme, perception, syllable, mental representation 1419|Decoene1993|Mental categories can be placed on a continuum from concrete to abstract, from entirely perceptually defined to entirely conceptual (Medin & Barsalou, 1987; Rosch, 1975a, 1975b): The color blue is a perceptually based category, whereas the concept "freedom" has a purely abstract characterization. [...] In the primed matching task (Beller, 1971), subjects are asked to decide as rapidly as possible whether or not elements of a stimulus pair are the same. The priming stimulus preceding the stimulus pair should facilitate the matching process insofar as the mental category repre- sentation it activates is similar to the stimuli to be matched. If stimuli are instances of a category that generates a prototypicality effect, then the prime will activate a category representation that is more similar to a good example than to a bad example of the category. Thus, the prime will facilitate judging similarity more in matching goodexample pairs than in matching poor-example pairs. Primed matching will thus result in a priming X prototypi- cality level interaction (henceforth, a P x P interaction).|603|primed matching task, perceptional category, abstract category 1420|Decoene1993|In terms of the speech unit hypothesis, it can be concluded that phoneme categories are the basic units in speech perception. They are more perceptually based -- hence, less abstract-than syllables. Moreover, because same-name pairs under same-name instructions showed no significant P X P interaction, phoneme representations seem to be entirely concrete. This in tum implies that phoneme categories form good patterns: The acoustic specification of a phonetic segment in the speech signal has to be such that a mental representation can be constructed from entirely perceptual characteristics. That is, the signal contains information allowing a stable perceptual description of phonetic segments by the perceptual system.|613|perceptional category, abstract category, phoneme, syllable 1421|Schroeder2012|The present study introduces the first substan- tial German database with norms for semantic typicality, age of acquisition, and concept familiarity for 824 exemplars of 11 semantic categories, including four natural ( ANIMALS, BIRDS, FRUITS, and VEGETABLES ) and five man-made ( CLOTHING, FURNITURE, VEHICLES, TOOLS , and MUSICAL INSTRUMENTS ) categories, as well as PROFESSIONS and SPORTS . Each category exemplar in the database was collected empirically in an exemplar generation study. For each category exemplar, norms for semantic typicality, estimated age of acquisition, and concept familiarity were gathered in three different rating studies. Reliability data and additional analyses on effects of semantic category and intercorrelations between age of acquisition, semantic typicality, concept familiarity, word length, and word frequency are provided. Overall, the data show high interand intrastudy reliabilities, providing a new resource tool for designing experiments with German word materials. The full database is available in the supplementary material of this file and also at www.psychonomic.org/archive.|000|prototypicality, age of acquisition, familiarity, speech norms, German, concept list 1422|Hartmann2014|This paper aims at accounting for the emergence and loss of constraints governing the formation of deverbal nominalizations in German from a cognitive point of view. Specifically, diachronic changes in the formation of derivatives in the suffix -ung are investigated on the basis of two large corpora of Middle High German (MHG, 1050-1350) and Early New High German (ENHG, 1350-1650) texts, respectively. Employing the key notions of construal (e.g. Verhagen, 2007) and mental scanning (e.g. Langacker, 1987) and adopting a usage-based perspective, this paper demonstrates that the diachronic change of word formation patterns can be explained in terms of basic principles of human cognition. It is shown that the emergence of word formation constraints affecting ung-nominalization can be attributed to an increase in (lexical-categorial) prototypicality: Numerous frequent word formation products in -ung adopt features of more prototypical nouns by means of lexicalization throughout the ENHG period. This change eventually affects the word formation pattern itself, blocking the formation of more “verby”, i.e. processual, ung-nominals and rendering a variety of previously felicitous derivatives ungrammatical. This development is paralleled by a loss of constraints affecting the competing word formation pattern of Infinitival Nominalization, which comes in as a “replacement process” (Barz, 1998) for ung-nominalization.|000|semantic change, nominalization, word formation, prototypicality 1423|Lieberherr2015|This paper presents a progress report on the historical phonology of Puroik. The first part lists recurrent correspondences between three Puroik dialects, including the two hitherto undescribed varieties of Kojo- Rojo and Bulu, and proposes reconstructions for Proto-Puroik, the hypothetical common ancestor of these languages. As an external control the reconstruction was further compared with Kuki-Chin (data from (VanBik 2009)). Based on these comparisons and a brief lexico-statistical evaluation, possible hypotheseses for the phylogenetic affiliation of Puroik are evaluated.|000|comparative wordlist, Sino-Tibetan, dataset 1424|Sommerstein1973|The main thesis of this paper is that the grammars of natural languages contain an exhaustive set of conditions on the output of the phonological rules - in fact, a surface phonotactics. I shall show that, contrary to what is usually assumed in gnerative phonology, a surface phonotacties is not redundant in a generative grammar if the grammar is indeed intended as 'a theory of linguistic competence' Chomsky, (1965: 3), and that if any set of rules in the phonological section of the grammar is redundant it is the morphophonotactic rules, better known as morpheme structure conditions. I shall propose a format for the statement of rules including so-called 'conspiracies') which are 'motivated' by the phonotactics in the sense of Matthews (1972: 21~220). Finally, I shall present a set of phonotactic rules for consonant clusters in Latin, and show how the statement of certain rules of Latin phonology can be simplified by taking their phonotactic motivation into account.|000|phonotactics, generative grammar, Chomsky, Latin 1425|Sommerstein1973|This paper contains phonotactic rules for Latin, a set of distinctive features for Latin, and information that morphological rules seem to be redundant in generative grammar (also it's not really clear what that means).|000|phonotactics, Latin, distinctive features, generative grammar 1426|Sommerstein1973|In current generative phonology, tactic rules occupy a peculiar position. Since Stanley (1967), morpheme structure rules or conditions have played no role in the phonological interpretation of surface structures; if such rusles or conditions are to be justified at all, it must be for other reasons. I have argued elsewhere (Sommersten, 1973) that the justification for tactic rules of some kind in phonology is that native speakers can tell the difference between forms that on the evidence of their phonological shape 'belong' in their languages (even though the forms may not actually appear in the lexicon) and forms that do not; in the time-honoured illustration, between *blick and *bnick. |73|generative grammar, phonotactics, 1427|Meheust2016|The integration of foreign genetic information is central to the evolution of eukaryotes, as has been demonstrated for the origin of the Calvin cycle and of the heme and carotenoid biosynthesis pathways in algae and plants. For photosynthetic lineages, this coordination involved three genomes of divergent phylogenetic origins (the nucleus, plastid, and mitochondrion). Major hurdles overcome by the ancestor of these lineages were harnessing the oxygen evolving organelle, optimizing the use of light, and stabilizing the partnership between the plastid endosymbiont and host through retargeting of proteins to the nascent organelle. Here we used protein similarity networks that can disentangle reticulate gene histories to explore how these significant challenges were met. We discovered a previously hidden component of algal and plant nuclear genomes that originated from the plastid endosymbiont: symbiogenetic genes (S-genes). These composite proteins, exclusive to photosynthetic eukaryotes, encode a cyanobacteriumderived domain fused to one of cyanobacterial or another prokary- otic origin and have emerged multiple, independent times during evolution. Transcriptome data demonstrate the existence and expression of S-genes across a wide swath of algae and plants and functional data indicate their involvement in tolerance to oxidative stress, phototropism, and adaptation to nitrogen-limitation. Our research demonstrates the “recycling” of genetic information by photosynthetic eukaryotes to generate novel composite genes, many of which function in plastid maintenance.|000|similarity networks, protein network, symbiogenetic gene, endosymbiosis 1429|Gligorijevic2016|**Motivation:** Discovering patterns in networks of protein–protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. However, the complexity of the multiple network alignment problem grows exponentially with the number of networks being aligned and designing a multiple network aligner that is both scalable and that produces biologically relevant alignments is a challenging task that has not been fully addressed. The objective of multiple network alignment is to create clusters of nodes that are evolutionarily and functionally conserved across all networks. Unfortunately, the alignment methods proposed thus far do not meet this objective as they are guided by pairwise scores that do not utilize the entire functional and evolutionary information across all networks. **Results:** To overcome this weakness, we propose Fuse, a new multiple network alignment algo- rithm that works in two steps. First, it computes our novel protein functional similarity scores by fusing information from wiring patterns of all aligned PPI networks and sequence similarities between their proteins. This is in contrast with the previous tools that are all based on protein simi- larities in pairs of networks being aligned. Our comprehensive new protein similarity scores are computed by Non-negative Matrix Tri-Factorization (NMTF) method that predicts associations be- tween proteins whose homology (from sequences) and functioning similarity (from wiring pat- terns) are supported by all networks. Using the five largest and most complete PPI networks from BioGRID, we show that NMTF predicts a large number protein pairs that are biologically consistent. Second, to identify clusters of aligned proteins over all networks, Fuse uses our novel maximum weight k-partite matching approximation algorithm. We compare Fuse with the state of the art mul- tiple network aligners and show that (i) by using only sequence alignment scores, Fuse already outperforms other aligners and produces a larger number of biologically consistent clusters that cover all aligned PPI networks and (ii) using both sequence alignments and topological NMTF- predicted scores leads to the best multiple network alignments thus far.|000|multiple network alignment, alignment, data fusion, software 1430|Brysbaert2014|Concreteness ratings are presented for 37,058 English words and 2,896 two-word expressions (such as “zebra crossing” and “zoom in”), obtained from over four thousand participants by means of a norming study using internet crowdsourcing for data collection. Although the instructions stressed that the assessment of word concreteness would be based on experiences involving all senses and motor responses, a comparison with the existing concreteness norms indicates that participants, as before, largely focused on visual and haptic experiences. The reported dataset is a subset of a comprehensive list of English lemmas and contains all lemmas known by at least 85% of the raters. It can be used in future research as a reference list of generally known English lemmas.|000|concreteness, speech norms, word ratings, 1431|Calero2002|The purpose of this study was to validate a reduced version (15 items) of the Boston Naming Test (BNT) in a sam- ple of 78 low-educational elderly persons with or without dementia, as determined by independent assessment with a battery of cognitive tests. The reduced version was found to be equivalent to the complete BNT, and to have criterion validity with respect to other measures of dementia. We conclude that the reduced version is a use- ful instrument for assessing patients who require shorter testing methods because of severe cognitive deteriora- tion or their low level of education.|000|Boston Naming Test, concept list, dementia, concepts, 1432|Kato2008|Geba is one of the Karenic languages which is spoken in eastern Burma (Myanmar). This paper presents the phonemic system, some morphosyntactic characteristics, a basic vocabulary and a text of the Geba language.|000|Sino-Tibetan, Karenic, Geba, database, word list 1433|Valyou2014|Networks are used to represent interactions in a wide variety of fields, like biology, sociology, chemistry, and more. They have a great deal of salient information contained in their structures, which have a variety of applications. One of the important topics of network analysis is finding influential nodes. These nodes are of two kinds —leader nodes and bridge nodes. In this study, we propose an algorithm to find strong leaders in a network based on a revision of neighborhood similarity. This leadership detection is combined with a neighborhood intersection clustering algorithm to produce high quality communities for various networks. We also delve into the structure of a new network, the Houghton College Twitter network, and examine the discovered leaders and their respective followers in more depth than which is frequently attempted for a network of its size. The results of the observations on this and other networks demonstrate that the community partitions found by this algorithm are very similar to those of ground truth communities.|000|network leader, community detection, neighborhood similarity, networks, complex networks 1434|Valyou2014|An algorithm to determine high quality leaders in a network has more uses than merely community detection. The identities of influential nodes in the network are useful data in many other practical applications of the study of networks. Fields in which these data are useful include marketing, sociology, and virology. Influential nodes are generally divided into two categories, core nodes and bridge nodes. Core nodes, or leaders, are nodes at the heart of a community, around which communities are formed. Bridge nodes act as connections between two or more communities. Many measures of the importance of a node in a network, called centrality, have been developed. The three most frequently used centrality functions are degree centrality, betweenness centrality, and distance centrality.|52|network, network leader, node importance, neighborhood similarity 1435|Bordag2008|This paper presents a revised version of an unsupervised and knowledge-free morpheme boundary detection algorithm based on letter successor variety (LSV) and a trie classifier [1]. Additional knowledge about relatedness of the found morphs is obtained from a morphemic analysis based on contextual similarity. For the boundary detection the challenge of increasing recall of found morphs while retaining a high precision is tackled by adding a compound splitter, iterating the LSV analysis and dividing the trie classifier into two distinctly applied clasifiers. The result is a significantly improved overall performance and a decreased reliance on corpus size. Further possible improvements and analyses are discussed.|000|morpheme detection, 1436|Benden2005|To simply take the distribution of linguistic elements as a basis for analysis was the methodological prime of researchers of the so-called “American Structuralism”. This paper deals with the detection of morphemes from a large corpus of German by simply applying a distributional procedure of counting the number of potential successors of a given sequence of letters of a word, a method reminiscent of proposals by Harris, Shannon and others. Morphemes can be heuristically read off by an increase in the potential successor count. Three different methods of identifying morpheme breaks are discussed and a proposal for improvement of the method by transforming graphemic to partial phonemic representation is put forward.|000|morpheme detection 1437|Benden2005|This method seems to be simple enough to be tested on small datasets. It can also be improved by finding first breaks in boundaries and then using these to split words, re-creating morphemes, etc. until all morphemes are detected.|000|morpheme detection 1438|Bagheri2005|In lexical semantics several meta-linguistic relations are used to model lexical structure. Their number and motivation vary from researcher to researcher. This article tries to show that one relation suffices to model the concept structure of the lexicon making use of intensional logic.|000|lexical semantics, semantics, linguistic sign, semantic relations 1439|Bagheri2005|Even though the relations are seen as primary they are not sufficient to distinguish different meanings. In addition so-called glosses are added which resemble very much customary definitions in defining dictionaries.|482|wordnet, semantic relations, 1440|Bagheri2005|This concept calculus goes back to G. W. Leibniz. Kauppi (1967) condensed and improved the intensional calculus and put it into a modern form in terms of relations. Leibniz distinguished between logic purely based on concepts (intension), and a logic based on objects (extension). A definition constitutes the fundamental relation between concepts: the defined concept contains the defining one. Usually a concept contains several concepts.|482|Gottfried Wilhelm Leibniz, concept calculus 1441|Bagheri2005|The text presents some rather simple-seeming system to denote relations among concepts, which may be interesting to test in application.|000|concept calculus, semantic relations 1442|Bagheri2005|How is the linguistic expression related to the concept? To stick to kinship relations: There are, for example, several expres- sions connected to the concept father: “father”, “dad”, “daddy”, “pop”, “old man”. These terms, extensionally interpreted, do not denote different kinds of persons. But they are used in different circumstances for the same kind of persons. The term “old man” might be used among adolescent people, when they talk about their fathers. But they would not use this term when they speak to their fathers. The term “daddy” is only used by girls (Schusky (1972), 13) not by boys in referring to their father. There are many more as- pects which determine the choice of expression. Especially kinship relations of different cultures are truly a treasure trove for very subtle differences of lexical coding. This cannot and should not be incorporated into the same concept system.|488|context, semantic relations, denotation 1443|Bagheri2005|The application of the concepts of this kind of concept system is not what is commonly understood by objects, viz. persons, entities, or everyday situations. The application now involves the concepts of the concept system of the ‘real world’ itself as objects, and the linguistic expressions as objects.|488|concept relations, semantic relations, denotation 1444|Baxter2014|simplified notation can be generated by (1) removing all parentheses and what is inside them, and (2) omitting square brackets. In this simplified notation jī 奇 becomes \*kaj... For serious philological or comparative research, however, the full, less user-friendly version is more realistic and is to be preferred. :comment:`quoted after` @Schuessler2015|379f|IPA, Old Chinese, phonetic transcription 1445|Schuessler2015|Testing (falsifying) in natural sciences is different from recovering a dead language. Physics can predict exactly how a body moves, where only one immutable law needs to be understood. Sounds and meanings can change in unpredictable ways (*t- > d, th, ts, s, ʔ, etc. depending on language, time and place) and can only be explained ex post facto. Explanations are often matters of interpretation, judgment and plausibility. Then, in the hard sciences hypotheses can be falsified with repeatable tests; drop an object to falsify gravity. But dead languages have no native speakers. Predicting the behavior of rhymes in Old Chinese literature is a matter of interpretation (a rhyme that does not fit may be a bad rhyme, a word may have been pronounced differently, graphic error, obsolete word). Predicting that a certain word will not be written in a certain way in manuscripts that may come to light one day does not seem a viable test or falsification.|577|falsification, Karl Popper, linguistic reconstruction, reconstruction methodology 1446|Schuessler2015|To accommodate what their hypotheses require, including their phonological analysis of graphs, morphological ideas, Proto-Min forms and loans in Hmong-Mien and Vietic, the authors expand Sagart’s 1999 hypotheses about tight and loose, consonantal and sesquisyllabic prefixes. Thus a putative configuration \*m+r can have seven distinct manifestations in NOC. * mǎi 買 \*mʕrajʔ * mái 埋 \*m.rʕə * mài 麥 \*m-rʕək * mài 脈 \*C.mʕ[i]k * wǔ 舞 \*k.m(r)aʔ * lái 來 \*mə.rʕə * lǐ 鯉 \*mə-rəʔ |583|Old Chinese, phonetic transcription, 1447|Schuessler2015|Most Middle Chinese (and hence OC) syllable initial consonants that can alternate within phonetic series belong to a homorganic set p ph b, or ts tsh dz and so on. ʔ and x (χ) as well as some ɣ and j do not fit into this neat system. Instead, ʔ and x co-occur occasionally in phonetic series with velar stops k kh g/ɣ, thus violating the so-called xiéshēng principle (Li Fang-Kuei 1971). Just as s does not mix with ts tsh dz (it does, though; cf. shēng 生 vs. qīng 青 ), x should not mix with k kh g/ɣ (Pan 2000: 337, S&B 2009: 223). :comment:`Note that this can be easily tested on networks of xíeshēng series.`|587|xiéshēng, uvular sounds, Old Chinese, 1448|Schuessler2015|On the other hand, rhymes like \*-ap with relatively few words have correspondingly fewer graphemes at their disposal. There is no phonetic series \*ʔap, so the few words of that shape are slipped into other series (yā 押 \*ʔrâp under jiǎ 甲 \*krâp, yā 壓 \*ʔrâp under yàn 猒 \*ʔem, yè 㡋 \*ʔap under yǎn 奄 \*ʔamʔ). Once this was done, occasional mixes of velars and laryngeals were accepted. Thus the mixing of velars with laryngeals was a matter of availability and expediency, not rigid phonology. As we have seen, there are more pragmatic ways of looking at phonetic loans. |589|uvular sounds, Old Chinese, xiéshēng 1449|Cai2013|This paper is part of an ongoing project on the Gong’an dialect (hereafter: GAD). Based on data collected from the field in September-October 2009, it aims to present some of the phonological, lexical and syntactical features of GAD in comparison to the Modern Standard Chinese or Mandarin Chinese (MC) and the neighbouring Changsha dialect (hereafter: CSD). 1 Section 2 offers some background information about the concept of dialect, Chinese dialects, previous studies on GAD, the data used in the project, the motivation and justification for this study, and the aims of this paper. Section 3 presents some of the phonological, lexical and syntactical features of GAD from a comparative perspective. Section 4 concludes this paper and provides some possible future routes of enquiry into the GAD.|000|Chángshā dialect, Gong'an dialect, Chinese dialects, unclassified, dialect data 1450|Chen2004|The Hakka dialect of Changting exhibits extraordinarily complex tone sandhi patterns, that present daunting analytical challenges to any theory, including rule-based generative model as well as Optimality Theory. This paper examines the possible extensions and logical moves within both theories, and concludes that both theoretical models in their current form fall short of their descriptive goals. Hakka is a limiting case that severely tests the adequacy of conceptual tools at our disposal.|000|Changting dialect, Hakka, Chinese dialects, tone sandhi 1451|Coblin1996|In Chinese historical phonology it is often possible to identify the traditional tone classes of particular syllables with considerable accuracy. But to determine the actual phonetic values of pre-modern tones is usually a much more difficult undertaking because source materials are either lacking, insufficiently explicit, or too difficult to interpret. It so happens, however, that very clear data on phonetic tone values are available for the standard language of early Qīng times. This language, which was called Guānghuà 官話 ["language of the officials or mandarins"] was described in detail by European Catholic priests, who intended to use it in their missionary activities in the Chinese empire. The Guānhuà treated n these missionary sources, and denoted there as *la lengua (~ lingua) mandarina* ["the mandarin language"] , was clearly identified by the missionaries as a Nanking-based koine. It was, in fact, essentially the same Jiānghuái 江淮 -type koine which had been used throughout the preceding Míng dynasty. It is important to note that, contrary to possible misconception, this form of "Mandarin" was neither based on nor directly related to the dialect of the cty of Peking. The ascendancy of Pekingese as the basis for the national linguistic standard lay nearly a century and a half in the future.|000|Mandarin, Chinese dialects, tone, tone change, Early Mandarin 1452|Coblin2004|Article provides information on the Huáng-Xiào dialects, a group located close to Jiānghuái Mandarin but not necessarily identical with it.|000|Jiānghuái, Huángxiào Mandarin, Chinese dialects, migration 1453|Coblin2015|The Gàn dialect group is one of the major Sinitic language families of China. It was identified later than most of the other major Chinese dialect complexes and did not play a role in the philological work of earlier scholars such as Bernhard Karlgren and his epigones. Indeed, it was only in the final decades of the last century that enough detailed data on the group became available to support meaningful comparative research. Three book-length works of great importance that appeared in the 1990’s were the Kè-Gàn fāngyán diàochá bàogào of Lǐ Rúlóng and Song-Hing Chang (1992), Les Dialectes Gan (1993) by Laurent Sagart, and the Kè-Gàn fāngyán bǐjiào yánjiù of Liú Lúnxīn (1999). With these major contributions in hand, Sinological linguists at last began to grasp the full nature and extent of the Gàn family. Since the advent the new century, many more articles and several book-length works have appeared, dealing both with individual Gàn varieties and with entire groups of these dialects. Indeed, our ever-growing corpus of Gàn material is now quite impressive. However, to the best of our knowledge, as of this writing no comparative phonological reconstruction covering the entire family has appeared in print. It is accordingly the primary goal of the present work to remedy this deficiency. To wit, we propose to reconstruct a Common Gàn phonological system and then to demonstrate how this reconstruction can be used as a tool for the study of lexical, taxonomic, and historical problems in comparative Gàn. In order to undertake a comparative reconstruction, one must begin by deciding what to compare. In the case of Gàn, this proves to be a complex issue, for no cogent classificatory scheme for the family has so far been proposed. Indeed, the late Professor Jerry Norman once remarked that it is easier to say what Gàn is not than to say what it is. In confronting this conundrum, our initial approach will be to use in our comparative work only dialects that are universally recognized as Gàn, and to set aside for the nonce any whose assignment to the group is problematic or disputed. Then, once our common phonological system has been reconstructed, we shall return to the problem of taxonomy and, by comparing our new common system with others posited for contiguous dialect families, we shall attempt a delineation of the Gàn family as a whole. Finally, when these tasks have been completed, we shall be in a position to propound a set of guidelines for testing the affiliations of those dialects whose taxonomy is currently in question. |000|Gàn, Chinese dialects, genetic classification, description, dialect data 1454|Coblin2015|Apart from providing a very complete view on the Gàn dialects, this book presents interesting discussions of how to reconstruct in Chinese dialectology, and this is very interesting, especially with respect to the lexicon and lexical change via compounding. Important in this respect are the "Experiments in Lexical Comparison and Annotation" in Chapter V, where important words, lke "face", "eye", "mouth", "nose", etc. are explained and discussed. |000|Gàn, Chinese dialects, linguistic reconstruction, lexical change, genetic classification 1455|Drellishak2006|In the lexicons of many of the world’s languages, there seem to exist subword patterns of sound and meaning that cannot easily be analyzed as morphemes. English, for example, has a number of words that start with the consonant cluster gl- and share a meaning related to light or vision, including glimmer, glisten, glitter, gleam, glow, and glint. [...] The psycholinguist, in other words, is faced with the necessity of somehow selecting phonesthemes before experiments requiring significant time and resources can be conducted to validate those phonesthemes. Furthermore, although there is a long history of proposed phonesthemes in English, other less- studied languages may not share this accumulated resource. In this paper, I propose and evaluate three statistical, language-independent methods for evaluating candidate phonesthemes that require only a dictionary of the target language in an electronic format and a computer running the necessary software.|000|phonestheme, morphology, language structure, automatic approach 1456|Collins2016|Everett et al. (2015) find a that complex tonal languages tend to be found in humid environments a correlation that holds up within different families and parts of the world. Despite the impressive statistical and experimental support for this causal claim, evidence is needed from natural language use, such as Chinese speakers changing their use of tone depending on humidity, before the claim can be considered well supported. There is other- wise a risk that this correlation could be an artifact of history of language families and language contact. To illustrate this, I show in a series of simulations that random selection of languages followed by language contact can create a positive global correlation between tone and humidity with as much as a 83 per cent probability, and a 47 per cent probability of holding within at least two different macro-areas. Language contact is additionally responsible for these correlations holding up when controlling for language relatedness, as I show that when using the random independent samples test employed by Everett et al., their result is still expected by chance as much as 60 per cent to 80 per cent of the time. I further show how contact can create correlations within families by a phylogenetic analysis of the evolu- tion of tone in Niger-Congo and Sino-Tibetan.|000|tonogenesis, tone change, complex tone languages, tone language, correlational studies 1457|Coupe2016|In their paper, Everett et al. (2016) stress how a shift could or should take place from autonomous linguistic forms to ecologically adaptive ones. This raises the issue of the meaning of ecology when it comes to languages, and to what the Greek root of this word—oikos, the house or the habitat—actually refers. Several authors have equated the ecology of languages with their social environment, i.e., communities of speak- ers. When describing the ecology of language evolution, Mufwene (2001) exemplified how situations of contact between several languages in colonial plantations resulted in specific selections and assemblages of linguistic forms. More recently, Lupyan and Dale’s (2010) ecolinguistic niche hypothesis points at how differing social contexts may shape language structures, much as ecological niches shape organisms. They suggest in particular that a high percentage of adult L2 learners in a linguistic community may push toward less morphological complexity. Another example of social influences on linguistic forms is the debated positive correlation between the number of speakers of a language and the size of its phonological in- ventory (Hay and Bauer 2007; Bybee 2011). The notion of ecology can also relate to the natural environment in which speakers live and interact, as it is the case in Everett et al.’s contribution. Different phe- nomena can be acknowledged, which may take place simultaneously or not in specific situations. |000|ecology, environment, cultural evolution, language evolution, biological parallels 1458|Donohue2016|Does (the presence or complexity of) tone inversely correlate with dryness of climate? The authors (Everett et al.) suggest that the absence of ambient humidity in the air negatively correlates with the presence of (complex?) lexical tone, partly because of the effect that dry air has to increase the difficulty in achieving precise articulatory targets. There are two main problems with the argumentation used. 1. Conflating ‘tone’ with ‘pitch’ or ‘fundamental frequency’, and mistaking ‘complexity’ with a syllable domain for tone assignment; 2. conflating ‘dry climate’ with the absence of humidity. The authors are not guilty in an absolute sense of these problems, acknowledging that there are complications. Their reliance on pitch contrasts as a proxy for tonal category contrasts, and the use of air humidity rather than (easily available) climate information for the ranges of different languages means that the authors are dealing with ephemeral correlations between proxy features. In the next two sections, I will critique the use of tone primarily to refer to distinctions realised by pitch, and the use of humidity as a powerful explanatory for the existence of tone categories.|000|tone language, correlational studies, tone change, climate 1459|Ember2016|The Everett et al. (2016) article makes a strong and compelling case that some, if not many, aspects of language may be ecological adaptations to climate. Their argument is not just theoretical, but it is supported by many strands of evidence including careful and systematic worldwide cross-cultural research. The purpose of the present comment is not to detract from the ecological argument, but rather to suggest that linguists would do well to also consider some psychological mechanisms to explain language differences and similarities. The ‘personality integration of culture’ model first put forward by John W. M. Whiting and Irvin L. Child (1953) and elaborated further in subsequent years (Whiting and Whiting 1978: 44–45) has inspired many cross-cultural studies. It proposes that some aspects of culture (referred to as ‘maintenance systems’) respond directly to climate and ecological conditions. Maintenance systems include culture traits such as subsistence patterns, means of production, settlement patterns, family structure, and political and social structures. But there are other aspects of culture referred to as ‘projective or expressive systems’ (such as religion, rituals, folklore, games, and art, and crime rates) that are probably not adaptive and might best be ex- plained by psychological mechanisms such as generalization, projection, defense mechanisms, or identification. The model is referred to as ‘personality integration of culture’ because it is believed that the maintenance systems predict the expressive systems via differences in childrearing and resulting differences in modal personality. Although Whiting and Child did not specifically discuss language as part of a culture’s expressive system, in my earlier work with Melvin Ember on language, two expectations about linguistic variation were derived from psychological theorizing about music. Both had empirical support in our worldwide cross-cultural tests.|000|tone language, linguistic complexity, correlational studies 1460|Hammarstroem2016a|Everett, Blasi, and Roberts in their position paper (Everett et al. 2016, henceforth EBRPP) argue that the sound systems of human languages are ecologically adaptive on the grounds that, (1) human and animal be- haviour is ‘generally’ adaptive and (2) that their previous work ‘supports’ the idea that ambient desiccation in the area where a language is spoken leads to the absence of lexical tone (EBRPP: 1). [...] The case, then, boils down to the authors’ second point, i.e. the empirical validity of the idea of ecological adaptivity of human language. The authors cite a number of studies which ‘suggest’ this is the case but without even a modicum of critical scrutiny of these studies (EBRPP: 3–4). Beyond ‘suggestive’ studies, the full weight of the case is rested on the authors’ own recent paper which allegedly (EBRPP: 5) is ‘demonstrating a robust statistical association between ambient desiccation and the absence of lexical tone (Everett et al. 2015)’. It continues ‘through various strategies, from simple intra-linguistic-family and intra-regional regressions to cross-isolate comparisons to global Monte Carlo analyses, we demonstrated that the association was clear and not the result of confounds, such as language or areal relatedness between particular data points’. (EBRPP: 5). An examination below of every claim 1 made in Everett et al. 2015 (henceforth EBRT) reveals that this is outright false.|000|correlational studies, tone language, climate, 1461|Fan2016|This paper focuses on the modals of Mental Ability from a variety of Chinese dialects as well as non-Chinese languages, attempting to build up a mini-map centering on Mental Ability, as a refinement of the current study on modality’s semantic map. This topic involves a variety of important theoretical issues, from the identification of new modal functions on the semantic map which have not been noted before to the exploration of the essential distinction between Mental Ability and Physical Ability. Our interest in this topic is intrigued by three interesting issues concerning the Mental Ability modal verb hui 會 in Standard Mandarin and its cognate or corre- sponding morphemes in Chinese dialects (hui hereafter). The hui-type modals in Chinese dialects have three properties, i.e. their modal functions cover a range from possibility to necessity; the ‘ability’ use is strictly restricted to Mental/Intellectual Ability; and its epistemic use has a strong tendency to have a future-time reference. In addition, its semantic change process has been a controversial issue so far. With the ‘Semantic Map Model’, a new typological tool, we reconstruct hui’s semantic development path by cross-linguistic comparison, the core of which is ‘mental ability ĺ objective necessity ĺ epistemic probability’. It is proved that this reconstruction has its advantages over other proposals in more adequate explanation of the semantic properties of the hui-type modals in Chinese dialects, and it is fully in line with relevant universals in cross- linguistic semantic-connection. It demonstrates that adopting typological approaches to the studies of specific languages can benefit not only the research on linguistic particulars, but also language universals.|000|Chinese dialects, linguistic reconstruction, 1462|Farrar2002|This paper discusses some of the design criteria for a linguistic ontology that can be used to support multilin- gual and crosslinguistic searches and queries on the Internet. It focuses on integrating linguistic concepts and instances into an upper-level ontology, and shows that the result can be understood and analyzed as a feature (structure) system. It considers various types of linguistic structure ranging from segment types to grammati- cal properties and relations, and linguistic inventories including phoneme tables, inflectional paradigms, lexi- cons, and grammatical descriptions.|000|ontology, linguistic annotation, morphology, 1463|Hammarstroem2016b|What would your ideas about language evolution be if there was only one language left on earth? Fortunately, our investigation need not be that impoverished. In the present article, we survey the state of knowledge regarding the kinds of language found among humans, the language inventory, population sizes, time depth, grammatical variation, and other relevant issues that a theory of lan- guage evolution should minimally take into account.|000|language evolution, linguistic diversity 1464|Mueller2010|This paper provides an overview of the annotation design for morphological structure in CDT. The structure of words and phrases is encoded as a dependency tree which can be specified in two different ways: either as an ordinary dependency tree or by means of an abstract operator specification. The dependency notation encodes the internal structure of phrasal compounds and regular NPs, while the operator notation encodes dependency structure within solid orthography compounds and derivationally constructed words. Finally, the paper discusses the semantic labeling system used in CDT and some specific issues related to the annotation of NPs|000|linguistic annotation, morphology 1465|Nelson2004|Preexisting word knowledge is accessed in many cognitive tasks, and this article offers a means for indexing this knowledge so that it can be manipulated or controlled. We offer free association data for 72,000 word pairs, along with over a million entries of related data, such as forward and backward strength, number of competing associates, and printed frequency. A separate file contains the 5,019 normed words, their statistics, and thousands of independently normed rhyme, stem, and fragment cues. Other files provide n  n associative networks for more than 4,000 words and a list of idiosyn- cratic responses for each normed word. The database will be useful for investigators interested in cuing, priming, recognition, network theory, linguistics, and implicit testing applications. They also will be useful for evaluating the predictive value of free association probabilities as compared with other measures, such as similarity ratings and co-occurrence norms. Of several procedures for measuring preexisting strength between two words, the best remains to be determined. The norms may be down- loaded from www.psychonomic.org/archive/.|000|speech norms, word association, word association data 1466|Hill2015|We present SimLex-999, a gold standard resource for evaluating distributional semantic mod- els that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar (Freud, psychology) have a low rating. We show that, via this focus on similarity, SimLex-999 incentivizes the development of models with a different, and arguably wider, range of applications than those which reflect conceptual association. Second, SimLex-999 contains a range of concrete and abstract adjective, noun, and verb pairs, together with an independent rating of concreteness and (free) association strength for each pair. This diversity enables fine-grained analyses of the performance of models on concepts of different types, and consequently greater insight into how architectures can be improved. Further, unlike existing gold standard evaluations, for which automatic approaches have reached or surpassed the inter-annotator agreement ceiling, state-of- the-art models perform well below this ceiling on SimLex-999. There is therefore plenty of scope for SimLex-999 to quantify future improvements to distributional semantic models, guiding the development of the next generation of representation-learning architectures.|000|speech norms, semantic similarity, word association, word association data 1467|Smith2004|This paper explains how regular expressions can be used when searching for morphological structure in the TIGER treebank with the TIGERSearch query tool (Lezius, 2002). It is assumed that the reader has some basic familiarity with the syntax of the TIGER query language (König et al., 2003; König and Lezius, 2003) as well as with the encoding of grammatical structure in the TIGER treebank and can formulate simple queries. For introductory information on the encoding of linguistic information in the treebank, see Smith (2003) 1 . Regular expressions are a powerful tool and will prove to be extremely useful in the formulation of queries. The main advantage that we will see in them is that compact regular expressions can describe an infinite set of possible forms that would be impossible to enumerate. They will prove extremely useful when searching for Lexemes that share an affix or a set of affixes, or in searching for wordforms which share a common inflectional marker or set of markers|000|morpheme detection, morphology, automatic approach 1468|Ostapirat2016|In this paper, I present a new look at the phonological reconstruction of Proto-Miao-Yao (PMY). Particular attention is devoted to the outstanding problems concerning the reconstruction of initial consonants and clusters. A reconstruction of PMY’s velarized feature is proposed as a key to understanding the complex development in modern dialects. Based on the new reconstruction system, I discuss the viability of some proposed lexical items shared between Miao-Yao and Chinese. A modest goal is to place long-range comparisons on firmer ground, based on established sound correspondences.|000|Proto-Miao-Yiao, linguistic reconstruction, Chinese 1469|Templer2007|The core postulate from which this paper proceeds it that there is a widening chasm between small privileged islands of middle-class learners of EVL across the developing world, the "EVL haves" -- and the masses of working-class learners and ordinary poor folks, the "EFL have-nots". Money talks English, and generates vast inequity. Equity and fairness demand we strike out on new paths. Basic human rights in the 21st century suggest that ideally, all individuals on this planet should have a right to learn an efficient, compact lingua franca for trans-cultural communication. In most rural and low-income learning environments, few students have the time or means to climb the ladder to intermediate proficiency in 'full' English. Ohter options need to be experimented with.|000|Basic English, basic vocabulary, education, concept list, mass communication 1470|Winter2016|When doing empirical studies in the field of language evolution, change over time is an inherent di- mension. This tutorial introduces readers to mixed models, Growth Curve Analysis (GCA) and Generalized Additive Models (GAMs). These approaches are ideal for analyzing nonlinear change over time where there are nested dependencies, such as time points within dyad (in repeated inter- action experiments) or time points within chain (in iterated learning experiments). In addition, the tutorial gives recommendations for choices about model fitting. Annotated scripts in the online Supplementary Data provide the reader with R code to serve as a springboard for the reader’s own analyses.|000|mixed models, language change, stochastic analysis 1471|Post2015|[...] Language Contact, another area of Rob’s deep interest, commences with a paper by Monali Longmailai, a Dimasa native speaker, whose work compares phonological, morphological and lexical similarities between unrelated lan- guages Dimasa and Khasi that have been in close contact for many years. A paper from U. V. Joseph and Linda Konnerth carefully considers borrowings from Indic languages into Tiwa, giving evidence for borrowings that may have occurred at different times in the history of the Indic languages and suggesting the importance of understanding the place of Khasi as a contact language as well. Toni Huber makes a thorough study of a word for ‘ancestral deity’, per- haps derived from a form se, using as its basis examples of the word as found in old literary as well as modern spoken sources, and grounding his study in a wealth of ethnographic information. [...] Historical Phonology is yet another of Rob’s specialties, and in this sec- tion Mark W. Post gives a fieldworker’s guide to one of the thorniest problems for the description of languages of this area – tone. The paper focusses on the Tani subgroup of Tibeto-Burman, giving both the new fieldworker and the sea- soned language researcher clear suggestions on both theory and methodology. Martine Mazaudon gives an account of an uncommon feature in Tibeto-Burman languages: vowel length contrasts on open syllables. She shows how those dis- tinctions have been retained, or merged, in a variety of Tamangic languages, relating this to similar processes in the languages of Europe. Sarah Thomason presents a detailed analysis of sound changes involving velars in Montana Sa- lish, a language of North America, giving readers a considered account of a very common occurrence in comparative linguistics: irregularities that can’t easily be explained. [...] Our final section, Language in Culture goes to the heart of yet another key area of Rob’s work. Yankee Modi, a native speaker of several languages of Arunachal Pradesh, writes about the importance of social organization to lan- guage distributions and language change in the Tani area. Tanka Subba’s chapter considers a number of major issues affecting the tribes of the Northeast Indian region today in relation to the wording of the Indian Government policy documents that affect them. Gwendolyn Hyslop’s chapter takes a detailed look at terms for grains and dairy and suggests that only some can be reconstructed for Proto East Bodish in Bhutan, specifically the dairy terms and buckwheat and barley. This suggests that before the breakup of the East Bodish, these were the items farmed. Shobhana Chelliah presents a complete narrative in Meitei – a story grounded in agricultural practices – but the focus of the paper is on the rhetorical structures of the story telling and why the study of these is so essen- tial for language documentation. George van Driem’s wide-ranging paper places language in a broader context, as part of human evolution. Citing biologists, anthropologists and other scientists as well as linguists and authors of literary note, the paper treats language as life form, constantly growing and changing. To round off the volume, Roger Blench presents a brief overview of the litera- ture from the emerging field of ophresiology, the study of taste and smell, exemplifying it using the wide range of specialised taste terms in the Kman lan- guage of Arunachal Pradesh (also known as Miju or Miju Mishmi). |000|Sino-Tibetan, Austro-Asiatic, language change, historical linguistics, 1472|McRae2005|Semantic features have provided insight into numerous behavioral phenomena concerning concepts, categorization, and semantic memory in adults, children, and neuropsychological populations. Numerous theories and models in these areas are based on representations and computations involving semantic features. Consequently, empirically derived semantic feature production norms have played, and continue to play, a highly useful role in these domains. This article describes a set of feature norms collected from approximately 725 participants for 541 living (dog) and nonliving (chair) basic-level concepts, the largest such set of norms developed to date. This article describes the norms and numerous statistics associated with them. Our aim is to make these norms available to facilitate other research, while obviating the need to repeat the labor-intensive methods involved in collecting and analyzing such norms. The full set of norms may be downloaded from www.psychonomic.org/archive.|000|speech norms, semantic norms, concept list, dataset 1473|Alonso2005|Subjective estimations of age of acquisition (AoA) for a large pool of Spanish words were collected from college students in Spain. The average score for each word (based on 50 individual responses, on a scale from 1 to 11) was taken as an AoA indicator, and normative values for a total of 7,039 single words are provided as supplemental materials. Beyond its intrinsic value as a standalone corpus, the largest of its kind for Spanish, the value of the database is enhanced by the fact that it contains most of the words that are currently included in other normative studies, allowing for a more complete characterization of the lexical stimuli that are usually employed in studies with Spanish-speaking participants. The norms are available for downloading as supplemental materials with this article.|000|speech norms, age of acquisition, dataset 1474|Sze2014|The Chinese language has more native speakers than any other language, but research on the reading of Chinese characters is still not as well-developed as it is for the reading of words in alphabetic languages. Two areas notably lacking are the paucity of megastudies in Chinese and the relatively infrequent use of the lexical decision paradigm to investigate single-character recognition. The Chinese Lexicon Project, described in this article, is a database of lexical decision latencies for 2,500 Chinese single characters in simplified script, collected from a sample of native mainland Chinese (Mandarin) speakers (N = 35). This resource will provide a valuable adjunct to influential mega-databases, such as the English, French, and Dutch Lexicon Projects. Using two separate analyses, some advantages associated with megastudies are exemplified. These include the selection of the strongest measure to represent Chinese character frequency (Cai & Brysbaert's (PLoS ONE 5(6): e10729, 2010) subtitle contextual diversity frequency count), and the conducting of virtual studies to replicate and clarify existing findings. The unique morpho-syllabic nature of the Chinese writing system makes it a valuable case study for functional language contrasts. Moreover, this is the first publicly available large-scale repository of behavioral responses pertaining to Chinese language processing (the behavioral dataset is attached to this article, as a supplemental file available for download). For these reasons, the data should be of substantial interest to psychologists, linguists, and other researchers.|000|speech norms, Chinese, lexical database, dataset 1475|Medler2005|Concepts vary markedly in the degree to which they make reference to different domains of sensory-motor experience. For example, the concept thunder is defined primarily in terms of its sound, and since it has no visual representation and cannot be touched or manipulated, makes no reference to color, shape, motion, tactile, or kinesthetic attributes. In contrast, the concept tomato includes salient color and shape attributes, but no sound or motion characteristics. This page contains mean perceptual attribute ratings in 4 sensory-motor domains (Sound, Color, Manipulation, Motion) for 1402 words, as well as Emotion ratings reflecting intensity and valence of emotional associations for the same words. These norms were collected from 342 undergraduate students, using an online form. For the sensory-motor domains (Sound, Color, Motion, and Manipulation), each participant was presented with a subset of the words and asked to rate how important a particular attibute domain was to the meaning of each word on a scale from 0 (not at all important) to 6 (very important). Each participant rated the words on only a single sensory-motor domain. For the Emotion ratings, paricipants were asked to rate how negative or positive were the emotions they associated with each word, using a scale from -6 (very negative feeling) to +6 (very positive feeling), with 0 being a neutral feeling. |000|speech norms, perception, attribute rating, English, 1477|Clark2004|The Paivio, Yuille, and Madigan (1968) norms for 925 nouns were extended in two ways. The first extension involved the collecting of a much more extensive and diverse set of properties from original ratings and other sources. Factor analysis of 32 properties identified 9 orthogonal factors and demonstrated both the redundancy among various measures and the tendency for some attributes (e.g., age of acquisition) to load on multiple factors. The second extension collected basic ratings of imagery, familiarity, and a new age of acquisition measure for a larger pool of 2,311 words, including parts of speech other than nouns. The analysis of these ratings and supplementary statistics computed for the words (e.g., number of syllables, Kucera-Francis frequency) demonstrated again the relative independence of various measures and the importance of obtaining diverse properties for such norms. Implications and directions for future research are considered. The full set of new norms may be downloaded from www. psychonomic.org/archive/.|000|speech norms, English, word ratings 1478|Bird2001|Age of acquisition and imageability ratings were collected for 2,645 words, including 892 verbs and 213 function words. Words that were ambiguous as to grammatical category were disambiguated: Verbs were shown in their infinitival form, and nouns (where appropriate) were preceded by the indefinite article (such as to crack and a crack). Subjects were speakers of British English selected from a wide age range, so that differences in the responses across age groups could be compared. Within the subset of early acquired noun/verb homonyms, the verb forms were rated as later acquired than the nouns, and the verb homonyms of high-imageability nouns were rated as significantly less imageable than their noun counterparts. A small number of words received significantly earlier or later age of acquisition ratings when the 20-40 years and 50-80 years age groups were compared. These tend to comprise words that have come to be used more frequently in recent years (either through technological advances or social change), or those that have fallen out of common usage. Regression analyses showed that although word length, familiarity, and concreteness make independent contributions to the age of acquisition measure, frequency and imageability are the most important predictors of rated age of acquisition.|000|speech norms, age of acquisition, imagability, English 1479|Ho2016|This article reviews the newly published *Old Chinese: A New Reconstruction* by Baxter and Sagart from three perspectives: attigtude toward langauges, methodology and types of evidence. This book contains many defects that cannot be overlooked and some suggestions are given regarding how to prevent these defects. This review consists of six sections and examines the following problems: 1) One rhyme group with many vowels, 2) Retention of old features in the Xiayi dialect, 3) Uising phonetic transliteration to prove final coda, 4) 午 and 五 not homophonous, 5) Calling a horse a deer and 6) "A mountan is a river bank." After careful investigation, we regret to say thtat many of the errors from this new book could have been avoided. These errors reflect the outdated concepts of the authors and the insufficiency of their basic training. In summary this monograph is rather disappointing. |000|Old Chinese, review, linguistic reconstruction 1480|Clark2004|The PYM norms included imagery ratings, concreteness ratings, and meaningfulness values for 925 words (all nouns), with familiarity ratings also available at a later date. :comment:`Description of the PYM norms which were apparently very influential, by` @Paivio1968 |371|speech norms, PYM norms, word imagery, familiarity, concreteness, word ratings 1481|Paivio1968|Groups of ss, 17-46 yr. old college students, were used to scale 925 nouns on abstractness-concreteness (c) imagery (i), and meaningfulness (m). Concreteness was defined in terms of drectness of reference to sense experience, and i in terms of word's capacity to arouse nonverbal images; c and i werer rated on 7-point scales. Meaningfulness was defined in terms of the mean number of written associations in 30 sec. The mean scale values for these variables are presented for each of teh 925 nouns. Also reported are the intercorrelations of the variables, together with an examination of the words for which c, i,. and m values are most clearly differentiated,; and reliability data, including comparisons with scale values for the variables from other studies.|000|concreteness, word imagery, imageability, word ratings, speech norms 1482|Clark2004|The set of properties covered in the original PYM norms included word concreteness, imageability, and meaningfulness. Norms were soon available for ratings of word familiarity as well. Over the years, a number of researchers have reported additional attributes for the core PYM items, several of which are included in the following analyses.|372|PYM norms 1483|Gilhooly1980|Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words of varying length and frequency of occurrence are presented. The words can all be used as nouns. Intergroup reliabilities are satisfactory on all attributes. Correlations with previous word lists are significant, and the intercorrelations between measures match previous findings.|000|age of acquisition, concreteness, imagability, familiarity, speech norms, ambiguity, dataset 1484|Keuleers2011|Polysemy The final topic we will address is the effect of polysemy—that is, the existence of different meanings for the same word. As can be seen from Table 14, word recognition research has suggested a small facilitation effect of number of meanings. Although there is a similar tendency in the BLP, none of the effects reached significance in the virtual experiments. Rodd, Gaskell, and Marslen-Wilson (2000) criticized the existing research on polysemy in word recognition because it failed to make a distinction between words with multiple related senses (e.g., the adjective uniform [similar in form] and the noun uniform [clothing worn by a particular group]) and words with multiple unrelated meanings (e.g., bank [financial institution] and bank [land alongside a river]). Rodd et al.’s results suggested that the effect of polysemy was limited to words with true polysemy (multiple related senses). As Table 15 shows, the same tendency is found in the BLP data, but the effects are again smaller than in the original experiment. In addition, the BLP data suggest that both number of meanings and number of senses impact word recognition.|296|polysemy, speech norms, lexical decision task, word processing 1485|Rodd2000|There have been several reports in the literature of faster visual lexical decisions to words that are semantically ambiguous. All current models of this ambiguity advantage assume that it is the presence of multiple unrelated meanings that produce this benefit. A set of three lexical decision experiments reported here challenge this assumption. We contrast the ambiguity seen in words like bark, which have multiple unrelated meanings, with words that have multiple related word senses (e.g., twist). In all three experiments we find that while multiple word senses do produce faster responses, ambiguity between multiple meanings delays recognition. These results suggest that, while competition between the multiple meanings of ambiguous words delays their recognition, the rich semantic representations associated with words with many senses facilitate their recognition.|000|semantic similarity, ambiguity, polysemy, homophony 1486|Rodd2000|As mentioned earlier, lexicographers routinely distinguish between word meanings and word senses when they structure dictionary entries. These dictionary entries provide a simple, yet reliable way to classify words as being ambiguous between multiple meanings or between multiple senses. We have used the entries in The Online Wordsmyth English Dictionary–Thesaurus (Parks, Ray, & Bland, 1998). 1 As we report later, the classifications made in this dictionary correspond closely to participants’ judgements about the relatedness of the meanings of ambiguous words. Millis and Button (1989) use tell as an example of a word that has many meanings. Participants produced up to four meanings for this word. These were to inform, to explain, to understand, and to relate in detail. Although there are clearly important differences between these four definitions, these differences are relatively subtle; all four definitions relate to a single core meaning of the word, to do with providing information. All these definitions are included as senses within a single entry in the Wordsmyth dictionary. This is just one of several examples of high-ambiguity items that are ambiguous between multiple word senses rather than between unrelated word meanings.|248|polysemy, homophony, meaning, sense, 1487|Rodd2000|Many words are semantically ambiguous, and can refer to more than one concept. For example, bark can refer either to a part of a tree or to the sound made by a dog. To understand such words, we must select one of these different interpretations, normally on the basis of the context in which the word occurs. Words can be ambiguous in different ways; a word like bark has two semantically unrelated meanings, which seem to share the same written and spoken form purely by chance. More common than this type of accidental ambiguity is the systematic ambiguity between related word senses. For example, the word *twist* has a range of dictionary definitions including *to make into a coil or spiral, to operate by turning, to alter the shape of, to misconstrue the meaning of, to wrench or sprain, and to squirm or writhe*. The meaning of this word varies systematically according to the context in which the word is used; for example, there are important differences between what it means to twist an ankle compared with to twist the truth. However, although the meaning of the word is ambiguous between these different interpretations, the interpretations are closely related to each other both etymologically and semantically; this is quite unlike the ambiguity for a word like bark.|245|polysemy, colexification, homophony 1488|Rodd2000|Homonyms, such as the two meanings of bark, are considered to be different words that, by chance, share the same orthographic and phonological form. Specifically, homographs are different words that share the same orthographic form, while homophones share the same phonological form. On the other hand, [pb] a polysemous word like twist is considered to be a single word that has more than one sense. Despite this linguistic distinction between homonymy and polysemy, psychologists have often used the two terms interchangeably (see Klein and Murphy (2001) for a discussion of this issue).|245f|polysemy, homophony, definition 1489|Rodd2000|The ambiguity advantage is the finding that visual lexical decisions are faster for words that are semantically ambiguous. Early reports of an ambiguity advantage came from Rubenstein, Garfield, Millikan (1970) and Jastrzembski (1981), who found faster visual lexical decisions for ambiguous words than for unambiguous words matched for overall frequency. However, Gernsbacher (1984) discussed a possible confound between ambiguity and familiarity in these experiments; words with more than one meaning are typically more familiar. She found no effect of ambiguity over and above familiar- ity.|246|familiarity, ambiguity, correlational studies 1490|Rodd2000|In summary, all current accounts of the ambiguity advantage assume that it is ambiguity between unrelated meanings that produces the ambiguity advantage. None of these models explicitly predict what the effect of multiple word senses should be. For those models in which the benefit arises because of the presence of multiple localist lexical entries for ambiguous words, the presence of a benefit for words with multiple senses will depend on whether multiple senses are represented as separate entries within the network. Those models that involve distributed semantic representations predict that words with multiple senses may show a processing advantage, but that this should be reduced compared with words with multiple meanings.|248|ambiguity, ambiguity advantage, lexical decision task, polysemy 1491|Grimes1959|This paper suggests a method of quantifying judgments of relative 'closeness' or 'distance' between related languages, and gives some results of its application. There is no speech community in which all speakers' speech behavior is identical. The linguist defines a homogeneous speech community as one in which the members' linguistic patterns are alike except for haphazard variations -haphazard as to type and magnitude and also as to the individuals producing them. It is questionable whether even this sort of speech community actually exists, but it is a useful fiction.|000|sound change, phonetic distance, phonetic similarity, quantitative analysis 1492|Grimes1959|We begin with the comparative method. One conclusion commonly reached from a comparison of related languages can be phrased: Languages A and B are closely related to each other; A and C are not so closely related.5 It would be useful to know just HOW MUCH more closely related A and B are than A and C.|599|comparative method, quantitative analysis, phonetic distance 1493|Grimes1959|The particulars used here to measure divergence among the Romance languages are 169 sets of phonological correspondences, which include almost all the important sound correspondences in Romance. For each set, we first determined how the sounds are articulated in each language in the positions for which the set is valid; that is, we made an allophonic transcription. We then processed each set of correspondences by comparing the phonetic value for each language in the set with the phonetic value for every other language. For the 7 languages, we made 21 pairwise comparisons for each set of correspondences. (Out of curiosity, we further included a conjectured pronunciation for Classical Latin in each set of correspondences, making 28 pairwise comparisons for each set.) In comparing two sounds, we first decided whether they were perceptually the same or perceptually different. Where they differed, the next step was to determine how much they differed.|-|Romance, data preparation, dataset, phonetic transcription, sound correspondences, phonetic similarity 1494|Grimes1959|After testing certain phonetic parameters, starting with Pike's concept of rank of stricture,10 we saw that the articulation of any sound in the data can be described in terms of six independent variables: (1) point of articulation, (2) constriction of the air stream in the median line of the mouth, (3) effective timing [pb] of the central constriction, (4) secondary shaping of the air stream in the mouth, (5) velic action, and (6) laryngeal action. A sound can be thought of as a point in a six-dimensional space, its location in that space identified by its place on the scales that measure the variables.|601f|distinctive features, Romance, phonetic distance, Hamming distance 1495|Chang2014|书稿是上古音研究的重要基础性著作,对可信的有韵金文进行了全面汇辑和研究。著作在回顾金文语音 研究的成果、经验和不足基础上,从金文用韵实际出发, 对金文用韵的形式与体例进行了总结归纳,其 中 " 主从韵 " 、 " 连锁韵 " 为著者所新发掘,为传世文献韵文所不见,前贤时修所未及见。此外,著作还对 612 个金 文入韵字、 1226 条金文韵段、 488 篇可信的有韵金文进行了描写。美国著名华裔学者杨福绵曾设想单 纯利用古文字材料研究上古音,以避开传世文献因传写错 讹、改动带来的负面影响,数十年来,未竟其功。 盖从事上古音研究者,多不研究古文字;研究古文字者,多不研究音韵学。而古文字材料整理的相关成果, 罕有符 合音韵学者用来研究上古音研究要求的,故利用古文字材料研究上古音存在一些困难。著者研治古 文字与上古音多年,遂以谫陋之资,为此至关重要之业。 |000|Bronze inscriptions, rhymes, Old Chinese, rhyme patterns, rhyme analysis 1496|Chang2014|This work contains numerous rhymes of bronze inscriptions which may be of value for additional rhyme analyses.|000|Bronze inscriptions, rhyme analysis, rhyme patterns, Old Chinese 1497|Campbell2013|+--------------------------------+----------------------------------------------------------------+ | Biological evolution | Linguistic change | +================================+================================================================+ | Discrete character | Lexical item (sometimes phonological or morphosyntactic trait) | +--------------------------------+----------------------------------------------------------------+ | Homology | Cognate | +--------------------------------+----------------------------------------------------------------+ | Mutation | Innovation, change | +--------------------------------+----------------------------------------------------------------+ | Natural selection | Social evaluation of forms, causes of change generally | +--------------------------------+----------------------------------------------------------------+ | Cladogenesis | Diversification (subgrouping) | +--------------------------------+----------------------------------------------------------------+ | Horizontal gene transfer | Borrowing, language contact | +--------------------------------+----------------------------------------------------------------+ | Hybrid (plants) | Mixed language (very rare) | +--------------------------------+----------------------------------------------------------------+ | Geographic cline, ring species | Cialects, dialect chain/continuum | +--------------------------------+----------------------------------------------------------------+ | Fossil | Relic, archaism | +--------------------------------+----------------------------------------------------------------+ | Extinction | Language death, extinction | +--------------------------------+----------------------------------------------------------------+ |470|biological parallels, linguistics, biology, language evolution, biological evolution 1498|Park2016|Mei’s `*`s- and Baxter-Sagart’s `*`N- have opposite effects, both phonetically and morphosyntactically: the `*`s- devoices the following root initial, and it is valency-increasing (associated with such syntactic and semantic features as transitive, causative, denominative, active, directive, outer-directed, intensive); *N voices the following initial and it is valency-decreasing (intransitive, anticausative, stative, inactive, inner-directed). With *s-, the voiced variant is the simplex, from which the voiceless one derives; with `*`N- , the direction of derivation is the opposite.|000|voicing, devoicing, Old Chinese, prefix 1499|Park2016|There are two major problems with this. First, apart from the fact that the KwVn rhymes with the TwVn < *Ton, it is hard to find evidence for one KwVn word not rhyming with another KwVn. Second, repeated words or different words with shared phonetic graphic components in the KwVn type in the Shijing rhyme with an ‘unambiguous *-on’ in one case and with an ‘unambiguous *-an’ in another. It is also notable that no trace of vowel contrast that corresponds the assumed *-wa-/*-o- distinction after velar or uvular initials is known to exist in Sino-Tibeto-Burman cognate words or early Chinese loan words in Tai, Vietnamese, Japanese and Korean. Note the following recurring phonophoric series. The unambiguous *-on/*-ot/*-or and *-an/*-at/*-ar by the RVH are marked by square brackets.|67|rounded vowel hypothesis, Sergey Yakhontov, Old Chinese, 1500|Park2016|:comment:`Park gives further evidence for rhyming behaviour and reconstruction that shows the problems of the rounded vowel hypothesis. These should be checked with the network.`|68-70|rounded vowel hypothesis, Old Chinese, examples, 1501|Rodd2000|The findings show a fundamental difference between polysemous words and homophonous words. Polysemous words are apparently more easy to understand for the speakers, so they have an advantage of being processed and serve apparently a fundamental function in guaranteein effective communication. Homophonies, that is, really ambiguous words, on the other hand, are not helping communications. Note that these findings support findings on cross-linguistic polysemies, namely, that the meanings with higher polysemy degree are earlier learned by children, and that they are more frequent in communication. This agains shows that a tendency towards polysemy seems to reflect a potentially important communicative function. Alternatively, however, it could just reflect the fact that words which are acquired earlier are more likely to become polysemous, since the longer words are in a language, or in the language of a speaker, the more chance they have to acquire more meanings. Think of Starostin's 1989 idea of "aging words" in this context. No matter how we turn it: the whole complex of polysemy, age of acquisition, frequency, and stability is highly interesting in the context of semantic change and cognition, and this paper discusses many important points from a psycholinguistic perspective which should be dealt more clearly with in historical linguistics.|000|polysemy, age of acquisition, frequency, homophony, colexification, ambiguity 1502|Narasimhan2015|Most state-of-the-art systems today produce morphological analysis based only on orthographic patterns. In contrast, we propose a model for unsupervised morphological analysis that integrates orthographic and semantic views of words. We model word formation in terms of morphological chains, from base words to the observed words, breaking the chains into parent-child relations. We use log-linear models with morpheme and wordlevel features to predict possible parents, including their modifications, for each word. The limited set of candidate parents for each word render contrastive estimation feasible. Our model consistently matches or outperforms five state-of-the-art systems on Arabic, English and Turkish|000|morpheme detection, semantics, unsupervised method, hierarchies 1503|Narasimhan2015|They use some semantic information, it is not yet clear what exactly, as well as a hierarchy, and this is exactly, what was somehow getting clear when looking into certain word examples: * we need the semantic component for compounding / morphology, since otherwise we do not know whether it makes sense to judge that two words are truly consisting of similar parts (think of polysemy/homophony here) * we need the hierarchic component to capture certain interesting cases and make the algorithms more efficient. Linear evaluation, this seems to be clear by now, is not really working well, and the evaluation should probably be hierarchic, following a newtork from the source to the targets |000|morpheme detection, unsupervised method, semantic similarity, 1504|Steele2010|Darwin saw similarities between the evolution of species and the evolution of languages, and it is now widely accepted that similarities between related languages can often be interpreted in terms of a bifurcating descent history (‘phylogenesis’). Such interpretations are supported when the distributions of shared and unshared traits (for example, in terms of etymological roots for elements of basic vocabulary) are analysed using tree-building techniques and found to be well-explained by a phylogenetic model. In this article, we question the demographic assumption which is sometimes made when a tree-building approach has been taken to a set of cultures or languages, namely that the resulting tree is also representative of a bifurcating population history. Using historical census data relating to Gaelic- and English-speaking inhabitants of Sutherland (Highland Scotland), we have explored the dynamics of language death due to language shift, representing the extreme case of lack of congruence between the genetic and the culture–historical processes. Such cases highlight the important role of selective cultural migration (or shifting between branches) in determining the extinction rates of different languages on such trees.|000|language tree, gene tree, biological parallels, co-evolution 1505|Steele2010|What is important about this paper is the fact that the authors try to show that the evolution of a population does not necessarily coincide with the evolution of a language.|000|language evolution, population, gene tree, 1506|DaMilano2016|This papers addresses certain parallels in linguistic and biological evolution. Strangely, without quoting specific earlier literature, it is a less successful review, but what makes it worth studying the paper is the fact that they draw attention to some other aspects, like Cavall-Sforza, mentioning even a recent work from 2013.|000|language evolution, biological parallels, biological evolution, review 1507|DaMilano2016|As Grandi (2011) underlined, biological evolution, cultural evolution and linguistic evolution are intertwined in a very complex way. On the one hand, language played a fundamental role in the cultural evolution of modern humans. O the other hand, cultural evolution, while unravelling according to different dynamics in comparison to biological evolution, can orient it. Moreover, biological sciences and language sciences, since early periods, ahve applied similar methods in order to reconstruct phylogenies. After the discovery of DNA and, especially, with the development of population genetics, analogies between languages and genes have been developed and concepts such as "vertical transmission", "drift", "clock-like mutation", and the "founder effect" were applied to both fields. Through the analysis of literature, we have shown, on the one hand, that such analogies have been fruitfully used in order to reconstruct similar processes. On the other hand, we stressed the importance of proceeding with all due caution in establishing analogies, and, especially in identifying uities of comparison.|352|biological parallels, analogy, biological evolution, language evolution 1508|DaMilano2016|Cognates are the traditional characters used to reconstruct phylogenies. They are recognized by means of the traditional comparative method as "individual morphemes, consisting of a form and a function or meaning and demonstrably inherited from a unique ancestor" (@Nichols2008 764). They are typically non-homoplastic, since they are supposed not to arise by chance and recur independently. Phyletic characters are similar to cognates, since they are changes which, in principle, can recur independently, but which are sufficiently specific to be recognized as unique in a particular family. An example of a phylogeny based on cognates are the works of @Ringe2002 and @Gray2003 on the Indo-European family.|350|cognates, homoplasy, 1509|Petroni2010|Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with the same meaning and averages over all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated with some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate than a smaller one, since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists, studying the stability of the different items. In this paper we tackle the problem with an automated methodology based only on our normalized Levenshtein distance. With this approach, the program of an automated reconstruction of language relationships is completed.|000|Levenshtein distance, Indo-European, rate of change, lexical change 1510|Evans2011|Returning to semantic typology, a fundamental and still unanswered questions is: are some semantic domains more variable cross-linguistically, and if so which? It is a good bet that event descriptions are one of the most variably lexicalized domains. Yet we won’t be able to test this hunch properly until we have extensive cross-linguistic data for the verb lexicon that is comparable in detail to that for the nominal lexicon. My experience across a range of fieldwork projects is that this is going to depend on a whole range of interdisciplinary collaborations—as varied as [pb] the realms of knowledge that any language can represent — which greatly broaden the situations in which language in use can be encountered and noted.|207f|semantic universals, universality, fieldwork, semantic similarity, 1511|Evans2011|Paper discusses problems of retrieving semantic specifications in fieldwork, since our linguistic bias often prevents us from looking at the specifics of languages and biases our approach.|000|semantic similarity, universality, semantic universals 1512|Gronroos2014|Morfessor is a family of methods for learning morphological segmentations of words based on unannotated data. We introduce a new variant of Morfessor, FlatCat, that applies a hid- den Markov model structure. It builds on previous work on Morfessor, sharing model components with the popular Morfessor Baseline and Categories-MAP variants. Our experiments show that while unsupervised FlatCat does not reach the accuracy of Categories-MAP, with semi-supervised learning it provides state-of-the-art results in the Morpho Challenge 2010 tasks for English, Finnish, and Turkish.|000|morpheme detection, Morfessor, hierarchies, unsupervised method 1513|Gronroos2014|This method was tested quickly on some 1000 words, but it turned out to be very disappointing, either due to wrong application, or to the limits of the method itself. This is, unfortunately, not entirely clear. Yet the hierarchical aspect they mention in the beginning of the paper may be interesting to be followed up, since one of the last convictions of myself was that hierarchies and semantic information are indispensible for fruitful morpheme detection|000|Morfessor, morpheme detection, hierarchies 1514|Creutz2005|This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology of a natural language. A probabilistic maximum a posteriori model is utilized, which builds hierarchical representations for a set of morphs, which are morpheme-like units discovered from unannotated text corpora. The induced morph lexicon stores parameters related to both the “meaning” and “form” of the morphs it contains. These parameters affect the role of the morphs in words. The model is implemented in a task of unsupervised morpheme segmentation of Finnish and English words. Very good results are obtained for Finnish and almost as good results are obtained in the English task.|000|unsupervised method, morpheme detection, minimum description length, Bayesian approaches, 1515|Creutz2005|A central question regarding morpheme segmentation is the compositionality of meaning and form. If the meaning of a word is transparent in the sense that it is the “sum [pb] of the meaning of the parts”, then the word can be split into the parts, which are the morphemes, e.g., English ‘foot+print’, ‘joy+ful+ness’, ‘play+er+s’. However, it is not uncommon that the form does consist of several morphemes, which are the smallest elements of syntax, but the meaning is not entirely compositional, e.g., English ‘foot+man’ (male servant wearing a uniform), ‘joy+stick’ (control device), ‘sky+scrap+er’ (very tall building).|1f|compositionality, meaning, semantics, word derivation, motivation, 1516|Creutz2005|De Marcken (1996) proposes a model for unsupervised language acquisition, which involves two central concepts: composition and perturbation. Composition means that an entry in the lexicon is composed of other entries, e.g., ‘joystick’ is composed of ‘joy’ and ‘stick’. Perturbation means that changes are introduced that give the whole a unique identity, e.g., the meaning of ‘joystick’ is not exactly the result of the composition of the parts. This framework is similar to the class hierarchy of many programming languages, where classes can modify default behaviors that are inherited from superclasse|2|morphology, word derivation, compounding, perturbation, composition, compositionality 1517|Creutz2005|In the Morfessor Baseline, a lexicon of morphs is constructed, so that it is possible to form any word in the corpus by the concatenation of some morphs. Each word in the corpus is then rewritten as a sequence of morph pointers, which point to entries in the lexicon. The aim is to find the optimal lexicon and segmentation, i.e., a set of morphs that is concise, and moreover gives a concise representation for the corpus.|2|Morfessor, morpheme detection, unsupervised method 1518|Creutz2005|A consequence of this kind of approach is that frequent word forms remain unsplit, whereas rare word forms are excessively split. This follows from the fact that the most concise representation is obtained when any frequent word is stored as a whole in the lexicon (e.g., English ‘having’, ‘soldiers’, ‘states’, ‘seemed’), whereas rarely occurring words are better coded in parts (e.g., ‘or+p+han’, ‘s+ed+it+ious’, ‘vol+can+o’). There is no proper notion of compositionality in the model, because frequent strings are usually kept together whereas rare strings are split. In contrast with the model proposed by de Marcken, the lexicon is flat instead of hierarchical, which means that any possible inner structure of the morphs is lost.|2|morpheme detection, Morfessor, unsupervised method, hierarchies 1519|Brenner2015|In a series of empirical studies using lexical decision and phoneme monitoring this line of research addressed the nature and locus of a possible concept type congruence effect, as predicted by the CTD. A facilitating effect of congruent determiner‐noun combinations could be shown in German and English auditory lexical decision, but not in the visual versions. The absence of the congruence effect in phoneme monitoring showed a post‐lexical locus, not interfering with lexical activation or selection processes. Due to its facilitating nature, it can be assumed to reflect earlier post‐lexical build‐up of noun phrases, rather than later post‐lexical checking mechanisms. |211|concept types, lexical decision task, conceptualization, denotation 1520|Smit2014|Morfessor is a family of probabilistic machine learning methods for finding the morphological segmentation from raw text data. Recent developments include the de- velopment of semi-supervised methods for utilizing annotated data. Morfessor 2.0 is a rewrite of the original, widely-used Morfessor 1.0 software, with well documented command-line tools and library interface. It includes new features such as semi-supervised learning, online training, and integrated evaluation code.|000|Morfessor, morpheme detection, introduction, unsupervised method 1521|Evans2014|In this paper, I lay out the workings of the rather unusual system of positional verbs found in Nen, a language of the Morehead-Maro family in Morehead district, Western Province, Papua New Guinea. Nen is unusual in its lexicalization patterns: it has very few verbs that are intransitive, with most verbs that tend to be intransitive cross-linguistically realized as morphologically middle verbs, including ‘talk’, ‘work’, ‘descend’, and so on. Within the fifty attested morphologically intransitive verbs, forty-five comprise an interesting class of “positional verbs,” the subject of this paper; the others are ‘be’, its derivatives ‘come’ and ‘go’ (lit. ‘be hither’ and ‘be thither’), and ‘walk’. Positional verbs denote spatial positions and postures like ‘be sitting’, ‘be up high’, ‘be erected (of a building)’, ‘be open’, ‘be in a tree-fork’, ‘be at the end of something’. Positional verbs differ from regular verbs in lacking in¿nitives, in possessing a special “stative” aspect inÀection and an unusual system for building a fourway number system (building large plurals by combining singular and dual markers), and in participating in a productive three-way alternation between positional statives (like ‘be high’), placement transitives (like ‘put up high’), and get-into-position middles (like ‘get into a high position’). The latter two types are more like normal verbs (for example, they possess in¿nitives and participate in the normal TAM series), but they are formally derived from the positionals. The paper concludes by situating the Nen system regionally and typologi- cally. Similar systems are found in related languages, but with the exception of the Eastern Torres Strait language Meriam Mer, no comparable system has been reported anywhere in New Guinea—the “classi¿catory verbs” known from languages like Ku Waru are quite different, serving primarily to classify objects rather than to give spatial dispositions. On the other hand, rather similar systems are found in some parts of Meso-America and the Amazon.|000|positional verbs, semantics, denotation 1522|Evans2007|This paper explores the vocabulary of mental states, knowing, thinking and remembering in Dalabon, an Australian Aboriginal language. Though Dalabon has a rich vocabulary for the overall semantic domain of attention, thought, memory and forgetting, there are no expressions specifi cally dedicated to remembering. Rather, the ontology of cognitive states and processes is categorized into shortterm vs long-term mental states and events. Aspectual choices are used to express transitions into mental states and events (‘remembering’ is ‘coming to have in mind’, and ‘forgetting’ is ‘coming to not have in mind’), without the entailments found in English, which distinguishes previously experienced mental states (‘remember’, ‘remind’) or mental states experienced for the fi rst time (‘get the idea that’, ‘realize’).|000|denotation, mental state, lexical structure, 1523|Virpioja2013|Morfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. After the release of the public implementations of the Morfessor Baseline and Categories-MAP methods in 2005, they have become popular as automatic tools for processing morphologically complex languages for applications such as speech recognition and machine translation. This report describes a new implementation of the Morfessor Baseline method. The new version not only fixes the main restrictions of the previous software, but also includes recent methodological extensions such as semi-supervised learning, which can make use of small amounts of manually segmented words. Experimental results for the various features of the implementation are reported for English and Finnish segmentation tasks.|000|Morfessor, introduction, morpheme detection, unsupervised method, report 1524|Roark2012|In this paper, we present a new collection of open-source software libraries that provides command line binary utilities and library classes and functions for compiling regular expression and context-sensitive rewrite rules into finite-state transducers, and for n-gram language modeling. The OpenGrm libraries use the OpenFst library to provide an efficient encoding of grammars and general algorithms for building, modifying and applying models.|000|finite state transducer, finite state grammar, weighted finite state transducer, Thrax 1525|Roark2012|This paper describes some basic aspects of finite state grammars, and also how these can be used to be learned, and how n-grams can be included into the finite state modeling process.|000|finite state transducer, n-gram model, software 1526|Harris1955|In this article which has been quoted often since its publication, especially in the context of morpheme detection, Harris describes a simple method to detect morpheme boundaries in written texts by counting the characters preceding and following each given character and looking for local peaks. @Benden2005 has used this to further elaborate measures for morpheme detection, and @Bordag2008 uses this measure to further elaborate on it, proposing smoothing using trigrams, bigrams, etc. An even more recent approach using Harris' initial idea is by @Griffiths2015, where information content is used to assess peaks in text, and different thresholds are used to test which provides the best segmentation. Note that this measure is still only linear, and it probably could be further advanced not only by modifying the statistics but also and especially by adding a hierarchical component.|000|morpheme detection, unsupervised method, information content, 1527|Pearce2010|Grouping and boundary perception are central to many aspects of sensory processing in cognition. We present a comparative study of recently published computational models of boundary perception in music. In doing so, we make three contributions. First, we hypothesise a relationship between expectation and grouping in auditory perception, and introduce a novel information-theoretic model of perceptual segmentation to test the hypothesis. Although we apply the model to musical melody, it is applicable in principle to sequential grouping in other areas of cognition. Second, we address a methodological consideration in the analysis of ambiguous stimuli that produce different percepts between individuals. We propose and demonstrate a solution to this problem, based on clustering of participants prior to analysis. Third, we conduct the first comparative analysis of probabilistic-learning and rule-based models of perceptual grouping in music. In spite of having only unsupervised exposure to music, the model performs comparably to rule-based models based on expert musical knowledge, supporting a role for probabilistic learning in perceptual segmentation of music.|000|boundary detection, music, unsupervised method 1528|Pearce2010|This article is the basis of the method by @Griffiths2015 for boundary detection (morpheme and phoneme boundaries) in written text.|000|boundary detection, morpheme detection, unsupervised method 1529|Hammastroem2006|We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random charac- ters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.|000|morpheme detection, unsupervised method, 1530|Hammastroem2006|This paper is in the line of @Harris1955 and others, like @Griffiths2015, and since it seems much easier to implement it might be worth giving it a try. An important point is made on page 82, where the author describes a correction for expected frequencies using entropy, which might turn out to be useful in other contexts as well.|000|morpheme detection, expected frequency, expected distribution 1531|Berdicevskis2016|We test whether the functionality (non-redundancy) of morphological features can serve as a predictor of the survivability of those features in the course of language change. We apply a recently proposed method of measuring functionality of a feature by estimating its importance for the performance of an automatic parser to the Slavic language group. We find that the functionality of a Common Slavic grammeme, together with the functionality of its category, is a significant predictor of its survivability in modern Slavic languages. The least functional grammemes within the most functional categories are most likely to die out.|000|redundancy, language evolution, fitness, linguistic complexity, Slavic languages, computational approaches 1532|MacKay2003|This book is aimed at senior undergraduates and graduate students in Engineering, Science, Mathematics, and Computing. It expects familiarity with calculus, probability theory, and linear algebra as taught in a first- or second-year undergraduate course on mathematics for scientists and engineers. Conventional courses on information theory cover not only the beautiful theoretical ideas of Shannon, but also practical solutions to communication problems. This book goes further, bringing in Bayesian data modelling, Monte Carlo methods, variational methods, clustering algorithms, and neural networks. Why unify information theory and machine learning? Because they are two sides of the same coin. In the 1960s, a single field, cybernetics, was populated by information theorists, computer scientists, and neuroscientists, all studying common problems. Information theory and machine learning still belong together. Brains are the ultimate compression and communication systems. And the state-of-the-art algorithms for both data compression and error-correcting codes use the same tools as machine learning.|000|information theory, information content, statistics, probability, Shannon, entropy, handbook 1533|MacKay2003|This book is essential for many aspects of information theory and is quoted by many other authors, also in the context of morpheme detection (see @Griffiths2015, @Pearce2010, etc.)|000|information content, information theory, entropy, handbook 1534|Miele2012|**Motivation** Proteins can be naturally classified into families of homologous sequences that derive from a common ancestor. The comparison of homologous sequences and the analysis of their phylogenetic relationships provide useful information regarding the function and evolution of genes. One important difficulty of clustering methods is to distinguish highly divergent homologous sequences from sequences that only share partial homology due to evolution by protein domain rearrangements. Existing clustering methods require parameters that have to be set a priori. Given the variability in the evolution pattern among proteins, these parameters cannot be optimal for all gene families **Results** We propose a strategy that aims at clustering sequences homologous over their entire length, and that takes into account the pattern of substitution specific to each gene family. Sequences are first all compared with each other and clustered into pre-families, based on pairwise similarity criteria, with permissive parameters to optimize sensitivity. Pre-families are then divided into homogeneous clusters, based on the topology of the similarity network. Finally, clusters are progressively merged into families, for which we compute multiple alignments, and we use a model selection technique to find the optimal tradeoff between the number of families and multiple alignment likelihood. To evaluate this method, called HiFiX, we analyzed simulated sequences and manually curated datasets. These tests showed that HiFiX is the only method robust to both sequence divergence and domain rearrangements. HiFiX is fast enough to be used on very large datasets.|000|clustering, network clustering, partitioning, homolog detection, Python 1535|Miele2012|Cluster algorithm using networks and alignments that might be interesting to further improve on automatic cognate detection. Especially the workflow might provide some useful inspiration, although in a more recent study it was reportetd that HIFIX is outperformed by other methods, especially transitivity clustering (TransClust).|000|homolog detection, network partitioning, partitioning, Python, multiple sequence alignment 1536|Roettger2013|**Motivation:** Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles. **Results:** Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches. |000|automatic threshold selection, density parameter, transitivity clustering, network partitioning, partitioning 1537|Roettger2013|This approach is rather simple, although not necessarily easiy to adapt. The authors basically test all thresholds, but let them converge on two divergent criteria: the size of clusters in general, which they want to be a bit lower than the size of all taxa, and the size of the size of the maximal cluster, which they want to be lower than the number of all taxa, or even more. Then they use the harmonic mean to optimize this function. Especially the use of the harmonic mean is an interesting idea to solve problems where one has to criteria to maximize which are not necessarily dependent of each other (think of using it in morpheme boundary detection, for example).|000|homolog detection, automatic threshold selection, harmonic mean, network partitioning, transitivity clustering 1538|Wittkop2010|Transitivity Clustering: This method (TransClust), available as a Cytoscape plugin, has been show to perform best in an intensive study of performance of cluster algorithms by @Bernardes2015. It is apparently not trivial enough to simply apply it, but the basic idea is to remove and add edges from a network and penalize each step of removal, until a satisfying result is reached, that is, until transitivity of all clusters is reached. Basically, the idea might be followed up on a small scale, since it does not really seem to be impossible to re-implement something similar in a brute-force search. However, the authors still require a threshold, which means that the problem of threshold detection cannot be completely avoided for the moment.|000|transitivity clustering, network partitioning, Java, Cytoscape, homolog detection 1539|Bernardes2015| **Background** An important problem in computational biology is the automatic detection of protein families (groups of homologous sequences). Clustering sequences into families is at the heart of most comparative studies dealing with protein evolution, structure, and function. Many methods have been developed for this task, and they perform reasonably well (over 0.88 of F-measure) when grouping proteins with high sequence identity. However, for highly diverged proteins the performance of these methods can be much lower, mainly because a common evolutionary origin is not deduced directly from sequence similarity. To the best of our knowledge, a systematic evaluation of clustering methods over distant homologous proteins is still lacking. **Results** We performed a comparative assessment of four clustering algorithms: Markov Clustering (MCL), Transitive Clustering (TransClust), Spectral Clustering of Protein Sequences (SCPS), and High-Fidelity clustering of protein sequences (HiFix), considering several datasets with different levels of sequence similarity. Two types of similarity measures, required by the clustering sequence methods, were used to evaluate the performance of the algorithms: the standard measure obtained from sequence–sequence comparisons, and a novel measure based on profile-profile comparisons, used here for the first time. **Conclusions** The results reveal low clustering performance for the highly divergent datasets when the standard measure was used. However, the novel measure based on profile-profile comparisons substantially improved the performance of the four methods, especially when very low sequence identity datasets were evaluated. We also performed a parameter optimization step to determine the best configuration for each clustering method. We found that TransClust clearly outperformed the other methods for most datasets. This work also provides guidelines for the practical application of clustering sequence methods aimed at detecting accurately groups of related protein sequences. |000|transitivity clustering, homolog detection, network partitioning, evaluation 1540|Bernardes2015|This article introduces transitivity clustering and other interesting methods for automatic homolog detection. The evaluation is based on simulations, les than real data, and it seems that transitivity clustering (TransClust, @Wittkop2010) outperforms other methods, like HIFIX (@Miele2012) or Markov Clustering (Dongen2000). |000|network partitioning, transitivity clustering, partitioning, evaluation, homolog detection 1541|Dellert2016|We propose a new method for empirically determining lists of basic concepts for the purpose of compiling extensive lexicostatistical databases. The idea is to approximate a notion of “swadeshness” formally and reproducibly without expert knowledge or bias, and being able to rank any number of concepts given enough data. Unlike previous approaches, our procedure indirectly measures both stability of concepts against lexical replacement, and their proneness to phenomena such as onomatopoesia and extensive borrowing. The method provides a fully automated way to generate customized Swadesh lists of any desired length, possibly adapted to a given geographical region. We apply the method to a large lexical database of Northern Eurasia, deriving a swadeshness ranking for more than 5,000 concepts expressed by German lemmas. We evaluate this ranking against existing shorter lists of basic concepts to validate the method, and give an English version of the 300 top concepts according to this ranking.|000|Swadesh list, concept list, stability, automatic approach 1542|Wahle2016|It is a well known phenomenon in historical linguistics, that the meaning of a proto form is different to the meaning of its descendants. This phenomenon of meaning change is often ignored in studies which use tools from statistical phylogenetic analysis to determine language relationships. It has been shown, that the databases currently used in linguistic phylogeny exhibit a considerable amount of the described phenomenon. The current study proposes a method to detect such instances of cross-concept relationships of words. Although the evaluation can not be done by standard means, the results indicate that semantic similarity is a good indicator for cross-concept relationships and that tools from computational biology offer a good framework for this kind of approach.|000|cross-semantic cognates, cognate detection, hidden markov models 1543|Tanaka1997|Word co-occurrences form a graph, regarding words as nodes and co-occurrence relations as branches. Thus, a co-occurrence graph can be constructed by co-occurrence relations in a corpus. This paper discusses a clustering method of the co-occurrence graph, the decomposition of the graph, from a graph-theoretical viewpoint. Since one of the applications for the clustering results is the ambiguity resolution, each output cluster is expected to have no ambiguity and be specialized in a single topic. We observed that a graph has no ambiguity if its branches representing co-occurrence relations are transitive. An algorithm to extract such graphs are proposed and its uniqueness of the output is discussed. The effectiveness of our method is examined by an experiment using co-occurrence graph obtained from a 30M bytes corpus|000|clustering, transitivity, graph theory, partitioning, co-occurrence, 1544|Dawyndt2006|A new algorithm is proposed for generating min-transitive approximations of a given similarity matrix (i.e. a symmetric matrix with elements in the unit interval and diagonal elements equal to one). Different approximations are generated depending on the choice of an aggregation operator that plays a central role in the algorithm. If the maximum operator is chosen, then the approximation coincides with the min-transitive closure of the given similarity matrix. In case of the arithmetic mean, a transitive approximation is generated which is, on the average, as close to the given similarity matrix as the approximation generated by the UPGMA hierarchical clustering algorithm. The new algorithm also allows to generate approximations in a purely ordinal setting. As this new approach is weight-driven, the partition tree associated to the corresponding min-transitive approximation can be built layer by layer. Numerical tests carried out on synthetic data are used for comparing different approximations generated by the new algorithm with certain approximations obtained by classical methods.|000|clustering, UPGMA, transitivity, network partitioning 1545|Dawyndt2006|Algorithm TCGA essentially consists of an outer repeat loop running over all edges of the weighted graph, and for each of these edges an inner for loop running over all nodes of the graph. All triangular subgraphs visited in this way are made locally transitive by raising, if necessary, the smallest weight in the triangular subgraph. The algorithm is called weight-driven because the order in which in the outer loop edges are visited is not arbi- trary, but proceeds in descending order of the edge weights. Proceeding along the same line of thoughts, however not persisting on restoring locally the transitivity by raising the smallest edge weight—a fortiori leading to a matrix that includes the input matrix—the transitivity of triangular subgraphs is imposed by lowering the middle and increasing the lowest weight so that they both become equal. More [pb] precisely, the two weights are aggregated into a single weight by means of a symmetric aggregation operator denoted furtheron as f. Here, a symmetric aggregation operator is defined as an operator that can take an arbitrary (finite) number of arguments in arbitrary order, is increasing in all of its arguments, takes value 0 when all arguments are 0 and takes value 1 when all arguments are 1. Also, upon a single argument, f acts as the identity, i.e. f(x) = x for all x 2 E [0, 1]. |177f|transitive closure, UPGMA, clustering, partitioning, graph theory 1546|Dawyndt2006|:So basically, all that is done, is to start from all edges, sorted by edge weight, and then searching for the third node in the graph, eventually changing its edge weight. As a result, one gets a transitive closure of the whole graph. This may be useful to be tested in detail.|000|UPGMA, transitive closure, clustering, partitioning, graph theory 1547|Dawyndt2006|The previously established weight-driven method in which just the maximum operator is systematically replaced by an arbitrary smaller aggregation operator f, turns out to be a valid transitive approximation generating algorithm, provided that whenever selecting an edge within the outer loop, no other edge exists that carries the same weight as and has a common node with the selected edge. In order to cover the latter situation, which can def- initely not be excluded, this simple modified algorithm had to be appropriately refined. A description of the new algorithm, is presented in the pseudo-code procedure named TAGA—an acronym for Transitive Approximation Generating Algorithm.|178|UPGMA, transitive closure, graph theory, partitioning 1548|Smith1970|SALISBURY1 has argued that there is an apparent contradiction between two fundamental concepts of biology—the belief that the gene is a unique sequence of nucleotides whose function it is to determine the sequence of amino-acids in a protein, and the theory of evolution by natural selection. In brief, he calculated that the number of possible amino-acid sequences is greater by many orders of magnitude than the number of proteins which could have existed on Earth since the origin of life, and hence that functionally effective proteins have a vanishingly small chance of arising by mutation. Natural selection is therefore ineffective because it lacks the essential raw material—favourable mutations.|563|biological parallels, random mutation, natural selection, protein structure, protein space, protein evolution, compositionality 1549|Smith1970|I shall assume that mutations, while not random in a chemical sense, are random as far as their chances of improving the funciton of the corresponding proteins are concerened. I shall also assume that evolution has occurred either by the natural selection of favourable mutations or by the chance fixation by genetic drift of selectively neutral mutations. The justification for makeing these assumptions is that no sensible alternatives have been suggested and that no evidence exists at the moment to invalidate them. [...] The model of protein evolution I want to discuss is best understood by analogy with a popular word game. The object of the gam eis to pass from one word to antoehr of the same length by chaning one letter at a time, with the requirement that all the intermediate words are meaningful in the same language. Thus WORD can be converted into GENE in the minimum number of spes, as follows: WORD WORE GORE GONE GENE This is an analogue of evolution, in which the words represent proteins; the letters represent amino-acids; the alteration of a single letter corresponds to the simples evolutionary step, the substitution of one amino-acid for another; and the requirement of meaning correspond to the requirement that each unit step in evolution should be from one functional protein to another. The reason for the last requirement is as follows: suppose that a protein A B C D . . . exists, and that a protein a b C D ... would be favourded by selection if it arose. Suppose further that the intermediates a B C D . . . and A b C D . . . are non-functional. These forms would arise by mutation, but would usually be eliminated by selection before a second mutation could occur. The double step from a b C D . . . to A B C D would thus be very unlikely to occur. Such double steps with unfavourable intermediates may occasionally occur, but are probably too rare to be important in evolution.|564|protein evolution, word association, meaning, function, biological parallels, 1550|Smith1970|It follows that if evolution by natural selection is to occur, functional proteins must for a continuous network which can be traversed by unit mutational steps without passing through nonfunctional intermediates. In this respect, functional proteins resemble four-letter words in the English language, rather than eight-letter words, for the latter form a series of small isolated islands in a sea of nonsense sequences. |564|protein evolution, biological parallels, protein derivation, protein space 1551|Smith1970|Some questions about molecular evolution can be formulated more clearly in terms of a protein space. For example: (i) Are all existing proteins part of the same continuous network, and if so, have they all been reached from a signle starting point? Possible alternatives are that there are two or more distinct networks, or that there is one network with multiple starting points. (ii) How often, if ever, has evolution passed through a non-functional sequence? If so, has this been achieved by the random walk of genese rendered redundant by duplication, or by the chance concurrence of two or more mutations? (iii) What fraction of the functional network has already been explored in evolution? (iv) What fraction of potentially useful proteins are inaccessible?|564|protein space, protein evolution, protein derivation, questions, 1552|Kurland2010|Even the notion that random mutation could have generated the enormous diversity of genome sequences evident in our biosphere is questionable. Rough calculations show that neither the mass nor the age of our galaxy is large enough to have generated all the unique proteins in the biosphere by testing independent random mutations (Salisbury, 1969). This calculation encouraged John Maynard Smith (@1970) to abandon the assumption that random mutations drive the independent evolution of individual proteins. His alternative formulation is that proteins form a single network or space of interrelated functional sequences in which mutation has generated every point in this space from a prior functional protein (Maynard Smith, 1970).|363|protein evolution, protein space, functional proteins 1553|Kurland2010|This view of a functional protein network identifies a space of sequences that is responsive to purifying selection. Any sequences not stabilized by purifying selection are transients that are obliterated eventually by random mutation unless the selective factors change (Kimura, 1983; Berg and Kurland, 2002). Accordingly, it can be inferred that the phylogenetic network of protein sequences is nested in a network of functional parameters. Indeed, protein phylogeny seems robust precisely because it reflects the interplay of conservative structural and functional networks in which the intrusion of novelty is constrained by intense network interactions, that is to say by purifying selection.|364|protein evolution, functional protein network, protein derivation 1554|Cheng2005|The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naïve Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naïve Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively.|000|protein classification, text classification, n-gram model, biological parallels 1555|Xing2010|Sequence classification has a broad range of applications such as genomic analysis, information retrieval, health informatics, finance, and abnormal detection. DiÆerent from the classification task on feature vectors, sequences do not have explicit features. Even with sophisticated feature selection techniques, the dimensionality of potential features may still be very high and the sequential nature of features is di±cult to capture. This makes sequence classification a more challenging task than classification on feature vectors. In this paper, we present a brief review of the existing work on sequence classification. We summarize the sequence classification in terms of methodologies and application domains. We also provide a review on several extensions of the sequence classification problem, such as early classification on sequences and semi-supervised learning on sequences.|000|sequence classification, n-gram model, biology, linguistics, computer science 1556|Xing2010|* The first category is feature based classification, which transforms a sequence into a feature vector and then apply conventional classification methods. Feature selection plays an important role in this kind of methods. * The second category is sequence distance based classifcation. The distance function which measures the similarity between sequences determines the quality of the classification significantly. * The third category is model based classification, such as using hidden markov model (HMM) and other statistical models to classify sequences.|41|sequence classification, sequence comparison, types, 1557|Moore2010|Proteins are composed of subunits termed domains that are recurrent units with distinct structure, function, and evolutionary history. At the sequence level, a domain can be described as a conserved stretch of amino acids found in various proteins. Domain signatures can be stored as profiles created from alignments of descriptive family members from which hidden Markov models (HMMs) are generated or by the use of position-specific scoring matrices (PSSM). A number of large databases such as Pfam (Finn et al., 2008), SMART (Letunic et al., 2006), or ProDom (Bru et al., 2005) harbor domain signatures, and the use of domains has become an essential tool of modern proteomics and genomics. However, in order to properly utilize the strength of domain-based analyses, it is of fundamental importance to understand the mechanics and selection forces that govern the evolution of multidomain architectures (MDAs).|213|domains, definition, protein domain, multi-domain architectures 1558|Moore2010|Protein evolution is thought to have started with a small library of domains, and structural and functional complexity increased by the combination and rearrangements thereof (Chothia, 1992). Complex multidomain proteins can be produced by gene fusion events where genes that code for simple proteins get fused facilitating the formation of more complex architectures [pb] (Kummerfeld and Teichmann, 2005; Pasek et al., 2006; Weiner et al., 2006). Multidomain proteins are decomposed by fission events giving rise to less complex architectures (Kummerfeld and Teichmann, 2005; Pasek et al., 2006; Wang et al., 2004; Weiner et al., 2006). Studies have estimated that fusion events are a major force in the evolution of proteins, more so than fission events (Ekman et al., 2007; Fong et al., 2007; Kummerfeld and Teichmann, 2005; Pasek et al., 2006). Moreover, novel MDAs frequently arise by domain insertion or deletion events at protein termini (Vibranovski et al., 2005; Weiner et al., 2006). At the sequence level, domain insertions or deletions may be governed by gene fusion events followed by simple point mutations.|213f|compositionality, domain combination, domain composition, gene fusion, gene fission, protein domain, protein structure, protein evolution 1559|Boutonnet2014|Linguistic relativity theory has received empirical support in domains such as color perception and object categorization. It is unknown, however, whether relations between words idiosyncratic to language impact non-verbal representations and conceptualizations. For instance, would one consider the concepts of horse and sea as related were it not for the existence of the compound seahorse? Here, we investigated such arbitrary conceptual relationships using a non-linguistic picture relatedness task in participants undergoing event-related brain potential recordings. Picture pairs arbitrarily related because of a compound and presented in the compound order elicited N400 amplitudes similar to unrelated pairs. Surprisingly, however, pictures presented in the reverse order (as in the sequence horse–sea) reduced N400 amplitudes significantly, demonstrating the existence of a link in memory between these two concepts otherwise unrelated. These results break new ground in the domain of linguistic relativity by revealing predicted semantic associations driven by lexical relations intrinsic to language.|000|compound words, compounding, arbitrariness, semantic associations, conceptualization, Sapir-Whorff hypothesis 1560|Booij2005|The grammar of words. Introduction to linguistic morphology. Book treats morphology in its various aspects, including question of: 1. morphology in general 2. word formation 3. inflection 4. interfaces (phonology-morphology, etc.) 5. morphology and mind It is definitely a valuable starting point, especially when having a look at the subchapters, like: 1. Morphology: basic notions 2. Morphological analysis 3. Derivation 51 4. Compounding 75 5. Inflection 6. Inflectional systems 7. The interface between morphology and phonology 153 8. Morphology and syntax: demarcation and interaction 185 9. Morphology and semantics 207 10. Morphology and psycholinguistics 231 11. Morphology and language change 255 |000|morphology, introduction, handbook, historical linguistics, compounding, word derivation 1561|Hudson2010|Word Grammar is a theory of language structure based on the assumption that language, and indeed the whole of knowledge, is a network, and that virtually all of knowledge is learned. It combines the psychological insights of cognitive linguistics with the rigour of more formal theories. This textbook spans a broad range of topics from prototypes, activation and default inheritance to the details of syntactic, morphological and semantic structure. It introduces elementary ideas from cognitive science and uses them to explain the structure of language including a survey of English grammar.|000|word grammar, network, cognition, grammar, formal grammar, 1562|Waelchli2005|The aim of this study is both descriptive and theoretical. Descriptively, it is a contribution to the cross-linguistic investigation of co-compounds and related phenomena within the functional domain of natural coordination. But its scope goes beyond that of cross-linguistic description. This was unavoidable because some time-honored traditional concepts in linguistics—notably the strict division between word and phrase, the listeme model of the lexicon, the view of coordination as a syntactic phrase, lexical semantics as a static rather than a context-dependent dynamic Weld—are not descriptively adequate for the phenomena under consideration. Thus, as concerns linguistic theory, the study does not present a coherent model of language structure or competence, nor does it follow any given theoretical framework. Rather, it challenges widely accepted approaches and attempts to provide some solutions that are descriptively more adequate. :comment:`Mentions interesting things in the introduction, also regarding the semantic aspects of linguistics, and that hierarchy is not always the key to semantics.`|000|co-compounds, compounding, compound words, cross-linguistic study, semantics, 1563|Heide2010|The aim of our study was to investigate in how far the processing of German verbs prefixed with ver- is influenced by their internal morphological structure. We showed that despite their homogeneous surface structure of [ver+root+(e)n], ver-verbs differ considerably regarding the root type, the lexicality of tlie root, and the lexicality of the root+(e)n combination. Using a lexical decision task in combination with masked morphological priming, we could show that ver-prefixed verbs undergo morphological decomposition during visual word recognition (cf. Taft and Forster 1975 and subsequent studies). Additionally, our data show that morphological decomposition is sensitive to the lexicality of the root+(e)n substring. While ver-verbs containing a lexical root+(e)n combination are decomposed into a right-branching [ver+[root+(e)n]] structure, ver-verbs with lion-lexical root+(e)n combinations are decomposed flatly into [ver+root+(e)n]. Thus, for some ver-verbs both prefix and suffix are processed independently from the root. For other verbs, only the prefix is stripped from the root and the suffix remains attached to it. This observation is not in line with Taft and Forster's (1975) proposal that morphological decomposition is mandatory [pb] stripped from tlie root during word recognition. Instead, our results may confirm the existence of dual routes for the processing of polymorphetnic words (e.g., Caramazza et al. 1988, Pinker and Prince 1994, Baayen and Schreuder 1999, Clahsen et al. 2003).|389f|German, complex words, derivation, word formation, lexical decision task 1564|Roettger2010|Clustering is a popular computational approach for partitioning data sets into groups of objects that share common traits. Due to recent advances in wet-lab technology, the amount of available biological data grows exponentially and increasingly poses problems in terms of computational complexity for current clustering approaches. In this thesis, we introduce two novel approaches, TransClustMV and ActiveTrans- Clust, that enable the handling of large scale datasets by reducing the amount of required information drastically by means of exploiting missing values. Furthermore, there exists a plethora of different clustering tools and standards making it very difficult for researchers to choose the correct methods for a given problem. In order to clarify this multifarious field, we developed ClustEval which streamlines the clustering process and enables practitioners conducting large-scale cluster analyses in a standardized and bias-free manner. We conclude the thesis by demonstrating the power of clustering tools and the need for the previously developed methods by conducting real-world analyses. We transferred the regulatory network of E. coli K-12 to pathogenic EHEC organisms based on evolutionary conservation therefore avoiding tedious and potentially dan- gerous wet-lab experiments. In another example, we identify pathogenicity specific core genomes of actinobacteria in order to identify potential drug targets.|000|clustering, transitivity clustering, partitioning, homolog detection 1565|Hudson2007b|This book is a collection of ideas about language—about how language is structured at every level, about the overall architecture of the whole system, and about how it fits into a larger framework of ideas about human cognition. The broad cognitive context is just as important as the detail about language structure precisely because my argument is that all the detail derives from this context. Language is not sui generis, a unique system which can, and should, be studied without reference to any other system; this may have been a healthy methodological antidote to the psychology of the early twentieth century, but the intellectual world has changed. Our intellectual neighbours have grown up into the healthy sciences of cognitive psychology and psycholinguistics, but intellectual isolationism is still strong on both sides. However well informed we may be about the neighbours’ comings and goings, neither side really allows these developments to influence theoretical work on their side. (Just to give a small example, phonological theories ignore the popular psychological theory that working memory includes a ‘phonological loop’ (e.g. Baddeley and Logie 1999), which in turn evolved without any significant input from phonological theory.)|000|word grammar, network, cognition, 1566|Booij2009|Labov (@1981, @1994) proposed to distinguish two types of phonological change: change that is phonetically gradual and aspects all relevant words and change that is phonetically abrupt, replaces a phoneme with another one, and is lexically gradual, that is, exhibits lexical di¤usion. Kiparsky (@1988 ) argued that the distinction between phonetically gradual and phonetically abrupt changes coincides with the distinction between postlexical and lexical phonological rules. The rules of /d/-weakening and /d/-deletion can indeed be considered lexical rules since they are neu- tralizing. As expected, they have exceptions and thus exhibit lexical di¤usion. Lexical di¤usion always creates surface opacity for rules since the speaker will find forms that have not undergone the rule. Therefore, as stated above, opacity will lead to lexical storage in the sense that for each phonetic form of such words a distinct lexical entry has to be created. This in its turn explains why semantic distinctions may correlate with phonological di¤erences, as in the pair oude hoer/ouwe hoer discussed above.|501|lexical diffusion, gradual sound change, Neogrammarian sound change 1567|Booij2009|In this chapter, we saw that the standard view in Generative Phonology of the balance between storage and computation has to be reconsidered. There is a wealth of evidence for the position that predictable information is stored in the lexicon. First, recent theoretical developments in phonology imply that predictable information about morphemes must nevertheless be stored in the lexicon. Second, data concerning phonological change show that computable information concerning the phonetic realization of morphemes nevertheless has to be stored lexically. I also proposed that [pb] we should take a radical step with respect to the relation between underlying form and phonetic form: it is not the phonetic form that is computed by the speaker but rather the underlying form. Like storage in general, storage of phonetic forms of words will speed up processing; it is only when we coin a new word that computation of the underlying form of the base word is necessary. These conclusions do not refute the position that the human language faculty has a dual structure: a lexicon with stored representations and rules. The native speaker does need rules for the perception and production of novel forms. What, however, these conclusions do refute is the position that computation and storage of information with respect to the same process or regularity are mutually exclusive.|504f|robustness, redundancy, lexical storage, cognition, 1569|Laland2015|Scientific activities take place within the structured sets of ideas and assumptions that define a field and its practices. The conceptual framework of evolutionary biology emerged with the Modern Synthesis in the early twentieth century and has since expanded into a highly successful research program to explore the processes of diversification and adaptation. Nonetheless, the ability of that framework satisfactorily to accommodate the rapid advances in developmental biology, genomics and ecology has been questioned. We review some of these arguments, focusing on literatures (evo-devo, developmental plasticity, inclusive inheritance and niche construction) whose implications for evolution can be interpreted in two ways—one that preserves the internal structure of contemporary evolutionary theory and one that points towards an alternative conceptual framework. The latter, which we label the ‘extended evolutionary synthesis' (EES), retains the fundaments of evolutionary theory, but differs in its emphasis on the role of constructive processes in development and evolution, and reciprocal portrayals of causation. In the EES, developmental processes, operating through developmental bias, inclusive inheritance and niche construction, share responsibility for the direction and rate of evolution, the origin of character variation and organism–environment complementarity. We spell out the structure, core assumptions and novel predictions of the EES, and show how it can be deployed to stimulate and advance research in those fields that study or use evolutionary biology.|000|extended evolutionary synthesis, new evolutionary synthesis, evolutionary theory 1570|Laland2014|Does evolutionary theory need a rethink? Yes, urgently Without an extended evolutionary framework, the theory neglects key processes, say Kevin Laland and colleagues. Charles Darwin conceived of evolution by natural selection without knowing that genes exist. Now mainstream evolutionary theory has come to focus almost exclusively on genetic inheritance and processes that change gene frequencies.|000|extended evolutionary synthesis, evolutionary theory, new evolutionary synthesis 1571|Wray2014|Does evolutionary theory need a rethink? No, all is well Theory accommodates evidence through relentless synthesis, say Gregory A. Wray, Hopi E. Hoekstra and colleagues. In October 1881, just six months before he died, Charles Darwin published his final book. The Formation of Vegetable Mould, Through the Actions of Worms11 sold briskly: Darwin’s earlier publications had secured his reputation. He devoted an entire book to these humble creatures in part because they exemplify an interesting feedback process: earthworms are adapted to thrive in an environment that they modify through their own activities.|000|extended evolutionary synthesis, new evolutionary synthesis, evolutionary theory 1572|Silver2016|This article is about the success of Google's software in the Go game. The interesting point is the use of neural networks. It has twofold implication: * it may be useful to investigate the applicability of neural networks to linguistic problems * it points to the general problem of science, namely, that black box approaches do not give us answers to our questions, but just solutions to our problems|000|neural network, Go game, artificial intelligence, 1573|Silver2016|The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state- of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.|000|neural network, artificial intelligence, deep learning, machine learning, 1574|Du2010|Clustering is a fundamental data analysis method. It is widely used for pattern recognition, feature extraction, vector quantization (VQ), image segmentation, function approximation, and data mining. As an unsupervised classification technique, clustering identifies some inherent structures present in a set of objects based on a similarity measure. Clustering methods can be based on statistical model identification (McLachlan & Basford, 1988) or competitive learning. In this paper, we give a comprehensive overview of competitive learning based clustering methods. Importance is attached to a number of competitive learning based clustering neural networks such as the self-organizing map (SOM), the learning vector quantization (LVQ), the neural gas, and the ART model, and clustering algorithms such as the CC-means, mountain/subtractive clustering, and fuzzy CC-means (FCM) algorithms. Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering, clustering based on non-Euclidean distance measures, supervised clustering, hierarchical clustering as well as cluster validity are also described. Two examples are given to demonstrate the use of the clustering methods.|000|neural network, clustering, 1575|Bengio1990|In order to detect the presence and location of immunoglobulin (Ig) domains from amino acid sequences we built a system based on a neural network with one hidden layer trained with back propagation. The program was designed to efficiently identify proteins exhibiting such domains, characterized by a few localized conserved regions and a low overall homology. When the National Biomedical Research Foundation (NBRF) NEW protein sequence database was scanned to evaluate the program's performance, we obtained very low rates of false negatives coupled with a moderate rate of false positives.|000|homolog detection, proteins, neural network, automatic approach 1576|Yanowich2016|The linguistic notion of “Sapir’s drift” refers to the phenomenon when genetically related lan- guages, long after their separation, undergo the same or very similar linguistic changes. Such “drift” may seem almost magical: given that language change is generally a random process, why would separate linguistic varieties exhibit the same change? There exist possible explana- tions demystifying Sapir’s drift, including (Joseph, 2012) who argues that if the sister languages all possessed the same variation in a given construction, that variation can serve as the basis for parallel changes long after the languages separate. Here, I propose another, complementary, explanation for Sapir’s drift for changes involving grammatical or semantic reanalysis. The new account is based on evolutionary modeling in the finite-population setting. All finite pop- ulations show the effect called genetic drift (unrelated to Sapir’s drift) that delays the effect of forces pushing the language in a particular direction. For reanalysis-based changes, this means that even when reanalysis of individual utterances could already occur in the proto-language, the full takeover by the new form may under the right conditions happen many centuries later, in the proto-language’s descendants. Given the introduced model, it would have been surprising if Sapir’s drift never arose, and not the fact that it does.|000|drift, Edward Sapir, genetic drift, language evolution, Sapir's drift 1577|Zeige2015|It signified primarily the descriptive study of words as it referred to the changes a word would suffer from processes of inflection, derivation, and compounding. In that sense, an inflected or derived word was ‘related’ to its root. Today of course, the term ‘etymology’ both signifies the scientific study of the history of words as well as the history of a word itself, its origins, and how its form and meaning have changed over time. Today’s meaning was present in the earlier one, too. The ‘creation’ of a lexeme, [pb] the question of how a word came to name an object or action, the manipulation of its form (with subsequent shift in meaning) and the developmental dimension of words as entities of an evolving language conflated in pre-historicized language description.|46f|history of science, etymology, root cognates, cognacy, partial cognacy 1578|Zeige2015|The structural study of languages and dialects in the early modern period was sufficiently detailed to enable initial classificatory work. However, these works exhibited the very same dualism in that they were as much classificatory systems as they were hypotheses on descent. Earlier and to a greater extent than in biology, structural similarities between languages were interpreted as indications of descent and subsequently fostered comparative work since at least the 16th century. But it needed the theoretical and methodological developments put forth by biology in the 19th century to turn from etymological speculation to systematic analysis.|45|genetic classification, diachrony and synchrony, classification criteria, history of science 1579|Schottel1663|Die Celtische oder alte Teutsche Sprache hat vielerley Mundarten/ so haubtsächlich geteihlet werden in Abstimmige, darin zwar die Teutschen Geschlechtwörter / Hülfwörter / Stammwörter und also die Teutsche Eigenschaft befindlich / dennoch aber wegen der Ausrede / Verstümlung und unkentlich Machung der Teutschen und Einmenung der frömden Wörter / fast abstimmig von jetziger Teutschen Sprache scheinen / wiewol doch Ankunft / Grund und Wesen Teutsch annoch ist und bleibes / as da sind die Zustimmige / so an sich teutsch und durch die Ausrede und Mundartverweis nur unterschieden seyn / bestehen entweder im Hochdeutschen / dahin des Ausspruchs unn der Mundart Eigenschaft halber gerechnet werden die [...] |154|genetic classification, Germanic, German, history of science, family tree, 1580|Haspelmath1999b|Grammaticalization, the change by which lexical categories become func- tional categories, is overwhelmingly irreversible. Prototypical functional categories never become prototypical lexical categories, and less radical changes against the general directionality of grammaticalization are extremely rare. Although the pervasiveness of grammaticalization has long been known, the question of why this change is irreversible has not been asked until fairly recently. However, no satisfactory explanation has been proposed so far. Irreversibility cannot be attributed to the lack of predict- ability, to the interplay of the motivating factors of economy and clarity, or to a preference for simple structures in language acquisition. I propose an explanation that follows the general structure of Keller’s (1994) invisible-hand theory: language change is shown to result from the cumulation of countless individual actions of speakers, which are not intended to change language, but whose side effect is change in a particular direction. Grammaticalization is a side effect of the maxim of extravagance, that is, speakers’ use of unusually explicit formulations in order to attract attention. As these are adopted more widely in the speech community, they become more frequent and are reduced phonologically. I propose that degrammaticalization is by and large impossible because there is no counteracting maxim of ‘‘anti-extravagance,’’ and because speakers have no conscious access to grammaticalized expressions and thus cannot use them in place of less grammaticalized ones. This is thus a usage-based explanation, in which the notion of imperfect language acquisition as the locus of change plays no role.|000|invisible hand, grammaticalization, directionality, 1581|Chaudhuri1998|This paper presents a distributed algorithm for finding the articulation points in an n node communication network represented by a connected undirected graph. For a given graph if the deletion of a node splits the graph into two or more components then that node is called an articulation point. The output of the algorithm is available in a distributed manner, i.e., when the algorithm terminates each node knows whether it is an articulation point or not. It is shown that the algorithm requires O(n) messages and O(n) units of time and is optimal in communication complexity to within a constant factor.|000|networks, articulation point, betweenness,algorithms 1582|ChousouPolydouri2016|Recent phylogenetic studies in historical linguistics have focused on lexical data. However, the way that such data are coded into characters for phylogenetic analysis has been approached in different ways, without investigating how coding methods may affect the results. In this paper, we compare three different coding methods for lexical data (multistate meaning-based characters, binary root-meaning characters, and binary cognate characters) in a Bayesian framework, using data from the Tup ́ı-Guaran ́ı and Chapacuran language families as case studies. We show that, contrary to prior expectations, different coding methods can have a significant impact on the topology of the resulting trees.|000|binary state models, root cognates, cross-semantic cognates, phylogenetic reconstruction 1583|Saavedra2016|Forms such as -topia in privatopia or -ercise in dancercise are known as blend splinters: they might not be morphemes, but they are clearly involved in word formation. This article offers an automated method that can highlight blend splinters which have the potential to become morphemes in their own right. For instance, the word alcoholic has given rise a large number of blends such as workaholic or rageaholic, so that the splinter -holic is now recognized as a morpheme in the Oxford English Dictionary Online. Because of the sheer number of newly coined blends, it is difficult to identify splinters that are turning into morphemes on the sole basis of human observation. It would therefore be desirable to have an automated method that could process large amounts of data and identify such elements. This article develops such a method, relying on unsupervised morphological segmentation (Harris, 1955). A custom blend database was established for this purpose. The method is able to detect splinters mentioned in previous research, such as -tainment, -ercise, and cyber-, but in addition, it also detects elements that have not been discussed so far, including -tastic, -sumer, and -verse. |000|blend splinters, pseudo-morpheme, word family, morphology, morphological change, word formation 1584|Cook2010|Newly coined words pose problems for natural language processing systems because they are not in a system's lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical. We propose a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words.|000|blend splinters, lexical blends, morphological change, word formation 1585|Menzerath1928|This book offers a first formulation of the "linguistic law" that says that the longer a structure, the shorter its components. This law is particularly interesting in the context of analogies between biology and linguistics and the question of protein assembly, since we could ask, whether the law governs biological structures as well. Note also, that, articles like the one by @Baixeries2012 show that this idea has already been tested and potentially even confirmed for biology. Furthermore, @Benesova2015 show that it is essentially different from being random.|000|Menzerath-Altmann law, quantitative linguistics, random models, biological parallels 1586|Baixeries2012|Recently, a random breakage model has been proposed to explain the negative correlation between mean chromosome length and chromosome number that is found in many groups of species and is consistent with Menzerath–Altmann law, a statistical law that defines the dependency between the mean size of the whole and the number of parts in quantitative linguistics. Here, the central assumption of the model, namely that genome size is independent from chromosome number is reviewed. This assumption is shown to be unrealistic from the perspective of chromosome structure and the statistical analysis of real genomes. A general class of random models, including that random breakage model, is analyzed. For any model within this class, a power law with an exponent of −1 is predicted for the expectation of the mean chromosome size as a function of chromosome length, a functional dependency that is not supported by real genomes. The random breakage and variants keeping genome size and chromosome number independent raise no serious objection to the relevance of correlations consistent with Menzerath–Altmann law across taxonomic groups and the possibility of a connection between human language and genomes through that law.|000|Menzerath-Altmann law, biological parallels, 1587|Benesova2015|The authors study the Menzerath-Altman law (@Menzerath1928) and show that it is not following a simple random model, but is instead something which is "not" random. What this means in full is not clear, since I don't have the article freely available. But it would be interesting to have a look, since they also mention @Baixeries2012 as a recent application and discussion of Menzerath's law in biology.|000|Menzerath-Altmann law, biological parallels, quantitative analysis, 1588|Grzybek2014|This chapter concentrates on word length, emphasizing relevant quantitative and synergetic approaches. Alternative units for measuring word length are discussed with regard to their usability, as well as the influence that different kinds of material may have on studying word length. In addition to presenting some basic descriptive statistical characteristics, this contribution shows that word length is a substantial and central phenomenon for a comprehensive theory of language. It is shown, first, that the way in which words of a given length occur in linguistic material is not chaotic, but follows clearly defined, law-like regularities; and second, that word length is not an isolated category within the linguistic system, but is closely interrelated to other properties of the word, as well as of other linguistic units, levels, and structures. Theoretical models are discussed, concerning not only these interrelations, but sequential text analysis and frequency distributions.|000|Menzerath-Altmann law, word length, word frequency, introduction 1589|Grzybek2014|This article mentions also explicitly the Menzerath-Altmann-law (@Menzerath1928) and seems to be a good overview on the state of the art in quantitative linguistics.|000|Menzerath-Altmann law, word length, word frequency 1590|SparckJones1972|The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.|000|information retrieval, inverse document frequency, document frequency 1591|SparckJones1972|This article is important, since it gives the first definition of inverse document frequency. Inverse document frequency itself is used to calculate term-frequency-inverse-document-frequency statistics, that is the often used tf-idf-scores. Term frequency was first proposed by @Luhn1957, inverse document frequency is then discussed in this paper. The major idea in this paper is (according to Wikipedia): The specificity of a term can be quantified as an inverse function of the number of documents in which it occurs. |000|tf-idf, inverse document frequency, term frequency 1592|Luhn1957|Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words which he feels will convey the most meaning. Since this process varies among individuals and since similar ideas are therefore relayed at differ- ent levels of specificity and by means of different words, the problem of literature searching by machines still presents major difficulties. A statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described. Steps include the statistical analysis of a collection of documents in a field of interest, the establishment of a set of "notions" and the vocabulary by which they are expressed, the compilation of a thesaurus-type dictionary and index, the automatic encoding of docu- ments by machine with the aid of such a dictionary, the encoding of topological notations (such as branched structures), the recording of the coded information, the establishment of a searching pattern for finding pertinent information, and the programming of appropriate machines to carry out a search.|000|term frequency, inverse document frequency, tf-idf, information content 1593|Luhn1957|This paper is important in the context of tf-idf and offers the first building block, namely term frequency, which is then followed up with the proposal of inverse document frequency by @SparckJones1972. tf-idf is calculated as :math:`tfidf(t,d,D)=tf(t,d)·idf(t,D)` For details on tf and idf, see the page on wikipedia at http://en.wikipedia.org/wiki/Tf-idf |000|tf-idf, term frequency, inverse document frequency 1594|Dellert2016a|This paper proposes a novel application of causal inference in the area of semantic language evolution, which attempts to infer unidirectional trends of lexical change exclusively from mas- sively cross-linguistic dictionary data. First, we show how colexification between concepts can be modeled mathematically as mutual information between concept variables. Core notions of causal inference (most prominently, the unshielded collider criterion) are then applied to predict the dominant directionality in pathways of semantic change. The paper concludes by revisit- ing a few well-known examples of synchronic polysemies, and illustrating how the method succeeds in building hypotheses about their historical development.|000|concept networks, polysemy, causal inference, semantic network 1595|Thompson2016|A central debate in cognitive science concerns the nativist hypoth- esis, the proposal that universal features of behavior reflect a biologically determined cognitive substrate: For example, linguis- tic nativism proposes a domain-specific faculty of language that strongly constrains which languages can be learned. An evolu- tionary stance appears to provide support for linguistic nativism, because coordinated constraints on variation may facilitate com- munication and therefore be adaptive. However, language, like many other human behaviors, is underpinned by social learning and cultural transmission alongside biological evolution. We set out two models of these interactions, which show how culture can facilitate rapid biological adaptation yet rule out strong nativiza- tion. The amplifying effects of culture can allow weak cognitive biases to have significant population-level consequences, radically increasing the evolvability of weak, defeasible inductive biases; however, the emergence of a strong cultural universal does not imply, nor lead to, nor require, strong innate constraints. From this we must conclude, on evolutionary grounds, that the strong nativist hypothesis for language is false. More generally, because such reciprocal interactions between cultural and biological evo- lution are not limited to language, nativist explanations for many behaviors should be reconsidered: Evolutionary reasoning shows how we can have cognitively driven behavioral universals and yet extreme plasticity at the level of the individual—if, and only if, we account for the human capacity to transmit knowledge culturally. Wherever culture is involved, weak cognitive biases rather than strong innate constraints should be the default assumption.|000|nativism, cultural evolution, nativist hypothesis, biological evolution 1596|Thompson2016|Culture mediates between the biases of individual learners and population-level tendencies or universals. This radically changes the predictions we should make about the language faculty, or any other system of constrained cultural learning: Specifically, the evolution of strong domain-specific constraints on learning is ruled out. Rather, the behavioral universals that these con- straints are invoked to explain can instead be produced by weak biases, amplified by cultural transmission. Although we have framed our model in terms of language and linguistic nativism, the same account may be applicable to any behavior that is the product of interactions between culture and biology: Wherever cognition has been shaped to acquire culturally transmitted be- haviors, our arguments should apply. We anticipate that cultural transmission may be amplifying the effects of learning biases in many domains of human behavior, mimicking the effects of strong innate constraints and inviting nativist overinterpretation; identifying these domains is a key priority. The default expla- nation of shared, universal aspects of language or other cultural behaviors should be in terms of weak innate constraints.|5.6|cultural evolution, nativist hypothesis, nativism, biological evolution 1597|Bezerianos2010|GeneaQuilts is a new visualization technique for representing large genealogies of up to several thousand individuals. The visualization takes the form of a diagonally-filled matrix, where rows are individuals and columns are nuclear families. After identifying the major tasks performed in genealogical research and the limits of current software, we present an interactive genealogy exploration system based on GeneaQuilts. The system includes an overview, a timeline, search and filtering components, and a new interaction technique called Bring & Slide that allows fluid navigation in very large genealogies. We report on preliminary feedback from domain experts and show how our system supports a number of their tasks.|000|visualization, genealogy, interactive visualization, phylogeny,matrix 1598|Hammarstroem2016b|This article is good to be quoted when talking about language diversity in general.|000|language diversity, Glottolog, Ethnologue, statistics, 1599|Backurs2015|The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time. In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be {tight}. Specifically, we show that, if the edit distance can be computed in time O(n2-δ) for some constant δ>0, then the satisfiability of conjunctive normal form formulas with N variables and M clauses can be solved in time MO(1) 2(1-ε)N for a constant ε>0. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist. |000|edit distance, linear time, algorithms, Wagner-Fischer algorithm 1600|Backurs2015|There's an interesting post about this paper in MIT news on their website: http://news.mit.edu/2015/algorithm-genome-best-possible-0610http://news.mit.edu/2015/algorithm-genome-best-possible-0610 |000|edit distance, Wagner-Fischer algorithm, linear time 1601|Basu2008|Numerous eukaryotic proteins contain multiple domains. Certain domains show a tendency to occur in diverse domain architectures and can be considered “promiscuous.” These promiscuous domains are, typically, involved in protein–protein interactions and play crucial roles in interaction networks, particularly those that contribute to signal transduction. A systematic comparative-genomic analysis of promiscuous domains in eukaryotes is described. Two quantitative measures of domain promiscuity are introduced and applied to the analysis of 28 genomes of diverse eukaryotes. Altogether, 215 domains are identified as strongly promiscuous. The fraction of promiscuous domains in animals is shown to be significantly greater than that in fungi or plants. Evolutionary reconstructions indicate that domain promiscuity is a volatile, relatively fast-changing feature of eukaryotic proteins, with few domains remaining promiscuous throughout the evolution of eukaryotes. Some domains appear to have attained promiscuity independently in different lineages, for example, animals and plants. It is proposed that promiscuous domains persist within a relatively small pool of evolutionarily stable domain combinations from which numerous rare architectures emerge during evolution. Domain promiscuity positively correlates with the number of experimentally detected domain interactions and with the strength of purifying selection affecting a domain. Thus, evolution of promiscuous domains seems to be constrained by the diversity of their interaction partners. The set of promiscuous domains is enriched for domains mediating protein–protein interactions that are involved in various forms of signal transduction, especially in the ubiquitin system and in chromatin. Thus, a limited repertoire of promiscuous domains makes a major contribution to the diversity and evolvability of eukaryotic proteomes and signaling networks.|000|promiscuous domain, domain promiscuity, domain evolution, protein domain, bigram analysis 1602|Miyazawa1994|Probabilities of all possible correspondences of residues in aligning two proteins are evaluated by assuming that the statistical weight of each alignment is proportional to the exponent of its total similarity score. Based on such probabilities, a probability alignment that includes the most probable correspondences is proposed. In the case of highly similar sequence pairs, the probability alignments agree with the maximum similarity alignments that correspond to the alignments with the maximum similarity score. Significant correspondences in the probability alignments are those whose probabilities are > 0.5. The probability alignment method is applied to a few protein pairs, and results indicate that such highly probable correspondences in the probability alignments are probably correct correspondences that agree with the structural alignments and that incorrect correspondences in the maximum similarity alignments are usually insignificant correspondences in the probability alignments. The root mean square deviations in superimposition of corresponding residues tend to be smaller for significant correspondences in the probability alignments than for all correspondences in the maximum similarity alignments, indicating that incorrect correspondences in the maximum similarity alignments tend to be insignificant correspondences in probability alignments. This fact is also confirmed in 109 protein pairs that are similar to each other with sequence identities between 90 and 35%. In addition, the probability alignment method may better predict correct correspondences than the maximum similarity alignment method. Probability alignments do, of course, depend on a scoring scheme but are less sensitive to the value of parameters such as gap penalties. The present probability alignment method is useful for constructing reliable alignments based on the probabilities of correspondences and can be used with any scoring scheme.|000|sequence alignment, sound correspondences, residue correspondences, probability models 1603|Frank2015|The chapter opens with a series of theoretical considerations that will be employed in the analysis of a single polysemous lexeme in Basque, namely, hatz. The section begins with an introduction to one of the principal instruments of analysis, an approach that allows language to be viewed a complex adaptive system (CAS). Next the scope of the CAS approach is enlarged so that it incorporates the notion of cultural schemas and their heterogeneously distributed nature. Then, the role of serial metonymy in semantic innovation and change is examined. These conceptual tools are applied to the analysis of the Basque data and to the exploration of the factors that contributed to the development and structuring of the resulting semantic network, particularly, to new senses such as ‘fingers’ and ‘claws’. Finally, it is argued that this approach to modeling language and semantic change represents a powerful conceptual tool for researchers working in usage-based frameworks, and more specifically, for those investigating topics in the field of cognitive diachronic lexical semantics.|000|complex adaptive system, metaphor, metonymy, Basque language, finger, claw 1604|Casard1987|This book treats the topic of dialect intelligility testing. It also provides a good historical background, so it seems to be the right book to turn to when trying to summarize the topic.|000|mutual intelligibility, dialectology, intelligibility testing, 1605|Casard1987|Questions of validity arise because the method consists of estimating a particular criterion through the assessment of a particular trait; the maximum geographical area within which a mode of speech can effectively serve for communication is estimated from intelligibility test scores. Any such measurement process incorporates a certain amount of random and systematic error. This error must be taken into account since its magnitude directly influences one's ability to make inferences from the results. The concepts of reliability and validity are used to eestimate the kind and degree of this measurement error (Blalock 1968a 14).|67|reliability, validity, construct validity, mutual intelligibility, intelligibility testing 1606|Casard1987|:comment:`Chapter treats various aspects of validity and reliability and is definitely worth reading.`|67-88|mutual intelligibility, validity, reliability, intelligibility testing 1607|Mellinger2014|To meet the ever-increasing volume of digital content requiring translation, LSPs and software developers have implemented various technologies into their work environments. Translators have a wide assortment of tools and systems at their disposal: [pb] word processors; the Internet; computer-assisted translation tools; terminology management systems; content management systems; and cloud-based computing to name a few of the many technological offerings now available. Moreover, the previously- mentioned outsourcing model partners companies and language service providers that are located throughout the world, which in turn allows and requires language professionals to be adept at working off-site or remotely. As a result, translators often collaborate virtually with colleagues located elsewhere. |2f|computer-assisted translation, 1608|Mellinger2014|Each of these advances has changed how work is distributed and performed among virtual team members, the types of content that are translated, and the skill sets required to complete work. The expanded repertoire of resources changes the nature of translation in and of itself, and subsequently influences the nature and progress of the task both behaviorally and cognitively. Likewise, the translation task impacts the tools required of professional translators and language service providers, giving rise to new tools and processes to aid translation. These changes are reinforced by the fact that many companies require the use of translation tools to complete assignments, which changes the way translators approach their jobs. The development of these technologies, be they translation memories, machine translation, concordancers, alignment tools, or corpus- building tools, has largely responded to the needs of translators in an effort to support their work, to achieve productivity gains, and to address the ever-increasing volume of work and time pressures (Hutchins 1998; Austermühl 2001; Bowker 2002; Kenny 2011; Dunne 2013).|3|computer-assisted translation, 1609|Mellinger2014|The question arises as to the impact that translation memory tools has on the translator, and his or her ability to complete work. These tools are often touted as a silver bullet to increased throughput and to overall ease of translation, but these claims have not been sufficiently scrutinized; the mutual effect of technology and translation process needs to be better understood. In particular, the notion that translation memory makes the translation process easier requires investigation, since the translation task itself has been [pb] changed. Instead of translating an entire text from scratch, the translator now completes a hybrid of translation, editing, and cross-language verification. While it is clear that the translation task has in fact changed and requires a different type of intervention by the translator, little empirical evidence is available to suggest that the task is easier or less effortful.|4f|translation memory tools, computer-assisted translation 1610|Mellinger2014|This dissertation is divided into six chapters. The present chapter introduces the purpose of the study and justifies its significance. Chapter 2 lays the conceptual framework for the study, bringing together the literature on translation technology, post- editing, working memory, and effort in order to design the experiment that is described in Chapter 3, and interpret the results that are presented in Chapter 4. As mentioned, the third chapter addresses methodological considerations taken into account when designing the study, and introduces a novel way to collect translation process data. The chapter also sketches a profile of the participants included in the study. Chapters 4 and 5 are closely related, in that the former reports the results of the experiment, while the latter aims to contextualize the findings. The amount of data generated by process-oriented research can be expansive, and consequently, interpreting the data and drawing valid, salient conclusions can prove challenging. Nevertheless, Chapter 5 aims to elucidate how the data obtained from the experiment may lend support to the hypotheses, and raises further questions to be studied at a later time. Chapter 6 synthesizes the findings, and contextualizes the results within the larger translation process. Moreover, we suggest implications of the study, particularly within the context of translation pedagogy, translation tool design, and the economics of translation. Special mention is also made of the research design, and how novel data collection methods can increase the pool of participants in empirical, process-oriented experiments, while still employing established metrics to measure cognitive effort.|000|computer-assisted translation 1611|Mellinger2014|As noted previously, the design and value proposition of TM tools are predicated on the assumption that exact matches require no revision; nevertheless, participant behavior clearly indicates substantial editing occurring in both fuzzy and exact matches. [pb] This is problematic, since compensation models in the language industry provide discounts for fuzzy and exact matches. For example, many translation providers charge for fuzzy matches using a sliding rate scale. The underlying assumption for this compensation model is that the rate of pay is proportional to the work effort required of the translator. These prices may be differentiated depending on the match level; a ninety percent match may be charged at a lesser rate than perhaps a seventy percent match. Exact matches are often the match type priced at the lowest rate, since these presumably do not require the translator to do any more than verify the translation.|117f|economy, computer-assisted translation, exact match, fuzzy match 1612|Lancey2008|Potentially good and useful overview on the word formation and morphology in the context of grammaticalization.|000|grammaticalization, morphology, word formation 1613|Kessler2007|Phylogenetic analyses of languages need to explicitly address whether the languages un- der consideration are related to each other at all. Recently developed permutation tests al- low this question to be explored by testing whether words in one set of languages are significantly more similar to those in another set of languages when paired up by seman- tics than when paired up at random. Seven different phonetic similarity metrics are im- plemented and evaluated on their effective- ness within such multilateral comparison systems when deployed to detect genetic re- lations among the Indo-European and Uralic language families.|000|consonant classes, CCM method, cognate detection, multi-lateral comparison 1614|Roth2005|Agents producing and exchanging knowledge are forming as a whole a socio-semantic complex system. Studying such knowledge communities offers theoretical challenges, with the perspective of naturalizing further social sciences, as well as practical challenges, with potential applications enabling agents to know the dynamics of the system they are participating in. The present thesis lies within the framework of this research program. Alongside and more broadly, we address the question of reconstruction in social science. Reconstruction is a reverse problem consisting of two issues: (i) deduce a given high-level observation for a considered system from low-level phenomena; and (ii) reconstruct the evolution of high-level observations from the dynamics of lower-level objects. In this respect, we argue that several significant aspects of the structure of a knowledge community are primarily produced by the co-evolution between agents and concepts, i.e. the evolution of an epistemic network. In particular, we address the first reconstruction issue by using Galois lattices to rebuild taxonomies of knowledge communities from low-level observation of relationships between agents and concepts; achieving ultimately an historical description (inter alia field progress, decline, specialization, interaction – merging or splitting). We then micro-found various stylized facts regarding this particular structure, by exhibiting processes at the level of agents accounting for the emergence of epistemic community structure. After assessing the empirical interaction and growth processes, and assuming that agents and concepts are co-evolving, we successfully propose a morphogenesis model rebuilding relevant high-level stylized facts. We finally defend a general epistemological point related to the methodology of complex system reconstruction, eventually supporting our choice of a co-evolutionary framework.|000|social networks, social interaction, bipartite network, artificial agents, multi-agent system, sociology, cultural evolution 1615|Roth2005|This thesis may be interesting in the context of modeling concept evolution, yet it is not yet completely clear how it could be used, and whether the ideas presented are in any way valid for linguistic applications.|000|sociology, multi-agent system, social networks, evolutionary model, bipartite network 1616|Velde2016|Exaptation is a concept that originated in evolutionary biology. It refers to the co-optation of a trait for a new function that is not immediately related to its former function (Gould & Vrba 1982). It was introduced into linguistics by Roger Lass, first in a paper in Journal of Linguistics in 1990, and later in his 1997 book as part of a more general evolutionarily inspired theory of language change. Since then, it has met with wavering enthusiasm: some scholars have embraced the term and applied it to a wide range of case studies, but others have been more sceptical, for various reasons. Some have argued that the transfer of concepts from evo- lutionary biology to linguistics is in general ill-advised as such transfers ignore the deep differences between biological evolution and cultural evolution; others have pointed out that the notion of exaptation is flawed in biology as well or have taken issue with the specifics of Lass’s understanding of the notion (such as the junk status of the exapted material, see Section 4.3), and still others have objected to the term on the grounds that the processes it captures are already covered by other mechanisms of change.|000|exaptation, grammaticalization, definition 1617|Vanhatalo2014|The Natural Semantic Metalanguage (NSM) is a method of semantic analysis, used for various tasks mainly in the field of linguistic research. A crucial part of the theory is the set of primes, minimal lexical units that are used to explicate words, cultural scripts and other concepts. Identifying the primes in a new language is an opportunity to reinforce and/or revisit the theory. The remarks presented in this paper resulted from the identification process of the Finnish-based NSM primes. The goal of this paper is to direct attention to some fundamental aspects in the Natural Semantic Metalanguage theory, especially to the relation between the universal language-independent NSM concepts and the English-based NSM. A number of remarks are made on the general system of the primes, as the paper points out issues related to e.g. the number, selection and mutual hierarchy of the primes. The economy and logic of certain prime constructions and the argumentation behind allolexy are discussed as well.|000|natural semantic metalanguage, semantic analysis, semantic universals, concept list 1618|DeLancey1987|Solid overview over the Sino-Tibetan language family.|000|Sino-Tibetan, introduction, overview 1619|Futrell2015|Explaining the variation between human languages and the con- straints on that variation is a core goal of linguistics. In the last 20 y, it has been claimed that many striking universals of cross-linguistic variation follow from a hypothetical principle that dependency length—the distance between syntactically related words in a sen- tence—is minimized. Various models of human sentence produc- tion and comprehension predict that long dependencies are difficult or inefficient to process; minimizing dependency length thus enables effective communication without incurring process- ing difficulty. However, despite widespread application of this idea in theoretical, empirical, and practical work, there is not yet large-scale evidence that dependency length is actually minimized in real utterances across many languages; previous work has fo- cused either on a small number of languages or on limited kinds of data about each language. Here, using parsed corpora of 37 di- verse languages, we show that overall dependency lengths for all languages are shorter than conservative random baselines. The results strongly suggest that dependency length minimization is a universal quantitative property of human languages and support explanations of linguistic variation in terms of general properties of human information processing.|000|dependency length minimalization, syntax, constraints, language dynamics, 1620|Futrell2015|Interesting article that shows that speakers prefer to minimize dependencies, or at least that they tend to try to avoid dependencies which get too long. The text has also nice illustrations so that one understands what they are talking about.|000|dependency length minimalization, syntax, language dynamics, constraints 1621|Goddard2010|This article gives a brief introducton to the natural semantic metalanguage approach.|000|natural semantic metalanguage, introduction, overview 1622|Goddard2010|This paper discusses the prefixes *s-, *r-, and *N- in Old Chinese on the basis of etymological relationships among words in Chinese and the comparison of cognate words between Chinese and Tibeto-Burman. The study of the prefix *s- has a long history and abundunt literature. This paper reviews past studies and presents the writer’s view from the perspective of the comparative study of Sino-Tibetan languages. The prefix *r- is proposed based on the reconstructions of *tr- and *tsr- in Old Chinese by Fang-kuei Li. The correspondences between Chinese and Tibeto-Burman reveal that *r- occurred before initial consonants and sometimes played the role of a prefix. The hypothesis of the prefix *N- in Old Chinese is put forward on the basis of Sino Tibetan comparison, in order to account for regular as well as irregular sound correspondences between Chinese and Tibetan, etymological relationships among Chinese words, Xie-sheng contacts in Chinese characters, and sound changes in Chinese as well as in Tibetan.|000|prefix, Old Chinese, Sino-Tibetan, linguistic reconstruction 1623|Shen2005|This article gives an overview over Li Fang-kui's reconstruction of Old Chinese. In this sense, it is very useful. Unfortunately, the data is not digitizable, so one is required to search manually.|000|Old Chinese, linguistic reconstruction, Li Fang-Kuei 1624|DeLancey2015|The verb agreement systems of Jinghpaw, Meyor, Northern Naga, and Northeast, Northwest and Southern Kuki-Chin contain material which is demonstrably inherited from Proto-Trans-Himalayan. Here we discuss morphological evidence that these systems share a common ancestor more recent than PTH. There is strong evidence connecting Jinghpaw with both Northern Naga and Kuki-Chin, and weaker evidence directly linking Northern Naga and Kuki-Chin, and both of these with Meyor. This is evidence that all of these languages belong to a single branch of the family, an idea which has been suggested in the past but never argued for.|000|Sino-Tibetan, subgrouping, morphological evidence, shared innovation 1625|Mendez2016|Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ~120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidrón, Spain. We investigate its divergence from ortho- logous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup. We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ~588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ~2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.|000|Neandertal, human prehistory, Y-chromosome, 1626|Jacques2016e|This is a short review of a grammar of Pumi, a Sino-Tibetan language.|000|review, Sino-Tibetan, Pumi language, 1627|Coloma2015|This paper attempts to contrast two alternative formulae for the Menzerath-Altmann law, using data from two linguistic measures (words per clause and phonemes per word) for the same text translated into 50 different languages. The alternative formulae are the traditional power function, and a recently proposed hyperbolic function. The estimations are modified to control for genetic and geographic factors, and for the presence of possible endogeneity between the related variables. None of these significantly alter the basic results, which show a slight preference for the power function over the hyperbolic one. *|000|Menzerath-Altmann law, cross-linguistic study 1628|DeLancey2012|This article re-presents the case, first presented in DeLancey (1997), for the mirative as a crosslinguistic category, and responds to critiques of that work by Gilbert Lazard and Nathan Hill. The nature of the mirative, a category which marks a statement as representing information which is new or unexpected, is exemplified with data from Kham (Tibeto-Burman) and Hare (Athabaskan). The mirative category is shown to be distinct from the well-known mediative or indirective evidential category. Finally, the role of mirativity in the complex verbal systems of Tibetan languages is briefly outlined|000|mirativity, evidentiality, grammar, grammatical categories, Sino-Tibetan 1629|DeLancey2012|The author gives initially no real explanation regarding the mirativity, but here it is, on page 532 in the article, which is a quote from Bashir 2007: .. pull-quote:: When huLa appears in narration of directly experienced events, the meaning is mirative, i.e. that the speaker has just found out about (i.e. was not aware of before) the content of the assertion. |000|mirativity, definition, grammatical categories 1630|Szeto2000|Intelligibility is often used to classify speech varieties as languages or dialects. In drawing the distinction, the degree of intelligibility between speakers of two different speech varieties can often indicate how close these varieties are. While talking about intelligibility as a criterion for the language-dialect classification or to group dialects of one language family, linguistic aspects like phonological and lexical factors are usually considered. Grammatical factors, on the other hand, are normally not focussed upon when intelligibility is being concerned. Grammatical divergence is an important factor in distinguishing different languages within one language family. The Chinese dialects vary in lexical, phonological as well as grammatical aspects. Their complicated relationships with each other are often comparable to those between different languages within a family (Ramsey 1989). Even within one dialect group speech varieties may show great contrast, a well-known example being the Min supergroup where the different branches bear some grammatical differences, and also the Yue dialect group as suggested by Killingley (1993): “Yue dialects ... reveal significant differences which would and do prevent mutual understanding between speakers.” This argument has added momentum to the long- standing debate of the status of the southern Chinese dialects as languages or dialects.|000|mutual intelligibility, Sinitic, Chinese dialects, intelligibility testing 1631|Szeto2000|Quote this article in the context of mutual intelligibility. It seems to be quite smartly done, comparing cognacy in words, etc., although it obviously misses the phonetic similarity.|000|mutual intelligibility, Chinese dialects, intelligibility testing 1632|DeLancey2015b|Certain subbranches of Trans-Himalayan (Sino-Tibeto-Burman) stand out as islands of complexity in a Eurasian sea of simplicity (Bickel and Nichols 2013). Others show a radically simpler verbal system more consistent with their South and Southeast Asian neighbors. The complex systems include elaborate systems of argument indexation; most of these reflect a hierarchical indexation paradigm, which can be traced to Proto-Trans- Himalayan. This morphology has been lost in many languages, including the most familiar branches of the family such as Sinitic, Boro-Garo, Tibetic, and Lolo-Burmese, as a result of creolization under intense language contact. The archaic system is preserved fairly intact in rGyalrongic and Kiranti and with various structural reorganization in several other branches. The Kuki-Chin branch has innovated an entirely new indexation paradigm, which in some subbranches has completely replaced the original system, while in others the two paradigms coexist.|000|morphological change, morphological complexity, Sino-Tibetan, subgrouping, overview 1633|DeLancey1986|The modern Tai languages are well-known for their elaborate classifier systems. While there is no doubt that the syntactic catagory of classifier existed in Proto-Tai, and probably much earlier [...], the elaborate modern systems show a diversity which demonstrates considerable independent development since PT.|438|Tai languages, classifier system, grammatical categories, measure words 1634|Wilson2011|This chapter is concerned with people's behavior experience of bodily evacuation between entering and leaving the toilet, across the full spectrum of private to public toilets in the Roman world.l Our written sources for this aspect of latrine use, while of some help, are relatively meagre (see Thiiry, chapter 4.1, pp. 43-47), and much of the reconstruction has to rely on a close reading of the archaeological evidence. Inevitably, this is weighted towards the multi-seater public latrines (joricae), since being larger and more monumentally built they are more likely to preserve incidental details that can provide clues to behavior, but many of the insights thus gained can be transferred to less public settings also.|000|urination, defecation, Romance, latrine usage 1636|Kulis2012|Bayesian models offer great flexibility for clus- tering applications—Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this pa- per, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. In- spired by the asymptotic connection between k- means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet pro- cess mixture approaches a hard clustering algo- rithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relax- ation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.|000|k-means, clustering, partitioning, Bayesian statistics, 1637|Zeige2015|The article highlights a semiotically relevant aspect of Niklas Luhmann’s Theory of Social Systems: its reception of the Saussurean dichotomies signifiant/ signifié and langue/parole. Luhmann’s position is weighted against the Cours as well as Saussure’s original writings, sampling their approaches to form, meaning, the sign’s two-sidedness, and the relation of linguistic structure and speech events. Ultimately, the article proposes a social ontology of linguistic abstraction in line with general semiology that explains the motility of language through communica- tion, thereby accounting for variability and optionality. It also indicates as to how the theoretical framework can feed into a model of linguistic description.|000|Niklas Luhmann, social system, Ferdinand de Saussure, linguistic sign 1638|Haspelmath2013|Article tries to convince natural scientists that open access publication should be nonprofit.|000|open access, nonprofit 1639|Ku2015|Endosymbiotic theory in eukaryotic-cell evolution rests upon a foundation of three cornerstone partners—the plastid (a cyano- bacterium), the mitochondrion (a proteobacterium), and its host (an archaeon)—and carries a corollary that, over time, the majority of genes once present in the organelle genomes were relinquished to the chromosomes of the host (endosymbiotic gene transfer). However, notwithstanding eukaryote-specific gene inventions, single-gene phylogenies have never traced eukaryotic genes to three single prokaryotic sources, an issue that hinges crucially upon factors influencing phylogenetic inference. In the age of genomes, single-gene trees, once used to test the predictions of endosymbiotic theory, now spawn new theories that stand to eventually replace endosymbiotic theory with descriptive, gene tree-based variants featuring supernumerary symbionts: prokary- otic partners distinct from the cornerstone trio and whose exis- tence is inferred solely from single-gene trees. We reason that the endosymbiotic ancestors of mitochondria and chloroplasts brought into the eukaryotic—and plant and algal—lineage a genome-sized sample of genes from the proteobacterial and cyanobacterial pan- genomes of their respective day and that, even if molecular phy- logeny were artifact-free, sampling prokaryotic pangenomes through endosymbiotic gene transfer would lead to inherited chimerism. Recombination in prokaryotes (transduction, conjugation, transfor- mation) differs from recombination in eukaryotes (sex). Prokaryotic recombination leads to pangenomes, and eukaryotic recombina- tion leads to vertical inheritance. Viewed from the perspective of endosymbiotic theory, the critical transition at the eukaryote origin that allowed escape from Muller’s ratchet—the origin of eukary- otic recombination, or sex—might have required surprisingly little evolutionary innovation.|000|endosymbiosis, lateral gene transfer, prokaryotic evolution 1640|Luehr2012|This article is a probably good summary on nominal paradigms in Indo-European languages.|000|nominal inflection, paradigms, analogy, Indo-European 1641|Majid2015|According to widespread opinion, the meaning of body part terms is determined by salient discontinuities in the visual image; such that hands, feet, arms, and legs, are natural parts. If so, one would expect these parts to have distinct names which correspond in meaning across languages. To test this proposal, we compared three unrelated languages—Dutch, Japanese, and Indonesian—and found both naming systems and boundaries of even basic body part terms display variation across languages. Bottom-up cues alone cannot explain natural language semantic systems; there simply is not a one-to-one mapping of the body semantic system to the body structural description. Although body parts are flexibly construed across languages, body parts semantics are, nevertheless, constrained by non-linguistic representations in the body structural description, suggesting these are necessary, although not sufficient, in accounting for aspects of the body lexicon.|000|embodiment, semantic change, denotation, body parts 1643|Kuemmel2012|When we reconstruct a proto-language, we produce a hypothesis about a non-attested synchronic stage and about the changes leading from it to the attested languages. This means that we should evaluate both the synchronic stage and the reconstructed changes with the help of general and typologi- cal considerations. In this paper some problems of phonological reconstruction in PIE are discussed from a typological perspective. First, one of the most controver- sial topics of PIE consonantism is addressed: the reconstruction of the stop system (i.e., the “glottalic” question). After an evaluation of different hy- potheses from both synchronic and diachronic typological data, it is ar- gued that the best solution might be to reconstruct pre-PIE implosives (i.e., non-explosive non-glottalic stops) and voiced explosives that changed to voiced explosives and breathy voiced stops in PIE or at least in the central IE languages. In the second part, the focus lies on the reconstruction of qualitative ablaut in vowels. In the light of typological parallels it is argued that both later reflexes of the vowels and some morphonological rules might be eas- ier to understand if PIE *o was the reflex of a pre-PIE long *ā in contrast to PIE *e/a resulting from pre-PIE short *a.|000|consonants, vowels, Indo-European, overview, introduction 1644|BenHamed2009|UNIDIA is a database on sound change designed to compile the sound change hypotheses formulated to account for the phonological evolution of the languages of the world. It aims at deriving a data-based typology of sound change, universals, tendencies and sound change distributions. It now contains about 3750 sound changes for some 190 languages, essentially sampled from the Bantu, Sinitic and Daic language groups. This data can be explored based on various phonetic criteria, and sound change distributions can be represented on a geographical map or on a language family tree. At completion, UNIDIA should be representative of how phonological evolution is construed by the community of historical linguists and could become a useful tool for testing competing models of language evolution. |000|database, sound change, 1645|Miyazawa1994|This paper is interesting in so far as it describes a way how the best scoring pairs in alignments can be selected by an alignment algorithm. This may be useful for later alignment of cognates for which scoring matrices have been inferred from the data. |000|sound correspondences, residue correspondences, pairwise alignment, sequence alignment, probability models 1646|Almeida2011|One negative characteristic of conductance that can be pointed out is that it might have a tendency of giving better scores to clusterings with fewer clusters, as more clusters will probably have more cut-edges. Also, the lack of internal edge density information used in this kind of conductance may cause problems, as can be seen in Figure 1, where both clusterings presented would have the same conductance score, even though the one in Figure 1b is obviously better.|50|conductance, graph clustering quality, evaluation 1647|Almeida2011|Graph clustering, the process of discovering groups of similar vertices in a graph, is a very interesting area of study, with applications in many different scenarios. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the effectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality evaluation metrics for graph clustering have been proposed in the literature, but there is no consensus on how do they compare to each other and how well they perform on different kinds of graphs. In this work we study five major graph clustering quality metrics in terms of their formal biases and their behavior when applied to clusters found by four implementations of classic graph clustering algorithms on five large, real world graphs. Our results show that those popular quality metrics have strong biases toward incorrectly awarding good scores to some kinds of clusters, especially seen in larger networks. They also indicate that currently used clustering algorithms and quality metrics do not behave as expected when cluster structures are different from the more traditional, clique-like ones.|000|conductance, graph clustering quality, evaluation 1648|Ao1991|This paper is a report of the preliminary based on linguistic data from modem Ever since Karlgren, results of an attempt to reconstruct Etudes SW la phonologie chinoise. Archives d’&udes orientales. studies in Chinese historical linguistics have been following the same approach: ancient rhyme tables as the starting point and give a phonetic representation categories in the table, drawing pronunciations, suggested evidence from sources an alternative approach: ignore the ancient rhyme dictionaries from modern dialects on. This paper supports Based on systematic cognate correspondence are Irregularities proposed to account Sinoxenic and rhyme tables and and argues for this position. with Dialectal Pronunciations], a Proto- with 29 onsets and 74 rhymes, which is simpler and more natural than the 37 onset and 139 rhyme system Karlgren change dialects, Chinese. (1988). who sets in 17 Chinese dialects collected from Hanyu Fangyan Zihui [A Collection of Chinese Characters Chinese sound system is reconstructed (1915), take one of the to each of the sound such as modem etc. This approach has been sharply criticized by Norman, work backward Proto-Chinese Chinese dialects. for the modern reconstructed. dialectal reflexes Various rules of sound of the proto-forms. are discussed in light of recent theories of language change, in particular the theory of lexical diffusion. Based on the proposed subgrouping relationship rules of sound change, a family tree model is built to illustrate the among the dialects under investigation.|000|Chinese, Old Chinese, linguistic reconstruction, Chinese dialectology 1649|Ao1991|The ultimate value of this paper is that it is based on a reconstruction of the modern dialects, not of the rhyme dictionaries. The author proposes a simple system of 29 onsets and 74 rhymes, derived from the Zihui (A collection of Chinese characters with dialectal pronunciations).|000|Chinese dialects, linguistic reconstruction, phonology 1650|Ao1991|.. image:: static/img/ao-1991-tree.png :name: chinese_dialects :width: 500px :comment:`Reference tree of Chinese dialects.`|375|Chinese, Chinese dialects, phonology, linguistic reconstruction, family tree 1651|Ao1991|.. image:: static/img/ao-1991-proto-chinese.png :name: chinese_dialects :width: 500px :comment:`System or Proto-Chinese (proposed reconstruction).`|370|Zìhuì, Chinese dialects, linguistic reconstruction, phonology 1652|BollAvetisyan2016|Many languages restrict their lexicons by OCP-P LACE , a phonotactic constraint against co-occurrences of consonants with shared [place] (e.g., McCarthy, 1986). While many previous studies have suggested that listeners have knowledge of OCP-P LACE and use this for speech processing, it is less clear whether they make reference to an abstract representation of this constraint. In Dutch, OCP-P LACE gradiently restricts non-adjacent consonant co-occurrences in the lexicon. Focusing on labial-vowel-labial co-occurrences, we found that there are, however, exceptions from the general effect of OCP-L ABIAL : (A) co-occurrences of identical labials are systematically less restricted than co-occurrences of homorganic labials, and (B) some specific pairs (e.g., /pVp/, /bVv/) occur more often than expected. Setting out to study whether exceptions such as (A) and (B) had an effect on processing, the current study presents an artificial language learning experiment and a reanalysis of Boll-Avetisyan and Kager’s (2014) speech segmentation data. Results indicate that Dutch listeners can use both knowledge of phonotactic detail and an abstract constraint OCP-L ABIAL as a cue for speech segmentation. We suggest that whether detailed or abstract representations are drawn on depends on the complexity of processing demands.|000|speech processing, phonotactics, obligatory contour principle 1653|Boltz1993|This is an early review of @Baxter1992 by Boltz and potentially interesting in certain contexts.|000|Old Chinese, linguistic reconstruction, 1654|Chen2010|This study investigates the new developments of sound variation in Taiwan Southern Min, including dialects spoken in Taipei, Changhua and Tainan. Four research questions are raised: First, what are the different vowel systems in these dialects? Second, what are the differences regarding lower register entering tone in these dialects? Third, what are the mechanisms motivating the sound variation? Fourth, what are the phonological structural changes of these dialects caused by the said sound changes? This study confirms that the vowel system (i, e, a, ə, u, ) of Tainan dialect tends to be the most popular system in Taiwan. The lower register tone of syllables with -p, -t, or -k ending tends to become [32], which is identical to the higher register entering tone. That which with -/?/ ending tends to lose the glottal stop and to be lengthened, becoming mid-level, high-falling or high-level, depending on the original contour.|000|case study, Chinese dialects, Southern Mǐn, phonetic variation, 1655|Clark2016|This paper considers how ideas developed within relevance theory can be applied in accounting for language change. It briefly surveys previous relevance-theoretic work on language change and suggests that studies of procedural meaning, lexical pragmatics and metarepresentation can each play an important role in accounting for semantic change. It identifies a number of areas for further research which could help to develop understanding of both relevance theory and language change and suggests that one important line of further research would be to explore connections between work in relevance theory and approaches which adopt terms and ideas from the theory without adopting the relevance-theoretic framework overall.|000|relevance theory, language change, explanation of language change 1656|Dimitrov2011|The objective of our work is to develop a general method for structurally related, but diverged sequences for simultaneous optimization of alignment and self-folding - the so-called Sankoff's program for simultaneous prediction of secondary structure and alignment between nucleotide sequences. A simple reason behind the simultaneous optimization of alignment and self-folding is that strong structural consensus among related, but diverged sequences are a good indicator for preserved functional role. Up to now there is no a general solution for this long standing problem. Here we discuss an approach which is just a first step to the full realization of Sankoff's program. Currently available models and software packages, such as foldalign, dynalign and others, implement only restricted versions (variations around first align and then fold or oppositely) of Sunkoff's program and do not use the full loop-based RNA/DNA energy model. We divided Sankof's program in two steps based on the analogy between the classical alignment algorithm and hybridization without self-folding. The next step is to include in the alignment an algorithm for the self-folding. In our approach, the alignment problem requires the implementation of the full loop-based RNA/DNA energy model for hybridization of two sequences. For this, we divided the alignment between two sequences into loops and associated a score to each loop in such way that the total score of the alignment is a sum over the scores for each alignment loop. The loop scoring model for alignment consists of following loop types: stacking with matched and mismatched pairs, bulges, internal loops and dangling ends. Calculation of thermodynamic partition function over all possible double-stranded conformations is interpreted in terms of all possible canonical pairwise alignments. The partition function is computed by means of a dynamic programming algorithm and used to determine the probability of an alignment as well as the probability of each possible match between two sequence positions. For calculation of match probabilities detailed recursion relations for partition functions of alignments are based on their recursion analogs for hybridization of subsequences. The partition function is used for backtracking and reconstructing a properly weighted ensemble of optimal and suboptimal alignments.|000|sequence alignment, probability models, Sankoff parsimony, 1657|Dimitrov2011|This algorithm might be interesting in the context of linguistic reconstruction, using either parsimony or probability frameworks.|000|linguistic reconstruction, sequence alignment, phonetic alignment, ancestral state reconstruction 1658|Ehara2016|It is true that any human-being can use any language when he/she is exposed in the language environment at his/her language acquisition age. This fact, however, does not mean that there are no relation between a language and a population of the speakers of the language. For example, Cavalli-Sforza et al. (1992) shows that there are relations between language groups and populations. The purpose of our research is to clarify the relation between linguistic (structural and lexical) distances and genetic distance of speakers of languages. By the author’s knowledge, previous studies for the relation between languages and populations do not use language names but family names or group names. We calculate linguistic distances using language names and compare genetic distance of the speakers of these languages.|000|distance-based methods, language history, lexical distance 1659|Ehara2016|Strange paper that is very naive and shows no account of actual development.|000|language structure, lexical distance, 1660|Evans2008|Linguists often refer to tonal languages as belonging to ‘types’ ─ African, East Asian, etc. This paper documents the tonal system of the Mianchi dialect of Southern Qiang, a Tibeto-Burman language located squarely in the East Asian tone environment. Although tone has developed in Southern Qiang under heavy influence from Chinese, the tone system found in Mianchi fits an ‘African’ typology much better than it does a ‘Chinese’ or ‘East Asian’ type. The degrees of African-ness and Chinese-ness are evaluated, and African-style features are shown to be lurking throughout the Qiangic family. Similarities of word structure and word length between African and Qiangic languages are held responsible for the similarities.|000|case study, African languages, Sino-Tibetan, Qiang, Southern Qiang, tone, tonogenesis 1661|Fon2011|This paper reports a dialectal split in syllable-final nasal mergers between northern and southern Taiwan Mandarin: both /in/ → [iŋ] and /əŋ/ → [ən] are found in northerners’ speech, while an additional /iŋ/ → [in] is reported among southerners. The former two mergers are treated as innovations while the latter is due to negative Min transfer. Rule connotation seems to be a combined result of origin, analogy, and speaker confidence. /iŋ/ → [in] is stigmatized due to Min transfer, /əŋ/  [ən] has acquired a slight negative tang by analogy, and /in/  [iŋ] is deemed as fairly positive. Regardless of rules, northerners, being speakers of the standard dialect, are generally more receptive to merged forms than southerners. A positive correlation is found between rule connotation and development. /in/ → [iŋ] is the closest to complete phonologization, followed by /əŋ/ → [ən], and /iŋ/  [in], which is the least developed. Rule interaction is found in speakers that have both rules involving /i/. Those who have only one rule show higher merging rates than those who have both rules. Conflict of social connotation and increased cognitive loading are posited to be the cause.|000|case study, Taiwan Mandarin, nasal merger, dialect split 1662|Gao2014|To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed word co-occurrence networks. Besides all the networks exhibit scale-free and small-world features, we find several new non-trivial results, including connections among English words are denser, and the expression of English language is more flexible and powerful; the connection way among Spanish words is more stringent and this indicates that the Spanish grammar is more rigorous; values of many statistical parameters of the French and Spanish networks are very approximate and this shows that these two languages share many commonalities; Arabic and Russian words have many varieties, which result in rich types of words and a sparse connection among words; connections among Chinese words obey a more uniform distribution, and one inclines to use the least number of Chinese words to express the same complex information as those in other five languages. This shows that the expression of Chinese language is quite concise. In addition, several topics worth further investigating by the complex network approach have been observed in this study.|000|directed network, weighted network, co-occurrence, word co-occurrence network, cross-linguistic study 1663|Gong2003|There have been different ways of sub-grouping Sino-Tibetan languages. Benedict (1972) sets up two sub-groups, which are Chinese and Tibeto-Karen, and subdivides Tibeto-Karen into Tibeto-Burman and Karen. Bradley (1979) and Matisoff (1986) consider it unnecessary to posit an intermediate “Tibeto-Karen” level. They regard Karenic as a subfamily of Tibeto-Burman, on the same taxonomic level as Lolo-Burmese of Kuki-Chin-Naga. On the basis of this sub-grouping, this paper discusses the phonological changes of finals that had occurred when Proto-Sino-Tibetan split into Old Chinese and Proto-Tibeto-Burman. In this study Old Chinese is based on Li (1971), Proto-Tibeto-Burman on Benedict (1972), and Proto-Sino-Tibetan on Gong (1995). In citing Benedict’s reconstruction I propose some revisions to his system. I have shown that with slight revisions Benedict’s system of Proto-Tibeto-Burman will be compatible with Li’s system of Old Chinese and can be aligned with my reconstruction of Proto-Sino-Tibetan. This paper attempts to bring together the study of Old Chinese phonology and the reconstruction of Proto-Tibeto-Burman, which have hitherto been pursued separately, and to trace the phonological development of Sino-Tibetan languages.|000|sound change, Sino-Tibetan, Old Chinese, linguistic reconstruction 1664|Handel2009|The Northern Min dialects have two unusual and interrelated features: the presence (in some dialects) of voiced or lenited initials which do not correspond to the voiced initials of Middle Chinese, and a pattern of tonal splits that cannot be accounted for by conditioning factors of the Middle Chinese phonological system. Various scholars have proposed different hypotheses concerning the origin of these two features. Through analysis of their phonetic aspects, it is argued that the features cannot be the result of recent contact with nearby Wu dialects, but must have been conditioned by a feature found in Proto-Northern Min, most likely a series of voiced aspirate initials. The historical origin of those initials, including the possibility that they entered Proto-Northern Min from another dialect source, is explored.|000|Chinese dialects, Northern Mǐn, voiced initial, aspirated initial 1665|Her2010|Whether classifiers (C) and measure words (M) can be meaningfully distinguished in Chinese has been a controversial issue, reflected also by the drastic discrepancy in the inventories of classifiers previously proposed. The two tests, i.e. de-insertion and adjectival modification, that proponents for the C/M distinction proposed have been shown to be unreliable and thus rejected. We re-examine these two tests closely and propose two sets of refined, reliable, and revealing tests. We further employ the Aristotelian distinction between essential and accidental properties as well as the Kantian distinction between analytic and synthetic propositions to characterize the C/M distinction. M is therefore semantically substantive and thus blocks numeral quantification and adjectival modification to the noun; C, in contrast, does not form such a barrier, for it is semantically null in the sense that it merely highlights a semantic aspect inherent to the noun and thus contributes no additional meaning.|000|measure words, classifier system, Mandarin, Chinese 1666|Li1971|This is the important reference for the influential reconstruction of Old Chinese by Li Fang-Kuei.|000|Li Fang-Kuei, Old Chinese, linguistic reconstruction 1667|Liang2016|The eigenvalues and eigenvectors of the adjacency matrix of a network contain essential information about its topology. For each of the Chinese language co-occurrence networks constructed from four literary genres, i.e., essay, popular science article, news report, and novel, it is found that the largest eigenvalue depends on the network size N, the number of edges, the average shortest path length, and the clustering coefficient. Moreover, it is found that their node-degree distributions all follow a power-law. The number of different eigenvalues, N λ , is found numerically to increase in the manner of N λ ∝ log N for novel and N λ ∝ N for the other three literary genres. An ‘‘M’’ shape or a triangle-like distribution appears in their spectral densities. The eigenvector corresponding to the largest eigenvalue is mostly localized to a node with the largest degree. For the above observed phenomena, mathematical analysis is provided with interpretation from a linguistic perspective.|000|spectral partitioning, Chinese, word co-occurrence network 1668|Liang2016|This is one of many papers in bad English but potentially interesting, due to their network models (although only word co-occurrence) by the same team of scholars. Other articles include @Liang2014 and @Gao2014.|000|Chinese, word co-occurrence network 1669|Gao2014|This is one of many papers in bad English but potentially interesting, due to their network models (although only word co-occurrence) by the same team of scholars. Other articles include @Liang2014 and @Liang2016. |000|Chinese, cross-linguistic study, word co-occurrence network 1670|Liang2014|The evolution of Chinese language has three main features: the total number of characters is gradually increasing, new words are generated in the existing characters, and some old words are no longer used in daily-life language. Based on the features, we propose an evolving language network model. Finally, we use this model to simulate the character co-occurrence networks (nodes are characters, and two characters are connected by an edge if they are adjacent to each other) constructed from essays in 11 different periods of China, and find that characters that appear with high frequency in old words are likely to be reused when new words are formed.|000|word co-occurrence network, Chinese, language model, language evolution 1671|Liang2014|See also @Gao2014 and @Liang2016 for similar approaches by the same team. |000|word co-occurrence network, language evolution, language model 1672|Lin2011|This paper investigates Dongshi Hakka tone sandhi within the output-oriented framework of Optimality Theory (OT, Prince & Smolensky 1993[2004], McCarthy & Prince 1993). Two different forces are shown to motivate the tonal alternation in Dongshi Hakka. The first force is assimilatory in nature and forces intersyllabic tone features to agree. Completely contradictory to this force is a dissimilatory effect that requires elements at the tonal level and the contour level to be different. These facts are captured by N O J UMP -t, OCP-T(11), OCP- C ( ), and OCP- C ( ), which regulate the well-formedness of tonal combination. In addition to tonal sequential markedness, the markedness status of a tone itself also plays a role. A low register tone occurring in a head position is shown to be marked and indirectly decides whether a tonal combination that violates a certain sequential markedness constraint will undergo tone sandhi. This can be explicitly captured by the con- junction of tonal and sequential markedness constraints.|000|case study, Hakka, Dongshi Hakka, tone sandhi 1673|Lu1987|Until now most of the studies on Chinese dialect subgrouping have been carried out by listing and comparing a handful of the linguistic features within the framework of traditional historical linguistics. Following the methodology for quantifying the degrees of closeness among Chinese dialects proposed in Cheng (1982), this paper presents a quantitative method of dialect subgrouping which uses both correlational analysis and cluster analysis. Using this approach the dialects of 74 locations in Jiangsu Province and the Shanghai Area in China will be subgrouped and discussed.|000|subgrouping, Chinese dialects, computer-based approaches, clustering, phonology, 1674|Lu1987|A very early approach that is based on a proposal by @Chen1982, uses different features of dialects, and clusters these with a simple clustering approach.|000|Chinese dialects, clustering, subgrouping, computer-based approaches 1675|Masek1980|This paper presents a faster algorithm to compute the edit distance. It directly refers to the Wagner-Fischer algorith, thus, to the algorithm by @Wagner1974.|000|Wagner-Fischer algorithm, edit distance, algorithms, optimization 1676|Moeschler2016|The main goal of this article is to make a proposal about where procedural meaning is located. Procedural meaning is defined as guiding the processing of conceptual information, whereas conceptual meaning includes information about the representation of entities, for instance objects or events. Conceptual information for connectives is described at the level of entailment, explicature and implicature, and may indicate possible causal relations among the events described, whereas procedural information for causal connectives is restricted to indicating the direction of the causal relation (forward or backward). Conceptual information for tenses specifies temporal coordinates, while procedural meaning specifies directional and subjective properties of events, using features such as [narrative] or [subjective]. A second goal is to answer a central question for pragmatics: what is the contribution of connectives, that is, what is the difference between discourses with and without connectives? The pragmatic framework adopted, which is based on Relevance Theory, gives the following answer: in a discourse without connectives, the accessibility of the intended interpretation depends solely on the context, whereas the use of connectives allows a simpler route, reducing the number of inferential steps and helping to determine semantic and pragmatic contents such as entailments, explicatures and implicatures.|000|procedural meaning, context, semantics, meaning, meaning potential 1677|Sasamoto2016|This is an introduction to the topic of procedural meaning for a special issue of the Lingua journal.|000|procedural meaning, meaning, meaning potential, context 1678|Sims2016|Dialectology in the Qiang languages is still an underdeveloped field of study. Previous accounts of Qiang varieties have over simplistically described all varieties as belonging to one of two groups, Northern Qiang and Southern Qiang, based on broad typological features. This article demonstrates that previous subgroupings are inadequate and cannot account for the diversity of Qiang varieties, such as the previously undescribed Yonghe variety. The implication of this finding is that an entirely new approach to subgrouping of Qiang varieties is required. This paper not only deconstructs the previous subgroupings, but also puts forward a new scheme for subgrouping based on shared innovations and individual-identifying evidence in order to show which groupings have been established and to show where further work is needed.|000|Qiang, dialect data, dialectology, Sino-Tibetan, dataset, subgrouping 1679|Suzuki2011|Gagatang Tibetan is spoken in the southwestern part of Pantiange Village, Weixi Lisu Autonomous County, Diqing Tibetan Autonomous Prefecture in northwestern Yunnan. This dialect, which belongs to the Melung subgroup of the Sems-kyi-nyila dialectal group of Khams Tibetan, possesses an idiosyncratic phonological feature, i.e. the pharyngealized vowel. This paper aims to provide a sketch of the vocalism of Gagatang Tibetan (Zhollam vernacular) and a short history of the pharyngealized vowel from the viewpoint of Tibetan dialectology. The analysis shows that the pharyngealized vowel originated mostly from examples of Written Tibetan r as a main initial and a glide, and the abridged form of a historically disyllabic word. The pharyngealized vowel can correspond to the rhotacized (retroflex) one in other vernaculars of the Melung subgroup, and such a sound development is also similar to the case in Naxi.|000|case study, pharyngealization, pharyngealized vowel, Gagatang Tibetan, sound change 1680|Tai2005|Conceptual structure has been the focus of research in recent years not only in cognitive grammars but also in autonomous syntactic theories concerned with mapping form to meaning. In this paper, we give a sketch of the universal basis of conceptual structure and propose a relativist view of conceptual structures under- lying different languages. Spatial expressions in Chinese and English are used to explore this view. Spatial expressions in sign language are also considered to deepen our understanding of conceptual structure. We take issue with the theory of conceptual semantics advocated by Jackendoff for the past two decades. We present a view that “creativity” and “generativity” resides largely in conceptual component, and only derivatively in syntactic component. Thus, the process of “syntacticization” is essentially on a par with lexicalization. We argue that syn- tactic patterns reflect conceptualizations in different languages and cultures and genuine cases of syntax-semantics mismatch are greatly reduced and hence simpler syntax. We also show how pragmatic inferences can be used to simplify syntactic structure, using word order, argument selection, and contextual expres- sions in Mandarin Chinese as case studies. We thus propose a sketch to work out a non-autonomous theory of syntax with minimal requirement of tentative innate linguistic structure.|000|conceptual semantics, sign language, English, Chinese, conceptualization, cognitive linguistics 1681|Wang2015|The tonal system in Jianchuan Bai has attracted much attention for its complex combinations of pitch and phonation type. In this paper, 1 based on EGG signals, three parameters, namely F0, Open Quotient (OQ) and Speed Quotient (SQ), are extracted to examine the tonal quality. It is found that there are two non-modal phonation types, Harsh and Pressed, and roughly four groups of pitch pattern (31/31/41; 33/433; 55/54; 35) in the eight tonal categories. One pair of tones can only be distinguished from each other by phonation type since their pitches are the same. As for other pairs, both pitch and phonation type may contribute to the distinction between them. Notably, non-modal phonation types vary across different Bai speakers. For a particular non-modal tone, one speaker may employ harsh voice, while another may use pressed voice. Sometimes, the non-modal phonation type even changed within a syllable. It is then suggested that different strategies may be used to produce non-modal tones in contrast with their modal counterpart. Moreover, based on the Bai data, how to define different phonation types based on the three basic parameters, F0, OQ and SQ, is discussed. Harsh voice is a better term for the type with the laryngeal features [Middle falling F0, -OQ, -SQ] rather than high-pitched voice.|000|case study, Bai, Jianchuan Bai, lax and tense vowels, phonology 1682|Wang2010|A liaison consonant is a consonant resulting from the spreading of the final consonant of a syllable to the initial position of an onsetless particle. This study investigated whether such derived consonants are recognized by native speakers of Taiwan Min and Taiwan Hakka. Results of a concept formation and a syllable inversion experiment with Taiwan Min subjects are taken from Wang & Kao (2004). A syllable inversion experiment was done with Taiwan Hakka subjects. The results showed variant treatments by users of different languages. Taiwan Min speakers tended not to regard the liaison consonants as existing in the particle, while Taiwan Hakka speakers tended to accept their existence. And within Hakka, speakers using Sixian variety tended to accept the consonants more than Hailu speakers. We argue that such gradient performances show different degrees of morphologization of the consonants in the particles.|000|liaison consonant, sandhi, morphological change, morphologization, Taiwanese, Hakka, case study, sound change, morphological change 1683|Wee2010|This paper presents data from Tianjin tritonal sandhi patterns that challenge both traditional derivational approaches and standard Optimality Theoretic (OT) approaches to phonological alternation. If construed derivationally, Tianjin tritonal sandhi requires derivational reversals; but if construed within OT, involves combi- nations of opacity and transparency. The account proposed here appeals to a percolative model where phonological information from terminal nodes finds correspondences in higher nodes, such that the correspondences may be imperfect when triggered by markedness requirements. While this requires a total re- conceptualization of phonological representations, this paper argues that it is well worth it because it predicts that: (1) directionality is a derivate from branching; (2) the depth of derivational opacity is confined by structural depth; (3) constituency, not adjacency, provides the environment for triggering alternation so alternation rules can therefore be blocked when marked collocations belong to different constituencies; and (4) underlying entities can have split surface correspondences.|000|Tianjin Chinese, Chinese dialects, case study, tone sandhi 1684|Zhang2010|This article discusses ongoing issues regarding the International Phonetic Alphabet. First of all, the author is in agreement with Mai’s 2005 criticism of classifying places of articulation according to active articulators, but this criticism goes not far enough. Problems with a description based on passive articulators need to be addressed as well and a new approach needs to be applied. Second, the active articulator model advocated by Sagey (1986), Ladefoged & Halle (1988), Halle (1992, 2003), and Duanmu (2009) is introduced to define distinctive features. After applying this model to two languages possessing seven places of articulation, it is shown to be powerful enough to distinguish the greatest number of place contrasts thus far attested. Finally, in comparison with the IPA, the active articulator model, in dealing with either simple or complex sounds, gives less ambiguous and more reasonable results, while allowing a new perspective on related issues.|000|IPA, phonetic alphabet, phonetic transcription, data problems, 1685|Horvath2011|This book gives an overview on weighted network analysis and might provide valuable information on basic aspects of networks (adjacency matrix, clustering coefficient, topological features of networks, connectivity, etc.|000|network, network analysis, introduction, 1686|Newman2003|We study assortative mixing in networks, the tendency for vertices in networks to be connected to other vertices that are like (or unlike) them in some way. We consider mixing according to discrete characteristics such as language or race in social networks and scalar characteristics such as age. As a special example of the latter we consider mixing according to vertex degree, i.e., according to the number of connections vertices have to other vertices: do gregarious people tend to associate with other gregarious people? We propose a number of measures of assortative mixing appropriate to the various mixing types, and apply them to a variety of real-world networks, showing that assortative mixing is a pervasive phenomenon found in many networks. We also propose several models of assortatively mixed networks, both analytic ones based on generating function methods, and numerical ones based on Monte Carlo graph generation techniques. We use these models to probe the properties of networks as their level of assortativity is varied. In the particular case of mixing by degree, we find strong variation with assortativity in the connectivity of the network and in the resilience of the network to the removal of vertices.|000|assortative mixing, networks, assortativity, network partitioning, cluster evaluation, evaluation 1687|Newman2003|In the study of social networks, the patterns of connections between people in a society, it has long been known that edges do not connect vertices regardless of their property or type. Patterns of friendship between individuals for example are strongly affected by the language, race, and age of the individuals in question, among other things. If people prefer to associate with others who are like them, we say that the network shows assortative mixing or assortative matching. If they prefer to associate with those who are different it shows disassortative mixing. Friendship is usually found to be assortative by most characteristics.|1|assortativity, cluster evaluation, evaluation, assortative mixing 1688|Teo2014|This book offers a comprehensive description of the phonetic and phonological features of Sumi, a Tibeto-Burman language of Nagaland, North-east India. It represents the first in-depth investigation of the acoustic phonetics and phonology of tone in Sumi, and is one of the first extensive acoustic descriptions of a language of Nagaland. The book describes the segmental phonology, syllable structure and tone system of Sumi. It looks at the phonetic realisation of these tones and the effects of segmental perturbations on tone realisation. It also examines morphologically conditioned tone variation in Sumi. Finally, this book offers a cross-linguistic comparison of both the segmental phonology and tonal system of Sumi with that of other Tibeto-Burman languages of Nagaland.|000|Sumi, Sino-Tibetan, dialect data, dataset 1689|Teo2012|Sumi (also known by its exonym ‘Sema’) is a Tibeto-Burman language spoken in Nagaland, North-east India. It is one of the major languages of the state, with an estimated 242,000 speakers living primarily in Zunheboto district, as well as in the major cities of Kohima and Dimapur. Bradley (1997) places Sumi (referred to as Sema), among the ‘Southern Naga’ languages, which include Angami (also known as Tenyidie) and Ao, in contrast to the ‘Northern Naga’ languages such as Konyak and Nocte. Burling (2003) offers a more conservative classification, placing Sumi (referred to as Simi) in an ‘Angami-Pochuri’ group containing Angami, Chakhesang (Chokri and Kheza) and Mao. Four main dialects of Sumi have been identified: the Western dialect, the Eastern dialect, the Chizolimi dialect, and the Central dialect. The Central dialect is the standard dialect used in published works of Sumi (Sreedhar 1976: 4–5).|000|IPA, Sumi, Sino-Tibetan, phonology, phonetics, sketch 1690|VanDriem2003|The latter phenomenon would be akin to the parallel and independent morphosyntactic developments observed in Indo- European by Schlegel (1808), for which Sapir (1921) later coined the term 'drift'. Tibeto-Burman linguists too will have to distinguish the parallel and independent rise of similar morphosyntactic patterns in different branches of the family from cognate morphology, just as Indo-European scholars did nearly two centuries ago.|465|history of science, drift, linguistic drift, Indo-European, Sino-Tibetan 1691|Severo2016|We aim at critically discussing the colonial process of language discursivization in America. Such discursivization integrated the Iberian colonial mechanism, centered in Spain and Portugal, from the sixteenth century on. The paper presents and discusses the way languages and people were put into discourses from a power framework centered on the logic of modernity/coloniality. Examples of this discursivization include the production of grammars, dictionaries, word lists, catechisms and the translation of religious and administrative European discursive genres to a non-European context. It is argued that the colonial discursivization of peoples and languages was framed by an Eurocentric interpretation which left its effects until today. The article relies on the theoretical framework of colonial Linguistics and Latin American postcolonial criticism, both focused on a historical and discursive perspective. Finally, we consider that the colonial experience is complex, which means that the colonial encounter produced the emergence of resistance and cultural hybridizations|000|sociology, colonialism, South American languages 1692|Ruhlen1998|Linguistic evidence indicates that the Yeniseian family of languages, spoken in central Siberia, is most closely related to the Na-Dene family of languages spoken, for the most part, in northwestern North America. This hypoth- esis locates the source of one of the three migrations respon- sible for the peopling of the Americas.|000|concept list, Na-Dene, Yeniseian, long-range comparison, linguistic reconstruction 1693|Nicholaev2014|The paper presents the author’s current version of the reconstruction of the phonological system of Proto-Na-Dene (PND = Proto-Athabaskan-Eyak-Tlingit in J. Leer’s terms), based on comparison of three groups of languages: 1) Tlingit dialects, 2) Eyak and 3) Athabaskan lan- guages (Proto-Athabaskan). Eyak and the Athabaskan languages are quite close to each other and are traced back to an intermediate Proto-Eyak-Athabaskan language. Regular phonetic correspondences between Eyak and PA have received an original interpretation by Michael E. Krauss and Jeffrey Leer, including very complicated correspondences of sonorants. In his works, J. Leer proposed a PND reconstruction that explained most of the regular sound cor- respondences between the Na-Dene languages. Although Leer’s reconstruction is quite se- ductive with its apparent simplicity, in some aspects this simplification is unwarranted, as the real situation turns out to be a lot more complicated. This is possibly a consequence of the number of the roots involved: Leer’s reconstruction is based on a relatively short list of cognate sets (ca. 300), whereas the author of the current paper has tried to take into account the entire comparative corpus (ca. 800 sets). Due to volume restrictions, the paper consists of only a brief summary of the reconstruction and an illustrative subset of the comparative ma- terial, dealing with certain complicated sound correspondences between front and lateral af- fricates/fricatives, previously analysed in a different light by J. Leer.|000|Na-Dene, linguistic reconstruction, proto-form, database 1694|Neto2008|A major goal in the study of complex traits is to decipher the causal interrelationships among correlated phenotypes. Current methods mostly yield undirected networks that connect phenotypes without causal orientation. Some of these connections may be spurious due to partial correlation that is not causal. We show how to build causal direction into an undirected network of phenotypes by including causal QTL for each phenotype. We evaluate causal direction for each edge connecting two phenotypes, using a LOD score. This new approach can be applied to many different population structures, including inbred and outbred crosses as well as natural populations, and can accommodate feedback loops. We assess its performance in simulation studies and show that our method recovers network edges and infers causal direction correctly at a high rate. Finally, we illustrate our method with an example involving gene expression and metabolite traits from experimental crosses.|000|causal inference, directionality, phenotype network, networks 1695|Wang2011c|Since their arrival in the Tibetan Plateau during the Neolithic Age, Tibetans have been well-adapted to extreme environmental conditions and possess genetic variation that reflect their living environment and migratory history. To investigate the origin of Tibetans and the genetic basis of adaptation in a rigorous environment, we genotyped 30 Tibetan individuals with more than one million SNP markers. Our findings suggested that Tibetans, together with the Yi people, were descendants of Tibeto-Burmans who diverged from ancient settlers of East Asia. The valleys of the Hengduan Mountain range may be a major migration route. We also identified a set of positively-selected genes that belong to functional classes of the embryonic, female gonad, and blood vessel developments, as well as response to hypoxia. Most of these genes were highly correlated with population-specific and beneficial phenotypes, such as high infant survival rate and the absence of chronic mountain sickness.|000|genetic analysis, population genetics, Tibetan 1696|Zhang2014a|Together with the sign (positive or negative) and strength (strong or weak), the directionality is also an important property of social ties, though usually ignored in undirected social networks for its invis- ibility. However, we believe most social ties are natively directed, and the awareness of directionality can improve our understand- ing about the network structures and further benefit social network analysis and mining tasks. Thus it’s appealing to study whether there exist interesting patterns about directionality in social net- works and whether we can learn the directions for undirected net- works based on these patterns. In this study, we engage in the investigation of directionality pat- terns on real-world directed social networks and summarize our findings using four consistency hypotheses. Based on these hy- potheses, we propose ReDirect, an optimization framework which makes it possible to infer the hidden directions of undirected social ties based on the network topology only. This general framework can incorporate various predictive models under specific scenarios. Furthermore, we show how to improve ReDirect by introducing semi/self-supervision in the framework and how to construct the self-labeled training data using simple but effective heuristics. Ex- perimental results show that even without external information, our approach can recover the directions of networks effectively. Moreover, we’re quite surprising to find that ReDirect can benefit predictive tasks remarkably, with a case study of link prediction. In experiments the redirected networks inferred using ReDirect are proven much more informative than original undirected ones and can improve the prediction performance significantly. It convinces us that ReDirect can be a beneficial general data preprocess tool for various network analysis and mining tasks by uncovering the hidden directions of undirected social networks.|000|directionality, social networks, networks, causal inference 1697|Calude2016|Numerals have fascinated and mystified linguists, mathematicians and lay persons alike for centuries. The productive use of numerals (in languages where this happens) exploits recursivity to give rise to what we call the ‘the number line’. While the smaller numerals 1–10 have enjoyed intense scrutiny, the typological study of the formation of the higher numerals has received comparatively less atten- tion. This article contains a comprehensive typological account of how languages in the Indo- European language family code numerals beyond 10 (10–99, 100s, 1,000s), the morphemes involved, and how these are ordered. We use this dataset from eighty-one Indo-European languages with phylogenetic comparative methods to propose diachronic reconstructions of these patterns in the Proto-Indo-European language. Our findings indicate that small numerals (11–19) show the widest cross-linguistic variation, and that higher numerals exhibit more consistency in both component parts and their ordering. Additionally, we show statistical evidence of correlations between the ordering of base and atom morphemes and other word order patterns (noun-postposition, noun-genitive, and verb-object order).|000|numerals, Indo-European, ancestral state reconstruction, 1698|Das2016|The Yiddishlanguageisover1,000yearsoldandincorporatesGerman,Slavic,andHebrewelements.TheprevalentviewclaimsYiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians,andTurksexhibitthehighestgeneticsimilaritywithAJs.TheGeographicPopulationStructureanalysislocalizedmostAJsalong major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from “Ashkenaz.” Iranianandmountain Jews werelocalizedalongtraderouteson theTurkey’s easternborder.Loss ofmaternalhaplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call “Ashkenazic” (i.e., “Scythian”), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.|000|population genetics, Yiddish, archaeogenetics 1699|Francois2016|Linguistic diffusion is commonly equated with contact, and contrasted with genealogy. This article takes a new perspective, by showing how diffusion lies in fact at the heart of language genealogy itself. Indeed, the Comparative method has taught us to identify genetic subgroups based on sets of shared innovations; but each of these innovations necessarily had to diffuse from speaker to speaker across a network of then mutually intelligible idiolects. Such a diffusionist approach to language genealogy allows us to model language change as it really took place in the social and geographical space of past societies. Crucially, the entangled isoglosses typical of dialect continuums and linkages (Ross 1988) cannot be handled by the Tree model, which is solely based on divergence; but they are easily captured by a diffusionist approach such as the Wave model, where the key process is convergence. After comparing the theoretical underpinnings of these two models, I introduce Historical Glottometry, a new quantitative approach aiming to free the Comparative Method from any cladistic assumption, and to reconcile it with a wave-based analysis. Finally, data from a group of Oceanic languages from Vanuatu illustrate the powerful potential of Glottometry as a new method for linguistic subgrouping.|000|networks, wave theory, linguistic diffusion, linkages, genetic relationship, linguistic area 1700|Huth2016|The meaning of language is represented in regions of the cerebral cortex collectively known as the ‘semantic system’. However, little of the semantic system has been mapped comprehensively, and the semantic selectivity of most regions is unknown. Here we systematically map semantic selectivity across the cortex using voxel-wise modelling of functional MRI (fMRI) data collected while subjects listened to hours of narrative stories. We show that the semantic system is organized into intricate patterns that seem to be consistent across individuals. We then use a novel generative model to create a detailed semantic atlas. Our results suggest that most areas within the semantic system represent information about specific semantic domains, or groups of related concepts, and our atlas shows which domains are represented in each area. This study demonstrates that data-driven methods—commonplace in studies of human neuroanatomy and functional connectivity—provide a powerful and efficient means for mapping functional representations in the brain.|000|neurology, semantic map, semantic network, 1701|Lamarre2014|How many words – and which ones – are sufficient to define all other words? When dictionaries are analyzed as directed graphs with links from defining words to defined words, they reveal a latent structure. Recursively removing all words that are reachable by definition but that do not define any further words reduces the dictionary to a Kernel of about 10%. This is still not the smallest number of words that can define all the rest. About 75% of the Kernel turns out to be its Core, a “Strongly Connected Subset” of words with a definitional path to and from any pair of its words and no word’s definition depending on a word outside the set. But the Core cannot define all the rest of the dictionary. The 25% of the Kernel surrounding the Core consists of small strongly connected subsets of words: the Satellites. The size of the smallest set of words that can define all the rest – the graph’s “minimum feedback vertex set” or MinSet – is about 1% of the dictionary,15% of the Kernel, and half-Core/half-Satellite. But every dictionary has a huge number of MinSets. The Core words are learned earlier, more frequent, and less concrete than the Satellites, which in turn are learned earlier and more frequent but more concrete than the rest of the Dictionary. In principle, only one MinSet’s words would need to be grounded through the sensorimotor capacity to recognize and categorize their referents. In a dual-code sensorimotor/symbolic model of the mental lexicon, the symbolic code could do all the rest via re-combinatory definition.|000|dictionary, semantic network, language model, semantic similarity 1702|Leslie2004|**Motivation:** Classification of proteins sequences into func- tional and structural families based on sequence homology is a central problem in computational biology. Discriminat- ive supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. **Results:** We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein clas- sification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well- motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while travers- ing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. **Availability:** SVM software is publicly available at http://microarray.cpmc.columbia.edu/gist. Mismatch kernel software is available upon request.|000|protein families, homolog detection, mismatch kernel 1703|Liu2015|The monograph entitled Methodologies and Implementations of Sino-Tibetan Comparisons by Feng Wang is a great contribution to comparative studies of Sino-Tibetan languages. Proto-Yi has been reconstructed on the basis of rigorous sound correspondence in this book. The relation- ship of Chinese, Bai and Yi has been analyzed by the distillation method. The most important contribution of this book may be the rigorous framework for historical comparison, which can be effectively applied to languages with a long history of contact. In the empirical side, this research ended the debate of the complex relationship between Chinese, Bai and Yi. This book under review is an important reference for comparative studies of Sino-Tibetan languages.|000|review, Sino-Tibetan, genetic classification, Bai, Sinitic 1704|Wang2014a|In the context of genetics and breeding research on multiple phenotypic traits, reconstructing the directional or causal structure between phenotypic traits is a prerequisite for quantifying the effects of genetic interventions on the traits. Current approaches mainly exploit the genetic effects at quantitative trait loci (QTLs) to learn about causal relationships among phenotypic traits. A requirement for using these approaches is that at least one unique QTL has been identified for each trait studied. However, in practice, especially for molecular phenotypes such as metabolites, this prerequisite is often not met due to limited sample sizes, high noise levels and small QTL effects. Here, we present a novel heuristic search algorithm called the QTL+phenotype supervised orientation (QPSO) algorithm to infer causal directions for edges in undirected phenotype networks. The two main advantages of this algorithm are: first, it does not require QTLs for each and every trait; second, it takes into account associated phenotypic interactions in addition to detected QTLs when orienting undirected edges between traits. We evaluate and compare the performance of QPSO with another state-of-the-art approach, the QTL-directed dependency graph (QDG) algorithm. Simulation results show that our method has broader applicability and leads to more accurate overall orientations. We also illustrate our method with a real-life example involving 24 metabolites and a few major QTLs measured on an association panel of 93 tomato cultivars. Matlab source code implementing the proposed algorithm is freely available upon request.|000|causal inference, phenotype network, directionality 1705|Mandera2016|Recent developments in distributional semantics (Mikolov et al., 2013) include a new class of prediction-based models that are trained on a text corpus and that measure semantic similarity between words. We discuss the relevance of these models for psycholinguistic theories and compare them to more traditional distributional semantic models. We compare the models' performances on a large dataset of semantic priming (Hutchison et al., 2013) and on a number of other tasks involving semantic processing and conclude that the prediction- based models usually offer a better fit to behavioral data. Theoretically, we argue that these models bridge the gap between traditional approaches to distributional semantics and psychologically plausible learning principles. As an aid to researchers, we release semantic vectors for English and Dutch for a range of models together with a convenient interface that can be used to extract a great number of semantic similarity measures.|000|concept list, meta data, semantic similarity, dataset, semantic network 1706|VanDriem2014b|This Trans-Himalayan tale unites two narratives, an historical account of schol- arly thinking regarding linguistic phylogeny in eastern Eurasia alongside a recon- struction of the ethnolinguistic prehistory of eastern Eurasia based on linguistic and human population genetic phylogeography. The first story traces the tale of transformation in thought regarding language relationships in eastern Eurasia from Tibeto-Burman to Trans-Himalayan. The path is strewn with defunct family trees such as Indo-Chinese, Sino-Tibetan, Sino-Himalayan and Sino-Kiranti. In the heyday of racism in scholarship, Social Darwinism coloured both language typology and the phylogenetic models of language relationship in eastern Eurasia. Its influential role in the perpetuation of the Indo-Chinese model is generally left untold. The second narrative presents a conjectural reconstruction of the ethno- linguistic prehistory of eastern Eurasia based on possible correlations between genes and language communities. In so doing, biological ancestry and linguistic affinity are meticulously distinguished, a distinction which the language typolo- gists of yore sought to blur, although the independence of language and race was stressed time and again by prominent historical linguists.|000|Sino-Tibetan, Trans-Himalayan, genetic classification, subgrouping 1707|VanDriem2014b|This paper gives an interesting historical overview on the classification of Sino-Tibetan.|000|Trans-Himalayan, Sino-Tibetan, genetic classification, subgrouping 1708|Brysbaert201X|This article deals with the number of words a speaker knows and is currently under review. It is very interesting to asses this also in the light of historical linguistics, since it shows to which degree change can influence our words.|000|word frequency, word formation, mental lexicon, lexical storage 1709|Regier2016|The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our knowledge the line of reasoning behind it has not been tested broadly across languages. Here, we note that this reasoning is a special case of the more general view that language is shaped by the need for efficient communication, and we empirically test a vari- ant of it against multiple sources of data, including library reference works, Twitter, and large digital collections of linguistic and meteorological data. Consistent with the hypothesis of efficient communication, we find that languages that use the same linguistic form for snow and ice tend to be spoken in warmer climates, and that this association appears to be mediated by lower communicative need to talk about snow and ice. Our results confirm that variation in semantic categories across languages may be traceable in part to local commu- nicative needs. They suggest moreover that despite its awkward history, the topic of “words for snow” may play a useful role as an accessible instance of the principle that language supports efficient communication.|000|denotation, semantic change, semantic similarity, semantics, polysemy, CLICS 1710|Shanon1978|The breaking of the genetic code has propagated analogies between genetic and linguistic phenomena. Biologists and linguists alike have claimed that there are similarities between the genetic code and human language. 1 In the present paper I argue that the similarity between the two systems is only misleading and that the two are fundamentally different.|000|biological parallels, analogy, genetic code, linguistics, 1711|Hymes1983|The terms 'lexicostatistics' and 'glottochronology' are both less than a generation old, and one commonly thinks of the subjects as being equally new. It is therefore suprising to discover that a form of glottochronology was devised more than a hundred years ago by the great French anthropologist, Paul Broca, and that Broca built upon a form of lexicostatistics devised by the noted French explorer, Dumont d'Urville. The latter had capitalized upon an idea of Constantin Rafinesque, poet, supposed lost Dauphin, finder or frger of the Delaware Indian *Walam Olumn*, a man of both genuine and spurious claims to scientific repute. Broca invented his glottochronology for an address to the Société d'Anthropologie de Paris; behind it lay two Pacific expeditions, vocabularies from French exploration under Dumont d'Urville and genealogies by Horatio Hale from a U.S. Exploring Expedition under Wilkes. Behind Dumont d'Urville's lexicostatistics lay a prize essay sent across the Atlantic to Paris from Rafinesque's residence, Philadephlia.|59|Paul Broca, lexicostatistics, history of science, Dumont d'Urville 1712|Metoz2006|The aim of this article is to show, by a study of the works of the American scholar Constantine Samuel Rafinesque (1783- 1840) and of the French scholars Jules Dumont d'Urville (1790-1842) and Paul Broca (1824-1880), that Morris Swadesh’s glottochronology which is at the base of the «New Synthesis» can in no way be regarded as innovative either in a methodological or in a historical sense. An exhaustive analysis of their work will be undertaken here in order to understand the intrinsic development of lexico-statistics and glottochronology in the 19th century and to emphasise the major role of the often undervalued work of C. S. Rafinesque in the emergence of these disciplines.|000|glottochronology, lexicostatistics, history of science, Paul Broca, Dumont d'Urville 1713|Hymes1983|Anthropology and linguistics, as historically developing disciplines, have had partly separate roots and traditions. In particular settings and in general, the two disciplines have partly shared, partly differed in the nature of their materials, their favorite types of problem the personalities of their dominant figures, their relations with other disciplines and intellectual current. The two disciplines have also varied in their interrelation with each other and the society about them. Institutional arrangements have reflected the varying degrees of kinship, kithship, and separation. Such relationships themselves form a topic that is central to a history of linguistic anthropology yet marginal to a self-contained history of linguistics or anthropology as either would be conceived by most authors. There exists not only a subject matter for a history of linguistic anthropology, but also a definite need. |000|history of science, anthropology, historical linguistics, introduction, glottochronology 1714|Hymes1983|Table of contents: 0. Introduction ix 1. Notes towards a History of Linguistic Anthropology 1 2. Lexicostatistics and Glottochronology in the 19th Century 59 3. The Americanist Tradition in Linguistics 115 4. Linguistic Method in Ethnography 135 5. Linguistic Anthropologist Alfred Louis Kroeber 245 6. From the First Yale School to World Prehistory Morris Swadesh 273 7. The Pre-war Prague School and Post-war American Anthropological Linguistics 331 8. Tradition and Paradigms 345 9. Index of Authors 385 10. Index of Subjects |000|history of science, anthropology, lexicostatistics, historical linguistics 1715|Quack2004|Text provides evidence that Mycenean Greek and Hittite place names are reflected in Egypt transcriptions.|000|Hittite, laryngeal theory, Mycenean Greek, Greek, Indo-European 1716|Reinoehl2016|It has been widely assumed that the primary adpositions of modern Indo-European languages constitute a historically identical category, descending from the Proto-Indo-European ‘local particles’. I argue that this assumption needs to be revised, because a major branch of the language family, Indo-Aryan, possesses adpositions of unrelated origin. This is not only a question of different etyma, but the New Indo-Aryan adpositions descend from structurally different sources. The ancient local particles, as attested in early Indo-Aryan varieties, combine with local case forms and show a preference for the prenominal position. By contrast, the New Indo-Aryan adpositions descend from nominal and verbal forms heading genitives, and show a propensity for the postnominal slot. Thus, we are dealing with elements unrelated not only etymologically, but also with regard to their morphosyntactic distribution.|000|Indo-European, adposition, local particle, genetic classification, ancestral state reconstruction 1717|Hume2016|In this paper we propose that insight into the unmarked nature of these patterns can be gained when we take seriously the view of language as a system of information transmission. In particular, we suggest that perceptually weak and strong unmarked patterns are those that effectively balance two competing properties of effective communication: (a) the contribution of the phonological unit in context to accurate message transmission, and (b) the resource cost of the phonological unit (see Hall, Hume, Jaeger & Wedel, forthcoming). In order to demonstrate this, we begin in §2 by describing key properties of language as a system of information transmission. In the following section, we turn to the predictions that follow from such a system for perceptually weak and strong sound patterns and show why it makes sense for them both to be described as unmarked.|000|sound patterns, markedness, phonotactics, information theory, language model 1718|Hume2016|**Message Modificatin Principle** a. Enhance redundancy on phonological-units-in-context that will contribute the most to successful message transmission, while having the least negative impact on resource cost. b. Reduce redundancy in phonological-units-in-context that will provide the least benefit to successful message transmission, thereby reducing resource cost.|3|sound patterns, phonotactics, redundancy, message modification 1719|Hume2016|With this as a basis, we return to the issue of why perceptually weak and strong sound patterns can both pattern as unmarked. The solution to this problem is that in both patterns, there is an effective trade-off between a PUC's resource cost and its potential contribution to accurate transmission of the message. In an effective system of information transfer, stable sound patterns with weak perceptual cues are predicted to be associated with PUCs that contribute little to message transmission accuracy; that is, the PUC has high relative predictability within the message, by occurring for example, toward the end of the word, or in a high frequency affix or function word. Investing resources in a PUC that contributes little to message identity is not cost-effective. Conversely, it is worthwhile to invest in PUCs that make a strong contribution to message transmission accuracy. Stable patterns with perceptually strong sounds are thus predicted to occur in contexts that are informative since strong perceptual cues support the transmission of unpredictable information, e.g. beginning of words, content words (as opposed to affixes/function words). Stable PUCs with strong cues occur toward the upper right quadrant of the figure in (5), while those with weak cues are located in the lower left quadrant.|5|stability, sound patterns, context, phonotactics, redundancy 1720|Hume2016|This paper offers an interesting idea regarding strength and weakness in sound patterns, and how they could be determined based on information-theoretic principles. Strength and weakness depend on different aspects which can be modelled by looking at the speech signal from an information-theoretic perspective. It seems worthwhile to look deeper into this and to see how it could be modeled for the purpose of language comparison.|000|sound patterns, redundancy, language model, stability, phonological strength 1721|Dickens1990|Tocharian is an extinct Indo-European language which stands by itself as one of the eleven major groups in the IE 1 language family. It was not discovered until the turn of this century, as a result of archaeological expeditions to Chinese Turkestan. What are some of the characteristics of Tocharian and what impact has it made on our knowledge of IE languages in general? This paper seeks to answer these questions.|000|Tocharian, introduction, overview, Indo-European 1722|Broca1862|This discussion article of some 60 pages is interesting in the context of early historical linguistics, as the author takes up ideas by Dumont d'Urville to quantify differences between languages. The paper of @Metoz2006 is very interesting in this context, as it introduces the main thoughts of Paul Broca in more detail. According to @Hymes1983, Broca even mentions a concept list of 115 items, but apparently it is nowhere printed in this article but just mentioned. Further reading of this article is required.|000|Paul Broca, glottochronology, history of science, lexicostatistics, concept list, 1723|Broca1862|Maintenant, laisson de côté l'origine du langage; considérons une collection d'individus qui parlent une langue spéciale. Cette langue, née ou non parmi eux, va se transmettre de génération en génération. En la comparant aux autres langues connues, nous constatons qu'elle ne ressemble à aucune d'elles, qu'elle constitue à elle seule un type tout particulier. Dès lors nous pouvons formuler les caractères essentiels qui lui appartiennent, et nous les rangeons avec juste raison au nombre des traits distinctifs du peuple ou de la race que nous examinons. :translation:`Now, putting the origin of the language faculty at the side, let's consider a group of individuals who speak a specific language. This language [...] will be transmitted from generation to generation. by comparing it with other languages, we constate that it does not ressemble any other language, that it constitues itself a particular type. From this we can determine the essential characteristics which belong to it, and we can rank them with good reason along with the number of distinctive traits of the people or the race which we examine.`|26|language comparison, distinctive traits, human prehistory 1724|Broca1862|Par conséquent, lorsque deux peuples se mélangent, il n'y a aucun parallélisme entre les conditions qui font pré valoir le type physique et celles qui font prévaloir le type linguistique de l'une ou de l'autre race. Au bout d'un cer tain nombre de générations, quand le mélange est effectué, la race croisée tend à se rapprocher de plus en plus du type physique de la race la plus nombreuse, tandis que c'est quelquefois la langue de la race la moins nombreuse qui supplante et remplace celle de la majorité. |32|language mixture, population, mixed languages, reticulate evolution 1725|Broca1862|Les linguistes ont sur nous un grand avantage : c'est qu'ils peuvent se passer de nous, tandis que nous ne pou vons nous passer d'eux. Pour disséquer les langues, pour les grouper en familles, pour découvrir l'histoire de leur formation, de leur évolution intérieure, de leur dissémina tion parmi les peuples, il n'est pas nécessaire de se pré occuper des questions de race ; il est même bon d'aborder ces difficiles problèmes sans idée préconçue, et sans s'in quiéter d'autre chose que de découvrir des faits qui ne re lèvent que d'eux-mêmes|34|historical linguistics, population, language evolution, human prehistory, anthropological evolution 1726|Broca1862|Les altérations spontanées du langage sont soumises sans doute à certaines lois ; mais l'évolution d'une langue est-elle comparable à ces courbes algébriques qu'on peut construire dans toute leur étendue lorsqu'on en connaît quelques points ? La linguistique est-elle parvenue , ou parviendra-t-elle jamais à ce degré de précision ? Je n'ai pas qualité pour le dire, mais je crains bien qu'il soit aussi difficile de deviner ce que fut une langue il y a vingt-deux mille années, que de prédire ce qu'elle sera au bout d'un pareil laps de temps.|40|language change, quantitative analysis, linguistics, language evolution 1727|Broca1862|Pour apprécier la lenteur avec laquelle les langues se modifient par la seule action du temps, lorsqu'elles sont à l'abri des influences étrangères, il suffit de considérer et de comparer les dialectes des principaux archipels de la Polynésie. Tout le monde sait que le capitaine Cook prit à son bord un Taïtien nommé Tupaïa qui put lui servir d'in terprète lorsqu'il toucha à la Nouvelle-Zélande. Dumont- d'Urville, dans la Partie philologique du Voyage de J'àstro- labe (t. I, p. 273. Paris, 1834, grand in-8°), a publié des vocabulaires très-étendus de la plupart des dialectes poly nésiens et comparé ces vocabulaires deux à deux, en comptant les mots identiques, les mots simplement sem blables, et enfin les mots qui lui paraissaient tout à fait différents. Ne connaissant pas encore les lois qui ont pré sidé dans chaque archipel à l'altération des consonnes , lois qui ont été découvertes par ses successeurs, Dumont- d'Urville a dû nécessairement ranger dans la catégorie des [pb] mots tout à fait différents un grand nombre de mots qui ne différaient que par la prononciation. Malgré cela, il a trouvé que le nombre des mots très-semblables ou iden tiques variait, suivant les termes de comparaison, de 41 à 74 pour cent. Par exemple, sur 431 mots considérés dans le taïtien et dans le mawi (Nouvelle-Zélande), il a trouvé 100 mots identiques, 257 mots simplement sembla bles, en tout, 357 mots plus ou moins semblables, ce qui ferait plus de 80 pour cent ; mais en ne prenant, pour établir la comparaison, que les mots identiques et les mots très-semblables, il a exprimé le rapport d'identité par le chiffre de 53 pour cent.|41f|quantitative analysis, lexicostatistics, historical linguistics, history of science, Dumont d'Urville, Polynesian 1728|Broca1862|1. Les caractères anthropologiques de premier ordre sont les caractères physiques, parce que ce sont les plus permanents ; 2. Les caractères fournis par la linguistique sont toujours utiles, quelquefois indispensables ; mais à eux seuls ils ne peuvent résoudre définitivement les questions d'anthropologie. Lorsque les conclusions qui paraissent en découler sont en opposition avec celles qui reposent sur les caractères physiques, lorsqu'il y a lieu de se demander si une race a changé de type ou si elle a changé de langue , l'hésitation doit disparaître devant cette considération que letype est infiniment plus permanent que le langage. 3. Lorsque deux races vivent sur le même sol et se mé langent, le type physique s'altère d'abord en proportion de l'intensité du mélange, puis la race croisée tend à revenir, par la suite des générations, au type de la race-mère la plus nombreuse. Le type physique qui survit au croisement avec plus ou moins de pureté, est donc celui de la race qui pré domine numériquement. 4. Dans les mêmes conditions de mélange, les langues des deux races respectives ne se fusionnent pas. L'une d'elles tôt ou tard supplante l'autre, au prix de quelques altérations qui ne la modifient pas dans ses caractères essentiels. Mais la langue qui survit n'est pas toujours celle de la race la plus nombreuse; c'est souvent celle de la mi norité;|54f|anthropology, linguistics, anthropological evolution, language evolution, analogy, biological parallels 1729|Broca1862|La linguistique, par conséquent, fournit à l'anthropologie des renseignements et non des arrêts, et elle doit intervenir dans nos débats, non à titre déjuge, mais à titre de témoin. :translation:`Linguistics therefore offers anthropology hints, no complete proofs, and it should take a place in our discussions, but not as a judge, but as a witness.` |55|biological parallels, anthropology, analogy, language evolution, anthropological evolution 1730|Wallis2013a|Many statistical methods rely on an underlying mathematical model of probability based on a simple approximation, one that is simultaneously well-known and yet frequently misunderstood. The Normal approximation to the Binomial distribution underpins a range of statistical tests and methods, including the calculation of accurate confidence intervals, performing goodness of fit and contingency tests, line- and model-fitting, and computational methods based upon these. A common mistake is in assuming that, since the probable distribution of error about the “true value” in the population is approximately Normally distributed, the same can be said for the error about an observation. This paper is divided into two parts: fundamentals and evaluation. First, we examine the estimation of confidence intervals using three initial approaches: the “Wald” (Normal) interval, the Wilson score interval and the “exact” Clopper-Pearson Binomial interval. Whereas the first two can be calculated directly from formulae, the Binomial interval must be approximated towards by computational search, and is computationally expensive. However this interval provides the most precise significance test, and therefore will form the baseline for our later evaluations. We also consider two further refinements: employing log-likelihood in intervals (also requiring search) and the effect of adding a continuity correction. Second, we evaluate each approach in three test paradigms. These are the single proportion interval or 2 × 1 goodness of fit test, and two variations on the common 2 × 2 contingency test. We evaluate the performance of each approach by a “practitioner strategy”. Since standard advice is to fall back to “exact” Binomial tests in conditions when approximations are expected to fail, we report the proportion of instances where one test obtains a significant result when the equivalent exact test does not, and vice versa, across an exhaustive set of possible values. We demonstrate that optimal methods are based on continuity-corrected versions of the Wilson interval or Yates’ test, and that commonly-held beliefs about weaknesses of tests are misleading. Log-likelihood, often proposed as an improvement on , performs disappointingly. Finally we note that at this level of precision we may distinguish two types of 2 2 test according to whether the independent variable partitions data into independent populations, and we make practical recommendations for their use.|000|binomial confidence intervals, contingency tests, evaluation 1731|Wallis2013|A set of statistical tests termed contingency tests, of which χ2 is the most well-known example, are commonly employed in linguistics research. Contingency tests compare discrete distributions, that is, data divided into two or more alternative categories, such as alternative linguistic choices of a speaker or different experimental conditions. These tests are highly ubiquitous, and are part of every linguistics researcher’s arsenal. However, the mathematical underpinnings of these tests are rarely discussed in the literature in an approachable way, with the result that many researchers may apply tests inappropriately, fail to see the possibility of testing particular questions, or draw unsound conclusions. Contingency tests are also closely related to the construction of confidence intervals, which are highly useful and revealing methods for plotting the certainty of experimental observations. This paper is organized in the following way. The foundations of the simplest type of χ2 test, the 2 × 1 goodness of fit test, is introduced and related to the z test for a single observed proportion p and the Wilson score confidence interval about p. We then show how the 2 × 2 test for independence (homogeneity) is derived from two observations p 1 and p 2 and explain when each test should be used. We also briefly introduce the Newcombe-Wilson test, which ideally should be used in preference to the χ test for observations drawn from two independent populations (such as two sub-corpora). We then turn to tests for larger tables, generally termed r × c tests, which have multiple degrees of freedom and therefore may encompass multiple trends, and discuss strategies for their analysis. Finally, we turn briefly to the question of differentiating test results. We introduce the concept of effect size (also termed “measures of association”) and finally explain how we may perform statistical separability tests to distinguish between two sets of results.|000|xi-square test, contingency tests, evaluation 1732|Pericliev2015|The paper presents a typological database of colexifications among basic vocabulary, derived from the Automated Similarity Judgment Program (ASJP) database. Some uses of the inventory of colexifications are proposed. Some heuristics are introduced regarding the discrimination of polysemy from homonymy in a typological database, as well as such pertaining to the determination of common membership of two languages in the same language family. In particular, it was found that shared colexifications corroborate the postulation of the Austric stock and the attribution of Sumerian to the Tibeto-Burman language family.|000|colexification, genetic relationship, proof of relationship, database, ASJP 1733|Pericliev2015|The paper illustrates how theoretically we could use colexification to search for deep genetic relations, by using very rare colexification patterns. They illustrate this for Sino-Tibetan and Sumerian, which share the pattern blood=new.|000|ASJP, genetic relationship, colexification, database, Sino-Tibetan, Sumerian, proof of relationship 1734|Wohlgemuth2009|The title of this work, “A typology of verbal borrowings”, basically already outlines its rationale — the cross-linguistic description and typologization of the techniques and mechanisms involved in accommodating borrowed verbs into their recipient languages. For the purposes of this introduction into the topic, the main mechanisms of loan verb accommodation are briefly sketched in sec. 1.1.5. As befits a true typology, this study is based on a large sample of borrowed verbs from over 350 languages worldwide. The research for the present work has been carried out in close association with the Loanword Typology Project (cf. sec. 1.2.3) and shares its goal to add to the understanding of the structure, semantics and general properties of loanwords in the languages of the world, with a particular focus on the class of verbs. In so doing, this dissertation also evaluates previous claims and findings on verb borrowability and loan verb accommodation. The Loanword Typology Project and the broader research context, as well as the objective of the present study, will be further elaborated in the follow- ing sections of the introduction.|000|verbal borrowing, lexical borrowing, verb, lateral transfer, historical linguistics, language contact 1735|Swadesh1953|Each pair of languages is examined for agreements of form and meaning, that is for cases where the words for a given item of meaning are phonetically such that they are evidently derived from the same original form in the common period of the two languages. The criterion of cognate sameness is that the two forms must correspond phonetically in accordance with the transformations of sounds which are known to have taken place in each line of development from the original common language. In addition to these phonetic "laws," one also takes into account assimilatory-dissimilatory, and analogical changes and the use of symbolic mutations ("ablaut") and of affixes.|350|cognate detection, cognate annotation, cognate sets, Swadesh cognacy 1736|Swadesh1953|In these lists, resemblances due to accidental convergence and to recent borrowing are disregarded. Cognate agreements are included even where the resemblance is obscured by phonetic, analogic and morphologic changes.|350|cognacy, cognate annotation, Swadesh cognacy, lexical borrowing 1737|Swadesh1953|Article gives explicit cognate judgments, corrected by André Martinet, for Russian, English, and French. These cognate judgments, however, show problematic characteristics, due to large amounts of morphological blurring of evidence, as in the ase of Russian "v", being marked as cognate with English "in", and with French "dans". It might be worthwhile to digitize the data and check those cases of really remote similarity due to morphological blurring.|000|cognacy, cognate annotation, Swadesh cognacy, homology 1738|Swadesh1953|The relationship between lexical divergence and geographic separation due to migration may vary depending on the special circumstances. First, if the two groups speak essentially the same language at the outset and if the sepa- ration is complete, the vocabulary divergence should coincide with the period of time since the migration. Second, the two peoples may have spoken dialects which were already divergent at the time of separation; in this case, the lexical divergence will of course be greater than the time since migration. Or third, the two groups may continue to maintain something of their original speech community for some time after the migration, tending to slow down the rate of divergence from each other. Our figures suggest that the third situation prevailed, and the archeology supports this in its evidences of exten- sive trade relations.|352|divergence, language contact, lexicostatistics, 1739|Wan2002|This paper aims to assess the cognitive validity of the underspecification of ([+anterior]) coronals and focus on whether there is any asymmetrical behavior among dentals, retroflexes, and velars with respect to the palatalization process in Mandarin. A corpus of 3500 slips of the tongue data was analyzed and evidence presented. The analysis shows that actual speakers of Mandarin use underspecified representations on line during language production, and that coronals take part in phonological patterns which are different from those of other places of articulation.|000|slips of the tongue, dataset, coronals, phonology, underspecification, 1740|Wan2002|The data is very interesting, but unfortunately not complete. Even more interesting than the interpretation of the author, however, is the fact, that it is apparently naively assumed that slips of the tongue would immediately point to deeper phonological representations, which is surely not necessarily the case. On the contrary, it seems that most of the slips of the tongue are better explained by looking at surrounding sounds, from which they derive their target representation.|000|slips of the tongue, dataset, phonological representation 1741|Hsiao2016|This paper reports the findings from a semantic judgment priming task aimed at examining how senses of three semantically complex Mandarin verbs chī ‘eat’, dǎ ‘hit’, and xǐ ‘wash’ are stored in the mental lexicon. In the experiment, the prime stimuli belonged to the basic senses of the three critical polysemous verbs. Three target conditions, that is, verb phrases with the same basic senses (e.g. chī niúpái ‘to eat steak’), closely related senses (e.g. chī wěiyá ‘to attend a year-end party’), and distantly related senses (e.g. chī lǎoběn ‘to live on one’s own fat’), were prepared to test whether processing patterns varied for different types of extended senses within a polysemous verb. The results indicate linearly decreasing priming effects for senses moving from the core senses, through closely related senses, to distantly related senses. These effects conflict with the separate-entry view and provide evidence for a shared core representa- tion among polysemous senses.|000|polysemy, homophony, mental representation, lexical decision task, lexical storage, psycholinguistics, 1742|Hsiao2016|This paper shows interesting evidence that shows that there is no strict boundary between homophony and polysemy, but rather a different degree of closeness of meanings of concepts. It also gives a recent overview over the psycholinguistic debate regarding homophony and polysemy.|000|homophony, polysemy, mental lexicon, Mandarin, psycholinguistics 1743|Biq2004|This is a corpus-based study of general nouns, ren ‘people’, shi(qing) ‘matter’, and dongxi ‘thing, object’, as they are used in spoken Mandarin in contemporary Taiwan. Our examination focuses on the co-occurrences and collocations of these general nouns with other linguistic elements. Our investigation shows that the three lexical items display different tendencies in designating referential specificity. They also manifest different extents to which each can form fixed expressions and stabilized constructions with other linguistic elements. Finally, the behavior of these general nouns also demonstrates how language use in interaction motivates and reinforces the conventionalization of meaning construction.|000|corpus studies, Mandarin, Taiwanese 1744|Evans2013|The study of linguistic diversity, and the factors driving change between language states, in different sociocultural contexts, arguably provides the best arena of human culture for the application of evolutionary approaches, as Darwin realized. After a long period in which this potential has been neglected, the scene is now set for a new re- connection of evolutionary approaches to the astonishingly diverse range of languages around the world, many on the verge of extinction without trace. This chapter outlines the various ways coevolutionary models can be applied to language change, and surveys the many ways diversity manifests itself both in language structure and in the organization of diversity beyond the language unit. Problems of establishing comparability and characterizing the full dimensions of the design space are discussed, including the distribution of characters across it, the correlations between them, and the challenge of establishing diachronic typologies (i.e., establishing the likelihood of different types of transition, including the insights that could be reached through properly focused studies of micro-variation). It concludes by surveying the main types of selection that mold the emergence of linguistic diversity—psychological/ physiological, system/semiotic, and genetic/ epidemiological—and spells out seven major challenges that confront further studies of linguistic diversity within an evolu- tionary framework. |000|language diversity, cultural evolution, evidence, language change, 1745|Evans2013|:comment:`gives a nice table that shows the systematic aspects and the difficulty in comparing phonemes across langauges`|251|systemic processes, language comparison, 1746|Eatough1997|Liangshan Yi (also known as Nosu, spoken in Sichuan Province, China) has many phonetically-interesting syllables. In this paper an articulatory description of the full range of distinctive syllables of this language is given and it is shown that the standard phonemicization of these is reasonable.|000|Liangshan Yi, Loloish, Sino-Tibetan, documentation, phoneme inventory 1747|Bradley1997|The Tibeto-Burman (TB) languages are the principal languages of the Himalayan region, spoken from Kashmir in the west, across the Himalayan and sub-Himalayan regions of India, Nepal, Bhutan, Banlgadesh, Tibet and China, and into Southeast Asia across Burma, Thailand, Laos and Vietnam. There are several hundred languages known, and doubtless some others yet to be identified.|000|Tibeto-Burman, Sino-Tibetan, overview, genetic classification, language tree 1748|Bradley1997|Article shows a very explicit language tree of a large number of Tibeto-Burman languages.|000|language tree, phylogenetic reconstruction, genetic classification, Sino-Tibetan, Tibeto-Burman 1749|Kozincev2016|Допустив заимствование индоевропейской речи автохтонами степей у мигрантов с юга и смешение между ними, мы сблизим археологические данные с генетическими и лингвистическими. Тогда время дивергенции анатолийских и прочих индоеврпейских языков совпадет со временем первой миграции с Ближнего Востока на Северный Кавказ. :translation:`By allowing for the borrowing of Indo-European by the aboriginal people in the steps from migrants from the South and their mixing with each other, we get archeological and genetical evidence close to the linguistic evidence. As a result, the divergence of the Anatolian and the rest of the Indo-European languages would coincide with the time of the first migration from the Near-East to the Eastern Kaukasus.`|2|language mixture, lexical borrowing, Indo-European, language history, Urheimat 1750|Gruntov2015|The paper proposes a new classification of the Mongolic languages, based on lexical iso- glosses within the 110­wordlist of the basic lexicon. This classification takes into account the data of Middle Mongolian monuments, Mongolic languages of China and Afghanistan, and the data collected by the authors during their fieldwork in Inner Mongolia (China), Mongolia and Russia. Many late contacts, intra-Mongolic reborrowings and interferences make it very hard to establish the precise form of the Mongolic genetic tree on the basis of lexicon alone. However, lexicostatistical analysis helps to reveal some important features, including an early separation of Dagur, the intermediate position of Eastern Yughur and its secondary convergence with Khoshut, the specific affinity of Olet (which is phonetically and morpho- logically an Oirat dialect) to Khalkha, the peculiar position of Mangghuer in the Baoanic subgroup, the refinement of the position of Kangjia within Gansu-Qinghai languages.|000|Mongolian, phylogenetic reconstruction, genetic classification, concept list, dataset, Mongolic 1751|Morrison2009|In phylogenetic analyses, multiple sequence alignment often seems to be the poor cousin to tree build- ing. It has long been recognized that both primary homology assessment (alignment) and secondary homology assessment (tree building) can have important effects on phylogenetic analyses (Morrison and Ellis 1997), and this has been repeatedly demonstrated for both empirical and simulated data (see references cited by Morrison 2008; also Wong et al. 2008). However, most theoretical contributions to phylogenetic analysis continue to involve tree building alone. Indeed, even most review articles continue to treat alignment as being about automated bioinformatics procedures (Wallace et al. 2005; Edgar and Batzoglou 2006; Kumar and Filipski 2007; Notredame 2007) rather than about phylogenetics (Morrison 2006; Phillips 2006).|000|sequence alignment, evolutionary biology, homology 1752|Morrison2009|Study presents an account on the number of manually adjusted alignments in biology, which according to a paper survey, are really wide-spread, also due to the fact that automatic alignment is often flawed, and scholars do not trust it, especially since major evolutionary events are not in the algorithms that produce first-stage automatic alignments.|000|manual annotation, sequence alignment, evolutionary biology, annotation, manual correction 1753|Morrison2009|I contend that this is what criterion-based manual alignment is all about. In my literature review, the most highly cited paper reflecting this attitude was that of Kelchner (2000), although several updated papers now exist (see references cited by Morrison 2008; also Benavides et al. 2007). These papers provide a set of guidelines for manually aligning sequences based on explicitly identifying the molecular events that have led to the observed sequence variation. Gaps can be created by many different types of events, and these need to be distinguished, as the alignment might be different for different events (see the examples in Figs 2 and 3). Moreover, the molecular mechanisms are likely to occur with different frequencies in different sequence types; for example, tandem repeats are very common in both protein- and intron-coding sequences, whereas inversions are most commonly reported in noncoding [pb] sequences (Kim and Lee 2005). This means that it is likely to be inappropriate to apply a single similarity- based algorithm to sequences that code for proteins, structural RNAs, and also noncoding sequences.|156f|sequence alignment, gaps, modeling, evolutionary events 1754|Michaud2016|The aim of this book is to provide a description and analysis of the tone sys- tem of Yongning Na (also known as Mosuo), a Sino-Tibetan language spoken in Southwest China.|000|Yongning Na, Sino-Tibetan, tone, description, dataset, 1755|Hill2007|Traditionally three independent types of analogical change in inflectional para- digms are distinguished: proportional analogy, paradigmatic leveling and ana- logical extension. However, the investigation of the data reveals that out of these types only that of proportional analogy can be empirically verified, being sup- ported by clear evidence from languages with well documented history. More- over, as shown by data from Russian, Old High German dialects, Old Saxon, Old English, and Latin, even in the most secure cases of paradigmatic leveling or analogical extension found in the literature the assumption of proportional anal- ogy is either probable or cannot be excluded. Consequently the three traditional types of analogical change seem to differ with respect to their ontological status. On the one hand, paradigmatic leveling, i.e. the elimination of allomorphy in inflectional paradigms, is to be viewed merely as a motivation for change whose operating principle really is proportional analogy. On the other hand, analogical extension, i.e., the extension of already existing inflection forms through affixes with comparable function, seems to be just a possible way to describe the results of changes which, again, may in fact be instances of proportional analogy. These findings have the following implications for linguistic theory and practice. In practical work on inflectional morphology paradigmatic leveling and ana- logical extension without the use of proportional analogy can no longer be used in explanations on reconstructed stages of language development. All proposed explanations of this kind are to be supported by establishing an underlying pro- portional analogy or reconsidered if this is impossible.|000|analogy, analogical change, language change, morphological change 1756|Gray2011|Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the inte- gration of evidence from linguistics, anthropology, archaeology and genetics. Unfortunately, although the comparative method in historical linguistics can provide a relative chronology, it cannot provide absolute date estimates and an alternative approach, called glottochronology, is fundamentally flawed. In this paper we outline how computational phylogenetic methods can reliably estimate language divergence dates and thus help resolve long-standing debates about human prehistory ranging from the origin of the Indo-European language family to the peopling of the Pacific.|000|language evolution, dating, glottochronology, 1757|Wilson2011|Previous research has suggested a relationship between frequency of use (FoU) and language change (Pagel, Atkinson, & Meade, 2007), but its nature remains unclear. Two research questions were raised in this thesis: 1) whether FoU remains stable over time, 2) whether amount of language change over time can be predicted using FoU. A 1147-word subset of the IDS wordlist (Key & Comrie, 2007) was used to test these questions. The FoU of both Latin and Spanish, and amount of change for each word was measured. There was a lower correlation across time than cross-linguistically, but the effect of genre could not be removed. A weak, highly significant negative relationship between FoU and amount of change was identified, supporting the claim that high frequency words change less than low frequency words. There is an intriguing correlation between FoU and lexical change, but the causal mechanism is not yet understood.|000|language change, frequency, Latin, Spanish, concept list, morphological change, dataset, 1758|Michaud2012|Comparative data from several language families show that nasality can be transferred between a syllable-initial consonant cluster and the following vowel. The cases reported to date are summarized, and a new analysis is proposed for a set of Sino-Tibetan data. The evolution appears to go in both directions: from the consonantal onset to the following vowel in Tai-Kadai, Austroasiatic, Sino-Tibetan, Niger-Congo (Kwa) and Indo-European (Celtic), and from the vowel to the preceding consonant in Siouan. However, an examination of the conditions on these changes brings out an asymmetry. In most cases, transfers of nasality take place from a consonantal onset to a following vowel; the instances we found of a regular change in the opposite direction all come from languages where there is one of the following restrictions on nasal sounds: (i) nasal consonants are nonphonemic (contextually predictable), or (ii) the opposition between nasal and oral vowels is neutralized after nasal consonants (in favor of nasal vowels). |000|nasalization, Sino-Tibetan, 1759|Michaud2012|This is an important paper describing how nasality can spread from onset to other parts in a syllable.|000|nasalization, Sino-Tibetan 1760|Schuit2007|When one is describing a language, one of the typological aspects one may want to take into consideration is the determination of morphological language type. To what extent does the language under consideration allow affixation? Can several morphemes occur in a word? Answering these questions determines the synthetic ‘value’ of the language: where on the continuum of synthesis should it be placed, more to the isolating end, or more to the agglutinative end? If the language is placed more to the agglutinative end, one can determine the amount of fusion in the language. Again the language can be placed on a continuum: more fusing or more agglutinative. It will also be interesting to see whether the language allows for incorporation, and if so, to what extent.|000|sign language, morphology, description, typology 1761|Diessel2016|Until recently, theoretical linguists have paid little attention to the frequency of linguistic elements in grammar and grammatical development. It is a standard assumption of (most) grammatical theories that the study of grammar (or competence) must be separated from the study of language use (or performance). However, this view of language has been called into question by various strands of research that have emphasized the importance of frequency for the analysis of linguistic structure. In this research, linguistic structure is often characterized as an emergent phenomenon shaped by general cognitive processes such as analogy, categorization, and automatization, which are crucially influenced by frequency of occurrence. There are many different ways in which frequency affects the processing and development of linguistic structure. Historical linguists have shown that frequent strings of linguistic elements are prone to undergo phonetic reduction and coalescence and that frequent expressions and constructions are more resistant to structure mapping and analogical leveling than infrequent ones. Cognitive linguists have argued that the organization of constituent structure and embedding is based on the language users’ experience with linguistic sequences and that the productivity of grammatical schemas or rules is determined by the combined effect of frequency and similarity. Child language researchers have demonstrated that frequency of occurrence plays an important role in the segmentation of the speech stream and the acquisition of syntactic categories and that the statistical properties of the ambient language are much more regular than commonly assumed. And finally, psycholinguists have shown that structural ambiguities in sentence processing can often be resolved by lexical and structural frequencies and that speakers’ choices between alternative constructions in language production are related to their experience with particular linguistic forms and meanings. Taken together, this research suggests that our knowledge of grammar is grounded in experience.|000|frequency, grammar, typology 1762|Sapir1921|What, then, are the absolutely essential concepts in speech, the concepts that must be expressed if language is to be a satisfactory means of communication? Clearly we must have, first of all, a large stock of basic or radical concepts, the concrete wherewithal of speech. We must have objects, actions, qualities to talk about, and these must have their corresponding symbols in independent words or in radical elements. No proposition, however abstract its intent, is humanly possible without a tying on at one or more points to the concrete world of sense. In every intelligible proposition at least two of these radical ideas must be expressed, though in exceptional cases one or even both may be understood from the context. And, secondly, such relational concepts must be expressed as moor the concrete concepts to each other and construct a definite, fundamental form of proposition. In this fundamental form there must be no doubt as to the nature of the relations that obtain between the concrete concepts. We must know what concrete concept is directly or indirectly related to what other, and how. If we wish to talk of a thing and an action, we must know if they are coordinately related to each other (e.g., “He is fond of wine and gambling”); or if the thing is conceived of as the starting point, the “doer” of the action, or, as it is customary to say, the “subject” of which the action is predicated; or if, on the contrary, it is the end point, the “object” of the action. If I wish to communicate an intelligible idea about a farmer, a duckling, and the act of killing, it is not enough to state the linguistic symbols for these concrete ideas in any order, higgledy-piggledy, trusting that the hearer may construct some kind of a relational pattern out of the general probabilities of the case. The fundamental syntactic relations must be unambiguously expressed. I can afford to be silent on the subject of time and place and number and of a host of other possible types of concepts, but I can find no way of dodging the issue as to who is doing the killing. There is no known language that can or does dodge it, any more than it succeeds in saying something without the use of symbols for the concrete concepts.|V.14|basic vocabulary, conceptualization, basic level concepts 1763|Sapir1921|We may now conveniently revise our first classification of concepts as expressed in language and suggest the following scheme: *Basic (Concrete) Concepts* (such as objects, actions, qualities) : normally expressed by independent words or radical elements; involve no relation as such *Derivational Concepts* (less concrete, as a rule, than I, more so than III): normally expressed by affixing non-radical elements to radical elements or by inner modification of these; differ from type I in defining ideas that are irrelevant to the proposition as a whole but that give a radical element a particular increment of significance and that are thus inherently related in a specific way to concepts of type I *Concrete Relational Concepts* (still more abstract, yet not entirely devoid of a measure of concreteness): normally expressed by affixing non-radical elements to radical elements, but generally at a greater remove from these than is the case with elements of type II, or by inner modification of radical elements; differ fundamentally from type II in indicating or implying relations that transcend the particular word to which they are immediately attached, thus leading over to *Pure Relational Concepts* (purely abstract): normally expressed by affixing non-radical elements to radical elements (in which case these concepts are frequently intertwined with those of type III) or by their inner modification, by independent words, or by position; serve to relate the concrete elements of the proposition to each other, thus giving it definite syntactic form.|V.19|basic vocabulary, concepts, conceptualization 1764|Babych2016|This paper presents a methodology for calculating a modified Levenshtein edit distance between character strings, and applies it to the task of automated cognate identification from non- parallel (comparable) corpora. This task is an important stage in developing MT systems and bilingual dictionaries beyond the coverage of traditionally used aligned parallel corpora, which can be used for finding translation equivalents for the ‘long tail’ in Zipfian distribution: low-frequency and usually unambiguous lexical items in closely-related languages (many of those often under- resourced). Graphonological Levenshtein edit distance relies on editing hierarchical representations of phonological features for graphemes (graphonological representations) and improves on phonological edit distance proposed for measuring dialectological variation. Graphonological edit distance works directly with character strings and does not require an intermediate stage of phonological transcription, exploiting the advantages of historical and morphological principles of orthography, which are obscured if only phonetic principle is applied. Difficulties associated with plain feature representations (unstructured feature sets or vectors) are addressed by using linguistically-motivated feature hierarchy that restricts matching of lower-level graphonological features when higher-level features are not matched. The paper presents an evaluation of the graphonological edit distance in comparison with the traditional Levenshtein edit distance from the perspective of its usefulness for the task of automated cognate identification. It discusses the advantages of the proposed method, which can be used for morphology induction, for robust transliteration across different alphabets (Latin, Cyrillic, Arabic, etc.) and robust identification of words with non-standard or distorted spelling, e.g., in user-generated content on the web such as posts on social media, blogs and comments. Software for calculating the modified feature-based Levenshtein distance, and the corresponding graphonological feature representations (vectors and the hierarchies of graphemes’ features) are released on the author’s webpage: http://corpus.leeds.ac.uk/bogdan/phonologylevenshtein/. Features are currently available for Latin and Cyrillic alphabets and will be extended to other alphabets and languages.|000|cognate detection, edit distance, distinctive features, feature hierarchy 1765|Wedel2011|This article mentions similar directions in patterns of change resulting from self-organizational principles in phonology, something called "conspiracy in phonology" by some researchers.|000|systemic processes, phonology, directionality 1766|Wedel2016|All languages make systematic use of individually meaningless, contrastive categories in combination to create distinct words. Despite the central role that phonemic category contrasts play in the transmission of information, they can be lost through language change. The nearly century-old functional load hypothesis proposes that phoneme contrast loss is less likely the greater the ‘work’ that a contrast does in distinguishing words. In Wedel, Kaplan, & Jackson, under review, we showed that simple measures of functional load do significantly predict patterns of phoneme merger, and that this effect is in the hypothesized direction: the greater the number of minimal word pairs that a phoneme contrast distinguishes, the less likely is merger. Here, we extend that analysis to two additional lexical properties predicted to influence the uncertainty associated with minimal pairs. We present evidence that within our dataset, the number of minimal lemma pairs sharing syntactic category better predicts merger than those with divergent syntactic categories, and that the number of minimal lemma pairs with similar frequencies better predicts merger than those with divergent frequencies. These findings support the general variationist/usage-based/evolutionary research program, which assumes a causal chain linking properties of individual utterances to long-term change in the abstract, sublexical category system of a speech community.|000|functional load, phonotactics, phonology, frequency 1767|Lipka1986|Under the influence of psychological research on the nature of human categorization, especially that of Rosch (1977) and Rosch — Mervis (1975), an alternative theory of semantics has developed in recent years, as witnessed by Coleman —Kay (1981) and Geeraerts (1984), and criticized by Osherson — Smith (1981). Its linguistic roots can be found in Fillmore (1978) and the empirical work on denotational structure in Labov (1978). The new prototype theory has been labelled 'non-Aristotelian', as opposed to the classical feature approach to semantics which Fillmore termed the 'checklist theory of semantics'. In the following I will discuss whether this really represents an alterna- tive (cf. Fillmore 1975) or simply a division of labour.|000|semantic features, prototype theory, lexical semantics, introduction 1768|Lipka1986|Paper contrasts prototype theory with classical semantic feature theory.|000|prototype theory, semantic features, introduction 1769|Crocker2016|We introduce ID EA L (Information Density and Linguistic Encoding), a collaborative research center that investigates the hypothesis that language use may be driven by the optimal use of the communication channel. From the point of view of linguistics, our approach promises to shed light on selected aspects of language variation that are hitherto not sufficiently explained. Applications of our research can be envisaged in various areas of natural lan- guage processing and AI, including machine translation, text generation, speech synthesis and multimodal interfaces.|000|language variation, information theory, information density, surprisal 1770|Radick2016|A familiar story about mid-twentieth-century American psychology tells of the abandonment of behaviorism for cognitive science. Between these two, however, lay a scientific borderland, muddy and much traveled. This essay relocates the origins of the Chomskyan program in linguistics there. Follow- ing his introduction of transformational generative grammar, Noam Chomsky (b. 1928) mounted a highly publicized attack on behaviorist psychology. Yet when he first developed that approach to grammar, he was a defender of be- haviorism. His antibehaviorism emerged only in the course of what became a systematic repudiation of the work of the Cornell linguist C. F. Hockett (1916– 2000). In the name of the positivist Unity of Science movement, Hockett had synthesized an approach to grammar based on statistical communication the- ory; a behaviorist view of language acquisition in children as a process of as- sociation and analogy; and an interest in uncovering the Darwinian origins of language. In criticizing Hockett on grammar, Chomsky came to engage gradu- ally and critically with the whole Hockettian synthesis. Situating Chomsky thus within his own disciplinary matrix suggests lessons for students of disciplinary politics generally and—famously with Chomsky—the place of political discipline within a scientific life.|000|behaviorism, Noam Chomsky, Charles Hocket, history of science, Chomsky syntax 1771|Yadava2006|This paper presents a report on the indigenous languages of Nepal.|000|Nepal, Sino-Tibetan, overview, report, 1772|Costa2009|Text treats the problem of "language history" as a fiction that is invented by people and used accordingly. This is exemplified by pointing to the Scots language and he revitalization movement.|000|language history, revitalization, Scots, philosophy of science 1773|Grinevald2011|La thématique des langues en danger a été orchestrée par des linguistes professionnels il y a une vingtaine d’années. Elle a rejoint les préoccupations de certains sociolinguistes qui, dans les années 1970 (par exemple Fishman 1972 ; Lafont 1984), avaient élaboré une réflexion concernant la maintenance des langues minoritaires, ainsi que certains courants anthropologiques s’intéressant à la documentation de langues indigènes d’Amérique du Nord, à la suite des travaux fondateurs de Boas aux Etats-Unis. Enfin, cette thématique s’inscrit plus largement dans les champs des contacts de langue (voir Thomason 2001) et du multilinguisme. Cet article est lui-même le fruit d’une co-écriture entre une linguiste et un sociolinguiste.|000|language death, language endangerment, philosophy of science, language change, history of science, overview 1774|Grinevald2011|Article introduces historical aspects of language endangerment as well as apparently some philosophical reflections on the topic.|000|language endangerment, language death, philosophy of science, overview, history of science 1775|Waelchli2012|This paper discusses a multidimensional probabilistic semantic map of lexical motion verb stems based on data collected from parallel texts (viz. translations of the Gospel according to Mark) for 100 languages from all continents. The crosslinguistic diversity of lexical semantics in motion verbs is illustrated in detail for the domain of ‘go’, ‘come’, and ‘arrive’ type contexts. It is argued that the theoretical bases underlying probabilistic semantic maps from exem- plar data are the isomorphism hypothesis (given any two meanings and their corresponding forms in any particular language, more similar meanings are more likely to be expressed by the same form in any language), similarity s­emantics (similarity is more basic than identity), and exemplar semantics (e­xemplar meaning is more fundamental than abstract concepts).|000|lexical typology, semantic map, colexification 1776|Nakassis2015|The thesis of this essay is that linguistic anthropology is not the study of language. Rather, “language” functions as a permanently problematic, if indispensable, object for linguistic anthropological analysis and thought. This is because, as I suggest, the critical intervention of linguistic anthropology over the last 40 years has been its ethnographic focus on indexicality, in particular, the ways that indexical processes undermine language as an autonomous object, entangling it with other semiotic modalities and thereby displacing it beyond its putative borders. Reviewing linguistic anthropological scholarship from 2015, I argue that it is the study of this displacement and its more general semiotic implications—and the entangled and mutually informing analytics that have been developed to theorize them, for example, language ideology, entextualization, interdiscursivity, chronotope—that centers the field. Focusing on a set of such analytics, I illustrate how recent linguistic anthropological scholarship has elaborated the reflexive, dialectical nature of social life, theorizing what I call total semiotic facts. I explore these dialectics by discussing three thematic clusters that occupied the attention of the field in 2015: diversity and authenticity, political economy, and mass mediation.|000|anthropology, linguistics, overview, history of science, 1777|Bowern2015|The twenty-first Century has been billed the era of “big data”, and linguists are participating in this trend. We are seeing an increased reliance on statistical and quantitative arguments in most fields of linguistics, including the oldest parts of the field, such as the study of language change. The increased use of statistical methods changes the types of questions we can ask of our data, as well as how we evaluate the answers. But this all has the prerequisite of certain types of data, coded in certain ways. We cannot make powerful statistical arguments from the qualitative data that historical linguists are used to working with. In this paper I survey a few types of work based on a lexical database of Pama-Nyungan languages, the largest family in Aboriginal Australia. I highlight the flexibility with which large-scale databases can be deployed, especially when combined with traditional methods. “Big” data may require new methods, but the combination of statistical approaches and traditional methods is necessary for us to gain new insight into old problems.|000|big data, Australian Lexical Database, dataset, Australian languages 1778|Zwicky1973|The paper proposes an analogy between chemical molecules and concepts in linguistics, which are constituded from atoms (semantic primes) which are then composed to form higher meanings.|000|biological parallels, molecules, semantics, analogy, semantic primes, natural semantic metalanguage 1779|Casasanto2015|What does it mean for metaphors to be “embodied”? Here we describe an influential theory of embodied cognition according to which thoughts are implemented in perceptuo-motor simulations, in the brain’s modality-specific systems. This theory is invoked in nearly every paper on “embodied metaphor,” across linguistics, philosophy, psychology, and cognitive neuroscience. There appears to be overwhelming support for the conclusion that representations of metaphorical “source domains” are embodied in percep- tuo-motor simulations. Here we show, however, that when the data are evaluated appropriately there is very little evidence that metaphors are embodied in this sense. The kind of data that offer compelling support for the embodiment of concrete, literal ideas like “grasping the ball” are nearly absent for abstract, metaphorical ideas like “grasping the explanation.” There is now abundant evidence that metaphors structure our thoughts, feelings, and choices in a variety of conceptual domains. But evidence for meta- phorical mental representation is not necessarily evidence for embodiment. If any metaphorical source domains are embodied in modality-specific simulations, they may be the exception rather than the rule.|000|embodiment, metaphor, 1780|Casasanto2015|Paper brings apparently some evidence against the theory of embodiment.|000|embodiment, metaphor, semantic change, semantic similarity 1781|Jacques2016c|The diachronic analysis of person indexation systems in Sino-Tibetan (Trans-Himalayan) languages is currently a topical issue. Factual errors have occasionally crept in, detracting somewhat from the quality of the linguistic discussion about these systems. Evidence from Tangut, Gyalrongic and Kiranti is so central to the debates that it appeared useful to provide a few clarifications about their person indexation systems, adducing evidence from a body of texts that has been considerably enriched in the past decade. The main points made in this paper can be summarized as follows. First, the view that personal affixes derive diachronically from pronouns is by no means as self-evident as it may seem. Second, person indexation in Tangut, the oldest Trans-Himalayan language with person indexation, is not optional, as has sometimes been stated in the literature. Third, person indexation in Gyalrongic and Kiranti is sensitive to grammatical relations, a finding which calls into question its analysis as marking speech act participant involvement.|000|Tangut, Kiranti, Gyalrongic, person indexation, Sino-Tibetan 1782|Trask1993|**SAP** A tripartite classification of the principal NP arguments in transitive and intransitive clauses. An intransitive clause is viewed as having a single argument, the subject (S). A transitive clause is seen as having two arguments, the agent (A), and the patient (P) (or in some versions the object O), either one of which may be identified with S for grammatical purposes, producing either an **accusative ** pattern (A=S) or an **ergative** pattern (P=S). (Dixon 1972)|000|SAP, definition, 1783|Moore2015|This article presents an overview of the goals and methods of semantic typology, the study of the distribution of semantic categories across languages. Results from this field inform theoretical accounts of syntax-semantics interface phenomena, as well as the nature of the relationship between language and cognition. This article discusses a variety of quantitative methods that represent recent efforts in semantic typology to (i) discover patterns in the distribution of independent variables and (ii) predict the distribution of dependent variables in relation to identified independent variables. Such methods include Multi-Dimensional Scaling, Hierarchical Cluster Analysis, and Generalized Linear Mixed Effects regression analyses. We identify and discuss notable published examples of these methods used in semantic typology.|000|semantic typology, lexical typology, semantics, overview 1784|Grossman2015|Language change is a central concern for any linguistic theory. For one thing, it is often assumed that language change is explanatory, in that it provides a reasonable answer to what Haspelmath dubbed ‘Greenberg’s Problem’ in 2014: why are languages the way they are? A short version of the Greenbergian answer is: ‘Because they became that way through processes of language change.’ However, this sort of answer throws into focus the fact that language change is not only a potential explanation for language structures. Rather, it is a set of problems that itself calls for explanation. In fact, this could be called ‘Greenberg’s Second Question’: why do languages change the way they do? In this article, we explore some ways in which the field of experimental pragmatics might shed light on the second question, by providing a set of methods that could investigate existing hypotheses about language change by developing falsifiable predictions to be evaluated in experimental settings. Moreover, these hypotheses can provide new research questions and data for experimentalists to work on, beyond the rather restricted set of questions that experimental pragmatics has confronted to date.|000|language change, pragmatics, experimental phonetics, sound change, 1785|Ussishkin2003|Languages are known to make an important distinction with respect to the ways in which they treat words borrowed from other languages, or loanwords. Differential repair of loanwords allows us to segregate phonotactic restrictions in a given language into two classes: 1 A phonotactic restriction is violated in loanwords. 2 A phonotactic restriction is upheld in loanwords. This paper addresses the following questions: Is there anything systematic about the distinction between restrictions falling into classes (1) and (2), and if so, how can we explain it?|000|phonotactics, loanword integration, lexical borrowing 1786|Bomhard2016|There have been numerous attempts to find relatives of Proto- Indo-European, not the least of which is the Indo-Uralic Hypothesis. According to this hypothesis, Proto-Indo-European and Proto-Uralic are alleged to descend from a common ancestor. However, attempts to prove this hypothesis have run into numerous difficulties. One difficulty concerns the inability to reconstruct the ancestral morphological system in detail, and another concerns the rather small shared vocabulary. This latter problem is further complicated by the fact that many scholars think in terms of borrowing rather than inheritance. Moreover, the lack of agreement in vocabulary affects the ability to establish viable sound correspondences and rules of combinability. This paper will attempt to show that these and other difficulties are caused, at least in large part, by the question of the origins of the Indo-European parent language. Evidence will be presented to demonstrate that Proto-Indo-European is the result of the imposition of a Eurasiatic language — to use Greenberg’s term — on a population speaking one or more primordial Northwest Caucasian languages.|000|language contact, Indo-European, Uralic languages, Caucasian languages, phylogenetic reconstruction 1787|Pyysalo2016|More than a century after Hermann Møller's original formulation the laryngeal theory no longer has a viable model – and a theory. This permanent emperor has no clothes situation is a direct result of the incompatibility of the Proto-Indo-Semitic hypothesis underlying the laryngeal theory with the Old Anatolian data. The inconsistency can only be overcome if the assumptions of the laryngeal theory are simplified as follows: 1. The set of laryngeals LT `*`h 1 h 2 h 3 is reduced to a single ‘laryngeal’ PIE `*`H , phonetically a glottal fricative and coinciding with Hitt. ḫ as a segment. 2. The Proto-Indo-Semitic morphology C 1 C 2 ·C 3 is abandoned. In order to continue within a framework of a meaningful theory in Indo-European linguistics, a transition to monolaryngealism as initiated by Z GUSTA (1951) and developed by S ZEMERENYI (1996) or preferably its most recent, completely revised and upgraded version, the glottal fricative theory (@Pyysalo2013), is recommended for the scholars in field.|000|laryngeal theory, overview, Indo-European, Anatolian, sound change 1788|Pyysalo2013|Dissertation presents the basic idea behind the PIE Lexicon project: * http://pielexicon.hum.helsinki.fi Here, all etymologies with explicit sound laws of Indo-European are explicitly modeled.|000|laryngeal theory, Indo-European, sound change, sound law, transducer 1789|Hill2013|The aim of the paper is to demonstrate that language contact is a ma- jor factor in shaping inflectional morphology of closely related languages such as the Indo-European languages of Europe. It investigates the explanatory potential of two particularly promising mechanisms of copying inflectional patterns from one language into another. The first mechanism is the contact-induced reanaly- sis of inflectional forms with subsequent rearrangement of morphological units in the recipient language. The second mechanism is the contact-induced grammati- calisation of syntactic patterns. Contact-induced reanalysis and contact-induced grammaticalisation might be responsible for a substancial part of non-inherited similarities in the inflection of the Indo-European languages of Europe. At the sa- me time, these mechanisms also have the potential of explaining why inherited features in their inflection may be partly replaced or modified in the way we ob- serve.|000|language contact, borrowing, nominal inflection, Indo-European 1790|Wedel2007|Phonologies are characterised by regularity, from the stereotyped phonetic characteristics of allophones to the contextually conditioned alternations between them. Most models of grammar account for regularity by hypothesising that there is only a limited set of symbols for expressing underlying forms, and that an independent grammar algorithm transforms symbol sequences into an output representation. However, this explanation for regularity is called into question by research which suggests that the mental lexicon records rich phonetic detail that directly informs production. Given evidence for biases favouring previously ex- perienced forms at many levels of production and perception, I argue that positive feedback within a richly detailed lexicon can produce regularity over many cycles of production and perception. Using simulation as a tool, I show that under the influence of positive feedback, gradient biases in usage can convert an initially gradient and variable distribution of lexical behaviours into a more categorical and simpler pattern.|000|systemic processes, phonology, systematic aspect of evolution, 1791|Bowers2011|Participants read aloud swear words, euphemisms of the swear words, and neutral stimuli while their autonomic activity was measured by electrodermal activity. The key finding was that autonomic responses to swear words were larger than to euphemisms and neutral stimuli. It is argued that the heightened response to swear words reflects a form of verbal conditioning in which the phonological form of the word is directly associated with an affective response. Euphemisms are effective because they replace the trigger (the offending word form) by another word form that expresses a similar idea. That is, word forms exert some control on affect and cognition in turn. We relate these findings to the linguistic relativity hypothesis, and suggest a simple mechanistic account of how language may influence thinking in this context|000|swear words, linguistic relativity, Sapir-Whorff hypothesis 1792|GuevaraErra2016|Neural coding in the auditory system has been shown to obey the principle of efficient neural coding. The statistical properties of speech appear to be particularly well matched to the auditory neural code. However, only English has so far been analyzed from an efficient coding perspective. It thus remains unknown whether such an approach is able to capture differences between the sound patterns of different languages. Here, we use independent component analysis to derive information theoretically optimal, non-redundant codes (filter populations) for seven typologically distinct languages (Dutch, English, Japanese, Marathi, Polish, Spanish and Turkish) and relate the statistical properties of these filter populations to documented differences in the speech rhythms (Analysis 1) and consonant inventories (Analysis 2) of these languages. We show that consonant class membership plays a particularly important role in shaping the statistical structure of speech in different languages, suggesting that acoustic transience, a property that discriminates consonant classes from one another, is highly relevant for efficient coding.|000|consonants, sound patterns, communicative efficiency, language acquisition 1793|Cocho2015|Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.|000|rank frequencies, word frequency, lexical change, statistics, concept list 1794|Holzer1996|**Ohne die zeitliche Reihenfolge der Isoglossen zu kennen, kann man den Stammbaum einer Sprachenfamilie nicht zeichnen.** Da nun diese zeitliche Reihenfolge selten in bezug alf alle Isoglossen auf einem Gebiet bekannt ist, kann man auch nur selten den STammbaum einer Sprachenfamilie mit Sicherheit rekonstruieren, obwohl jede Sprachenfamilie aus logisch zwingenden und per definitionem in einem -- gegebenenfalls eben unbekannten -- Stammbaum organisiert sein muß. |22|family tree, wave theory, isoglosses 1795|Mulloni2006|Present-day machine translation technologies crucially depend on the size and quality of lexical resources. Much of recent research in the area has been concerned with methods to build bilingual dictionaries automatically. In this paper we propose a methodology for the automatic detection of cognates between two languages based solely on the orthography of words. From a set of known cognates, the method induces rules capturing regularities of orthographic mutations that a word undergoes when migrating from one language into the other. The rules are then applied as a preprocessing step before measuring the orthographic similarity between putative cognates. As a result, the method allows to achieve an improvement in the F-measure of 11,86 % in comparison with detecting cognates based only on the edit distance between them. |000|cognate detection, orthography, machine translation 1798|Collier2014|Shows a nice graphic on differences between syntax and phonology which could be used as a basis to further develop the idea of word formation applied to biology in a study of fruitful analogies in biology and linguistics.|000|phonotactics, phonology, morphology, syntax, language model 1799|Seifter2014|This master thesis discusses August Schleicher and his role as a connecting link between linguistics and natural sciences in the 19th century, thus it discusses the work of a very early interdisciplinary scholar. The work is interesting, but historically not necessarily thoroughly done, and more relying on secondary sources. Anyway, in order to get a quick insight into Schleicher's role in the 19th century, it may be interesting to give some of the chapters a proper read.|000|August Schleicher, interdisciplinary research, history of science 1800|Brendel1986|The concept of "words" in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri- to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.|000|biological parallels, nucleotide sequences, DNA sequence, morphology, vocabulary, 1801|Mueller2016|Identifying complexity-inducing factors is an essential precondition for the successful management of manufacturing complexity. However, established methods do not fully address the characteristics of semiconductor front-end manufacturing. This paper introduces Multiple Workflow Alignment (MWA) as a new approach to identify differences between products that increase semiconductor manufacturing complexity—so called complexity-inducing variety. MWA is an adaption of the biological sequence alignment algorithm ClustalW and is characterized by high accuracy and reliability. The resulting alignments can be used for logistic-oriented workflow-design and complexity measurement.|000|sequence alignment, multiple sequence alignment, logistics, interdisciplinary research, methodological transfer 1802|Mueller2016|Funny paper which aligns workflows in order to make them more efficient.|000|workflow, multiple sequence alignment, complexity, logistics 1803|DaCosta2016|In this paper we describe two main contributions in the ields of lexicography and Linked Open Data: a human corrected disambiguation, using the Princeton Wordnet’s sense inventory (PWN, Fellbaum, 1998), of Swadesh lists maintained in the Internet Archive by the Rosetta Project, and the distribution of this data through an expansion of the Open Multilingual Wordnet (OMW, Bond and Foster, 2013). The task of disambiguating word lists isn’t always a straightforward task. The PWN is a vast resource with many ine-grained senses, and word lists often fail to help resolve the inherent ambiguity of words. In this work we describe the corner cases of this disambiguation and, when necessary, motivate our choice over other possible senses. We take the results of this work as a great example of the beneits of sharing linguistic data under open licenses, and will continue linking other openly available data. All the data will be released in future OMW releases, and we will encourage the community to contribute in correcting and adding to the data made available.|000|concept list, Open Multilingual Wordnet, linked data, Swadesh list, 1804|Geisler2013|Among biologist as well as linguists, it is now widely accepted that there are many striking parallels between the evolution of life forms and the history of languages. Starting from the rise of language studies as a scientiic discipline in the early 19 th century up to today’s recent “quantitative turn” in historical linguistics, scholars from both disciplines have repeatedly pointed to similarities between the respective research objects in biology and linguistics. Of all these parallels, the use of family trees to model the differentiation of species (genomes and languages) is surely the most striking one. Methodically speaking, genealogical relations between lan- guages and species can both be visualized with help of bifurcating trees which in- dicate the splitting of ancestral into descendant taxa. Being developed indepen- dently in linguistics and biology (Hoenigswald 1963), the tree model suffered dif- ferent fates in both disciplines: While the reconstruction of phylogenetic trees suc- cessively became one of the key objectives in evolutionary biology, the tree model was controversially disputed in linguistics and – although to no time completely abandoned – never became a true part of the consensus.|000|language history, family tree, network, wave theory, history of science 1805|Bonfante1953|Linguistic problems of the past, and the problems of kinship of languages in particular, have attracted very little attention. Still, what our ancestors believed or thought or hoped about their own language, or the langauge of their ancestors or of their neighbors, about their differences and affinities, was important to them, not only were scholars interested in these questions: the thinking of the leading classes was also influenced by them, and also that of the peoples in general. It is true that the history of past science is largely the history of errors; but these errors -- even if they were errors, and they frequently are not -- still constitute the history of human thinking, and are therefore not alien to us. This paper will be devoted to the question of the kinship of the European languages. here the lack of information of our contemporary books and encycloedias is appalling. the work of centuries ahs simply been entirely forgotten or neglected. Our predecessors are accused more or less openly of an ignorance which is not theirs, but ours. Concerning the kinship of the Indo-European languages, most handbooks attribute the discovery to the XIXth century, startin with Bopp; some mention Schlegel, Rask, W. Jones, Adelung; a few (*very few*) got back to Cœurdoux and Sassetti. Names like Dante, Scaliger, Leibniz, Vico go unheeded. The material I will present in this paper, although far from being exhaustive, should give a much more complete picture of this question than is found in other works.|000|history of science, family tree, language history, language model, genetic relationship. Indo-European 1806|List2016f|This article investigates the terminology and the processes underlying the fundamental historical rela- tions between words in linguistics (cognacy) and genes in biology (homology). The comparison between linguistics and biology shows that there are major inconsistencies in the analogies drawn be- tween the two research fields and the models applied in phylogenetic reconstruction in linguistics. Cognacy between words is treated as a binary relation which is either present or not. Words, however, can exhibit different degrees of cognacy which go beyond the distinction between orthologous and paralogous genes in biology. The complex nature of cognacy has strong implications for the models used for phylogenetic reconstruction. Instead of modeling lexical evolution as a process of cognate gain and cognate loss, we need to go beyond the cognate relation and develop models which take the degrees of cognacy into account. This opts for the use of evolutionary models which handle multistate characters and allow to define potentially asymmetrical transition tendencies among the character states instead of time-reversible binary state models in phylogenetic approaches. The benefit of multi- state models with asymmetric transition tendencies is demonstrated by testing how well different models of lexical change perform in semantic reconstruction on a lexicostatistical dataset of 23 Chinese dialects in a parsimony framework. The results show that the improved models largely out- perform the popular gain–loss models. This suggests that improved models of lexical change may have strong consequences for phylogenetic approaches in linguistics.|000|language model, cognacy, homology, Chinese dialects, lexical change, partial cognacy 1807|List2015d|Little is known about the history of Chinese dialects. Major dialect groups were identified long ago using various traditional criteria, such as tonal and segmental development from their presumed common ancestor; however, scholarly agreement about their detailed development is largely lacking. At the core of the problem lies the role that language contact played in the history of Chinese. Unlike in the case of other language families, the Chinese dialects never really separated into distinct, independent languages, but kept evolving in close contact to each other. As a result, it is hard to tell whether traits shared among the dialects have been inherited or borrowed. Recent network approaches from a biological perspective could show a way out of this dilemma, since they were specifi- cally designed to handle vertical and horizontal aspects in bacterial evolution, and the first pilot studies in histori- cal linguistics have reported promising results. In this paper, a case study on the application of network approaches in Chinese historical linguistics is presented. Based on a dataset of 200 basic items translated into 23 Chinese dia- lect varieties, competing proposals for Chinese dialect classification are compared and tested for general plausibil- ity. The results of the comparison show that network approaches are a useful supplement for quantitative and qualitative approaches in Chinese historical linguistics. In order to reach their full potential, however, the underly- ing evolutionary models need to be more closely adapted to linguistic needs, and additional evidence, like geo- graphic information, needs to be taken into account|000|phylogenetic network, minimal lateral network, lexical borrowing, borrowing detection, Chinese dialects 1808|List2013a|Since long it has been noted that cross-linguistically recurring polysemies can serve as an indi- cator of conceptual relations, and quite a few approaches to model and analyze such data have been proposed in the recent past. Although – given the nature of the data – it seems natural to model and analyze it with the help of network techniques, there are only a few approaches which make explicit use of them. In this paper, we show how the strict application of weighted network models helps to get more out of cross-linguistic polysemies than would be possible using approaches that are only based on item-to-item comparison. For our study we use a large dataset consisting of 1252 semantic items translated into 195 different languages covering 44 different language families. By analyz- ing the community structure of the network reconstructed from the data, we find that a majority of the concepts (68%) can be separated into 104 large communities consisting of five and more nodes. These large communities almost exclusively constitute meaningful groupings of concepts into con- ceptual fields. They provide a valid starting point for deeper analyses of various topics in historical semantics, such as cognate detection, etymological analysis, and semantic reconstruction.|000|network, polysemy, colexification, partitioning, community detection 1809|Chao1924|Suffice it to say that there is more freedom then one would naturally suppose, as the tones are more suggested then actually represented by the melody compoused for the words; for instance, two syllables sung a minor third apart give the impression (to the Chinese ear at least) that the first syllable is a falling tone. The arrangement of tones is also simplified by the fact that unaccented and therefore toneless syllables may be set tu any note to suite the music.|10|Chinese, Chinese dialects, tone, music 1810|Haspelmath2014|We are planning to start an electronic journal (with open access) that publishes dictionaries of minor languages in database format, for which we need funding for a start- up phase of three years (for one scientist who coordinates the work in this initial phase). Collecting the words of as many languages around the world as possible is an important task of comparative linguistics, but there is currently no good way of publishing high- quality dictionary data of minor languages. Existing online dictionaries are not regular refereed publications and thus neither guarantee high quality nor do they contribute to career-building. They are often made available outside a stable institutional context and usually put their entries on HTML pages that emulate the format of print dictionaries, rather than exploiting the possibilities of electronic publication (database publication, using the principles of Linked Data). The technical prerequisites for a database publication of dictionaries already exist at our institutions (including guarantees for maintenance way into the future). Due to the enormous achievements of international language documentation programmes, many researchers have compiled dictionaries that they would be interested in publishing but lack serious publication options. Thus, all we need is funding for a start-up phase, to test the database format, set up an editorial board, establish a workflow, publish a number of seed dictionaries and advertise the journal among the community of minor language researchers.|000|dictionary, electronic resources, CLLD, linked data, cross-linguistic study 1811|Chen2009|本文的特色可 总结为三点: 一 是 以斯瓦迪 士 (Swadesh) 100 核心词 为纲 , 对侗台语 100 核心词进行全面 、 系统 的研究 , 这样 的研 究 目前还 不多见 。二是讨 论 中除 了将侗 台语与汉 语 、 藏缅 语进行 比较 外 , 我们 还把南 岛语系 的 印尼语 也作 为一 种参 与 比较 的主要语 言 , 尽量挖掘 印尼语 中的关系词 族 。三 是把 词义 比较作 为一 个重 要 内容 , 将 比较 纳入语义 类型学 的视野 , 与其 他语系语 言进 行 比较 , 看词义 发 展有 哪些共 同的演变 方 向 , 体现 了怎样 的规律 。这 使我们寻找 同源 词 的过 程具有 了 更强 的操 作性 。 :translation:`This article has three goals: The first is to start from Swadesh's 100 basic words in order to carry out a broad and systematic study of Kam-Tai languages, which has rarely been done up to now. Second is to add Indonesian as an Austronesian language to our comparison with Chinese and Tibeto-Burman to search for cognates. Third is to strenghten the importance of semantic comparison, and to compare it with semantic typology, thereby comparing different languages and determining common directions of development and regularities. This plays an important role when searching for words with a common origin.` |000|Sino-Tibetan, Kam-Tai, Swadesh list, concept list, comparative wordlist 1812|Deng2006|With the etymological statistics and the molecular anthropological method, the paper mathematically classifies the Sino-Tibetan languages and dialects and decribes their genetic relationship. The distance relationship is indicated by the branch lengths of the constructed trees, which shows the clustering of the languages and their separation in hierarchies. In addition, the time dephts of the languages in the family are estimated and a hypothesized process of the formation of the language family is given. [...] This paper claims that the 100 words in Swadesh's list can be used as the classification standard of the Sino-Tibetan languages and dialects, and thus contributes to linguistic theories. [...] :comment:`Some incomprehensible stuff follows: basically, there is a tree and a date of Sino-Tibetan in this paper.`|000|Sino-Tibetan, genetic classification, lexicostatistics, glottochronology, dating, Swadesh list, concept list, 1813|Deng2006|More info on this paper: * essentially uses Swadesh 100 list (@Swadesh1955) * uses Neighbor Joining (@Saitou1987) as cluster algorithm applied to pairwise language distances (p 15) * not really clear which languages are part of the investigation, needs to be checked thoroughly * creates a new Swadesh list of 111 concepts * studies both Sino-Tibetan, Hmong-Mien, Mǐn dialects, and Kam-Tai * uses distance matrices and tries to infer dates from them|000|Sino-Tibetan, Swadesh list, Neighbor-Joining, genetic classification 1814|Deng2006|:comment:`pointing to problematic concepts in Swadesh-100 for Hmong-Mien:` * 词目第12, “ 沙 ”, 苗瑶语反映为明显的汉语借词 , 替换为 Swadesh 的后 100 词的“ 盐”。:translation:`Swadesh 12 "sand" is an obvious borrowing from Chinese, and we replace it with "salt".` * 第 26 “ 毛” 和第 27 “ 发”, 苗瑶语为同一个同源词 :translation:`Swadesh 26 "hair" and "feather" go back to the same word in Hmong-Mien.` * 第 29 “ 眼 ” 苗瑶语反映形式为明显的古汉语借词 , 所以 , 我们用 Swadesh 后 100 词的 “大 腿 ” 替换。 :translation:`Swadesh 29 "eye" is an obvious loan from Old Chinese in Hmong-Mien, and we use "leg" from the larger Swadesh list instead.` * 第 41 “ 腹 ”, 苗瑶语中, “ 肚子”、 “ 腹部”、 “ 肠子”, 词义交叉 :translation:`Swadesh 41, "belly", has very many different expressions, like "stomach", "belly", "intestines", in Hmong-Mien.` * 第 50 苗瑶语没有 指称 “ 女人 ” 的词 , 我们用 “ 姑娘 ” 替换 :translation:`Hmong-Mien doesn't have a word for Swadesh 50 "woman", so we took "girl".` * 第 53 “ 听 ”, 苗瑶语 “ 听 ” 跟 “ 听见 ” 是两个不同来源的词 , 所以分立 :translation:`Hmong-Mien has two distinct words "listen" and "hear" for Swadesh 53 "hear", so we use both of them.` [pb] * 第 54 “ 看 ”, 苗瑶语“ 看 ” 跟“ 看见 ” 也是两个不同来源的词, 所以分立。:translation:`Swadesh 54 "to see" has "to look" and "to see" in Hmong-Mien, and we split it.` * 第 63 “ 泅 ”, 苗瑶语反映为明显的晚期汉语借词 , 替换为“ 潜水”。:translation:`Swadesh 63 "to swim" is a clear borrowing from Chinese, and we replace it with "to dive"` * 第 68 “ 睡 ” 和第 100 “ 躺 ”, 苗瑶语为同一个同源词。:translation:`Swadesh 68 "to sleep" and 100 "to lie down" have the same root in Hmong-Mien.` * 第 80 “ 冷 ” 瑶语支分别有两个不同来源的形式 , 所以 , 分立为 “ 冷 ” 和 “ 冷 ” :translation:`Swadesh 80 "cold" has two words of different origin in Hmong-Mien and we distinguish "cold 1" from "cold 2".` 此外 , 我们增补雅洪托夫 35 词有 , 而 Swadesh 100 词没有的 “ 盐 、 风 、 年龄 ”, 还有几个较重要的跟苗瑶语区域经济生态环境密切相关的基本词汇“ 猪、水獭 、穿山 甲 、 摘(猪草)”, 这些都反映苗瑶语民族原始的采集生活和自然环境的特点 。 :translation:`Apart from this, we added from Jachontov's 35 words those, which did not occur in Swadesh's 100 words, like "salt", "wind", "year". Furthermore, we also added some important environmental or economic terms in the Hmong-Mien area, like "pig", "otter", "pangolin", "ragweed". These are reflect the original Hmon-Mien peoples' culture and living environment.` |21f|Swadesh list, concept list, Hmong-Mien, Sino-Tibetan, 1815|Deng2006|:comment:`Sino-Tibetan family trees in different versions discussed in some detail with figures.`|38f|Sino-Tibetan, family tree, genetic classification 1816|Deng2014|Systematic statistical analysis of the rhymes in the "Book of Songs" can help us summarize the types and rules for hyme containted therein. [...] One of the major problems in past phonological analysis was that rhyming theory was understood differently by different schlars. STudy of the rhyme contained in the Book of Songs is the first step in starting a discussion about the nature of rhyme. [...] 1. Book of Songs rhythm. [...] Wang Li's "Shijing Yundu" were in need of revision. [...] 2. Book of Songs Tongyun and Heyun. [...] It was found that that syllable tail harmony is the essential condition of Tongyun and Heyun. 3. Rhyming essentials. The similarities and differences of "rhyme" in "rhyming" and in "phonology" were analyzed. [...] Poems including ancient, modern, Chinese, and foreign were used to determine that the harmony of rhyme is derived speech perception system specific to each society. It was further shown that the nature of rhyme is "harmony" of syllable tail characteristics and that rhyme is the basis of "yun" (rhyme) in the field of phonology. Syllable tail characteristics reflect the changes of a particular phonological system and are the key to the study of ancient Chinese rhyme. 4. Problems related to ancient Chinese rhyme sound construction. [...] The first is whether a rhyme class can have more than one vowel. [...] more than vowel is fundamental to sound consruction and is not antagonistic to single vowel constructions. [...] Second, the composition of sound in various constructions were analyzed. The sound composition used by several scholars was analyzed and the acceptance standard sued for ancient Chinese sound composition is discussed. 5. Ancient Chinese Yinsheng construction. A comparison between the Yinsheng and Rusheng rhyming pattern in the Book of Songs and the use of "a" in Tibetan literature. Use of rhyme in modern folk songs and the harmony of rhyme between elements were also taken into consideration. It is argued that syllables ending with vowel can rhyme with the syllables ending with stop coda. Therefore, there is no need to reconstruct a stop coda for Yinsheng rhyme groups. 6. Ancient Chinese tones. A commentary is made about which of three theories had the greates influence on the origin and development ancient Chinese tones: initial sound characteristic transfer, the length of the vowels and their elasticity, consonantts coda dissapearance. Statistical analysis of the Book of Songs rhyme material is used to reconstructed the tones of th eperiod of the Book of Songs as well as to deduce the evolution of ancient Chinese tones.|000|rhyme analysis, Old Chinese, Shījīng, rhyme patterns, 1817|Wang2010b|Polysyllabic words in Chinese and Tibetan are the research object of this thesis. Their word-formation is re-classified according to the classification of word building and coinage. The discussion mainly focuses on word formation by compounding, derivation, word reduplication and phonetic-based word formation. [...] Finally, the thesis probes into the feasibility of comparative study on word-formation of Chinese and Tibetan. It points out that word-formation which is very stable can be separated from grammar structure and there are some common features between Chinese and Tibetan in word-formation of polyseyllabic words, for example, monosyllabism, common compound word structure and syntactic structure. Besides, morphemes in both Chinese and Tibetan possess high independence and composability. the paper also compares the word-formation in Chinese and Tibetan with compounding, derivation, word reduplication and phonetic-based word formation in languages from other families so as to explore common features of polysyllabic words in Chinese and Tibetan from a wider perspective and provide some clues for the kin relationship of Sino-Tibetan family.|000|Sino-Tibetan, word formation, Tibetan, Chinese, polysyllabic words, morpheme 1818|Tian2007|In this paper, we introduce in brief the basic conditions of the Sino-Tibetan data resources, the STEDT project (the Sino-Tibetan Etymological Dictionary and Thesaurus) at the University of California, Berkeley and the STDP (The Sino-Tibetan Database and Retrieval System Project) at the Chinese Academy of Social Sciences (CASS), including the data structures, data volumes, and retrieval methods. We also discuss interdisciplinary information on the origin of East Asian civilization, which consists of several disciplines, including linguistics, molecular biology, human genetics, and archaeology.|000|dataset, Sino-Tibetan, database, STEDT, STDP 1819|Tsetkov2016|Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a “donor” language to a “recipient” lan- guage as a result of contacts between communities speaking different languages. Borrowed words are found in all languages, and—in contrast to cognate relationships—borrowing relationships may exist across unrelated languages (for example, about 40% of Swahili’s vocabulary is borrowed from the unrelated language Arabic). In this work, we develop a model of morpho-phonological transformations across languages. Its features are based on universal constraints from Optimality Theory (OT), and we show that compared to several standard—but linguistically more naïve—baselines, our OT-inspired model obtains good performance at predicting donor forms from borrowed forms with only a few dozen training examples, making this a cost-effective strategy for sharing lexical information across languages. We demonstrate applications of the lexical borrowing model in machine translation, using resource-rich donor language to obtain translations of out-of-vocabulary loanwords in a lower resource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.|000|lexical borrowing, transducer, adaptation, automatic approach 1820|Tsetkov2016| Interesting approach, especially since they use a four-stage pipeline which might be worth giving a look (although their approach is only bi-lingual, so no possibility to model borrowing in a larger scale of language history/evolution): 1. Pronunciation (convert alphabet to sounds) 2. Syllabification (extract syllable models, very good idea, indeed) 3. Phonological adaptation (very nice) 4. Morphological adaptation They then use some methods to learn constraint weights and follow some OT scheme, but the main idea seems to be even more important.|000|optimality theory, loanword integration, lexical borrowing, adaptation, syllable structure 1821|Parkvall2008|This paper presents an algorithm intended to quantify the diachronic stability of linguistic charac- teristics. It is argued that a linguistic feature whose presence or absence is best predicted by language families is a stable feature. Conversely, a feature that correlates better with geographical areas than with families is one that is sensitive to diffusion. Contrasting the structural heterogeneity within families with that found within geographical areas, it is thus possible to make a statement regarding the varying diachronic stability of specific features. While the main aim of the paper is methodological exploration, and while the method certainly not devoid of problems, I propose that the current approach can be useful in studies of language contact and long-range historical comparison.|000|stability, automatic approach, WALS, feature, grammatical feature, 1822|Parkvall2008|Paper ranks features in WALS, so contradicting the title, where one could expect that it is a basic treatment of stability, including lexicon and paradigms, so something that might be quoted in the context, but nothing that actually offers a solution.|000|stability, WALS, grammatical feature, typology 1823|Weiss2014|The Comparative Method (CM) is the systematic process of reconstructing the segmental and suprasegmental inventory of an ancestral language from cognate refl exes in the genetically related daughter languages. On the basis of the lexemes in which these refl ex segments are embedded one may also reconstruct morphemes and lexemes. Since Schleicher (1852) it has been the key tool for investigating linguistic prehistory. Textual records for languages are in most parts of the world very recent, and even the earliest written records go back only a shallow 5,000 years or so. Thus if we were entirely dependent on written records little historical linguistics could be done on many languages and we would have no clear idea of the inter-relationship and prehistory of even relatively shallow language families like Slavic or Germanic. Fortunately, CM allows us to reach much greater time depths and reconstruct ancestral languages that existed thousands of years before the advent of writing.|000|comparative method, methodology, historical linguistics, introduction 1824|Weiss2014|Good introduction to the comparative method. Interesting point in this paper is the passage on presuppositions: * Regularity (5.1) * Directionality (5.2) This is interesting in so far as it contains both the two points which were discussed in @Chacon2015. Further interesting point are the limitations mentioned: * complete loss (7.1) * time depth (7.2) * convergence (7.3) It also contains a passage on evaluation of reconstructed systems (very important!).|000|comparative method, introduction, methodology, historical linguistics 1825|Weiss2014|If the regularity hypothesis is correct, then CM should in theory be universally applicable. Nevertheless, there are many factors that can make the application of CM challenging or virtually impossible. The most obvious limitation is the absence of comparative data. Language isolates are not uncommon and only fairly shallow linguistic prehistories can be discovered for them (see Campbell in press). To take the best known isolate, Basque does not have an extensive linguistic prehistory since we are limited to the evidence of internal reconstruction, and the examination of Latin loanwords.|136|regularity hypothesis, comparative method, universality 1826|Weiss2014|The system reconstructed by CM, an entity bearing some relation to a prehistoric grammar, should be subject to whatever constraints I-language is universally subject to. But the evaluation of the typological plausibility of reconstructed systems has often been clouded by confusion between those constraints that are truly features of the computational component of the human genetic language endowment and those prevalent tendencies that result from the diachronic fi lter. A case in point would be fi nal devoicing of obstruents. It is undeniable that fi nal devoicing is a very widespread phenomenon, but is this because fi nal devoicing is a realisation of some innate cognitive principle, e.g. the emergence of the unmarked in positions of neutralisation, or is it because there are multiple phonetic factors conspiring to prefer word-fi nal devoicing? If the former is the case then the fi nal voicing of obstruents would be excluded and any reconstructed language with such a feature (like Proto-Indo- European apparently where the evidence of Italic, Anatolian and Indic appears to point to the generalisation of voiced stops in word-fi nal position) would require serious revision, for example by reinterpreting the apparent voice feature as something more phonetically plausible. On the other hand, if the explanation for the prevalence of fi nal devoicing is phonetic bias, and if there are also possible diachronic pathways to fi nal voicing, then the reconstruction of such an unusual phenomenon cannot be excluded.|139|evaluation, linguistic reconstruction, comparative method, 1827|Weiss2014|The Comparative Method (CM) is the systematic process of reconstructing the segmental and suprasegmental inventory of an ancestral language from cognate refl exes in the genetically related daughter languages.|127|comparative method, definition 1828|Chirkova2006|In the number of its speakers, Tibeto-Burman is one of the largest language families in the world. The language family, however, has received little scholarly attention and its composition and history remain poorly understood. Many languages are still awaiting detailed documentation and description – a task that is becoming urgent as smaller languages fall victim to socio-economic and demographic pressures. Given the dazzling linguistic diversity and sheer number of languages yet to be studied, a thorough understanding of the Tibeto-Burman language family poses great challenges. One complicating factor is that presently available data are scattered, making an overview of the family and adequate historical comparisons unfeasible.|000|Sino-Tibetan, Tibeto-Burman, database, introduction 1829|Castro2015|The Sui are an official minority group of China with a population of around 430,000. The Sui language belongs to the Kam-Sui branch of the Tai-Kadai language family. This research presents the results of a large-scale survey of the Sui language. The goals of the survey were to document the Sui dialect situation and to elucidate the genetic and synchronic relationships among the various dialects. A 600-item wordlist was collected from sixteen data points. Comprehension of four Sui dialects was also tested across the region using Recorded Text Tests (RTTs). Based on the primary data and on Thurgood’s (1988) Proto-Kam-Sui, and drawing on related languages as well as various other reconstructions, this work first charts the historical development of the Sui dialects. Secondly, it maps out dialect groups based on synchronic lexical similarity. Thirdly, it applies Levenshtein distance analysis to the data and describes the resulting dialect groups. Finally, it describes and discusses RTT methodology and results. The report also includes wordlist data, phonology sketches for each data point, and interlinearised RTT texts. The results show that Sui has four main dialects and that one of these is genetically more closely affiliated to Kam than to Sui. Additionally, three dialects have low mutual intelligibility. Mutual intelligibility of dialects is not completely predicted by genetic relationships. Some items of particular interest in this work include: 1) evidence for sesquisyllabic words in Proto- Kam-Sui (supporting Pittayaporn 2009); 2) evidence for the areal diffusion of tonal flip-flops (Yue- Hashimoto 1986); and 3) an innovative L2 retelling method for testing inter-dialect intelligibility. This report was originally published in print under the title of Shuiyu Diaocha Yanjiu by Guizhou People’s Press, 2014. The English version of this book is published with permission of Guizhou People’s Press. The maps in this book were produced by the author. Maps 6.1, 6.2, and 7.1 were generated using Creative Commons Gabmap software. (John Nerbonne, Rinke Colen, Charlotte Gooskens, Peter Kleiweg, and Therese Leinonen (2011). Gabmap — A Web Application for Dialectology. Dialectologia Special Issue II.)|000|Sui languages, Tai-Kadai, Kam-Sui, dataset, dialect data, concept list, 1830|Chacon2015|There has been much debate regarding the internal history of the Tukanoan languages dur- ing the last four decades, with different classification proposals being based on lexical and phonological data. Here, we present a new classification of the Tukanoan language family based on an improved computational approach which infers phylogenetic trees from pro- posed sound change patterns. In contrast to traditional methods based on the manual identi- fication of shared innovations by experts, our method identifies valid innovations within a parsimony framework. In contrast to existing computational models which are mostly based on binary character states for lexical data, we model sound change patterns as directed weighted transitions between multiple character states. We apply the new approach to a set of 21 extant Tukano languages. Our results confirm the east-west split of the Tukanoan lan- guages which was proposed in the past and suggest a classification which groups Kubeo with Tanimuka on the one hand, and Koreguahe with Maihiki, on the other hand, thus rec- onciling previous classifications. We use this new classification to propose a consensus phy- logeny of Tukanoan in which all automatically inferred shared innovations were manually checked and uncertainties are explicitly displayed.|000|Tukano languages, phylogenetic reconstruction, sound change, directionality, 1831|Chacon2015|The common idea of subgrouping in classical historical lin- guistics and subgrouping in cladistics is that only shared innovations can be used to identify a valid clade. While this view is sound and just on the first sight, it is essentially circular for a couple of reasons. First, the identification of innovations defining a given subgroup requires knowing what was the ancestral state before the innovation occurred. This makes subgrouping vulnerable as it depends on the criteria used for linguistic reconstruction, which can vary considerably from one linguist to another even when dealing with the same sound correspondences. The most extreme risk is for true innovations getting analyzed as retentions and vice-versa. The only way to circum- vent the problem of circularity is to know the direction underlying the processes under investiga- tion. Directionality is, however, only a necessary condition for the identification of shared innova- tions, since it is likewise possible that directional process occur independently or are propagated by horizontal transmission. Second, subgrouping may bear the risk of circularity if linguists em- ploy sub-grouping hypotheses when doing their reconstructions, since, in common practice, many linguists often have a certain tree topology in mind when refining their reconstructions. Third, [pb] many sound innovations do not occur only once in a well defined subset of languages. In fact, the most common types of sound change have a greater probability to occur multiple times in the evolutionary history of a linguistic family (homoplasy), not to mention cases of phonologi- cal borrowing (lateral transfer). In many cases it is a priori impossible to say whether a sound change is a “true” shared innovation (a singular evolutionary event) or an instance of inde- pendent innovations (multiple evolutionary events). In order to separate the wheat from the chaff, linguists need to distinguish which sound innovations are more reliable for subgroup- ing than others. This introduces, however, a further problem, since sound changes which are more reliable for subgrouping need to be rare. Rare sound changes, however, are difficult to observe and study, and the risk that they are merely based on wrongly proposed cognate sets or incorrectly interpreted assessments of directionality is very high (@Harrison2003). So no matter what we do in traditional subgrouping, as long as we try to use shared innovations, we are confronted with problems of circularity, epistemology, and objectivity.|182f|phylogenetic reconstruction, genetic classification, circularity, subgrouping, sound change, shared innovation 1832|Wagner2010|Identical rhymes (right/write, attire/retire) are considered satisfactory and even artistic in French poetry but are considered unsatisfactory in English. This has been a consistent generalization over the course of centuries, a surprising fact given that other aspects of poetic form in French were happily applied in English. This paper puts forward the hypothesis that this difference is not merely one of poetic tradition, but is grounded in the distinct ways in which information-structure affects prosody in the two languages. A study of rhyme usage in poetry and a perception experiment confirm that native speakers' intuitions about rhyming in the two languages indeed differ, and a further perception experiment supports the hypothesis that this fact is due to a constraint on prosody that is active in English but not in French. The findings suggest that certain forms of artistic expression in poetry are influenced, and even constrained, by more general properties of a language.|000|poetry, perception, rhymes, information structure, English, French 1833|Lee1996|This study investigated the effects of linguistic experience on tone perception. Both Cantonese (in Experiment 1) and Mandarin (in Experiment 2) tones, including both lexical and nonlexical tones, were presented to three groups of subjects: Cantonese, Mandarin, and English native speakers. Subjects were asked to determine whether two auditorily presented tones were the same or different. The interval between the presentation of the two tones, and the level of interference during this interval, were manipulated. Native speakers did better at discriminating tones from their own languages than the other two groups of subjects, for both lexical and nonlexical tones. Subjects did worst when they were required to count backward during the interstimulus interval. Cantonese speakers were better than both Mandarin and English speakers at discriminating Cantonese tones, and there was no difference between Mandarin and English speakers, except in one condition. Mandarin speakers did better than both Cantonese and English speakers, and Cantonese speakers did better than English speakers, at discriminating Mandarin tones. Results are discussed in terms of the effects of language background, differences between Cantonese and Mandarin tones, and the nature of encoding in short-term memory.|000|perception, tone, Cantonese, Mandarin, 1834|Kaan2008|**BACKGROUND:** Tone languages such as Thai and Mandarin Chinese use differences in fundamental frequency (F0, pitch) to distinguish lexical meaning. Previous behavioral studies have shown that native speakers of a non-tone language have difficulty discriminating among tone contrasts and are sensitive to different F0 dimensions than speakers of a tone language. The aim of the present ERP study was to investigate the effect of language background and training on the non-attentive processing of lexical tones. EEG was recorded from 12 adult native speakers of Mandarin Chinese, 12 native speakers of American English, and 11 Thai speakers while they were watching a movie and were presented with multiple tokens of low-falling, mid-level and high-rising Thai lexical tones. High-rising or low-falling tokens were presented as deviants among mid-level standard tokens, and vice versa. EEG data and data from a behavioral discrimination task were collected before and after a two-day perceptual categorization training task. **RESULTS:** Behavioral discrimination improved after training in both the Chinese and the English groups. Low-falling tone deviants versus standards elicited a mismatch negativity (MMN) in all language groups. Before, but not after training, the English speakers showed a larger MMN compared to the Chinese, even though English speakers performed worst in the behavioral tasks. The MMN was followed by a late negativity, which became smaller with improved discrimination. The High-rising deviants versus standards elicited a late negativity, which was left-lateralized only in the English and Chinese groups. **CONCLUSION:** Results showed that native speakers of English, Chinese and Thai recruited largely similar mechanisms when non-attentively processing Thai lexical tones. However, native Thai speakers differed from the Chinese and English speakers with respect to the processing of late F0 contour differences (high-rising versus mid-level tones). In addition, native speakers of a non-tone language (English) were initially more sensitive to F0 onset differences (low-falling versus mid-level contrast), which was suppressed as a result of training. This result converges with results from previous behavioral studies and supports the view that attentive as well as non-attentive processing of F0 contrasts is affected by language background, but is malleable even in adult learners.|000|tone, perception, Thai, Mandarin, English 1835|Hoth2016|The Freiburg speech intelligibility test according to DIN 45621 was introduced around 60 years ago. For decades, and still today, the Freiburg test has been a standard whose relevance extends far beyond pure audiometry. It is used primarily to determine the speech perception threshold (based on two-digit numbers) and the ability to discriminate speech at suprathreshold presentation levels (based on monosyllabic nouns). Moreover, it is a measure of the degree of disability, the requirement for and success of technical hearing aids (auxiliaries directives), and the compensation for disability and handicap (Königstein recommendation). In differential audiological diagnostics, the Freiburg test contributes to the distinction between low- and high-frequency hearing loss, as well as to identification of conductive, sensory, neural, and central disorders. Currently, the phonemic and perceptual balance of the monosyllabic test lists is subject to critical discussions. Obvious deficiencies exist for testing speech recognition in noise. In this respect, alternatives such as sentence or rhyme tests in closed-answer inventories are discussed.|000|speech norms, naming test, audiometry, listening comprehension, 1836|Pilling2004|Native speakers of two languages (English and Ndonga) were compared on three colour cognition tasks (sorting, triads and visual search) in a test of the linguistic relativity hypothesis (Whorf, 1956). The colour lexicons of these two languages differ because Ndonga has no basic terms for ORANGE, PINK and PURPLE, and stimuli were chosen to exploit this difference. On the sorting task (sorting into similarity-groups) for each language, nominally similar colours were grouped together more often than nominally dissimilar colours. On the triads task (choosing the most different of three colours), when the most nominally isolated colour differed for the two language-groups, each group tended to choose their nominal isolate. On the search task (scanning for target colours among distractors), targets were either in a different English category than distractors (cross-category), or some distractors were in the same English category as distractors (within-category). The 'cost' in speed of having within-category distractors was much greater for the English than for the Ndonga. Overall, these data suggest that a core universal component is modulated by a small relativist influence. The differences in the visual search task are consistent with language affecting pre-attentive processes (an indirect language effect) as well as exerting on-line influences (a direct effect).|000|linguistic relativity, color terms, English, Sapir-Whorff hypothesis, Ndonga 1837|Kraxenberger2014|The present paper aims to approximate an understanding of poetry’s distinctiveness and its specific modes of operation in regard to the poetic text as well as its perception. Here it follows the writings of Roman Jakobson, and distinguishes between poetic and prosaic language use with respect to the communicative function in poetry, textual defamiliarizing effects, and their functional and perceptual consequences. By referring both to Victor Shklovsky’s psychological concept of ostranenie and Jakobson’s model of language functions, special attention is paid to the location of emotions in poetic texts. Given this background, specific factors of communication as well as distinct emotional modes of operation can be distinguished. This is discussed using ratings of the poem Letzte Wache by Georg Heym that are taken from a survey study on emotional classification and aesthetic evaluation of poetry. Finally, extra-textual consequences of poetic devices are addressed and the fruitfulness of Jakobson’s writings for contemporary, interdisciplinary approaches is stressed to highlight the enduring relevance of his ideas.|000|poetic function, Roman Jakobson, aesthetics, language use, language and communication 1838|Li2006|Researchers studying strata in the phonological history of Chinese dialects should note the following: First, some historical strata emerged because of language contact. Second, beside the evolution of phonological categories themselves, it is also important to trace the actual sound change from one pronunciation to another. Third, primary (i.e. covering most characters in a category) and secondary strata should be differentiated, and the latter should not be blown out of proportion. Fourth, researchers should find out how the various strata in a dialect constitute a single synchronic system. The four points in discussion are all illustrated with examples from Min dialects.|000|lexical strata, stratification, Mǐn, Chinese dialects, language contact 1839|Ciaccio2015|The theory postulating that language shapes the way we perceive reality, mostly known as Linguistic Relativity or Sapir-Whorf Hypothesis, has been widely debated in the past years. In particular, an increasing number of studies have dealt with the relationship between color naming and color perception, this being more easily observable than other rather abstract language categories. While some research on this field provided evidence for language effects on color perception, others stressed cross-linguistic universal tendencies in color naming and perception. Although supporters of the two main approaches have struggled to establish their theories rejecting the opposite view for many years, a new reconciling approach between them seems to emerge in more recent literature. On the one hand, this position would acknowledge the existence of universal tendencies in color naming. At the same time, it would posit that there can also be differences in the way languages encode color boundaries, and that these differences may, in turn, affect color perception. As a result, the two views not only seem compatible, but even complementary. According to this interpretation, the present paper aims at providing a review on the recent literature concerning color naming and color perception.|000|linguistic relativity, Sapir-Whorff hypothesis, color terms, relativism, universalism 1840|VanDriem2007|In mediaeval Europe, most scholars came to terms with the world’s linguistic diversity within the framework of a Biblical belief system. Even at the end of the eighteenth century, pious scholars such as Sir William Jones believed that the myth of the Tower of Babel explained how ‘the language of Noah’ had been ‘lost irretrievably’ (1793: 489). Another Biblical view attempted to explain the world’s linguistic stocks as deriving from Noah’s three sons after the deluge had abated in the well-known Judæo-Christian myth of the ark. The descendants of Shem populated the earth with Semitic speaking peoples, whereas the descendants of Ham today spoke ‘Scythian’ languages, whilst all other languages derived from the progeny of Noah’s eldest son Japhet. The Semitic languages most notably include Hebrew, the language of the Old Testament. The Semitic language family is known today as Afroasiatic. Scythian or ‘Scythisch’ is a language family first identified in Leiden by Marcus van Boxhorn (1647), although van Boxhorn did not invoke Biblical mythology in any of his own writings. His theory of language relationship was renamed Indo-Germanic or Indo-European in the 19th century. In 1647, ‘Scythisch’ specifically included Sanskrit, known to van Boxhorn through the vocabulary recorded by Ctesias of Cnidos in the fifth century BC, 1 and all the then known branches of Indo-European, viz. Latin, Greek, Celtic, Germanic, Indo-Iranian, Baltic and Slavonic. To subsume all other languages of the world from Alaska to Papua New Guinea and from Tierra del Fuego to Japan within a grand Japhetic family provided scholars working within a Biblical framework with a comfortably tidy and undifferentiated view of global linguistic diversity. Ironically, even in the twentieth century under the ruthless iconoclast dictator Joseph Stalin, the dominant paradigm in linguistics, archaeology and ethnography in the Soviet Union for decades was Marrism. This school of thought entertained a latter-day version of the Japhetic Theory conceived by the Georgian scholar Nikolaj Jakovlevič Marr, who curried favour with Stalin and in 1921 founded the Japhetic Institute of the Soviet Academy of Sciences.|000|Sino-Tibetan, genetic classification, Tibeto-Burman, Chinese, Old Chinese 1841|VanDriem2007|A third monophyletic view of Asian linguistics stocks is the most interesting because, though it has been whittled down in the course of time, this model still exists today and includes Chinese. The Indo-Chinese theory was invented by John Caspar Leyden, a Scottish physician and poet who died at the age of 35 in Batavia in the Dutch East Indies. He travelled widely in India and in mainland and insular Southeast Asia. His Indo-Chinese family encompassed all the languages ‘of the regions which lie between India and China, and the greater part of the islanders in the eastern sea’, which although ‘dissimilar’, according to Leyden, ‘exhibit the same mixed origin’ (1806: 1). Leyden did not live long enough to publicly disavow the theory, as Müller had done for Turanian, and Indo-Chinese was to doggedly lead a life of its own, even though it was constantly under assault by more knowledgeable scholars. Indo-Chinese is known today as ‘Sino-Tibetan’.|212|Sino-Tibetan, history of science, genetic classification 1842|Heine2016|Building on recent findings made in the framework of Construction Grammar, on the one hand, and within the framework on grammaticalization, on the other, the present paper is concerned with the development from lexical compound- ing to derivation. Compounding is presumably the most common source of derivational categories and this applies in particular to modifying (endocentric) compounds, which are the main subject of this paper. By looking at three cases of grammatical change in English, German, and the West African language Ewe it is argued that the two frameworks differ in their goals and in their approaches. Both frameworks search for regularities in grammatical change, but whereas Construction Grammar has a focus on constructional change, that is, change in the development of constructions, the central question asked by students of grammaticalization is how and why, e.g., lexical categories give rise to grammati- cal (or functional) categories.|000|compounding, construction grammar, word derivation, word formation, grammaticalization 1843|Plebe2015|The idea that language molds our thinking has met with varying degrees of favor in the history of philosophy and linguistics. A rather puzzling aspect of this debate is the lack of rigorous demonstration or rebuttal of the language-thought connection. We believe that one obstacle is the intrinsic difficulty in treating “thought” as a measurable dependent variable in empirical, replicable, experiments. We plead the case for a more humble, but more verifiable, aspect of linguistic relativity: by looking at cases where language shapes our perception, instead of “thought” generally. We target the domain of color terms, a field where the effects of language in shaping perception have been more striking, even if the details are highly debated, and also discuss a different kind of perception, one related to emotion terms. We argue that in this case, there is also a continuum in the proprioceptive determinants of emotions, which is structured into discrete categories by language.|000|linguistic relativity, Sapir-Whorff hypothesis, relativism, universality 1844|Kao2011|What makes a poem beautiful? To answer this question, this thesis uses computa- tional methods to examine stylistic and content features employed by professional and amateur poets. I propose and test the hypotheses below: (1) Beautiful poems contain more sophisticated diction and poetic sound devices (2) Beautiful poems are more emotional (3) Beautiful poems contain more concrete details and imagery. Building upon existing techniques designed to quantitatively analyze poetic style and sentiment in texts, I identify and analyze factors testing these hypotheses. Results showed no significant difference between the difficulty levels of professional and am- ateur poets’ vocabulary. Contrary to prediction, professional poets use significantly less sound devices such as rhyme and alliteration than amateur poets. Professional poems were also found to contain significantly fewer emotional words than amateur poems. The most important mark of good poetry was instead concrete and specific imagery. Professional poets refer to specific objects more often and employ a greater variety of words. This suggests that professional poets are able to evoke sentiment using concrete and specific imagery instead of abstract or explicitly emotional words. These results challenge and confirm several established doctrines in creative writing and poetic theory, suggesting that methods from computational linguistics may help support the analysis of beauty in verbal art.|000|poetry, poetic function, computer-based approaches, rhyme patterns, English 1845|Lew2016|This study describes the acoustic properties associated with tone and register in Louma Oeshi, a previously unstudied Akoid language of Laos. Louma Oeshi uses three tones (High, Mid, and Low) which overlap with a tense/lax register distinction to yield a six-way suprasegmental contrast. In this paper, we (1) offer a first account of the pitch and voice quality characteristics associated with each Tone-Register pair, (2) examine further the variability in glottalization strategies signaling the constricted register, and (3) explore the influence of contrastive voice quality on pitch and vice versa, particularly as a predictor of the variation in glottalization.|000|speech acoustics, tone, tone language, Louma Oeshi, Sino-Tibetan, Tibeto-Burman, Loloish 1846|Foley2016|Evolutionary problems are often considered in terms of ‘origins’, and research in human evolution seen as a search for human origins. However, evolution, including human evolution, is a process of transitions from one state to another, and so questions are best put in terms of understanding the nature of those transitions. This paper discusses how the contributions to the themed issue ‘Major transitions in human evolution’ throw light on the pattern of change in hominin evolution. Four questions are addressed: (1) Is there a major divide between early (australopithecine) and later (Homo) evolution? (2) Does the pattern of change fit a model of short trans- formations, or gradual evolution? (3) Why is the role of Africa so prominent? (4) How are different aspects of adaptation—genes, phenotypes and behaviour—integrated across the transitions? The importance of develop- ing technologies and approaches and the enduring role of fieldwork are emphasized.|000|evolutionary transitions, human prehistory, human evolution, 1847|Chen2011|世界古典意音文字中普遍存在音补现象。音补是一些提示发音的补充性符号,可分提示表意符号发音的音补和提示表音符号发音的音补两类。音补是文字为准确记录语言而使用的一种手段。本文对古汉字和古埃及圣书字这两种古典意音文字的音补现象进行了描述。 :translation:`The old semantic-phonetic writing systems in the world often show the phenomenon of phonetic complements. Phonetic complements are symbols that are used to emphasize the pronunciation. The can be divided into primarily semantic phonetic complements and primarily phonetic phonetic complements. Phonetic complements serve the enhanced documentation of language. This article contrasts the old Chinese and the old Egyptian semantic-phonetic writing systems regarding their phonetic complements.` |000|phonetic complement, phonophoric character, Chinese writing system, Egypt writing system 1848|Agha2015|It may not be an exaggeration to say that our understanding of linguistic and other forms of human communication has advanced more during the 20th century than in any preceding period. Yet these changes did not occur all at once. Instead, different levels of organization within communi- cative conduct became focal objects of scholarly attention at different times. Earlier in the 20th century, research paradigms in many disciplines were dominated by approaches that favored abstract models of homogenous sign systems underlying the complexities of situated communication. A trans- formative shift began after the middle of the century, when scholarship be- gan to turn from abstractable models to contextual and perspectival varia- tion, from an exclusive focus on langue, defined as the object of linguistics by Ferdinand de Saussure (1857–1913), to the organization of parole into forms of situated language use within social practices. Attention to situated practices soon revealed that many features of parole rely on the tendency of language users to adapt the resources of langue in heterogeneous ways within specific varieties of communicative conduct. “Register” originated as a term to designate these varieties. In recent decades, approaches to register phenomena have become central to many disciplines in Europe and North America. The present volume brings together work by anthropologists, folklorists, linguists, and philologists. The sixteen articles collected here represent approaches that have developed on both sides of the Atlantic. Many authors discuss the development of register studies in their own fields and employ analytic techniques developed within distinct disciplinary traditions. They focus on the register organization of a range of semiotic devices – whether grammatical units or prosody, whether lexical items or melodic contours, whether verbal signs or kinesic behaviors, whether spoken as utterances or circulated through script-artifacts. They describe models of communicative conduct in a variety of social practices and historical locales, and the range of phenomena they describe is far wider than those studied in early approaches to registers.|000|register, synchronic variation, language variation, 1849|Solovyev2016|Earlier an unexpected phenomenon – acceleration of grammar change for Indo-European languages – was found, but its reason is not clear. Probable reason is an evolution change of cognitive mechanisms 6-11 thousand years ago. It means that evolution of the human mind has not been completed in Pleistocene. We discover a correlation between grammar change acceleration and population distribution of gene ASPM, regulating some processes in human brain development. It is proposed that acceleration of some cognitive processes may be a competitive advantage of ASPM.|000|grammaticalization, grammatical change, cognitive mechanism, evolutionary transitions, cognition, language evolution 1850|Solovyev2016a|In the paper three approaches to reconstruction of languages evolution trees are compared on the material of North Caucasian languages: the expert one (comparative-historical method), lexicostatistics, application of phylogenetic algorithms to databases. It is shown that degree of coherence of different computer solutions is approximately the same as degree of coherence of expert solutions. A new classification of North Caucasian languages is proposed, as a result of applying the consensus method to different known classifications.|000|North Caucasian, lexicostatistics, ASJP, genetic classification, phylogenetic reconstruction 1851|Arbib2015|The article offers an assessment of David Kemmerer’s article ‘The Cross-Linguistic Prevalence of SOV and SVO Word Orders Ref lects the Sequential and Hierarchical Representation of Action in Broca’s Area’ in the context of the question, ‘How did biological evolution yield a language-ready brain?’ We argue that the path from praxic actions to grammatical structure is indirect and that comparative neurobiology of human and monkey and computational modeling of neural circuitry may both play a role complementary to that of human neurology and neurolinguistics in tracing the intricacies of that path. We conclude that (a) Kemmerer’s focus on BA44 ignores the involvement of a larger system and that the key factor of subject salience is determined by regions outside BA44 supporting the perceptual salience of people within a scene; (b) the ability to process ‘non-canonical’ structures (i.e., structures with a non-default mapping between syntax and semantics) may have been crucial in the evolution of language, and (c) that constituent hierarchies (i.e., the ability to express increasingly more complex information by increasing constituent complexity) also play a particularly important role in explaining the communicative power of language.|000|language origin, language evolution, subject-verb-order, Broca's Area 1852|Hladka2015|We present a gentle introduction to machine learning in natural language processing. Our goal is to navigate readers through basic machine learning concepts and experimental techniques. As an illustrative example we practically address the task of word sense disambiguation using the R software system. We focus especially on students and junior researchers who are not trained in experimenting with machine learning yet and who want to start. To some extent, machine learning process is independent on both addressed task and software system used. Therefore readers who deal with tasks from different research areas or who prefer different software systems will gain useful knowledge as well.|000|machine learning, introduction, natural language processing, automatic translation, machine translation 1853|Brunelle2016|Southeast Asia is often considered a quintessential Sprachbund where languages from five different language phyla have been converging typologically for millennia. One of the common features shared by many languages of the area is tone: several major national languages of the region have large tone inventories and complex tone contours. In this paper, we suggest a more fine-grained view. We show that in addition to a large number of atonal languages, the tone languages of the region are actually far more diverse than usually assumed, and employ phonation type contrasts at least as often as pitch. Along the same lines, we argue that concepts such as tone and register, while descriptively useful, can obscure important underlying similarities and impede our understanding of the behavior of phonetic properties, typological regularities, and diachrony. We finally draw the reader’s attention to some issues of current interest in the study of tone and phonation in Southeast Asia and describe some technical developments that are likely to allow researchers to address new lines of research in years to come.|000|tone, tone language, South-East Asian languages, Sprachbund, introduction, language union, convergent evolution, 1854|Cutler2015|Adolescence is a time when many young people begin experimenting with and ‘trying on’ different iden- tities. This process manifests itself in the linguistic choices that they make as they try to signal alignments with particular ethnic groups, subcultures, or lifestyles. The phenomenon of European American or ‘White’ youth who style their speech using features of African American English (AAE) and Hip Hop Nation Language (HHNL) to project their orientation to Hip Hop culture illustrates how this process may involve crossing ethnolinguistic boundaries. 1 This article reviews the dynamics of this phenomenon, how it intersects with sociolinguistic theories of style, and the implications of incorporating discussions about language variation, AAE, and Hip Hop in urban, ethnic minority schools.|000|language variation, hip hop, rhyme patterns, African American English, Hip Hop Nation Language, introduction 1855|Penny1993|Tree diagrams have been used to rep- resent relationships among human popu- lations and among languages but only re- cently have the trees been compared (Fig. 1) and the claim made "of considerable parallelism between genetic and linguistic evolution" (Cavalli-Sforza et al, 1988:6002). In rejecting this claim, O'Grady et al. (1989) pointed out that the original study did not have an objective measure to support the conclusion of high similarity. However, the lack of an objective measure invalidates the denial (O'Grady et al., 1989) just as much as it invalidates the claim that the trees are similar (Cavalli-Sforza et al., 1988, 1989). The original work has recently been sup- ported by computer simulation to estimate the probability of finding such agreement between the gene and language trees. Tree comparison metrics are perhaps a more natural way of comparing trees quantita- tively, and we have recently extended the knowledge of such metrics (Steel and Pen- ny, 1993). An objective tree comparison measure can resolve the issue of the sim- ilarity of the gene and language trees from Cavalli-Sforza et al. (1988), showing that they are indeed far more similar than ex- pected by chance.|000|biological parallels, family tree, language tree, species tree, language as species 1856|Ravindranath2015|In a world where most individuals speak more than one language and most languages are in contact with other languages, the study of language in its social context must take into account language contact. Two central questions in the study of language contact and change are (1) whether social or linguistic factors are primary in predicting the outcomes of contact; and (2) whether change in the context of contact is a foregone conclusion. Quantitative studies of sociolinguistic variation provide an effective means of observing change in progress, but the majority of such studies have focused on monolingual speakers/communities. This article gives a brief summary of the study of language contact in variationist sociolinguistics before turning to a discussion of the types of data and approaches best suited to answering the question of how and when contact causes change in mul- tilingual communities. Recognizing that the description of a multilingual community involves more social parameters and more inter-individual variation than a monolingual one, this article focuses on a core list of social factors to consider in studies of variation and change in multilingual communities, organized so that possible intersections may also be considered.|000|sociolinguistic variation, linguistic varieties, language variation, language contact, introduction 1857|Holtz2011|Der Wortschatz einer Sprache ist ein besonders instabiles Element, welches sich ständig im Wandel befindet. Seine fortlaufende Aktualisierung reflek- tiert die sich ändernden Bedürfnisse einer Sprachgemeinschaft, die mithilfe von Wörtern ihre Gedankenwelt und die Wahrnehmung ihrer Umgebung artikuliert (Kastovsky 2006: 201). Besonders in Zeiten des wirtschaftlichen, gesellschaftlichen oder politischen Wandels sucht eine Sprachgemeinschaft nach neuen Ausdrucksmöglichkeiten, die linguistisch gesehen mit Neologis- men befriedigt werden. In diesem Zusammenhang lassen sich die geschätzten 10.000 bis 40.000 neuen Wörter und Wendungen erklären, die in den vergangenen dreißig Jahren in der VR China entstanden sind. Jährlich kom- men nach Schätzungen von Yang Xiaoping (2008: 52) 800 neue Ausdrücke hinzu. Sie stellen das linguistische Inventar für den technischen Fortschritt, soziale und wirtschaftliche Umwälzungen sowie die Konsequenzen der politischen Reformen dar.|000|Chinese, Mandarin, lexical borrowing, loanword integration 1858|Holtz2011|Article shows some basic aspects on loanword integration, not very deeply linguistic, but potentially interesting regarding numbers.|000|Chinese, loanword integration, lexical borrowing, Mandarin 1859|Hulst2016|In this article, I discuss and review the proposal to replace traditional binary features (such as [+round]) by unary, single-valued, or monovalent units (such as |round|). I will focus on proposals within the context of dependency phonology, government phonology, and radical CV phonology. In all three approaches, in addition to unary primes, use is made of head–dependency relations. The central motivation for switching from a binary to a unary understanding of phonological primitives comes from the empirical finding that binary ap- proaches wrongly predict that both values of each feature define natural classes of segments or can be in- volved in a process. The central idea behind monovalency is that in all cases, only one pole of a phonetic opposition can play these roles. A monovalent approach is thus inherently more restrictive and therefore should form the null hypothesis.|000|phonology, monovalent features, distinctive features, phonetics, introduction 1860|Mendes2016|Substitution rates are known to be variable among genes, chromosomes, species, and lineages due to multifarious biological processes. Here, we consider another source of substitution rate variation due to a technical bias associated with gene tree discordance. Discordance has been found to be rampant in genome-wide data sets, often due to incomplete lineage sorting (ILS). This apparent substitution rate variation is caused when substitutions that occur on discordant gene trees are analyzed in the context of a single, fixed species tree. Such substitutions have to be resolved by proposing multiple substitutions on the species tree, and we therefore refer to this phenomenon as Substitutions Produced by ILS (SPILS). We use simulations to demonstrate that SPILS has a larger effect with increasing levels of ILS, and on trees with larger numbers of taxa. Specific branches of the species trees are consistently, but erroneously, inferred to be longer or shorter, and we show that these branches can be predicted based on discordant tree topologies. Moreover, we observe that fixing a species tree topology when performing tests of positive selection increases the false positive rate, particularly for genes whose discordant topologies are most affected by SPILS. Finally, we use data from multiple Drosophila species to show that SPILS can be detected in nature. Although the effects of SPILS are modest per gene, it has the potential to affect substitution rate variation whenever high levels of ILS are present, particularly in rapid radiations. The problems outlined here have implications for character mapping of any type of trait, and for any biological process that causes discordance. We discuss possible solutions to these problems, and areas in which they are likely to have caused faulty inferences of convergence and accelerated evolution. |000|rate of change, rate variation, substituion rate, phylogenetic reconstruction, Bayesian approaches, incomplete lineage sorting 1861|Tserdanelis2006|Historically problematic as to their notational description * Different traditions (Americanist/IPA/Slavic/Historical etc.) * For example: [ č, tš, :sampa:`tS`, :sampa:`t_S`, :sampa:`t)S` ] for postalveolar or palatal and [ ts, :sampa:`t_s`, :sampa:`t^s`, :sampa:`t)s` ] for alveolar or dental * The current International Phonetics Association alphabet (Handbook of the IPA, 1999), formalizes the use of a tie-bar convention [ ) ] above or below the sequence of symbols denoting the stop onset and fricative release of affricates. * Superscription of the fricative symbol [:sampa:`t^s`] is also a frequent but adhoc practice. Less frequent is the superscribing of the stop portion as in the Japanese allophone of word-initial /z/ transcribed by Okada (1999) as [:sampa:`^ds`] (IPA: [:sampa:`d)z`]). |B1|affricates, phonetic transcription, IPA, 1862|Tserdanelis2006|Affricates are internally complex segments comprised of at least (3) distinct phases that can vary in their relative timing: 1 Stop Closure 2 Release Burst 3 Frication Noise Temporal characteristics of stop and fricative phases are ignored by the traditional notation practices|B2a|affricates, characteristics, definition 1863|Tserdanelis2006|We propose standardizing the superscription of the symbol associated with the durationally shorter and thus less dominant phase of an affricate in a practice similar to the IPA notation for prenasalization [ ^m b ] or secondary articulation [ d ^w ]|B2b|affricates, phonetic transcription, superscript letters 1864|Tserdanelis2006|Authors propose the use of superscript letters in affricate phonetic transcription in order to distinguish different durations of the stop-part or the release part. They illustrate cross-linguistic differences for some four Indo-European languages (Albanian, Greek, Bulgarian, Romanian).|000|phonetic transcription, affricates, 1865|Leskovec2008|A large body of work has been devoted to identifying com- munity structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a signifi- cantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algo- rithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a net- work community structure similar to our observations.|000|conductance, network, cluster purity 1866|Gries2015|The techniques discussed so far are all usually employed to subject hypotheses to significance testing and decide on which hypothesis to adopt based on a p-value. A different set of tech- niques is concerned with generating hypotheses in the first place. That is, these techniques serve exploratory functions and are often applied to data sets whose size and complexity defies an analyst’s eyeballing and pattern-matching skills. Again, this section will begin with a discussion of what are arguably the currently most frequently used methods before turning to some refinements and desiderata that would benefit the field; as before, given that the number of techniques is vast, the overview has to be quite selective.|730|exploratory data analysis, definition, hypothesis-generating methods, 1867|Gries2015|Statistical tools that belong to the domain of null-hypothesis significance testing are the most widespread in linguistics. Within this set of tools, one needs to distinguish between good- ness-of-fit tests and tests for independence/differences. The former are concerned with testing whether a characteristics of a partic- ular data set – a mean, a standard deviation, the overall distri- bution – is different from that of some other data set (e.g., one from a previous study) or a known distribution (e.g., the bell- shaped normal distribution). The latter can be divided up into monofactorial and multifactorial tests: both involve one dependent variable (or response or effect), but the monofactorial designs contain only one independent variable (or predictor or cause) whereas multifactorial designs contain more than one indepen- dent variable. While not a standard statistical terminology, it has occasionally been didactically useful to distinguish two kinds of multifactoriality; Adopting this distinction, multifac- torial 1 refers to designs in which multiple independent variables are involved but without interactions, whereas multi- factorial 2 then refers to designs in which multiple independent variables are involved such that they may interact with each other.|725|hypothesis-testing methods, methodology, significance, definition, 1868|Gries2015|This article surveys a selected variety of statistical methods that are currently used in experimental and observational studies in linguistics. It covers goodness-of-fit tests, monofactorial and multifactorial hypothesis testing methods, and hypothesis- generating techniques. In addition, for the two major sections of significance testing and exploratory methods, the article also discusses a wide range of statistical desiderata, i.e., perspectives and methods whose more widespread recognition or adoption would benefit linguistics as a discipline.|000|methodology, linguistics, quantitative linguistics, quantitative analysis, significance 1869|Pawelec2009|I propose to show that, in their Conceptual Metaphor Theory (CMT), Lakoff and his collaborators do not offer a new account of metaphor but rather a wide-ranging representation of analogies, reconstructed on the basis of selected linguistic material (primarily collocations and idioms). Consequently, CMT is valuable not as an explanation of metaphorical language in use, nor a hypothesis about the genesis and development of concepts in individual minds, but primarily as a way to represent the results of unexplored social processes of lexicalization involving metaphor. If one adopts a more 'ecological', situated perspective, this global, post hoc approach may perhaps provide useful material to speculate on the forces that drive meaning extension in history.|000|metaphor, cognitive semantics, conceptual metaphor theory, Lakoff, analogy, post-hoc theory 1871|Georg2009|Very detailed and apparently informed review of @Robbeets2005, a large volume on the Altaic theory.|000|Altaic, Japanese, genetic classification, genetic relationship, review, proof of relationship 1872|Georg2009|Why cite Whitman at length with a passage which would require a thorough reading of this author’s book to fully understand it and then offer no discussion at all of his views, either consenting or dissenting?|260|transparancy, linguistic reconstruction, Altaic, Japanese 1873|Georg2009|There is some usefulness in presenting all these opinions offered by various proponents of the Altaic hypothesis over the decades, 2 if only to see how they [pb] differ from each other, how wildly disparate etymological sets – not rarely in the writings of one and the same author – serve the purpose of linking Japanese linguistically to the Asian continent, or to see how the ideas of some of these scholars evolved over time. Some usefulness, since we are again left alone as to what we are to make of this.|261f|Altaic, transparancy, linguistic reconstruction, etymology 1874|Robbeets2005|The question of cognation [sic] between Japanese and Altaic is usually addressed in a partisan fashion. The polarization into a pro-Altaic and anti-Altaic front is an impediment to progress in Japanese comparative studies. My intention is to take a detached view of the matter, not joined to either side. With an impartial mind I do not wish to imply that I am uninterested or indifferent to the problem. Rather, the question whether Japanese is Altaic or not is truly intriguing and it represents a serious challenge, but I do not have anything to gain or lose by the exact outcome of the question. Hence, I will attempt to approach this question with proper scholarly detachment, sifting and evaluating facts rather than defending one position or the other. :comment:`Quoted after` @Georg2009|29|etymology, proof of relationship, Japanese, Altaic, objectivity, validity, reliability 1875|Robbeets2005|Macro-level comparison is more synoptic, in a way that provides us with overview. It is like inspecting two similar objects with the naked eye or scanning them through a microscope. Focusing on some microscopic similarities is pointless without prior inspection with the naked eye since your eyes can help you to understand the totality of the objects considered. :comment:`quoted after` @Georg2009|24|language comparison, methodology, Altaic, mass comparison 1876|Ho2016|It is a firm linguistic fact that rhyming should be based on identity of vowels. Interpretation should, of course, be based upon facts. Facts precede and matter more than interpretation.|183|vowel purity, rhyme patterns, rhyme analysis, Old Chinese, linguistic reconstruction 1877|Jacques2016d|Japhug is a language with ergative alignment on NP arguments and direct-inverse verbal indexation. However, this paper, 1 through a detailed description of relativizing constructions in Japhug, shows the existence of accusative pivots and proposes an unambiguous definition of ‘subjects’ and ‘objects’ in this language.|000|Japhug, relativization, subject, object, grammar, 1878|Jacques2016d|Japhug presents strict verb-final word order. The only elements that can occur post-verbally are sentence-final particles, some ideophones and adverbs [...], and right-dislocated constituents.|2|Japhug, subject-verb-order 1879|Jacques2015b|This paper presents a critical overview of previously proposed etymologies involving the initial cluster *sr– between Chinese and other Sino-Tibetan languages. It puts forth one new etymology, which confirms the simplification of the cluster *sr– to s– in Kiranti and the preservation of this cluster in Rgyalrong languages.|000|Sino-Tibetan, consonant cluster, Rgyalrong, Kiranti, linguistic reconstruction 1880|Jang2015|The purpose of this article is to propose and prove that the well-known theory of markedness in language acquisition is also working in lexical borrowing. Principles of first language acquisition have widely been attested to operate in second language and/or foreign language acquisition. However, not much attention has been paid to the comparison between first language acquisition and lexical borrowing, although lexical borrowing also clearly involves similar processes and/or principles of foreign language acquisition in various forms. Specifically, we will show that fricatives of source language are changed to stops in target language, in parallel with the well-known phonological process that fricatives are realized as stops and that they are acquired later than stops in first language acquisition. Supporting evidence is provided from the comparison between general language acquisition data and strengthening of fricatives found in the lexical borrowing from Chinese by Vietnamese. In so doing, we will compare the alveolar fricatives in Chinese and their borrowed forms in Sino-Vietnamese and Sino-Korean.|000|lexical borrowing, fricatives, Vietnamese, Sino-Tibetan, Chinese 1881|Jakobson1968|[T]he acquisition of fricatives presupposes the acquisition of stops in child language; and in the linguistic systems of the world the former cannot exist unless the latter exists as well. :comment:`quoted after` @Jang2015|51|fricatives, language acquisition, universals 1882|Jang2015|Thus, stops are acquired earlier than fricatives. @Jakobson<1968> (1968) reported for numerous languages that children tend to substitute stop consonants for fricatives when they are in initial position. This theory has been called “implicational universals” in the literature and has been tested in various ways in various languages.|151|implicational universals, Roman Jakobson 1883|Jang2015|A question may arise with regard to this data: Is the sound change of /s/o/t/ found in the adoption of Chinese into Sino-Vietnamese accidental or regular? We have already seen that strengthening of fricatives is a regular process in first language acquisition in the previous section. If this is proven to be a rule-based regular phenomenon in lexical borrowing from a foreign language, then it is surely quite an interesting phenomenon for the study of comparative-historical linguistics as well as for the study of lexical borrowing.|159|fricatives, strengthening, universals, lexical borrowing 1884|Jang2015|Not only is the markedness theory useful for explaining language acquisition and language universals, it is also useful in explaining lexical borrowing. :comment:`What is that supposed to mean in the end? There is no general rule, and markedness is also not generalized, so in the end, all the article does is repeating evidence and bringing up Jakobson here.`|164|markedness, lexical borrowing, 1885|Sampson2015|Recent cross-language research has yielded strong statistical evidence in support of the idea, advocated by André Martinet and widely accepted by linguists, that languages avoid adopting sound-changes which would create many homophones. Yet we know that the history of Chinese phonology has been marked by repeated phoneme mergers and losses which led to a very high incidence of homophony, forcing the monomorphemic vocabulary of the classical language to be replaced by a largely bimorphemic modern vocabulary. This paper examines various ways in which this apparent contradiction might be resolved. None seems fully satisfactory, yet some resolution must exist.|000|splits and mergers, phoneme merger, Chinese, language history, syllable structure, monosyllabicity, Old Chinese, 1886|Sampson2015|One of the most striking properties of Chinese, to people more familiar with European languages, is its very high incidence of homophony. All languages contain some homophones, for instance English /rait/ can represent any of the unrelated words right, write, or rite. But in English and other European languages, words which coincide in pronunciation with other words are a minority, and even in such cases it is rare for more than two or three etymologically-distinct words to share a spoken form. In Chinese, and particularly in the standard, Mandarin dialect, if for the moment we use the term “word” to refer to 字 rather than to 詞, there are very few words which are not homophonous with other words, and a set of homophones may contain ten or twenty members.|680|homophony, Mandarin, Chinese, Chinese dialects 1887|Sampson2015|It is difficult to be precise about which of all the words that have been used in the long recorded history of Chinese should be counted as elements of present-day spoken Mandarin, but one linguistically- sophisticated attempt to do so is Chao and Yang (1962); on average a Mandarin syllable is ambiguous between about four of the 字 listed there, with a maximum of 25-way homophony for the syllable yù.|680|homophony, Chinese, Mandarin 1888|Sampson2015|Studies initiated in Qing-dynasty China have shown that this situation results from a series of sound-changes over a long period which merged phonemes that previously contrasted, or eliminated phonemes altogether (merged them with zero).|680|Chinese, homophony, language history 1889|Sampson2015|In general linguistics, though, there is a longstanding belief that the historical sound-changes which occur in all languages are subject to a constraint by which they avoid creating a high degree of homophony. Contrasting phonemes merge, it is claimed, only if there are not too many pairs of words distinguished by that particular contrast.|681|phoneme merger, sound change, language change, sound change, 1890|Sampson2015|Particularly strong evidence is discussed by Wedel, Kaplan, and Jackson (2013), who examine 41 phoneme mergers in six European languages together with Korean and Cantonese, all of which occurred recently enough to allow vocabularies to be studied statistically. Comparing phoneme mergers which have actually occurred with hypothetical mergers that are equally phonetically plausible but have not occurred, Wedel et al. find support at a very high level of significance (p < .001) for the hypothesis that the likelihood of a merger correlates inversely with the number of homophones it creates.|682|functional load, sound change, phoneme merger, quantitative analysis 1891|Sampson2015|As things stand, then, we have arguments which seem quite cogent that language-change avoids creating excessive homophony; if I did not know about Chinese I would certainly find those arguments convincing. Yet at the same time we know that sound-changes in the history of Chinese have created a massive level of homophony. This is a real paradox. Both statements appear to be true, but they contradict each other. The aim of the rest of this paper is to explore various ways in which one might hope to resolve the paradox. None of the alternatives seems to me satisfactory. But some resolution there must be.|682|functional load, phoneme merger, Chinese, paradox 1892|Li1987|If [the word 金] hadn’t become the disyllabic form [金子], the ... words for “gold” and “tael” would have been homophonous. The threat of too many homophonous words has forced the language to increase dramatically the proportion of polysyllabic words. :comment:`quoted after` @Sampson2015|817f|polysyllabic words, Chinese, language change, homophony, phoneme merger 1893|Sampson2015|As it happens, @Li<1987> and Thompson were ill-advised in their choice of example. Instances of the particular disyllable-creating process they cited, namely suffixing 子 to a noun without any diminutive connotation, are known to have occurred early (Jerry @Norman<1988> 1988: 114 quoted examples from the Tang dynasty), while on the other hand -m and -n still contrasted for the 14th-century 唐. And many disyllable-creating innovations may well have occurred in speech before they showed up in the written record.|684|sound change, phoneme merger, Chinese, 1894|Sampson2015|A universal tendency for languages to avoid becoming inefficient through excessive ambiguity is very natural and understandable, if it is indeed a reality, whereas a universal tendency to avoid generating homophones via one type of process while allowing any amount of homophony to be produced in other ways would be inexplicable and implausible.|686|homophony, functional load, explanation of sound change 1895|Behr2015|One point made by Professor Sampson, which cannot be emphasized too much given the rampant back-projection of Standard Mandarin syllable structure onto Old Chinese realities in the literature, is that “homophony in the Old Chinese of three thousand years ago may not have been strikingly greater than in modern European languages.” (p.2.) [@Sampson2015 ] :comment:`Gives the shi-shi-shi story by Chao Yuenren as an example for homophony increase in Chinese. Gives the text in Old Chinese (Baxter-Sagart-reconstruction) to illustrate that it has no homophones in Old Chinese.`|719|Chinese, Old Chinese, homophony, sound change, phoneme merger 1896|Behr2015|Taking tonality into account, Middle Chinese still had more than 3000 distinct syllables (Duanmu 1999), i.e. about as much as the 2.756 distinct CVC syllables regularly used in Modern English (Barker 2008). In short, the necessity of distinguishing lost distinctions of the spoken language in writing must have been low well down to the medieval period and it is therefore inherently unlikely that disyllabification is exclusively driven by functional considerations of homophony avoidance.|721|Middle Chinese, homophony, statistics, Guǎngyùn, rhyme books, phoneme inventory 1897|Behr2015|The process of disyllabification is to a large degree concomitant to the rise of tonal distinctions in Old and Early Medieval Chinese, only completed shortly before the Suí reunification in the peripheral dialects (Pulleyblank 1973). The compensatory function of replacing lost final and laryngeal distinctions in the segmental inventory by phonemic tones is curiously absent from Professor Sampson's consideration of solutions for the apparent “enigma”. This is somewhat surprising in view of the fact that Shannon entropy inspired theories of “functional (FL) load as information loss” (@Hockett<1966> 1966, Wang 1967) clearly show that the FL of tonal distinctions is much higher than that of stress in non-tonal languages and about as high as the FL of vowels in a tone-language like Mandarin (Surendran & Niyogi 2003: 16). In other words, capacities for lexical distinction in perception and communication arising from such FL patterns, rely heavily on tonal distinctions.|721|functional load, tonogenesis, tone language, statistics 1898|Behr2015|It has recently been shown on the basis of a quantitative analysis of the development of Written Tibetan – a language phonotactically very close to pre-tonal reconstructed OC – into its various modern tonal and non-tonal dialect descendants, that there is a clearly identifiably threshold when the rate of segmental homophony invariably gives rise to disambiguating tonal contrasts. Although the employed method is somewhat crude, calculating the degree of homophonicity as the number of single syllables divided by the number of syllables with distinct initials [pb] and finals – 1, it clearly shows a tendency, whereby a homonym rate between 2.5 and 3.0 correlates with the incidence of phonemic tone distinctions in a successor dialect (Kǒng Jiāngpíng 2012). Any solution of the “enigma” will therefore have to carefully take tonal distinctions across the lexicon into account.|721f|homophony, Written Tibetan, Tibetan, phoneme merger, tonogenesis 1899|Behr2015|:comment:`Answers to the idea that disyllabic loans contributed to polysyllabic words in Chinese, dismissed by` @Sampson2015 Again, I would caution here against two assumptions which could seem to be implied, namely (a) that all such cases of internally unanalyzable compounds are loanwords, and (b) that the number of disyllabic words in Old Chinese is truly neglectable, [...]. :comment:`Gives further Jiāgǔwén examples for unanalyzable compounds in Old Chinese.` More importantly, it has become increasingly clear during recent years that the process of disyllabification of the vocabulary must have started already in the oracle bone period. Although much depends on the notoriously difficult definitions of “wordhood” in this area, some scholars estimate the percentage of compounds as high as one quarter of the vocabulary. :comment:`Very nice quotes on further literature and statistics follow this passage, according to which already 20% of the vocabulary in Old Chinese - Oracle bone inscriptions are disyllabic!`|722|polysyllabic words, Old Chinese, monosyllabicity, sound change, homophony 1900|Behr2015|A recent metastudy of disyllabicity in 27 early and medieval corpora of excavated texts and in the edited literature (Zhèng Zhènfēng & Lǐ Dōnggē 2010) clearly shows the following trends, directly relevant to the discussion of homophony avoidance as a compensatory mechanism: (a) disyllabification was incipient long before the phonological changes which eliminated most OC initial consonant clusters and the process of tonogenesis. In pre-Qín paleographic materials, rates of disyllables start out with ca. 20% in OBI, reach a first peak in Chūnqiū 春秋 bronze in- scriptions at 27.8%, and a second one in late Warring States and Qín bamboo strip inscriptions (Bǎoshān 包山: 43.8%, Shuìhǔdì 睡虎地: 43.5%). The development is not strictly linear but apparently strongly dependent on the sociolinguistic layer and textual genre. The great “explosion” of disyllabicity, if seen from the perspective of excavated materials, happens in the Eastern Hàn period, when all corpora start to exceed rates of 50% of disyllabic compounds, reaching as high as 78.2% for a corpus of stone and clay inscriptions from non-literary backgrounds. [...] (b) disyllabification rates are roughly comparable between excavated texts and the edited literature, and, if anything, higher in the paleographic materials which tend to reflect the underlying colloquial better due to a lack of editorial “streamlining”.|723|polysyllabic words, sound change, Chinese, language history, Old Chinese, statistics 1901|Behr2015|Since cycles of monosyllabicization via segmental "depletion" and subsequent recreation of polysyllables are an East and Southeast Asian areal phenomenon (for a recent comprehensive overview see Michaud 2012; for the varying degrees of homophonicity in Chinese dialects see Ke, Wang and Coupé 2002), it may well be that the rise of disyllables was also consolidated by areal pressures.|723|linguistic area, South-East Asian languages, polysyllabic words, phoneme merger, sound change, language contact 1902|Behr2015|(c) The idea that the rise of disyllabic words in documents is an artifact contingent upon the availability of paper as a cheap writing support, available roughly since the Eastern Hàn period, has been effectively disproven. The rate of compounding is largely independent of the type of the writing support. Theories, according to which disyllabicity arose early on but was only reflected in texts much later due material constraints are therefore unconvincing.|724|Chinese, polysyllabic words, language change, 1903|Chen2015|1. The massive homophony in modern Chinese was the result of multiple sound changes, including nasal ending mergers, final consonant loss, devoicing, palatalization (hence the neutralization between k,kh,x and ts, tsh, s), and so forth — processes that need not happen all at once. It is entirely conceivable that, for instance, nasal merger took place at time x, creating pressure for disambiguation by means of compounding at time y, which in turn facilitates devoicing at time z, and so forth.|693|Chinese, phoneme merger, polysyllabic words, language change 1904|Chen2015|2. Even within one single sound change (e.g. nasal merger), there is the logical possibility of gradual lexical diffusion instead of an instantaneous sweep across the vocabulary as conceived by the Neogrammarians.|694|lexical diffusion, language change, polysyllabic words, Chinese, phoneme merger 1905|Chen2015|Hence all sound change involves some degree of neutralization, and therefore diminishes the functional load of a phonemic contrast. At which point does neutralization count as a violation of Gilliéron- Martinet’s hypothesis remains unclear.|694|Chinese, polysyllabic words, phoneme merger, sound change 1906|Chen2015|It seems that pleonastic compounds constitute a tiny fraction of word formation types, even though their token frequency is surprisingly high as noted above. Conceivably, polysyllabic compounds arose in Chinese for entirely mundane reasons, much like city hall or speed limit (both written as a single word in German: Rathaus, Geschwindigkeitsbegrenzung). These unremarkable polysyllabic compounds may have established a certain kind of rhythmic pattern for a new wave of otherwise redundant compounds like 打擊 and 毆打.|-|polysyllabic words, Chinese, pleonastic compounds, rhythmic pattern, redundant compounds 1907|Kwok2016|There is a general belief that Yue constitutes a highly uniform dialect group, with its members sharing a good number of structures and lexica with the regional prestige dialect, Cantonese. Based on our first hand data collected from the field, this paper describes the lesser known grammatical diversity across the Yue dialects, which can be illustrated by the different uses of the following features: (a) ideophonic suffixes; (b) diminutive suffixes and tone sandhi; (c) perfective aspect markers and their position in the VP, and (d) neutral question forms. The survey includes nine dialects from different subgroups, most of which are spoken far away from the Pearl River Delta where Cantonese dominates. Our study reveals that while Cantonese has obvious influence over other members of Yue, the grammatical diversity across Yue cannot simply be overlooked.|000|grammaticalization, grammar, linguistic diversity, Yuè, Chinese dialects, typology 1908|Kwok2016|There is a remarkable morphological process in Cantonese called ideophonic suffixation. This process appears in the form of ABB, where A is an adjective, a noun, or (less commonly) a verb and BB is a replicated ideophonic syllable without concrete meaning and conventionally written characters, which is therefore represented by an empty square [...].|114|ideophonic suffixation, definition, 1909|Kwok2016|Our survey reveals that ideophonic suffixation not only exists in Cantonese. It is also a common morphological process in most, if not all, Yue dialects. :comment:`gives interesting examples for the case, with one language regularly showing BBA instead of AAB, and langauges often showing cognate words for identical compounds.`|115|Yuè, Chinese dialects, ideophonic suffixation 1910|Kwok2016|Interestingly, in the Yue dialects spoken in Guangxi, especially in the Nanning and Baise varieties, one root can choose from six different types of ideophonic suffixes. Teh intensity of the action is expressed by the height of the vowel of the ideophone, which can be regarded as a sound symbolism phenomenon.|116|ideophonic suffixation, Yuè, Chinese dialects 1911|Bromham2016|Interdisciplinary research is widely considered a hothouse for innovation, and the only plausible approach to complex problems such as climate change1, 2. One barrier to interdisciplinary research is the widespread perception that interdisciplinary projects are less likely to be funded than those with a narrower focus3, 4. However, this commonly held belief has been difficult to evaluate objectively, partly because of lack of a comparable, quantitative measure of degree of interdisciplinarity that can be applied to funding application data1. Here we compare the degree to which research proposals span disparate fields by using a biodiversity metric that captures the relative representation of different fields (balance) and their degree of difference (disparity). The Australian Research Council’s Discovery Programme provides an ideal test case, because a single annual nationwide competitive grants scheme covers fundamental research in all disciplines, including arts, humanities and sciences. Using data on all 18,476 proposals submitted to the scheme over 5 consecutive years, including successful and unsuccessful applications, we show that the greater the degree of interdisciplinarity, the lower the probability of being funded. The negative impact of interdisciplinarity is significant even when number of collaborators, primary research field and type of institution are taken into account. This is the first broad-scale quantitative assessment of success rates of interdisciplinary research proposals. The interdisciplinary distance metric allows efficient evaluation of trends in research funding, and could be used to identify proposals that require assessment strategies appropriate to interdisciplinary research5.|000|interdisciplinary research, chances, funding 1912|Bromham2016|Paper concludes that interdisciplinary research is very bad in getting funded, which is somehow contradictory to what is usually claimed or reinforced by research institutions.|000|interdisciplinary research, funding, 1913|Wang2015b|Most combinations of morphemes in early Chinese are generative. Therefore, the morpheme is the basic grammatical unit. In other words, morphemes and words are not distinguishable in early Chinese. In modern Chinese, however, combinations of morphemes may be generative or non-generative. Morphemes in non-generative combinations are not basic units but rather constituents of basic units|714|morphological change, polysyllabic words, Chinese, polysyllabification, language history, phoneme merger 1914|Wang2015b|First, synonyms often have several meanings, yet compounds are monosemous. [...] Second, compounding synonyms is a means of semantic generation. [...]|715|homophony, compounding, functional load, Chinese 1915|Wang2015b|Below, I provide direct evidence to show the likelihood that a shift from monosyllabic to disyllabic words took place [pb] before the loss of phonemic contrasts. The data of multisyllabic words in Chinese across time are taken from Li (2011). :comment:`shows statistics on increase of polysyllabicity in the history of Chinese, based on the source.`|715f|polysyllabic words, Chinese, statistics, 1916|Wang2015b|In conclusion, a focus on the earliest changes in lexicon and phonology reveals that an increase of multisyllabic words in Chinese preceded phonological simplification. That is to say, the multisyllabic Chinese lexicon allows homophony in monosyllabic morphemes.|717|phoneme merger, polysyllabification, Chinese, polysyllabic words, 1917|Dubossarsky2016|Linguists have identified a number of types of recurrent semantic change, and have proposed a number of explanations, usually based on specific lexical items. This paper takes a different approach, by using a distributional semantic model to identify and quantify semantic change across an entire lexicon in a completely bottom-up fashion, and by examining which distributional properties of words are causal factors in semantic change. Several independent contributing factors are identified. First, the degree of prototypicality of a word within its semantic cluster correlated inversely with its likelihood of change (the “Diachronic Prototypicality Effect”). Second, the word class assignment of a word correlates with its rate of change: verbs change more than nouns, and nouns change more than adjectives (the “Diachronic Word Class Effect”), which we propose may be the diachronic result of an independently established synchronic psycholinguistic effect (the “Verb Mutability Effect”). Third, we found that mere token frequency does not play a significant role in the likelihood of a word’s meaning to change. A regression analysis shows that these effects complement each other, and together, cover a significant amount of the variance in the data.|000|semantic change, automatic approach, directionality, part of speech, prototypicality 1918|Dubossarsky2016|Contemporary research identifies different kinds of regularity in semantic change as tendencies of change, which are asymmetries with respect to the directions in which change is more likely to occur.|2|semantic change, regularity 1919|Traugott2002|The focus of this work is recent developments in cross-linguistic research on historical semantics and pragmatics, with special reference to the histories of English and Japanese. The framework can be characterized as “integrative functionalist” (Croft 1995) in that we consider linguistic phenomena to be systematic and partly arbitrary, but so closely tied to cognitive and social factors as not to be self-contained; they are therefore in part nonarbitrary. One of the linguist’s tasks is to determine what is arbitrary, what is not, and how to account for the differences. We see semantic change (change in code) as arising out of the pragmatic uses to which speakers or writers and addressees or readers put language, and most espe- cially out of the preferred strategies that speakers/writers use in communicating with addressees. The changes discussed in this book are tendencies that are remarkably widely attested, but that can be violated under particular, often social, circumstances ranging from shifts in ideological values to the development of various technolo- gies. “Regularity” is to be understood as typical change, or frequent replication across time and across languages, not as analogous to the Neogrammarian idea of unexceptionless change in phonology. Richard Dasher takes prime responsibility for the Japanese data, Elizabeth Traugott for the remainder, but both have discussed all the material presented here in countless meetings over nearly fifteen years. The ideas presented here have been explored in several venues. It would be impossible to thank and ac- knowledge the contribution of all those who have helped make this a better book than it would have been otherwise, but Joan Bybee, Maria Cuenca, Bernd Heine, Paul Kiparsky, Roger Lass, Nina Lin, Alain Peyraube, Eve Sweetser, Chaofen Sun, Shiao-Wei Tham, and Yo Matsumoto deserve special mention, and especially Brady Clark, Andrew Garrett, and Nigel Vincent who gave extensive advice on pre-final drafts. Elizabeth Traugott owes a particular debt to her coauthors on var- ious other occasions: Paul Hopper, Ekkehard König, Rachel Nordlinger, Whitney Tabor, and above all to Scott Schwenter without whose inspiration, intellectual congening, and friendly challenges this book would not have come to fruition. Juno Nakamura gave invaluable help with preparing the manuscript and the indices. Citi Potts saved us from many errors at the copy-editing stage, and Andrew Winnard of Cambridge University Press supervised the production. To all our deepest appreciation. |000|semantic change, regularity, tendencies, cross-linguistic study 1920|Traugott2002|This book is something one should definitely consult and mention whenever it comes to semantic change investigations, even if it may be disappointing in the actual content, since it offers not much concrete things one could further use.|000|semantic change, regularity, tendencies, cross-linguistic study 1921|Dubossarsky2016|For example, @Traugott<2002> & Dasher (2002) propose that semantic change regularly follows the pathway: objective meaning > subjective meaning > intersubjective meaning. It has also been suggested that concrete meanings tend to develop into more abstract ones (Bloomfield 1933; @Haspelmath<2004> 2004; Sweetser 1990).|2|pathways, semantic change, regularity, tendencies, cross-linguistic study 1922|Dubossarsky2016|Another often-observed regularity is that semantic change overwhelm- ingly tends to entail polysemy, in which a word or expression acquire new senses that co-exist with the older conventionalized senses (e.g., a new sense for surf has emerged since the 1990s). These new senses can continue to co- exist stably with the older ones or to supplant earlier senses, thereby “taking over” the meaning of the word.|2|polysemy, semantic change, regularity, tendencies 1923|Dubossarsky2016|The existence of such regularities and asymmetries, or “unidirectional pathways of change”, has been taken as evidence that language change is not random. Moreover, these asymmetries call for explanations that are plausible in terms of what we know about human cognition and communication.|2|directionality of semantic change, directionality, asymmetry, semantic change 1924|Dubossarsky2016|Nonetheless, some work in this direction can be found in earlier structur- alist and cognitivist theories of semantic change, which emphasized the role of the structure of the lexicon in explaining semantic change. For example, it has often been assumed that changes in words’ meanings are due to a tendency for languages to avoid ambiguous form-meaning pairings, such as homonymy, synonymy, and polysemy (@Anttila<1989> 1989; Menner 1945).|3|systemic processes, homophony, synonyms, semantic change 1925|Dubossarsky2016|On the other hand, when related words are examined together, it has been observed that one word’s change of meaning often “drags along” other words in the same semantic field, leading to parallel change (@Lehrer<1985> 1985). These seemingly contradictory patterns of change lead to the conclusion that if ambiguity avoid- ance is indeed a reason of semantic change, its role is more complex than ini- tially assumed.|3|parallel development, semantic field, semantic change, ambiguity, ambiguity-avoidance 1926|Dubossarsky2016|Interesting study introducing collocations of words to deduce semantic change across a corpus of English covering many years and being accessible on a year-wise basis. They show that verbs apparently change much quicker than nouns in their meaning, which is an interesting finding, which is apparently also psychologically founded.|000|semantic change, quantitative analysis, n-gram model, corpus studies, tendencies, part of speech 1927|Lehrer1985|Interesting article which apparently claims that the semantic field of a word changing its meaning has influence on general tendencies of semantic change thereafter. Definitely worth a closer read, since this seems to point to systemic processes which are not yet completely understood.|000|semantic field, systemic processes, semantic change, tendencies 1928|Lehrer1985|The connection between semantic change and semantic fields was stressed by @Trier<1931> (1931), one of the earliest linguists to develop semantic field theory. Trier showed that semantic change affects the structure of semantic fields. His famous example of the meaning of words dealing with knowledge in medieval German shows that both the inventory and the semantic structure changed between 1200 and 1300. The words that remained in the language had different meanings. * 1200: Wîsheit > Kunst and Wîsheit > List * 1300: Wîsheit, Kunst, List :comment:`Shows how a general term comes to develop a meaning of its own.`|284|semantic change, semantic field, systemic processes 1929|Lehrer1985|Trier concludes that a change in the meaning of one word in a field requires changes in the meaning of other items because of two assumptions he makes: 1 There are no overlaps of meaning in a field. 2 There are no gaps. Both of these assumptions are false, however. The existence of partial synonymy -- a very common phenomenon -- shows that the first assumption is false, and the discussion of lexical gaps in Lehrer (1974, chapter 5) provides many counterexamples to Trier's second assumption.|284|semantic field, semantic change, systemic processes 1930|Lehrer1985|This study will take a different direction. It will show how semantically related words show parallel semantic changes.|284|parallel development, semantic change, semantic field, tendencies 1931|Lehrer1985|I have tried to show that semantic field theory -- the view that our lexicon is organized into semantic fields -- can contribute to our understanding of semantic chagne. The phenomenon I discuss does not replace other explanations of why words change meaning in the first place or why they change from A to B nstead of from A to C. But I hope to have shown that the histories of words are not completely independent. Sometimes semantically related words share a historical development.|293|semantic field, systemic processes, parallel development, semantic change 1932|Newman2006|Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets. |000|network modularity, community structure, network 1933|Newman2006|The modularity is, up to a multiplicative constant, the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random.|8578|definition, network modularity, modularity 1934|Newman2006|The modularity can be either positive or negative, with positive values indicating the possible presence of community structure. Thus, one can search for community structure pre- cisely by looking for the divisions of a network that have positive, and preferably large, values of the modularity (18)|8578|network modularity, modularity, characteristics 1935|Wang1967|1. General discussion 2. Tone features and segmental features 3. Presentation of phonological features 4. A proposed set of tone features 5. Redundancy conventions 6. Phonetic interpretation 7. Marking conventions 8. Tone circle in Mǐn|000|tone, introduction, distinctive features, phonology, tone language, complex tone languages 1936|Wang1967|Article presents phonological features of tone and is apparently the first article published on the topic, so it is worthwile quoting this article in the given context. Furthermore, it seems useful to take this coding into account when trying to generalize the tones in the data, as it seems to provide full coverage for all tones which effectively occur, which, of course, still needs to be proven.|-|tone, complex tone languages, introduction, distinctive features, 1937|Zhu2015|Based on firsthand acoustic data, this paper aims to determine how many phonologically contrastive falling tones exist in tonal languages, and what kinds of distinctive features are needed to specify them. 1 These goals are achieved by using a tonal model called the Multi-Register and Four-Level Model, which represents tones along four parameters: register, length, height, and contour. Having excluded a quasi-falling tone, this paper identifies seven Falling Tonotypes in the M Register: High, Low, MS-High, MS-Low, Deferred-High, Deferred-Low, and Slight Falling. Four of these also occur in the L Register with special voice qualities. In total, there are eleven Falling Tonotypes, which can be specified according to five distinctive features.|000|tone, complex tone languages, Chinese, Chinese dialects, distinctive features, experimental phonetics 1938|Zhu2015|The present paper attempts to [pb] determine how many phonologically contrastive falling tones exist in tonal languages, and what kinds of phonological parameters and distinctive features are needed to describe, represent, classify, or specify them.|605f|tone, phonetics, complex tone languages, Chinese dialects, typology 1939|Zhu2015|Paper describes an earlier mentioned model for cross-linguistic tone comparison (@Zhu2012) which is based on registers and four levels of height. In this context, it is probably interesting to give it a proper read.|000|phonation type, register, tone, distinctive features, language model, typology, cross-linguistic study 1940|Zhu2015|Before proceeding, we need to define or explain some terms. First, the “tonotype,” a new concept in phonology and typology, is characterized by three features: (1) Each tonotype has its own defining acoustic and/or perceptual property/ies; (2) Any tonotype would contrast with another (or more) of the same contour type in at least one language; and (3) A tonotype inventory is a set of types which is adequate and least redundant in describing and defining all contour tones in all tonal languages.|608|tonotype, distinctive features, language model, typology, definition 1941|Zhu2015|This paper started with criticizing two old schemes of tonal representation and ends with endorsing a new one. The new one, the RLM, is adequate for describing and representing 1,400 tonal systems in the author’s audio database and can predict later tonal development through [pb] observance of synchronic variations. The data used in this paper are different from those in previous similar papers’ in four aspects: 1) acoustic data; 2) multi tokens/speakers; 3) typological comparability owing to the same normalization procedure; and 4) verifiability. Having excluded the LoT, which contains various low contours, including the lowest falling pitch, a universal falling inventory has been constructed which contains eleven Falling Tonotypes, seven in the M Register and four in the L Register. These tonotypes are expressed in the RLM and specified with five distinctive features, which incorporates four tonal parameters: register, length, height, and contour. It is likely, of course, that more falling types would be discovered in the future and the system might be adjusted; however, the present classification has paved the way for future typological and evolutionary studies.|631f|tonotype, cross-linguistic study, distinctive features, tone, complex tone languages 1942|Zhu2015|Our sparse knowledge of tone typology is a direct result of insufficient acoustic data and unsatisfactory schemes, the Five-Point Scale (FPS) and the H/L notation, used for tone description and representation.|606|abbreviations, five-point scale, tone, tone representation, 1943|Zhu2015|Redundancy in pitch contour. One tonotype, e.g., the High Falling [pb] /52/, corresponds to many FPS forms, such as [51, 52, 53, 41], etc. (cf. §§3.2, 6.1). In fact, the FPS has too much redundancy in pitch representation. In the case of falling tones, the FPS can define ten types of straight falling tones: [51, 52, 53, 41, 42, 31; 54, 43, 32, 21], thirteen delayed falling tones such as [551, 552, 331, 431...], seventeen depressed falling tones such as [451, 342, 231...], and ten prolonged falling tones such as [511, 512, 411...] - totaling 50. With such redundancy and ambiguities in representation, it is hard to achieve anything meaningful in typological studies.|633f|redundancy, tone representation, five-point scale, 1944|Cotter2015|Interesting popular article summarizing changes in Palestine Arabic due to contact (politically induced) with Hebrew and other factors. Describes contact-induced language change and spreading of feature (especially loss of features) and gives additional literature.|000|Arabic, Palestine Arabic, Hebrew, introduction, language change, contact-induced sound change, language contact 1945|Kaplan2015|Several recent papers (Silverman 2010; Kaplan 2011; Wedel et al. 2013a,b) have presented evidence in favor of the longstanding functional load hypothesis (Martinet 1952; @Hockett<1967> 1967), arguing that phoneme pairs that distinguish many words are unlikely to undergo merger, and that this phenomenon can be detected in sufficiently large datasets. In contrast with some other approaches (e.g., @King<1967> 1967), these recent studies assume that functional load operates as a statistical tendency: homophony avoidance is one factor among many that influences the course of sound change, and does not by itself predict whether a given pair of sounds will merge or not.|710|functional load, Chinese, statistics, tendencies 1946|Kaplan2015|:comment:`having given some concrete numerical examples in the two pages before, quoting data from` @Wedel2013 In this context, it is quite clear that the Chinese case is indeed an outlier. Sampson estimates that the merger of /k k h x/ with /ts ts h s/, for example, may have resulted in as many as ten thousand homophones. Even if we suppose that Sampson’s estimate is two or three times too large, we are still left with a merger that produced an order of magnitude more homophones than any of the actual mergers in Wedel et al. (2013b). We would predict that mergers of this type ought to be vanishingly rare.|712|Chinese, phoneme merger, functional load, statistics 1947|Kaplan2015|More plausible to me, though still far from certain, is the suggestion that one of the mergers Sampson describes was sufficient to trigger systematic compounding, and those compounds effectively eliminated many minimal pairs and allowed subsequent mergers to proceed. Given a scenario along these lines, the ‘trigger’ merger would remain unexplained, but we could understand why so many unusual mergers have occurred in a single language.|712|compounding, phoneme merger, Chinese, functional load 1948|Kaplan2015|But if we systematically dismiss apparent counterexamples to the functional load hypothesis, we run the risk of making the hypothesis unfalsifiable. Indeed, now that the basic case for functional load has been made, a clear and necessary direction for ongoing research is to find and study examples like Chinese, in order to refine our understanding of the generality of functional load and its interaction with other influences on language change.|713|functional load, Chinese, litmus test, hypothesis, 1949|Wedel2013|For nearly a century, linguists have suggested that diachronic merger is less likely between phonemes with a high functional load – that is, phonemes that distinguish many words in the language in question. However, limitations in data and computational power have made assessing this hypothesis difficult. Here we present the first larger-scale study of the functional load hypothesis, using data from sound changes in a diverse set of languages. Our results support the functional load hypothesis: phoneme pairs undergoing merger dis- tinguish significantly fewer minimal pairs in the lexicon than unmerged phoneme pairs. Furthermore, we show that higher phoneme probability is positively correlated with mer- ger, but that this effect is stronger for phonemes that distinguish no minimal pairs. Finally, within our dataset we find that minimal pair count and phoneme probability better predict merger than change in system entropy at the lexical or phoneme level.|000|functional load, quantitative analysis, phoneme merger, statistics 1950|Wedel2013|In this paper, we present the first such analysis of a dataset comprising a large number of phoneme mergers from a diverse set of languages. We show for the first time [pb] that simple measures of functional load within a system of phonemes do significantly predict patterns of phoneme merger, and that this effect is in the hypothesized direc- tion: the greater the contribution a pair of phonemes makes to word differentiation, the less likely those pho- nemes are to merge over the course of language change. Further, we show that in the case that a phoneme pair does not distinguish many words, phoneme probability is a sig- nificant predictor of merger.|179f|dataset, phoneme merger, functional load, 1951|Wedel2013|The languages represented in the dataset are English (Received Pronunciation and Standard American), Korean, French, German, Dutch, Slovak, Spanish, and Hong Kong Cantonese. Each language is represented by a phonemically-tran- scribed word list from a published corpus.|180|dataset, Korean, French, English, Spanish, Dutch, Slovak, German, Cantonese, corpus studies, functional load, phoneme merger 1952|Wedel2013|@Hockett<1967> (1967) and Surendran and Niyogi (2006) described a general framework for assessing functional load of phonemic contrasts in terms of the change in system entropy at any level of analysis upon loss of a phoneme con- trast. Based on information theoretic methods introduced by @Shannon<1948> (1948), this approach assumes that a language can be described as an infinite sequence of phonemes, and [pb] that a corpus is a sample of this sequence. Following Surendran and Niyogi (2006), we calculated the change in entropy of the corpus upon phoneme merger at both the phoneme and the word level. |180f|entropy, dataset, functional load, data preparation 1953|Koskenniemi2013|Regular correspondences between historically related languages can be modelled using finite- state transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a proto- language) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules, one may construct useful mappings between the languages. In this way, the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words, aligning them, recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching, the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.|000|sound correspondences, sequence alignment, finite state transducer, automatic approach 1954|Koskenniemi2013|Paper uses finite state transducers and manually adjusted alignments in Finnish and Estonian to generate a model for both languages. Basically, it is by no means spectacular, since it only tries to model things explicitly. It might be interesting, however, to see, how they do the implementation of the rules.|000|finite state transducer, Finnish, Estonian, sequence alignment, automatic approach 1955|Jacox2016|A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that imple- ments a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm gen- eralize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation.|000|gene tree reconciliation, gene duplication, lateral gene transfer, DTL model, maximum parsimony 1956|Doyon2010|Tree reconciliation methods aim at estimating the evolutionary events that cause discrepancy between gene trees and species trees. We provide a discrete computational model that considers duplications, transfers and losses of genes. The model yields a fast and exact algorithm to infer time consistent and most parsimonious reconciliations. Then we study the conditions under which parsimony is able to accurately infer such events. Overall, it performs well even under realistic rates, transfers being in general less accurately recovered than duplications. An implementation is freely available at http://www.atgc-montpellier.fr/MPR .|000|DTL model, gene tree reconciliation, dynamic programming, maximum parsimony, algorithms 1957|Jacox2016|Finally, the ecceTERA software is based on a unified dynamic programming algorithm (described in the Supplementary Material) that builds upon the model of the Mowgli algorithm (@Doyon<2010> et al., 2010).|2|DTL model, dynamic programming, algorithms 1958|Doyon2010|This article describes an algorithm for gene tree reconciliation which might be the method of choice when trying to start with it for real.|000|gene tree reconciliation, DTL model, dynamic programming, algorithms 1959|Edwardes2016|This article is probably nice for teaching, as it contains a test to determine what kind of linguist one is. |000|teaching material, psycho test, linguistics, 1960|Watt2016|Paper describes two interesting aspects of language use to threaten people: 1 the problem of determining when a sentence is a threat (intonation, context, etc.) 2 the problem of intonation being individually recognized as threatening, depending on accent, dialect coloring, etc. So this paper clearly mentions experiments regarding the impression that speakers have when hearing a language, which substantiates my personal impression that languages may sound mutually threatening or aggressive to speakers, depending on their main intonation (Russian vs. German, for example). |000|language perception, psycholinguistics, teaching material, accent, 1961|Armstrong2016|In a series of papers, Donald Davidson (Synthese 59(1):3–17, 1984, The philosophical grounds of rationality, 1986, Midwest Stud Philos 16:1–12, 1991) developed a powerful argument against the claim that linguistic conventions provide any explanatory purchase on an account of linguistic meaning and communication. This argument, as I shall develop it, turns on cases of what I call lexical innovation: cases in which a speaker uses a sentence containing a novel expression-meaning pair, but nevertheless successfully communicates her intended meaning to her audience. I will argue that cases of lexical innovation motivate a dynamic con- ception of linguistic conventions according to which background linguistic conventions may be rapidly expanded to incorporate new word meanings or shifted to revise the meanings of words already in circulation. I argue that this dynamic account of conventions both resolves the problem raised by cases of lexical inno- vation and that it does so in a way that is preferable to those who—like Davidson— deny important explanatory roles for linguistic conventions.|000|lexical innovation, language perception, semantics, denotation, reference potential 1962|Armstrong2016|Author shows some interesting examples with complete neologisms which can be understood from context. Not clear to which extent it is useful, but the ideas center around the basic problem of creative denotation using new forms, and this is clearly related to the problem of denotation in general, and to compounding in specific.|000|language perception, denotation, semantic change, compounding 1963|Arcodia2015|Chinese is often defined as a ‘textbook example’ of an isolating language, with comparatively few affixes that are usually etymologically transparent (@Sagart<2004> 2004). After ‘deconstructing’ the notion of the isolating morphological type, I shall discuss data from a number of Chinese dialects spread over the Shanxi, Shaanxi, Henan, Hebei, and Shandong provinces. I will show that there seem to be some areal clusters with productive morphological phenom- ena not expected to occur in isolating languages, which can be explained both by the cross-linguistically wide- spread tendency towards the reduction of certain items in speech production and, arguably, by processes of convergence among dialects.|000|isolating language, Chinese, Chinese dialects, typology 1964|Arcodia2015|Article gives examples for derivational or near-productive morphology in Chinese dialects. Examples are, however, spurios and not necessarily well reported. Also the analysis can be done from different view-points, so it is by no means clear why the author claims that he could "decomose" the notion of "isolating" language (especially given that he's talking about dialects, but treats "Chinese" as just one language).|000|Chinese, Chinese dialects, isolating language, typology 1965|Hill2015a|This paper outlines the pitfalls of the current anachronistic practice of transcribing early Chinese documents by iden- tifying each character with a kǎishū 楷書 equivalent. In its place, I suggest a way of transliterating characters directly, by rendering into roman letters the phonetic and semantic information encoded by a character.|000|Chinese, Chinese writing system, transliteration, 1966|Wang2015a|Pervasive sound correspondence requires reflexes in all languages compared. Relaxing this requirement would include more late borrowings into the comparison and could cause a misunderstanding of language relationships. From this perspective, this paper investigates the basis of sound correspondence in the reconstruction of Proto-Miao-Yao. The genetic relationship between Miao-Yao languages can be confirmed by the genetic indicator of more high-rank and less low-rank related morphemes, either through the requirement of pervasiveness or a relaxed requirement; though this relaxation results in some degree of distortion. A similar procedure has been applied to related morphemes between Chinese and Proto-Miao-Yao with similar results. A genetic relationship, rather than language contact between Chinese and Proto-Miao-Yao, has been suggested by rank analysis. To double-check this conclusion, the inexplicability principle was used. This principle refers to the inability to describe the representation of related morphemes in the recipient language in terms of the p ­ honological system of the donor language; these inexplicable elements are considered to be inherited from the ancestor language rather than acquired through borrowing.|000|sound correspondences, Proto-Miao-Yiao, pervasiveness principle, methodology 1967|Wang2015a|Wang Feng presents his idea on the problematic notion of "pervasiveness" of sound correspondences to show that Miao-Yao is closely related to Sino-Tibetan. This is problematic for several reasons, especially since the methodological tools, like the Rank analysis, etc., or the unexplicability principle are very ad-hoc.|000|Sino-Tibetan, Proto-Miao-Yiao, ranked concept list, methodology, pervasiveness principle, inexplicability principle, sound correspondences 1968|Zheng2015b|The homophony of the words wēi 微 and wéi 維 (wei wei tong yin 微維同音 , hereafter wwty) recorded in the Zhongyuan Yinyun 中原音韻 (Rhymes of the Central Plain) actually reflects the phenomenon of the labiodentaliza- tion of the palatal initial [j-] into the voiced fricative [v-]. This merger of phonological classes can be dated back to the Hexi dialect 河西 in the tenth century c.e. It is thus clear that the merger of the words wéi 惟 , wéi 維 and wēi 微 could have occurred earlier in the history of the Chinese language. The northern dialect of the Eastern Jin dynasty 東晉 (317–420) reflected in Chinese transcriptions of Sanskrit and the relevant record in the Jingdian shiwen 經典釋 文 (Explanatory Glosses on the Classics) may share superficial similarities with this merger, but their phonological characteristics are in fact different, an issue that needs to be discussed separately. This phonological change can be formalized into `*`j- > v-/__V [+back,-low]. In addition to the materials on the history of the Chinese language, the analysis of materials on Chinese dialects, minority languages in China as well as phonetic experiments further show that this merger constitutes a natural phonological change consisting of a typologically significant articula- tory basis.|000|homophony, sound change, Old Chinese, Chinese dialects, examples, 1969|Zheng2015b|Paper gives an exhaustive overview on evidence for the merger of wēi 微 and wéi 維 in their pronunciation in Old Chinese, as recorded in Zhōngyuán Yīnyùn. The article dates the change, which goes from /j/ to /v/ and tries to explain it from different view points.|000|sound change, examples, Old Chinese, Chinese dialects, 1970|Wang2016|In Chinese some personal pronouns and demonstrative pronouns share the same origins because of the conceptual cognitive mappings existing bewteen the personal domain and demonstrative pronound omain. The traces of such origins can also be found in the Chinese dialects which can be divided into three main categories. Tehre exists a systematic correspondence between first person concept and proximal deixs concept, and between second person concept and distal deixis concept in Mandarin and southeast dialects. Therefore, the second personal pronoun *er* (尓) evolves into *na* (那), and the vacancy of the third personal pronoun can only be filled in with other pronouns which have nothing to do with proximal or distal demonstrative ronouns. Because of the mental distance of the referent of the third person concept and the historically later appearance of K series of distal demonstrative pronouns in the southeast dialects, the pronouns of the "otehr demonstrative" type *ta* (他) and *qi* (其) of specific reference similar in sound to the N series and the K series evolve into the third personal pronouns in the two categories of dialects. The concept of the third personal pronoun in northwest dialects corresponds to that of proximal deixis and distal deixis, and because its referent is mentally far, the distal demonstrative pronoun *wu/na* (兀/那) is used for the third personal pronoun as well. Such mechanism comes from Altaic languages. Probably similar is the evolution mechanism of the proximal demonstrative pronoun *yi* (伊) for the third personal pronoun.|000|Chinese dialects, personal pronoun, deixis, pronoun, semantic change, 1971|Wang2016|This article describes the explicit evolution of personal pronouns across Chinese dialects, which is essential and interesting, as it might give us further hints for subgrouping.|000|personal pronoun, Chinese dialects, semantic change, deixis, subgrouping, genetic classification 1972|Ogura2016|In this comment, I consider why homophones occur even though humans try to manifest one-to-one correspondence between form and meaning based on the CELEX lexical database of English, version 2.5 (1995) and the evolution of diatones in English, and its implication to Chinese [as mentioned in @Sampson2015 ]. |000|functional load, homophony, phoneme merger, English, 1973|Ogura2016|Zipf (@1949) suggests the simultaneous minimization of the two opposing forces from listener and speaker for form and meaning associations. One form for all meanings is an ideal code for the speaker, while one-to-one correspondence between form and meaning is an ideal code for the listener.|705|homophony, ambiguity-avoidance, functional load, English, phoneme merger, speaker-listener-model 1974|Ogura2016|Diatones are noun-verb pairs where the stress falls on the first syllable for the noun but the second syllable for the verb, e.g. address, permit, subject, contract, etc.|706|English, diatones, definition 1975|Ogura2016|We assume that when lots of homophones emerged in the latter part of the 16th century, first diatones emerged to avoid creating homophones.|706|homophony, ambiguity-avoidance, diatones, English 1976|Zeng2015|In this paper the present-day pronunciations of the historically voiced obstruents in Xiangxiang Chinese were examined from an acoustic-phonetic perspective. Results strongly supported an ongoing process of devoicing for the voiced stops, fricatives and affricates, which is conditioned by a couple of factors including sex of speaker, place of articulation, manner of articulation and historically tonal type. Specifically, historically voiced obstruents in syllables corresponding to the historical tone category Ru have all become voiceless unaspirated or voiceless aspirated, meaning the process of devoicing has already completed; in contrast, for those that occur in syllables corresponding to the other three tonal categories, i.e., Ping, Shang, and Qu, i) the historical devoicing is still in progress, as evidenced by a considerable amount of intra- and inter-speaker free variation in the pattern of voicing among “fully voiced”, “partially devoiced” and “fully devoiced” obstruents; ii) the degree of devoicing is the highest in voiced fricatives, followed by the voiced affricates, and is the lowest in voiced stops; and iii) the devoicing also tends to be more extensive for obstruents with a more posterior place of articulation, and for male than female speakers. Candidate explanations for these patterns of devoicing were also presented.|000|voicing, Chinese dialects, devoicing, sound change, Xiāng 1977|Zeng2015|Interesting article on experimental phonetics, where devoicing patterns in synchrony in a Xiāng dialect are investigated. Apparently, devoicing patterns differ regarding sex but also regarding place of articulation. Furthermore, devoicing was already completed in the rù-tone category of Middle Chinese. Place of articulation apparently follows the oral cavity, with least amount of devoiced stops in [b] via [d] and [g], with similar patterns for fricatives.|000|speech acoustics, experimental phonetics, devoicing, Xiāng, Chinese dialects 1978|Zeng2015|However, the data found in Xiangxiang dialect showed that, although both of the above factors had an effect on the occurrence of voicing/devoicing, only place of articulation exhibited the predicted pattern. On one hand, the occurrence of devoicing for the voiced stops increased as the point of occlusion moved further back in the oral cavity: 32% for [b], [pb] 41% for [d] and 55% for [g]. The voiced fricatives, which distinguish two major places of articulation (velar [:sampa:`G`] vs. alveolar/post-alveolar [z zj]), exhibited the predicted pattern as well, [:sampa:`G`] having a higher percentage number of devoiced tokens than the three anterior consonants. On the other hand, there was a larger proportion of devoiced stops in male speakers in comparison to female speakers: 55% of the voiced stops, 97% of the voiced fricatives and 69% of the voiced affricates were devoiced in male speakers, in comparison with 24% of devoiced stops, 84% of devoiced fricatives and 59% of devoiced affricates in female speakers.|654f|devoicing, experimental phonetics, Chinese dialects, Xiāng 1979|Zeng2015|It was also found that historical tonal category was closely related with devoicing in the Xiangxiang dialect. The historically voiced obstruents have all become voiceless unaspirated or voiceless aspirated in syllables corresponding to the historical tone category Ru (entering), while devoicing is still in progress for those in syllables corresponding to the other three historical tonal categories, i.e., Ping (level), Shang (rising), and, Qu (departing), as evidenced by the observation made in the present study that there is a considerable amount of intra- and inter-speaker free [pb] variation in the pattern of voicing among “fully voiced”, “partially devoiced” and “fully devoiced” obstruents. In fact, many other Chinese dialects, the Old Xiang dialects in particular (Chen 陳 2008), also exhibited a similar pattern that devoicing occurred first in historical Ru syllables. To address this, another historical process with regard to the development of the historical Ru tone should also be taken into consideration -- it is widely attested across Chinese dialects that the historical Ru tone has ceased to be an independent tonal category but has been redistributed among other tones.|660f|sound change, rù-tone, Chinese dialects, Xiāng 1980|Zeng2015|the development of the historical Ru tone, which is characterized by its (C)VC syllable structure in which the syllable-final “C” can be [p], [t] or [k], consisted of at least two steps. At the first stage, the syllable-final [p t k] was weakened and became a glottal stop [ :sampa:`?` ], as evidenced by many dialects in current Wu dialects and Jianghuai dialects [...] where historical Ru is still an independent tonal category and preserves its (C)VC structure, but the final stops have all become [ :sampa:`?` ]. Then this glottal stop dropped and the preceding vowel was lengthened, and it is at this point that Ru began to merge into other tones. The observation in the Xiangxiang dialect that devoicing occurred first in historically Ru-tone syllables indicated that devoicing may occur prior to the redistribution of this historical tonal category, or, at least, when its independent status was still retained to some extent. If this is not the case, or in other words, if the historical Ru tone already merged into other tones before devoicing took place, the attested sensitivity of devoicing to this tonal category cannot be explained.|661|rù-tone, devoicing, Middle Chinese, sound change, Xiāng 1981|Zeng2015|Another well-known consonant type that has an effect on vowel pitch is voiced obstruents (voiced stops being discussed the most), that is, prevocalic voiced stops would lower the F 0 of the following vowel. These two consonant types were inherently incompatible and exerted contradictory effects on the pitch of the intervening vowel, therefore making it difficult for speakers to produce a historically Ru-tone syllable which begins with a voiced obstruent and is checked by a glottal stop (let’s assume that devoicing had not taken place by the time the three stop finals merged into [ :sampa:`?` ]). Moreover, the short vowel duration for the Ru-tone syllables would make these two effects more overlapping and therefore more incompatible with each other, which would further enhance the already-existing pronunciation difficulties.|662|Chinese dialects, devoicing, rù-tone, Xiāng 1982|Nesetril2012|This text is aimed at doctoral students and researchers, who are interested in Combinatorics and Graph Theory or who would just like to learn about some active topics and trends. But the book may be also interesting to researchers in mathematics, physics, chemistry, computer science, etc. who would seek for an introduction to the tools available for analysis of the properties of discrete structures, and sparse structures particularly. The dichotomy between sparse and dense objects is one of the main paradigm of the whole mathematics which transcends boundaries of particular disciplines. This is also reflected by our book. The book is organized in three parts, called Presentation, Theory, and Applications. The first part, Presentation, gives a general overview of the covered mate- rial and of its relationships with other domains of contemporary mathematics and computer science. In particular, Chap. 2 is devoted to the exposition of some typical examples illustrating the scope of this book. The second part, Theory, is the largest part of the book and it is divided into eleven chapters. Chapter 3 introduces all the relevant notions and results which will be used in the book: basic notions and standard terminology, as well as more involved concepts and constructions (such as homomorphisms, minors, expanders, Ramsey theory, logic, or complexity classes), or more specific considerations on graph parameters, structures, and homomorphism counting. Chapter 4 introduces the specific notions used to study the density properties, shallow minors, shallow topological minors, or shallow im- mersions of individual graphs, as well as the related fundamental stability results. These results are applied in Chap. 5, and this leads to the nowhere dense/somewhere dense classification and to the notion of classes with bounded expansion (which are sparser than general nowhere dense classes). This classification is very robust and it can be characterized by virtually all main combinatorial invariants. Several first characterizations are included in Chap. 5, and more characterizations are given in Chaps. 7, 8, 12, and 11. Chapter 5 ends with a discussion about the connection to model theory and the various approaches to handle general relational structures. Although the study of dense graphs frequently relies on the properties of dense homoge- neous core structures (like complete graphs or even random graphs), it will be shown that sparse graph properties are intimately related to the proper- ties of trees, and particularly to the ones of bounded height trees. Funda- mental results on bounded height trees and, more generally, on graphs with bounded tree-depth are proved in Chap. 6. They open the way to the main decomposition theorem, which is the subject of Chap. 7. The decomposition scheme introduced there, which we call low tree-depth coloring, is a deep generalization of the concept of proper coloring. The low tree-depth colorings also lead to an alternative characterization of the nowhere dense/somewhere dense dichotomy. Yet another characterization of this dichotomy is proved in Chap. 8, that relies on the notion of independence through the notion of quasi-wideness (which has been introduced in the context of mathemati- cal logic). Chapters 9 and 11 deal with homomorphism dualities. Bounded expansion classes are proved to have the richest spectrum of finite dualities and, in the oriented case, they are actually characterized by this property. Meanwhile, Chap. 10 establishes a connection to model theory and deals par- ticularly with relativizations of the homomorphism preservation theorem of first-order logic. A last characterization of the somewhere dense/nowhere dense dichotomy is proved in Chap. 12 by considering the asymptotic loga- rithmic density of a fixed pattern in the shallow minors of the graph of a class. In a sense, one can view this last result as a characterization of the di- chotomy in probabilistic terms. The Theory part ends with Chap. 13 where the results of the previous chapters are gathered and put to service in the study of the characteristics of nowhere dense classes, of classes with bounded expansion, and of classes with bounded tree-depth (which are derived from trees with bounded height). It is pleasing to see how these characterizations are nicely related. The third part, Applications, concerns both theoretical and algorith- mic applications of the concepts and results introduced in the second part. This part opens with Chap. 14 which gives several examples of classes with bounded expansion, such as classical classes defined in the context of ge- ometric graphs and graph drawing, as well as classes admitting bounded non-repetitive colorings. It is also the occasion for a connection with the Erdős-Rényi model of random graphs. Some applications are considered in Chap. 15, such as the existence of linear matching (and more generally unions of long disjoint paths), connection with the Burr-Erdős conjecture, with game coloring, and with spectral graph theory. In Chap. 16, the use of a density driven criterion for the existence of sublinear vertex separators links our study to the sparse model of property testing, via the concept of hyperfiniteness. We provide in Chap. 17 core algorithms related to our study. In particular, we detail a fast iterative algorithm to compute a low tree-depth decompo- sition, the number of colors being controlled by a polynomial dependence on the densities of the shallow minors of the graph. The fact that this algo- rithm is nearly linear for sparse classes is one of the main advantages of our approaches. In Chap. 18 we consider algorithmic applications, which mainly derive from the fast low tree-depth coloring algorithm. These cover various well-known algorithmic problems, such as subgraph isomorphism, decidabil- ity of first-order properties, as well as their counting versions. The title of the last chapter—Further Directions—is self-explanatory. |000|sparse data, sparsity, graph theory, algorithms, introduction 1983|Nesetril2012|Book introducese sparsity in graphs, structures and algorithms. Chapter 4 gives basic definitions to measure sparsity in data. The book is very specific and potentially only marginally useful, but it should be kept in mind that it exists, since it might offer some interesting insights when considering future problems.|000|sparse data, sparsity, graph theory, introduction, algorithms 1984|Thomason2007|Historical linguists have always known that some linguistic changes result from deliberate, conscious actions by speakers. 1 But the general assumption has been that such changes are relatively trivial, confined mainly to the invention or borrowing of new words, changes in lexical semantics, and the adoption of a few structural features from a prestige dialect. The goal of this paper is to show that adult speakers can and do make deliberate choices that bring about nontrivial lexical and structural linguistic change; I also discuss some of the implications of the most extreme examples for historical linguistic methodology. Although such processes are not confined to contact-induced change, the focus here is on processes motivated at least in part by language contact. The particular categories that I’ll focus on are correspondence rules applied in lexical borrowing, deliberate non-change, and deliberate structural change. Perhaps the most important lesson to be learned from examples of deliberate change and deliberate non-change is that efforts to develop deterministic predictive theories of contact-induced change – either by setting theoretical limits on its extent or by predicting specific outcomes under specific linguistic and/or social conditions – are doomed; but I won’t explore this point in detail in the present paper (see Thomason 2000 for discussion of the general issue).|000|language change, language contact, deliberate change, 1985|Thomason2007|The Comparative Method is by far the most powerful tool in the historical linguist’s toolbox. It dates from the 1870s in essentially its modern form, though of course we know much more about the mechanics and results of language change now than we did a hundred years ago. Because it enables us to reconstruct sizable portions of proto- language lexicon, phonology, and morphology, and to a lesser extent (morpho)syntax as well, the Comparative Method greatly expands our ability to examine language changes over considerable time depths. If our knowledge of language change were confined to actually attested past changes and the very recent studies of ongoing change, our understanding of processes and probabilities of change would be based on evidence from a tiny handful of languages in a tiny handful of language families – primarily, at present, languages that have been documented for several hundred years or more. And our understanding would be extremely impoverished in comparison with what we actually do know, thanks primarily to the Comparative Method. Most significantly, then, the Comparative Method provides us with a window on prehistory. Historical linguistics is not unique among the historical sciences in being able to recover specific information that is not directly attested, from a period before any direct attestations are available; but it is certainly one of the most successful historical sciences in this regard. Certain other historical sciences, for instance paleoanthropology, have borrowed and adapted our Comparative Method in efforts to achieve comparable results. It is true, of course, that the time depths reached by the Comparative Method – perhaps 10,000 years, possibly a bit more – are very shallow compared to the relevant time depths for historical sciences such as evolutionary biology. Still, we are envied by many other historical scientists for our ability to trace linguistic features back into prehistory. The body of information about language families and their proto-languages, including the paths and results of a huge number of phonological and morphological changes, enables us to make fairly confident statements about common vs. uncommon changes, if not about possible vs. impossible changes.|42|time depth, comparative method, definition 1986|Thomason2007|If asked what kinds of linguistic changes speakers are most likely to make deliberately, most linguists would think first of lexical innovations. Every generation of teenagers has its own slang vocabulary and every specialized field has its own technical lexicon, to take the most obvious examples. So, for instance, a few generations ago the word crazy took on a slang meaning ‘terrific, wonderful’ – a lexical semantic innovation, added to its earlier meaning ‘insane’ – and around 1960, college students in California replaced crazy in its slang meaning with napa, derived from the location of a state mental institution. Still later, another generation replaced crazy (and, in California, napa) with cool.|44|deliberate change, language change, lexical change 1987|Thomason2007|Finally, although the Comparative Method has worked very well for the great majority of languages around the world to which it has been applied, the method fails, and can in fact be predicted to fail, in cases where speakers’ choices do have a drastic effect on a language’s lexicon and grammar. Unfortunately for historical linguists’ hopes of elucidating language history, such cases can be impossible to unravel retrospectively – except, of course, that we can identify a given language as having had an aberrant history, because the Comparative Method itself alerts us to that fact. Fortunately for our hopes, however, the number of areas in the world where speakers have chosen to make such drastic changes is small, and even within those areas the situation is usually not completely hopeless.|62|comparative method, deliberate change, limitations 1988|Mufwene2007|Publications such as Thomason & Kaufman (1988) and Thomason (2001, 2002) have perpetuated, without fundamental support, the distinction between internally- and externally-motivated language changes. They suggest that the two kinds of changes are different in nature and that those induced by language contact have not contributed to language speciation in the same way as those owing (principally) to language-internal mechanisms. Putatively, genetic classifications should be based only on correspondences suggested by internally-motivated change. Thus, they conclude that creoles cannot be classed genetically, because, in their case, contact was so extensive that the comparative method cannot be applied to them in informative ways. This disjoint view of language diversification, which has treated creoles as children out of wedlock (Mufwene 1997b, 2001), is so deeply entrenched in linguistics that it is repeated both in general introductions to linguistics and in historical linguistics textbooks. When the latter kinds of books cover the development of creoles at all as part of their subject matters, they also claim that these vernaculars are genetically “exceptional,” if not unnatural, because they have not emerged in the “usual” and/or “natural” way. See, e.g. Hock & Joseph (1996, critiqued in Mufwene 1998, 2001).|000|creolisation, creole languages, comparative method, methodology 1989|Mufwene2007|Adducing arguments from external history, I argue in this article that the above problem stems from a misinterpretation of the genetic Stammbaum (or cladogram) as an account of language diversification rather than as a representation of the outcome of the speciation process. The problem also comes from a basic assumption in genetic linguistics, since the design of the Stammbaum by August Schleicher in the 19 th century, that normal language diversification proceeds monoparentally (a process identified by biologists as “asexual transmission”).|64|family tree, problem, critics, diversification, speciation 1990|Mufwene2007|I submit that the whole distinction between internally versus externally-motivated change must have to do with another legacy from the 19 th century: the ideology of language purity, which is itself related to that of race purity. According to this, hybrids, products of race or language mixing, are less normal, if not simply, abnormal phenomena (Mufwene, to appear). It is therefore not surprising that creoles and pidgins, as putatively extreme cases of externally-motivated change, have typically been suggested to be unnatural developments, even by creolists (e.g. McWhorter 2000, 2001, 2005).|64|purification, linguistic purism, racism, family tree, 1991|Mufwene2007|If creoles have really developed in their own unusual or abnormal way, their different structures suggest that the ecologies of their emergence are far from being identical from one to the other. On the other hand, if we assume a uniformitarian position and acknowledge that they have emerged by the same restructuring processes that have often resulted in language diversification, then it is normal that differing ecologies will produce new mutually divergent language varieties, even if exactly the same languages are involved in the contacts. Local dynamics of competition and selection will favor different variants even from what may appear to be more or less the same feature pools. Thus, [pb] creoles are not genetically unusual, nor abnormal, nor less natural. Instead, they are a precious opportunity for linguists to realize the extent to which language contact, subsuming also dialect contact but really based on idiolect contact, has been a critical catalyst in language change and speciation.|85f|creolisation, creole languages, family tree, methodology, 1992|Mufwene2007|To the extent that contact, situated at the idiolectal level, is acknowledged as a critical ecological factor in the actuation of change, the distinction between externally and internally-motivated change becomes simply sociological, and the distinction between changes induced by contact and those independent of contact becomes misguided. Moreover, like evolutionary biology, genetic linguistics (which could also be called evolutionary linguistics) has everything to gain from being interested in issues of language vitality, which can then deal with the demise of languages by structural erosion or by language shift and with the emergence of new varieties. In all these cases of language evolution, the action of competition and selection among competing variants and/or systems is evident.|86|internal and external sound change, sound change, language change, explanation of language change, contact-induced language change 1993|Sullivan2008|This paper offers a model to explain the general observation that lexical items are more often borrowed from a higher status language into a lower status one, than visa versa. Material from Lahore, Pakistan, shows that in casual speech among plurilinguals codeswitching is the norm. In formal contexts, in which there is attention to proper language, educated speakers filter out features which are not part of the standard language. Constraints on language and education in the hierarchical social structure withhold from most speakers of the lower status languages the knowledge necessary to evaluate their own speech in this way, thus allowing features of other languages to become established in their language.|000|language contact, code-switching, prestige language, explanation of language change 1994|Sullivan2008|Paper may offer an interesting introduction into borrowing behavior in lexical borrowing in contact situations in so far as it emphasizes the role that prestive languages play.|000|prestige language, lexical borrowing, mechanism of language change, borrowability 1995|Mithun2007|orth America is home to considerable linguistic diversity, with 55-60 distinct language families and isolates in the traditional sense, that is, the largest genetic groups that can be established by the comparative method. Of these, 22 are represented in California. Some California languages are members of large families that extend over wide areas of the continent, such as Uto-Aztecan, Algic (Athabaskan-Wiyot-Yurok), and Athabaskan-Eyak-Tlingit. Others belong to medium or small families spoken mainly or wholly within California, such as Utian (12 languages), Yuman (11), Pomoan (7), Chumashan (6), Shastan (4), Maidun (4), Wintun (4), Yokutsan (3), Palaihnihan (2), and Yuki-Wappo (2). Some are isolates: Karuk, Chimariko, Yana, Washo, Esselen, and Salinan. Most of the currently accepted genetic classification was established over a century ago (Powell 1891).|000|North America, North American languages, linguistic diversity, language contact, California, introduction 1996|Mithun2007|Article introduces basically historical linguistics and contact situation of California languages in North America. It introduces basic hypotheses regarding higher-order relationships and what is known about the contact situation.|000|North American languages, California, introduction, language history, linguistic area 1997|Bei2015|The features of tones contact have been investigated between Xiang Dialect and Gan Dialect, by using the method of tone pattern in this paper. It is showed that the “transition” feature of the mixed dialect reflects in similar features of tones between mixed dialect and ancestor dialect or between mixed tones and target dialectˈ which includes as follows: the balance of the tone number, the mixing of tone category, the similarity of tone value, and the correlation of tone contour.|000|Xiāng, Gàn, Chinese dialects, tone change, language contact, 1998|Bei2015|Article may in general be really interesting since it investigates concrete patterns of tone correspondences and potential lexical borrowing and the perception of tones involved therein. Thus, the article tries to directly pursue how tone is integrated in borrowing situations in Chinese dialects.|000|Chinese dialects, Xiāng, Gàn, tone change, tone, lexical borrowing, language contact 1999|Uemura1999|In this paper, we are concerned with identifying a subclass of tree adjoining grammars (TAGs) that is suitable for the application to modeling and predicting RNA secondary structures. The goal of this paper is twofold: For the purpose of applying to the RNA secondary structure prediction problem, we first introduce a special subclass of TAGs and develop a fast parsing algorithm for the subclass, together with some of its language theoretic characterizations. Then, based on the algorithm, we develop a prediction system and demonstrate the effectiveness of the system by presenting some experimental results obtained from biological data, where free energy evaluation selection for parse trees is incorporated into the algorithm.|000|tree adjoining grammar, RNA folding, RNA secondary structure prediction 2000|Uemura1999|To our knowledge, it was in 197 1 that the basic ideas of TAGS were first suggested in the paper by Joshi and Takahashi [9], and it was not until 1975 that the current formalization for TAGS was established and officially appeared in the work by Joshi et al. [8] |277|tree adjoining grammar, background, history 2001|Dobbs2016|Lataddi Narua is the first reported variety of a Naish language having only two tonal levels; all other Naish languages described to date have three. The language uses a system in which level tones are primary, and tones are associated to lexical items rather than to individual syllables. Four tone categories are identified for nouns and adjectives, and six for verbs. Processes of tonal interaction across word boundaries are discussed, from tonal morphology to observations about syntactic and semantic factors influencing the placement of tone group boundar- ies within a sentence.|000|Naish, Narua, Sino-Tibetan, Tibeto-Burman, Lataddi Narua, tonal morphology, tone language, 2002|Jacques2011b|Naxi, Na and Laze are three languages whose position within Sino-Tibetan is controversial. We propose that they are descended from a common ancestor (‘Proto-Naish’). Unlike conservative languages of the family, such as Rgyalrong and Tibetan, which have consonant clusters and final consonants, Naxi, Na and Laze share a simple syllabic structure (consonant+glide+vowel+tone) due to phonological erosion. This raises the issue of how the regular phonological correspondences between these three languages should be interpreted, and what phonological structure should be reconstructed for Proto-Naish. The regularities revealed by comparing the three languages are interpreted in light of potential cognates in conservative languages. This brings out numerous cases of phonetic conditioning of vowels by place of articulation of a preceding consonant or consonant cluster. Overall, these findings warrant a relatively optimistic conclu- sion concerning the feasibility of unraveling the phonological history of highly eroded language subgroups within Sino-Tibetan.|000|Naxi, Narua, Laze, Sino-Tibetan, Tibeto-Burman, linguistic reconstruction, Naish, 2003|Jacques2011b|Naxi, Na and Laze all have a simple syllabic structure: (C)(G)V+T, where C is a consonant, G an on-glide, V a vowel, and T a tone. Brackets indicate optional con- stituents. There are neither initial clusters nor final consonants in any of the Naish languages and dialects studied so far, and given the wide coverage of the surveys conducted since the early years of the People’s Republic of China, it is a safe guess that none will come to light as more varieties come under academic scrutiny.|470|Naish, Narua, Naxi, Laze, syllable structure, description 2004|Jacques2011b|The Naish languages, with their absence of segmental inflection and their limited syllable inventory, constitute a typological extreme and offer an exceptional challenge to the application of the comparative method, due to the high opacity of [pb] the phonological changes that have taken place in this branch. The present study constitutes a first step towards unraveling the phonological history of the Naish languages; it exemplifies the well-established fact that conservative languages pro- vide useful indications for interpreting present-day correspondences among the short forms of phonologically eroded languages.|492f|historical linguistics, linguistic reconstruction, methodology, comparative method, phonological erosion 2005|Jacques2011b|Finally, while language classification is not the main focus of this research, the insights gained into the historical phonology of Naxi, Na and Laze put to rest any doubt that they belong within a single subgroup (clade) of Sino-Tibetan.|493|Sino-Tibetan, Naish, Narua, Naxi, Laze, subgrouping, 2006|Dobbs2016|:comment:`Showing family tree of Sino-Tibetan in parts with position of Naish branch following apparently` @Jacques2011b :comment:`but in fact the tree is not shown in that publication.` .. image:: static/img/dobbs-2016-image-1.png :name: dobbs-2016 :width: 500px [Sino-Tibetan family tree (Qiangic subbranch)]|68|family tree, Sino-Tibetan, subgrouping, Naish 2007|Yu2016|Through a case study on the Zijun Samatao community of Kunming City in Yunnan Province of China, this paper discusses the impact of urbanization and mass Han Chinese migration on the endangerment of an ethnic minority language and culture. Samatao of Kunming is a subgroup of the minority Yi na- tionality of China. Historically, Kunming was the original homeland of many Yi communities including the Samatao. With the fast pace of urbanization and in- creasing Han Chinese migration, the Samatao community is losing their language and has lost their traditional culture and religion. Urbanization has brought mass migration into Zijun village, accelerating the endangerment of the Samatao language. Of nearly three thousand registered Samatao people, fewer than 100 (including semi-speakers) have any level of active knowledge of their language and culture. Since 1994, the endangerment of Samatao language has dramatically increased, especially since 2003. Based on data from fieldwork in the community over more than twenty years, this paper explores how the ideals of the Samatao community about preserving their language and culture are challenged by the contextual reality of urbanization and mass Han Chinese migration.|000|urbanization, Han Chinese, migration, Kùnmíng, Samatao, 2008|Yu2016|The study will look at 1) the connections between the Samatao and the other Yi of southwestern China; 2) a brief history and current social status of the Samatao in Kunming; 3) languages surrounding the Samatao and forces changing the situation of Samatao language and culture; 4) the rise of an activist elite among the Samatao, promoting activities to preserve their language and culture since the 1990s; and finally 5) the need for local government involvement in making relevant policy and providing financial support for the preservation of Samatao language in Zijun village.|162|Samatao, Loloish, Han Chinese, urbanization, language endangerment 2009|Yu2016|The Samatao people live in Zijun village (also known as Da’er ‘big ear’ village; indeed one physical characteristic of many Samatao people is large ears) about 12 kilometers southeast of the centre of Kunming. Official Chinese classification groups Samatao into the Yi nationality.|164|description, Samatao, location, 2010|Yu2016|The fact that their language has been classified as ‘just’ a sub-dialect of Yi means that there is no support from the government to maintain the language. Furthermore, since the late 1990s the Samatao are overwhelmingly outnumbered [pb] by Han Chinese migrants living or doing business in the village. The existence of the Samatao in Zijun is hardly in evidence culturally or linguistically, though it is called Zijun Samatao village.|167f|language endangerment, Samatao, Loloish, Sino-Tibetan 2011|Yu2016|This report shows how currently languages are endangered, in this case for a Loloish (Yi) language in the Kùnmíng area. It is potentially interesting to study how language endangerment proceeds, but does not provide any details about the language, since apparently, not much research has been done on that so far.|000|Samatao, Sino-Tibetan, language endangerment, Kùnmíng, urbanization 2012|Yiu2015|Talmy( 1985, 2000b ) classifies languages into verb-framed and satellite-framed languages based on whether path is expressed in the main verb or in the satellite. The former denotes path in the main verb while the latter specifies path in the satellite. This study shows that modern Cantonese and Mandarin tend to express path in the satellite, and specifically in the directional complement. Scholars suggest that Chinese has undergone a typological shift from a verb-framed language to a satellite- framed language and the shift was completed at around the tenth cen- tury. However, this study finds that the change has not been completed and it is faster in some dialects and in some types of motion events than the others. Moreover, by studying materials compiled in the nineteenth and early twentieth centuries in Cantonese and Mandarin, this study observes that early Cantonese and Mandarin tend to use directional verbs to denote path when their modern counterparts opt for the use of directional complements. This study further proposes that disyllabicization is an important factor underlying the typological shift. Last, this study undertakes a quantitative approach to examine the expression of path in materials of different nature in modern Cantonese and Mandarin. The statistical result illustrates that written materials reflect as much the linguis- tic reality as spoken language.|000|Mandarin, Cantonese, Yuè, Chinese dialects, typology, verb-framed, satellite-framed, disyllabification 2013|Yiu2015|Contains some study on the question of satellite-framed and verb-framed languages regarding the typology of motion events. It compares Cantonese and Mandarin. Basically, the main focus are sentences like: * 瓶子漂出了洞穴 * The bottle floated *out* of the cave. And here, languages may use either a full verb or a form with a complement, like in Chinese (with German potentially occupying an intermediate position).|000|typology of motion events, satellite-framed, verb-framed, Mandarin, Cantonese, Chinese dialects, typology 2014|Sampson2015b|More than one commentator takes up my point that there would be no paradox, if the move from monomorphemic to bimorphemic vocabulary had preceded the loss of phonological contrasts. I gave reasons to believe that it could hardly have happened that way, but Wang Feng [@Wang2015b] offers some fascinating statistical material which is quite new to me. (I am ashamed to say that I know little of the research literature being published within China.) According to Wang Feng, the sound-changes which have made Mandarin phonological structure so much simpler than that of Old Chinese were concentrated within two periods of just a few centuries each; and his Figure 1 shows that the propensity of Chinese to use compound words, though it has risen more or less continuously since the earliest written documents, rose particularly fast also over two short periods. These data, Wang Feng says, make it likely “that an increase of multisyllabic words in Chinese preceded phonological simplification”.|740|disyllabification, Old Chinese, phoneme merger, phonological erosion, sound change 2015|Sampson2015b|:comment:`[reply to` @Behr2015 :comment:`]` I do not dispute that there may have been some early compound words which fell outside these categories. However (to repeat) the issue is not that Chinese came to use many compounds, but that even concepts previously expressed by single morphemes came in many or most cases to be expressed by compounds, often compounds of synonyms.|741|Old Chinese, disyllabification, compounding, phoneme merger 2016|Sampson2015b|The type of synonym compound which is so characteristic of Mandarin Chinese really is vanishingly rare in English and other European languages familiar to me, and it is difficult to disagree with the usual assumption that these compounds arose in Chinese as a reaction to excessive homophony among individual morphemes.|743|synonym compound, compounding, Chinese, language history, 2017|Wang2015b|This is a reply to @Sampson2015, and basically, the author argues that phoneme merger was preceded by disyllabification in the history of the Chinese language.|000|phoneme merger, language history, Chinese, disyllabification, 2018|Martinet1964|toutes choses égales d’ailleurs, une opposition phonologique qui sert à maintenir distincts des centaines de mots parmi les plus fréquents et les plus utiles n’opposera-t-elle pas une résistance plus efficace à l’élimination que celle qui ne rend de service que dans un très petit nombre de cas? :translation:`other things being equal, will a phonological opposition which serves to keep apart hundreds of the commonest and most useful words not resist elimination more effectively than an opposition which serves that purpose only in a very few cases?` [:comment:`Translated and quoted after` @Sampson2015b ]|54|functional load, definition, sound change 2019|Sampson2015b|If the phoneme mergers had not created real ambiguity problems, then someone well versed in Classical Chinese should be able to listen to an unfamiliar passage of it read aloud and understand what he is hearing, without sight of the written text. 10 I do not believe anyone can do that.|750|Old Chinese, phoneme merger, ambiguity 2020|McDermott2016|Music is present in every culture, but the degree to which it is shaped by biology remains debated. One widely discussed phenomenon is that some combinations of notes are perceived by Westerners as pleasant, or consonant, whereas others are perceived as unpleasant, or dissonant1. The contrast between consonance and dissonance is central to Western music, 3, and its origins have fascinated scholars since the ancient Greeks. Aesthetic responses to consonance are commonly assumed by scientists to have biological roots and thus to be universally present in humans. Ethnomusicologists and composers, in contrast, have argued that consonance is a creation of Western musical culture6. The issue has remained unresolved, partly because little is known about the extent of cross-cultural variation in consonance preferences. Here we report experiments with the Tsimane’—a native Amazonian society with minimal exposure to Western culture—and comparison populations in Bolivia and the United States that varied in exposure to Western music. Participants rated the pleasantness of sounds. Despite exhibiting Western-like discrimination abilities and Western-like aesthetic responses to familiar sounds and acoustic roughness, the Tsimane’ rated consonant and dissonant chords and vocal harmonies as equally pleasant. By contrast, Bolivian city- and town-dwellers exhibited significant preferences for consonance, albeit to a lesser degree than US residents. The results indicate that consonance preferences can be absent in cultures sufficiently isolated from Western music, and are thus unlikely to reflect innate biases or exposure to harmonic natural sounds. The observed variation in preferences is presumably determined by exposure to musical harmony, suggesting that culture has a dominant role in shaping aesthetic responses to music.|000|music, cultural evolution, cultural variation, perception 2021|McDermott2016|The roles of culture and biology in consonance and dissonance have remained unresolved in part because cross-cultural experimental data are rare. Studies of culturally isolated populations are increasingly dif- ficult to conduct due to the diffusion of Western culture around the world, but our results underscore their importance for revealing the diversity of human musical behaviour.|3'|cultural evolution, music, perception, 2022|Mugdan2014|I reported that the French term phonème, long believed to have been first used in public in May 1873, is already attested in March 1865, both in an unpublished letter by Antoni Dufriche-Desgenettes (1804–1878) and in a lithographed table of his phonetic alphabet. Although the article accompanying the table was based on a talk given in late 1860 and was published in the proceedings of the Société d’ Ethnographie for 1860–1861 (­Dufriche-Desgenettes 1861), it was presumably not written before March 1865 and did not appear in print until the spring of 1868 (cf. Mugdan 2011: 90–96). Meanwhile, with the progress of large-scale digitization projects, I have discovered an even earlier instance of phonème in a hitherto unknown text by Dufriche.|185|Antoni Dufriche-Desgenettes, biodate, phoneme, history of science, 2023|Mugdan2014|Dufriche’s reply, dated 11 December 1862, contains a passage in which he points out that the French words pays and dépayser have a mid E followed by an I, with a weakened [j]-sound (une sorte de yé atténué) in between, to which he refers as “ce phonème de transition, ou plutôt de liaison euphonique [this phoneme of transition or rather euphonic linkage]” (Dufriche 1862: 271; italics in the original). This is the only occurrence of the term phonème in the text, and it is remarkable that Dufriche, as in all of his later [pb] writings, neither gives an explanation nor says anything about whether he himself adapted the word from Greek φώνημα (phōnēma) “voice” or borrowed it from someone else (cf. Mugdan 2011: 98–101).|185f|phoneme, history of science, Antoni Dufriche-Desgenettes 2024|Mugdan2014|What we still do not know is how Dufriche heard about the word and how its meaning changed from “sound sequence as the expression side of a word” to “sound segment as an element of a sound inventory” (cf. Mugdan 2011: 100–102); perhaps further evidence will come to light as more 19th-century publications are digitized and can be searched online.|186|Antoni Dufriche-Desgenettes, history of science, phoneme, terminology 2025|Zheng2015|Norman (1979) proposes there are three strata dated from the Qin-Han, Southern dynasties and Late Tang period in the modern Min dialects respectively. Ting (1988, 1995) and Mei (1994, 2001) argue that the three strata also exist in the Wu dialects. Most of the previous studies focus on the comparison between the Southern Wu (SW) and Min and neglect the materials of Northern Wu (NW). Through comparative studies, strata analyses and referring to Chinese historical phonology, we assume that some phonological traits in NW, SW and Min all come from Qieyun or the Jiangdong dialect in the Southern dynasty. This paper also points out that although Wu dialects in southern Zhèjāng are more similar to Min than Northern Wu, such places in NW as Dānyáng, Chángshú and “Chóngqīhǎi” near Jinling geographically regarded as the center of the Jiangdong dialect, probably retain some early phonological traits in the Southern dynasties, which is also similar to modern SW and Min.|000|Chinese dialects, Wú, Mǐn, language history, historical phonology, sound change, genetic relationship 2026|Zheng2015|Generally speaking, Wu is divided into a northern (JiƗngsnj) type and a southern (Zhèjiāng) type, but such a division is not very clear. Northern Wu (hereafter NW) is referred to those dialects around the Taihu Lake, so NW is also named Taihu Lake subgroup of Wu. As for the TƗizhǀu subgroup of Wu dialects, including Tiāntāi, Wēnlǐng, Línhǎi, etc., is usually sub-classified into NW. The dialects belonging to the Southern Wu (hereafter SW) include Wēnzhōu, Jīnhuá, Qúzhōu and Líshuǐ, which is the southern part of Zhèjiāng province (@Norman<1988> 1988: 199).|120|Wú, Chinese dialects, subgrouping, 2027|Zheng2015|As for the classification of the Chinese dialects in the Wei-Jin and the Northern and Southern dynasties, the difference of the dialects of [pb] south and north China is basically reflected in the discrepancy between *Hebei* 河北 and *Jiangdong* 江東. The preface of Jiandian Shiwen 經典釋文·序錄 said: "方言差別,固自不同,河北江南,最為巨异." (The dialectal difference existed at an early stage, and that between *Hebei* and *Jiangdong* is the most obvious.) :comment:`[Brings more examples from the literature]` |120f|diatopic variation, Chinese dialects, language variation, language history, Héběi, Jiāngdōng, 2028|Zheng2015|Furthermore, the phenomenon probably reflects the historical relationship between the modern Wu and the Jiangdong dialect. In addition to the phonological perspective, we can discuss that from the perspective of the history of lexical change as well, especially grammaritcal evolution of the particles (see Zheng 2010). This paper aims [pb] at setting up the genetic relationship among the Northern Wu, Southern Wu and Min dialects based on some phonological features. It is hoped to shed light on a better understanding of the present-day and ancient dialects in the south China.|141f|Mǐn, Wú, Chinese dialects, subgrouping, genetic relationship 2029|Behr2005|This paper explores the diachronic stability of tone classes in rhyming bronze inscriptions from the Western and Eastern Zhou periods, and compares it to the ratio encountered in the Máoshī and bronze inscriptional tōngjiǎ characters. The relationship between the four tone categories stayed by and large similar throughout the Zhōu subperiods, even if the absolute degree of tonal interrhyming decreased dramatically over time along with the consolidation of a tetrasyllabic line standard. Possible implications of these findings for the process of tonogenesis, the syllable typology of Old Chinese, and the nature of pre-Shījīng rhyming are discussed.|000|rhyme patterns, Old Chinese, shījīng, Bronze inscriptions, tonogenesis, syllable structure, 2030|Behr2005|Apart from external comparative data, two types of evidence which might shed some light on the process and dating of tonogenesis in Old Chinese have been comparatively little explored, namely the consistency of tones throughout phonophoric or xiéshēng 諧聲-series and tonal irregularities in pre-Qín inscriptional rhyming. In this paper, I will mainly address the second type of materials, focussing on the corpus of rhyming bronze inscriptions collected for my 1997 dissertation. [...] For the Western Zhōu period, the available 111 inscriptions have been diachronically ordered according to the reign periods of the Western Zhōu kings|113|Old Chinese, rhyme patterns, Bronze inscriptions, dataset 2031|Behr2005|One major problem in comparing RBI data to the tonal distributions encountered in the Odes, 18 which have been meticulously analyzed by Cheung (1968), Jeon (1987), Xiàng (1987) and others, is the almost complete lack of a concept of zhāng 章 or “stanza” in rhyming bronze inscriptions. As is abundantly clear from the excavated tomb text literature, the zhāng, rather than the whole poem, was the most important unit of textual organization in the various bamboo strip versions of the Odes, but no similar graphic, metric or conceptual entity is known from the bronzes.|117|stanza, rhyme patterns, poem, Bronze inscriptions, structure, 2032|Behr2005|Moreovoer, it follows from the lack of stanzaic verse organization, that neither the traditional fourway subdivision of tonally mixed sequences according to the number tones involved within one rhyming sequence, nor an assignment of tonally mixed sequences to one predominant tone (within —the stanza!) is possible with regard to rhyming bronze inscriptions. Even worse, it is well known that instances of nonadjacent rhyming lines (i.e. ABAB, ABBA, ABACAB, ABACABC etc.) with or without one or several intervening non-rhyming lines (i.e. A[...X...] B[...X...]B A[...X...] etc.) are much more common in rhyming bronze inscriptions than in the Shījīng, 21 so that an approach which simply takes an “"yī yùn dào dǐ" 一韻到底 or unchanged monorhyme sequence as the domain of reference for tonal patterns is clearly not feasable.|120|Bronze inscriptions, rhyme patterns, Old Chinese 2033|Behr2005|In fact, the ratio between tonally mixed and unmixed rhymes is usually well below 50%. Cross-tonal rhyming is therefore the rule, not an exception in rhyming bronze inscriptions.|124|Old Chinese, rhyme patterns, tone, impure rhymes, tone contact, Bronze inscriptions 2034|Behr2005|Other strictly geographical effects in tone contacts are largely absent. This observation ties in nicely with the overall impression that the areal consistency of rhyming practices in bronze inscriptions is a strong argument in favor of the existence of some sort of koinē-like normative speech or yǎyán 雅言 during the Eastern Zhōu period, if not earlier.|125|Bronze inscriptions, koine, yǎyán, Eastern Zhōu dynasty, Old Chinese, geographic variation 2035|Ratcliffe2012|The high degree of contradiction and incompatibility between two indepen- dently produced Afroasiatic comparative lexica (Ehret 1995, Orel & Stolbova 1995) calls into question the reliability of the comparative method at deep time depths. The discrepancy could only have arisen if one or both sources contain a large number of chance or spurious matches. This article first documents the discrepancy between the two comparative lexica, and then attempts to explain it. The central proposal is that the evaluation of a proposed reconstruction must go beyond qualitative evaluations of individual proposed cognate sets and incor- porate quantitative tools for evaluating the probable degree of chance matches within the reconstruction as a whole.|000|reliability, validity, comparative method, methodology, Afroasiatic languages, long-range comparison 2036|Ratcliffe2012|One way to test the reliability of the comparative method would be to under- take the following experiment. Take a set of languages for which a relationship has been suggested, but for which regular sound correspondences and a recon- structed proto-language sound system have not yet been established. Furnish two libraries in different places with all the available and relevant information on the languages (dictionaries, grammars, texts). Take two teams of researchers trained in the comparative method, put them in the libraries, keep them in isolation from each other and see what they come up with. If it is a reliable procedure then two trained practitioners of it confronted with the same body of data should come up with broadly similar results — repeatability of experiments might reasonably be expected, as in natural science. Of course, finding qualified volunteers for such an experiment might prove quite difficult.|240|comparative method, reliability, validity, methodology, inter-annotator agreement 2037|Ratcliffe2012|Therefore, even when there is no direct contradic- tion, proposals of cognacy in one source which crucially rely upon assumptions about sound correspondences, root-structure, or semantics which are contradic- tory with the assumptions about these matters made by the other source must be judged incompatible in principle. By this criterion I have judged that about 555 of Ehret’s (1995) entries, more than half the total, as incompatible with the Afroasiatic reconstruction of Orel & Stolbova (1995). In short the two sources have reconstructed mutually unrecognizable proto-languages.|241|comparative method, testing, reliability, methodology, linguistic reconstruction 2038|Kessler2001|[...] traditional historical [linguistic] work does not proceed along lines that invite statistical analysis. In a canonical statistical study, it is important that one select in advance (i.e. before even looking at the languages) the attributes that one is going to study and the metric for evaluating those attributes. :comment:`[quoted after @Ratcliffe2012 ]`|18|historical linguistics, statistics, methodology 2039|Ratcliffe2012|In what follows I will present the results of my comparison of the two lexica, and then show how probabilistic tools can be used to evaluate and explain the dis- crepancy between them. I will consider how to calculate in principle the relative costs (in terms of raised expectation of chance matches) of different ways of con- ducting a comparison. I will also attempt to determine how allowing for leeway in different areas may vary the number of chance matches. Finally I will show how it might be possible to estimate the expected number of chance matches likely to be present in a proposed set of reconstructions, based on what the researcher making the proposal reveals about the method of conducting the search.|245|linguistic reconstruction, comparative method, statistics, reliability, methodology 2040|Ratcliffe2012|In the following discussion I will use the term “Agreeing” to describe cases where both sources agree in connecting a particular word in one branch with a particular word in at least one other. [pb] The term “Complementary” describes two cases. The first is where connections are proposed between words not treated by the other source, but where these pro- posals are not incompatible in principle. [pb] The term “Contradictory” defines cases where both sources treat the same word in one branch but connect it with a different words or internally non-cognate words in another branch [...]. |248-250|evaluation, reliability, linguistic reconstruction, comparative method 2041|Ratcliffe2012|There are three different causes of incompatibility between the sources: incompat- ible sound correspondences, incompatible assumptions about root structure, and incompatible assumptions about semantics. [...] If a cognate set depends upon a sound correspondence contradictory with the sound correspondences proposed in the other source it is IIP. [...] [pb] That still leaves 17 proposed correspondence sets (= proposed reconstructed segments) in Ehret (1995) and 14 in Orel & Stolbova (1995). :comment:`[17 out of about 33 to 42 consonants are incompatible in the two sources.]` [...] [pb] Both Ehret (1995) and Orel & Stolbova (1995) allow 2-to-3 matches, that is cases where a word with two consonants in one language is proposed as cognate with a word with three consonants in another. However, the justification for this is differ- ent in each case. Orel & Stolbova (1995) allow for the loss of certain consonants, notably sonorants and gutturals (velar fricatives and post-velar consonants), in all environments, and allow for a small number of consonants to be treated as [pb] fossilized prefixes or suffixes. Ehret takes the more radical view that all third con- sonants are fossilized suffixes. [...] The category of semantically incompatible includes two types of problems, first for a word with many meanings or many translation equivalents, the sources may choose to take different ones as basic and search other languages accordingly. [...] The second problem, not unconnected with the first, is that the authors have different assumptions about what sort of vocabulary to reconstruct for the proto- language. As this example also illustrates, Ehret has tendency to reconstruct verbs wherever possible, while Orel & Stolbova (1995) prefer concrete nouns. :comment:`[problem of semantic reconstruction evident]` |252-255|incompatibility, linguistic reconstruction, evaluation, comparative method, reliability 2042|Ratcliffe2012|Thus slightly less than seven percent of the cognate sets proposed by Ehret (1995) are also proposed by Orel & Stolbova (1995). Slightly more than anoth- er 21% are complementary. From the reverse perspective, since Orel & Stolbova (1995) have roughly two and a half times as many entries as Ehret (1995), only a little more than two percent of the cognate sets proposed by Orel & Stolbova (1995) are also acknowledged by Ehret (1995). Die-hard opponents of long-dis- tance comparison may leap to the conclusion that the method is 93% to 98% inac- curate even at medium depths, but such a conclusion would be premature. Still, the fact remains that two sets of scholars have been able to reconstruct mutu- ally unrecognizable proto-languages, and this demands an explanation.|256|linguistic reconstruction, reliability, testing, evaluation, comparative method 2043|Ratcliffe2012|How much semantic range do comparatists normally allow for, and how much have Ehret (1995) and Orel & Stolbova (1995) allowed for? Researchers rarely spell out how they have gone about searching for cognates, how many comparisons they have made before they have found anything. But often this can be deduced from the results. One interesting cause and effect relation is that broad semantic leeway in making the comparison leads to the reconstruction of forms with vague seman- tics and also not unusually to the reconstruction of a large numbers of synonyms.|263|linguistic reconstruction, cross-semantic cognates, semantic range, reliability, problem, methodology 2044|Ratcliffe2012|How much semantic range do comparatists normally allow for, and how much have Ehret (1995) and Orel & Stolbova (1995) allowed for? Researchers rarely spell out how they have gone about searching for cognates, how many comparisons they have made before they have found anything. But often this can be deduced from the results. One interesting cause and effect relation is that broad semantic leeway in making the comparison leads to the reconstruction of forms with vague seman- tics and also not unusually to the reconstruction of a large numbers of synonyms.|263|linguistic reconstruction, cross-semantic cognates, semantic range, reliability, problem, methodology 2045|Ratcliffe2012|This can be illustrated by example. Suppose one compares the word for ‘bird’ in one language not only with the equivalent for ‘bird’ in the other language, but with the words for ‘sparrow’, ‘pigeon’, ‘vulture’, etc. If there are 50 “bird-words” in each language, and one compares each of these in one language with the fifty in the other, there is a total of 2500 (50 x 50) comparisons made. Of course if one finds [pb] apparent cognates on the basis of assuming semantic equivalence between pairs like ‘swallow’~‘hawk’, ‘parrot’~‘quail’, ‘kite’~‘ostrich’, ‘butterfly’~‘pelican’, etc. one can only reconstruct a superordinate term like ‘bird’ or ‘kind of bird’. If several such matches are found, the researcher ends up reconstructing several synonyms for ‘bird’. This example may sound exaggerated, but in fact it seems to be exactly what Orel & Stolbova (1995) have done in this particular case. The index under ‘bird’ indicates 52 sets reconstructed with the meaning ‘bird’. This includes proposed cognate sets with the following meanings. .. image:: static/img/ratcliffe-2012-1.png :name: ratcliffe-2012 :width: 600px [...] However it is not plausible that Afroasiatic had 52 synonyms for the generic term ‘bird’ and that these later differentiated in the individual languages. Natural languages do not allow such a high degree of synonymy. The reconstruction of synonyms is an artefact of method — specifically of allowing broad semantic lee- way in making comparisons.|263f|chance resemblance, linguistic reconstruction, cross-semantic cognates, methodology 2046|Ratcliffe2012|Another way to increase the number of comparisons, in effect another way to broaden semantic leeway, is to ignore derivational and etymological history. If one considers not just the most basic or etymologically oldest sense of a word but secondary semantic shifts, dialect variant meanings, derived words no longer closely semantically connected with the source, etc., one can increase the number of comparisons, and hence the degree of randomness. Conversely, a careful use of etymological and derivational history to restrict the number of comparisons can reduce the degree of randomness.|267|chance resemblance, morphological change, word derivation, methodology, comparative method, reliability 2047|Ratcliffe2012|Comparison across branches in the absence of intermediate node reconstruction [...] allows for a choice of potential cognates among many languages of a given sub-branch, increasing the number of comparisons and hence the degree of randomness. The solution is, of course, more attention to re- construction at the small family and sub-branch level. Of course, it is sometimes the case that a peripheral language within a sub-group retains a conservative form, while the majority of the languages in the sub-group have innovated together. Reconstruction purely on the basis of the internal information of the sub-group would probably lead us to reconstruct the majority (innovative) form and to ig- nore the actually conservative peripheral form in these cases. This is a consequence which simply has to be accepted in the preliminary stages of comparison, in the same way that the Neogrammarian principle of regular sound change will initially force us to ignore some legitimate cognates, for example place names or personal names, which have not changed according to the regular sound laws. [...] [pb] Panchronic use of sources [...] leads to an increase in the number of comparisons that can be made. [...] [pb] Reconstruction of excessive synonyms and homonyms is generally considered inconsistent with best practice. Presumably this reflects uniformitarian assump- tions about what sort of thing is possible in actual languages. For functional rea- sons a language can only tolerate a certain amount of synonymy and homonymy, since homonymy creates ambiguity, which interferes with communication, and synonymy, which is a form of redundancy, imposes unneccesary burdens on memory. However, exactly how much synonymy and homonymy languages toler- ate has never been systematically studied cross-linguistically, and it is difficult to see how this even could be done, although Krupa (1968) presents an interesting attempt to quantify synonymy for one language, Maori. |270-272|chance resemblance, linguistic reconstruction, comparative method, methodology, stochastic analysis 2048|Ratcliffe2012|Interesting article that shows how linguistic reconstruction should NOT be done, especially pointing to methodology which is often used by many scholars, and the fallacies resulting from this, namely: * use of data from many languages of a single sub-branch without reconstructing at the sub-branch level * panchronic use of the sources, choosing meanings from various time depths * broad semantic reconstruction The author compares two reconstructions on Afro-Asiatic languages and comes to the conclusion that they largely differ, thus showing that the comparative method does not work at the deep level, due to the methodological looseness, as illustrated by the three points mentioned above, which increase the number of chance resemblances.|000|comparative method, stochastic analysis, chance resemblance, methodology, linguistic reconstruction 2049|Kaiping2016|Theoretical anthropology tries to develop models of human biology and culture. In this thesis, we investigate how different computational models from theoretical biology can be applied to evolutionary anthropology. We study two different types of models, applying them to two different sub-fields of evolutionary anthropology, and highlighting alternative choices in their construction. On the one hand, we observe that the evolutionary simulations are composed of three main components: an updating rule, a game and a population structure. We find that the updating rule can alter the qualitative and quantitative evolutionary outcome of a model. A dominant language is more resilient to learning errors and more frequent when selection primarily weeds out maladapted individuals, instead of promoting well-adapted ones. We study the evolution of cooperation and institutional punishment. Group selection can support cooperation, even when implemented through the selection of individual agents migrating between communities at different rates. Institutional punishment on the other hand is highly complex and cannot arise from simpler strategies in either well- mixed or community-structured populations. On the other hand, Bayesian inference models used for linguistic phylogenies can in- corporate highly correlated typological information, without a priori knowledge about the underlying linguistic universals. While close in subject, models in theoretical biology and profound anthropological expertise express all but disjoint theories in terms of scope and complexity. This thesis acknowledges this challenge and contributes to bridging the gap|000|anthropology, computational approaches, family tree, language evolution, 2050|Kaiping2016|Thesis does not seem to be completely interesting, as it is more theoretical and much into game theory, bt some of the parts on the application of methods and language evolution may be interesting to be quoted in some contexts.|000|computational approaches, anthropology, language evolution 2051|Weiss2016|The concept of a last universal common ancestor of all cells (LUCA, or the progenote) is central to the study of early evolution and life’s origin, yet information about how and where LUCA lived is lacking. We investigated all clusters and phylogenetic trees for 6.1 million protein coding genes from sequenced prokaryotic genomes in order to reconstruct the microbial ecology of LUCA. Among 286,514 protein clusters, we identified 355 protein families (∼0.1%) that trace to LUCA by phylogenetic criteria. Because these proteins are not universally distributed, they can shed light on LUCA’s physiology. Their functions, properties and prosthetic groups depict LUCA as anaerobic, CO 2 -fixing, H 2 -dependent with a Wood–Ljungdahl pathway, N 2 -fixing and thermophilic. LUCA’s biochemistry was replete with FeS clusters and radical reaction mechanisms. Its cofactors reveal dependence upon transition metals, flavins, S-adenosyl methionine, coenzyme A, ferredoxin, molybdopterin, corrins and selenium. Its genetic code required nucleoside modifications and S-adenosyl methionine-dependent methylations. The 355 phylogenies identify clostridia and methanogens, whose modern lifestyles resemble that of LUCA, as basal among their respective domains. LUCA inhabited a geochemically active environment rich in H 2 , CO 2 and iron. The data support the theory of an autotrophic origin of life involving the Wood–Ljungdahl pathway in a hydrothermal setting.|000|last universal common ancestor, LUCA, origin of life, 2052|Weiss2016|Paper by Bill Martin's team, uses a rather simplified workflow, defining LUCA as those genes which are present in two representatives of the different domains of life. They use Markov clustering for homolog detection and automatic alignments.|000|LUCA, last universal common ancestor, sequence alignment, origin of life 2053|Pedersen2016|During the Last Glacial Maximum, continental ice sheets isolated Beringia (northeast Siberia and northwest North America) from unglaciated North America. By around 15 to 14 thousand calibrated radiocarbon years before present (cal. kyr bp), glacial retreat opened an approximately 1,500-km-long corridor between the ice sheets. It remains unclear when plants and animals colonized this corridor and it became biologically viable for human migration. We obtained radiocarbon dates, pollen, macrofossils and metagenomic DNA from lake sediment cores in a bottleneck portion of the corridor. We find evidence of steppe vegetation, bison and mammoth by approximately 12.6 cal. kyr bp, followed by open forest, with evidence of moose and elk at about 11.5 cal. kyr bp, and boreal forest approximately 10 cal. kyr bp. Our findings reveal that the first Americans, whether Clovis or earlier groups in unglaciated North America before 12.6 cal. kyr bp, are unlikely to have travelled by this route into the Americas. However, later groups may have used this north–south passageway.|000|colonialization, North America, North American languages, archaeogenetics 2054|Pedersen2016|This article describes that colonialization via Siberia was NOT possible until 12000 BC (or years ago), which suggests that people were settling the Americas via different routes, be it from South America or other ways, as older cultures predating the time were found and identified.|000|North American languages, North America, colonialization, archaeogenetics 2055|Frermann2016|Word meanings change over time and an automated procedure for extracting this information from text would be useful for historical exploratory studies, information retrieval or question answering. We present a dynamic Bayesian model of diachronic meaning change, which infers temporal word represen- tations as a set of senses and their prevalence. Unlike previous work, we explicitly model language change as a smooth, gradual pro- cess. We experimentally show that this model- ing decision is beneficial: our model performs competitively on meaning change detection tasks whilst inducing discernible word senses and their development over time. Application of our model to the SemEval-2015 temporal classification benchmark datasets further re- veals that it performs on par with highly optimized task-specific systems.|000|semantic change, meaning, Bayesian approaches, feature vectors, semantic vectors 2056|Frermann2016|Quite interesting paper on semantic change in English language, based on the co-occurrence semantic vectors which they now generally use in NLP to model semantics.|000|Bayesian approaches, word co-occurrence network, semantic change, semantic similarity 2057|Hamilton2016|Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust method- ology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. We then use this methodology to reveal statistical laws of semantic evo- lution. Using six historical corpora span- ning four languages and two centuries, we propose two quantitative laws of seman- tic change: (i) the law of conformity—the rate of semantic change scales with an in- verse power-law of word frequency; (ii) the law of innovation—independent of fre- quency, words that are more polysemous have higher rates of semantic change.|000|semantic change, word embeddings, word co-occurrence network, semantic vectors 2058|Hamilton2016|Paper shows the word embeddings to handle semantic change. It is potentially of some interest, although very data-driven and not very informed regarding historical linguistics. However, they even mention a metric to handle polysemicity of words, which might be interesting to compare against our more powerful metrics in CLICS.|000|polysemy, word embeddings, semantic change, semantic vectors 2059|Devadoss2016|A classic problem in computational biology is constructing a phylogenetic tree given a set of distances between n species. In most cases, a tree structure is too constraining. We consider a circular split network, a generalization of a tree in which multiple parallel edges signify divergence. A geometric space of such networks is intro- duced, forming a natural extension of the work by Billera, Holmes, and Vogtmann on tree space. We explore properties of this space, and show a natural embedding of the compactification of the real moduli space of curves within it. |000|family tree, phylogenetic network, split network, algorithms, introduction 2060|Devadoss2016|Potentially interesting article, as it introduces some interesting background on split networks, including descriptions of algorithms, like a convex hull algorithm to draw a split system.|000|algorithms, introduction, split network, phylogenetic network, family tree 2061|Chiswick2004|This paper develops a scalar or quantitative measure of the “distance” between English and a myriad of other (non-native American) languages. This measure is based on the difficulty Americans have learning other languages. The linguistic distance measure is then used in an analysis of the determinants of English language proficiency among adult immigrants in the United States and Canada. It is shown that, when other determinants of English language proficiency are the same, the greater the measure of linguistic distance, the poorer is the respondent’s English language proficiency. This measure can be used in research, evaluation and practitioner analyses, and for diagnostic purposes regarding linguistic minorities in English-speaking countries. The methodology can also be applied to develop linguistic distance measures for other languages. |000|linguistic distance, language distance, language learning, language comparison, language comprehension 2062|Chiswick2004|Interesting but useless approach to measure distance between languages by using the learning difficulty of L2 speakers. Ignores that learning is not transitive or reciproke for languages and that distances are thus no real distances, or, say, rather expressed in weighted directed networks than in undirected ways.|000|language distance, language learning, mutual intelligibility, 2063|Nouri2016|We present methods for investigating pro- cesses of evolution in a language fam- ily by modeling relationships among the observed languages. The models aim to find regularities—regular correspon- dences in lexical data. We present an al- gorithm which codes the data using pho- netic features of sounds, and learns long- range contextual rules that condition re- current sound correspondences between languages. This gives us a measure of model quality: better models find more regularity in the data. We also present a procedure for imputing unseen data, which provides another method of model com- parison. Our experiments demonstrate im- provements in performance compared to prior work.|000|sequence alignment, phonetic alignment, language change, language evolution, phylogenetic reconstruction, 2064|Nouri2016|Typical strange paper of those Finnish people in which they use some kind of alignment, some kind of model, and some kind of stuff to work phylogenetic trees on the always-the-same dataset of Uralic languages they scraped from ToB.|000|phylogenetic reconstruction, phonetic alignment, etymological cognacy 2065|Fink2015|Wenn man den evolutionären Theorien Glauben schenkt, dann steht der Kannibalismus am Anfang der Kulturentwicklung. Auf einer primitiven Kulturstufe fressen sich die Menschen gegenseitig auf, da ihnen Landwirtschaft und Kochkunst noch fremd sind. Eine andere Erklärung des Kannibalismus im Rahmen evolutionärer Theorien beruft sich auf allgemein verbreitete magisch-religiöse Überzeugungen, wobei der Kannibalismus hier oft in engem Zusammenhang mit Menschenopfern steht.1 Nachdem sich bei den Griechen und selbst noch in der Neuzeit Berichte über primitive Völker mit kannibalischen Riten häufen, müsste man im Rahmen solcher Theorien anneh- men, dass sich in den Textzeugen der älteren Kulturen in Mesopotamien, Ägypten und Kleinasien noch mehr solcher Berichte finden lassen müssten, da der Zivilisationsprozess damals noch nicht so weit fortgeschritten war, wie zur Zeit der Griechen oder gar nochmal 2000 Jahre später, als in Europa Mozart, Beethoven und Bach ihre Werke komponierten und an den Rändern der Welt – zumindest wenn man den Berichten zahlreicher For- schungsreisender und Ethnologen Glauben schenkt – immer noch Menschen gefressen wurden. Ein kurzer Überblick und die Diskussion der wichtigsten einschlägigen Quellen soll uns zeigen, ob die Texte der ältesten Schriftkulturen Evidenz für diese evolutionäre Annahme eines quasi universell verbreiteten Kannibalismus auf einer frühen Entwicklungsstufe der Menscheit liefern.|000|cannibalism, Greece, antiquity, human flesh, historiography, introduction 2066|Fink2015|Article gives a quick introduction to the history of cannibalism, and what is known about it, with some focus on Ancient Greece and Western antiquity.|000|introduction, cannibalism, human flesh, antiquity 2067|Deri2016|Grapheme-to-phoneme (g2p) models are rarely available in low-resource languages, as the creation of training and evaluation data is expensive and time-consuming. We use Wiktionary to obtain more than 650k word-pronunciation pairs in more than 500 languages. We then develop phoneme and language distance metrics based on phono- logical and linguistic knowledge; apply- ing those, we adapt g2p models for high- resource languages to create models for related low-resource languages. We pro- vide results for models for 229 adapted lan- guages.|000|grapheme-to-phoneme, NLP, natural language processing, cross-linguistic study, Wiktionary, dataset, 2068|Deri2016|Interesting article, partially a bit naive, but even more interesting dataset, including phonetic transcriptions for hundreds of languages. Their smart distinction between LATIN, CYRILLIC, etc. alphabets makes parsing much simpler, and they could be used as starting point for an elaborated system of phoneme-guessing in under-resourced languages.|000|grapheme-to-phoneme, NLP, phonetic transcription, cross-linguistic study 2069|Cai2016|Most previous approaches to Chinese word segmentation formalize this prob- lem as a character-based sequence label- ing task so that only contextual informa- tion within fixed sized local windows and simple interactions between adjacent tags can be captured. In this paper, we pro- pose a novel neural framework which thor- oughly eliminates context windows and can utilize complete segmentation history. Our model employs a gated combination neural network over characters to produce distributed representations of word candi- dates, which are then given to a long short- term memory (LSTM) language scoring model. Experiments on the benchmark datasets show that without the help of feature engineering as most existing ap- proaches, our models achieve competitive or better performances with previous state- of-the-art methods.|000|word segmentation, neural network, NLP, automatic approach 2070|Cai2016|Article might be interesting in the context of morpheme segmentation, as it shows some interesting introduction to NLP task for Chinese word segmentation.|000|word segmentation, NLP, automatic approach, Chinese 2071|Zhang2016|Character-based and word-based methods are two main types of statistical models for Chinese word segmentation, the for- mer exploiting sequence labeling models over characters and the latter typically ex- ploiting a transition-based model, with the advantages that word-level features can be easily utilized. Neural models have been exploited for character-based Chi- nese word segmentation, giving high accu- racies by making use of external character embeddings, yet requiring less feature en- gineering. In this paper, we study a neu- ral model for word-based Chinese word segmentation, by replacing the manually- designed discrete features with neural fea- tures in a word-based segmentation frame- work. Experimental results demonstrate that word features lead to comparable per- formances to the best systems in the litera- ture, and a further combination of discrete and neural features gives top accuracies.|000|word segmentation, neural network, Chinese, NLP, transition, hierarchies 2072|Zhang2016|What is interesting about this approach is that it uses some kind of hierarchy to carry out word segmentation, which is exactly what we would need for morphological segmentation: not a full-fledged segmentation into all elements, but instead a tree-representation that shows for each word a presumed hierarchy by which it was established.|000|word segmentation, hierarchies, Chinese, NLP, neural network 2073|Gutierrez2016|Arbitrariness of the sign—the notion that the forms of words are unrelated to their meanings—is an underlying assumption of many linguistic theories. Two lines of research have recently challenged this as- sumption, but they produce differing char- acterizations of non-arbitrariness in lan- guage. Behavioral and corpus studies have confirmed the validity of localized form-meaning patterns manifested in lim- ited subsets of the lexicon. Meanwhile, global (lexicon-wide) statistical analyses instead find diffuse form-meaning system- aticity across the lexicon as a whole. We bridge the gap with an approach that can detect both local and global form- meaning systematicity in language. In the kernel regression formulation we in- troduce, form-meaning relationships can be used to predict words’ distributional semantic vectors from their forms. Fur- thermore, we introduce a novel metric learning algorithm that can learn weighted edit distances that minimize kernel regres- sion error. Our results suggest that the English lexicon exhibits far more global form-meaning systematicity than previ- ously discovered, and that much of this systematicity is focused in localized form- meaning patterns.|000|semantic vectors, NLP, semantic change, arbitrariness, form-meaning pairs, automatic approach, English 2074|Gutierrez2016|Interesting paper that introduces the idea of form-meaning systematicity on the sub-morphemic level to be investigated from an NLP viewpoint. They list further literature on the topic and use edit distance to search for phonetic similarities and semantic vector cosine distance (or cosine similarity) to search for correlations between form and meaning. Although their approach remains a bit questionable, it is promising to be compared with semantic networks or polysemy networks and especially to be contrasted with these approaches.|000|semantic similarity, phonetic similarity, arbitrariness, semantic vectors, edit distance, English, corpus studies, automatic approach 2075|Eger2016|We consider two graph models of semantic change. The first is a time-series model that relates embedding vectors from one time period to embedding vectors of pre- vious time periods. In the second, we construct one graph for each word: nodes in this graph correspond to time points and edge weights to the similarity of the word’s meaning across two time points. We apply our two models to corpora across three different languages. We find that semantic change is linear in two senses. Firstly, today’s embedding vectors (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time peri- ods. Secondly, self-similarity of words de- cays linearly in time. We consider both findings as new laws/hypotheses of se- mantic change.|000|word embeddings, semantic vectors, vector-space models, semantic change, cosine similarity, 2076|Eger2016|Describe some basic way on how to model semantic change in word embeddings, especially based on some vector-space models where they compare meanings at different times with cosine similarity.|000|word embeddings, semantic change, cosine similarity, NLP 2077|King2016|This work examines CRF-based sequence alignment models for learning natural lan- guage morphology. Although these sys- tems have performed well for a limited number of languages, this work, as part of the SIGMORPHON 2016 shared task, specifically sets out to determine whether these models handle non-concatenative morphology as well as previous work might suggest. Results, however, indicate a strong preference for simpler, concatena- tive morphological systems.|000|sequence alignment, morphology, machine learning, NLP, 2078|King2016|Potentially interesting paper, as the author describes how one can learn things like morphology by using sequence alignment, which brings us close to the idea of how to tackle the broken-plurals problem, with the exception that more complicated representations are being used for source and target forms along with the multi-tiered sequence representations.|000|sequence alignment, morphology, machine learning, NLP 2079|Ma2016|For languages such as German where compounds occur frequently and are written as single tokens, a wide variety of NLP applications benefits from recognizing and splitting compounds. As the traditional word frequency-based approach to compound splitting has several drawbacks, this paper introduces a letter sequence labeling approach, which can utilize rich word form features to build discriminative learning models that are optimized for splitting. Experiments show that the proposed method significantly outperforms state-of-the-art compound splitters. |000|compound words, morpheme detection, morpheme segmentation, NLP, labeling 2080|Ma2016|Potentially interesting approach for the task of compound splitting, or morphological segmentation, as it is specifically tested on German, and tries to find where to cut words into compounds.|000|compound words, morpheme segmentation, NLP, automatic approach, labeling 2081|Nicolai2016|Syllabification is sometimes influenced by morphological boundaries. We show that incorporating morphological infor- mation can improve the accuracy of or- thographic syllabification in English and German. Surprisingly, unsupervised seg- menters, such as Morfessor, can be more useful for this purpose than the supervised ones.|000|morpheme segmentation, syllabification, English, German, NLP 2082|Nicolai2016|Not sure whether this is a useful paper, as they claim to be better at syllabification due to morpheme boundaries known, which is obvious, clear, and known for a long time, but their claim that Morfessor works better to prepare the data, is interesting in this context, although they only test on German and English.|000|morpheme detection, syllabification, English, German, NLP 2083|Declerck2016|This paper presents an approach for the formal representation of components in German compounds. We assume that such a formal representation will support the segmentation and analysis of unseen compounds that feature components already seen in other compounds. An extensive language resource that explicitly codes components of compounds is GermaNet, a lexical semantic network for German. We summarize the GermaNet approach to the description of compounds, discussing some of its shortcomings. Our proposed extension of this representation builds on the lemon lexicon model for ontolo- gies, established by the W3C Ontol- ogy Lexicon Community Group. |000|morpheme, compound words, formats, statndards, NLP, representation of sound sequences 2084|Declerck2016|Basically a bit boring paper, but they deal with compound representation, which is interesting per se, as we also need to consider how to deal with it, and so it is worth being in this paper collection.|000|compound words, representation of sound sequences, 2085|Shoemark2016|Recent work has proposed using network science to analyse the structure of the mental lexicon by viewing words as nodes in a phonological network, with edges connecting words that differ by a single phoneme. Comparing the structure of phonological networks across different languages could provide insights into linguistic typology and the cognitive pressures that shape lan- guage acquisition, evolution, and process- ing. However, previous studies have not considered how statistics gathered from these networks are affected by factors such as lexicon size and the distribution of word lengths. We show that these factors can substantially affect the statistics of a phonological network and propose a new method for making more robust comparisons. We then analyse eight languages, finding many commonalities but also some qualitative differences in their lexicon structure.|000|phonetic network, mental lexicon, edit distance, phoneme system, 2086|Shoemark2016|The idea of a phonetic network sounds oversimplifying, as these networks connect all words which differ by only one phoneme, which is surely not useful, as the question of where a word occurs etc. is even more important for this task. However, since the authors show that this is not useful themselves, it is again an interesting paper. More importantly even, they list datasets of pronunciations for some eight different languages, which is surely a nice thing to be considered further (they list CELEX, but also 5 other sources of interest, which are apparently freely accessible).|000|phonetic network, edit distance, German, English, Basque, Dutch, French, Spanish, Portugues, Polish, dataset 2087|Shoemark2016|Therefore, rather than directly comparing the statistics of real lexicons across languages, we pro- pose to generate separate pseudolexicons for each language that match the word-length distribution and phoneme inventory size of that language. We can then examine the differences between these pseudolexicons and the real lexicons over a range of lexicon sizes, and compare these differences across languages. Using this method we can bet- ter evaluate some of the claims made by previous authors and reveal some previously undetected vari- ation in network structure across languages.|114|phonetic network, pseudo word, 2088|Akseneva2016|It is commonly accepted that morphological dependencies are finite-state in nature. We argue that the upper bound on morphological expressivity is much lower. Drawing on technical results from computational phonology, we show that a variety of morphotactic phenomena are tier-based strictly local and do not fall into weaker subclasses such as the strictly local or strictly piecewise languages. Since the tier-based strictly local languages are learnable in the limit from positive texts, this marks a first important step towards general machine learning algorithms for morphology. Furthermore, the limitation to tier-based strictly local languages ex- plains typological gaps that are puzzling from a purely linguistic perspective.|000|finite state transducer, tier-based language, finite state grammar, phonology, morphology, NLP 2089|Akseneva2016|This paper shows that morphology often is not as hard as people might think, which is an interesting idea (including the presentation of different types of grammars which are much simpler than finite-state transducers).|000|phonology, morphology, finite state transducer, tier-based language 2090|Akseneva2016|Morphotactics describes the syntax of morphemes, that is to say, their linear order in the word and the conditions that license their pres- ence or enforce their absence. One can distin- guish surface morphotactics from underlying mor- photactics. The former regulates the shape of the pronounced surface strings, whereas the latter is only concerned with the arrangements of the mor- phemes in the initial representation rather than how said morphemes are realized in specific envi- ronments. We only consider underlying morpho- tactics in this paper.|121|morphotactics, definition 2091|Akseneva2016|A string language L over alphabet Σ is strictly lo- cal(SL)iffthereissomek ∈ NsuchthatLis generated by a strictly k-local grammar G. Such a grammar consists of a finite set of k-grams, each one of which describes an illicit substring. More precisely, given a string w ∈ Σ∗, let wˆk := kw k (where , ∈/ Σ) and k-grams(w) := {s | s is a substring of wˆk−1 of length k}. Then G generates string w iff k-grams(w) ∩ G = ∅. That is to say, G generates every string over Σ that does not contain an illicit substring.|122|strictly-local language, formal language theory, string language 2092|Akseneva2016|A better model for pri- mary stress assignment is furnished by tier-based strictly local (TSL; Heinz et al. (2011)) grammars, which also happen to be powerful enough for cir- cumfixation.|125|formal language theory, formal grammar, tier-based language, 2093|DegaetanoOrtlieb2016|We present a new approach for modeling diachronic linguistic change in grammatical usage. We illustrate the approach on English scientific writing in Late Modern English, focusing on grammatical patterns that are potentially indicative of shifts in register, genre and/or style. Commonly, diachronic change is characterized by the relative frequency of typical linguistic features over time. However, to fully capture changing linguistic usage, feature productivity needs to be taken into account as well. We introduce a data-driven approach for systematically detecting typical features and assessing their productivity over time, using information-theoretic measures of entropy and surprisal.|000|language change, grammatical change, information-based modeling, productivity 2094|DegaetanoOrtlieb2016|In some sense, this text seems to deal with productivity, but it is by no means 100% clear what they understand from this. It might, however, offer some interesting ideas when giving it a proper read.|000|productivity, grammatical change, NLP 2095|Mei2013|Important paper in which Mei Tsulin describes his ideas about the classification of Chinese dialects by showing how the pronouns developed. The paper contains some family trees and other interesting ways to model, so we have a classification of Chinese dialects and explicit attempts to model language history and lexical change.|000|Chinese dialects, genetic classification, family tree, pronoun, lexical change 2096|Dagan2006|Two significant evolutionary processes are fundamentally not tree-like in nature - lateral gene transfer among prokaryotes and endosymbiotic gene transfer (from organelles) among eukaryotes. To incorporate such processes into the bigger picture of early evolution, biologists need to depart from the preconceived notion that all genomes are related by a single bifurcating tree.|000|family tree, phylogenetic network, core genome, tree of one percent, 2097|Dagan2006|Evolutionary biologists like to think in terms of trees. Since Darwin, biologists have envisaged phylogeny as a tree-like process of lineage splittings. But Darwin was not concerned with the evolution of microbes, where lateral gene transfer (LGT; a distinctly non-treelike process) is an important mechanism of natural variation, as prokaryotic genome sequences attest [1-4]. Evolutionary biologists are not debating whether LGT exists. But they are debating - and heatedly so - how much LGT actually goes on in evolution. Recent estimates of the proportion of prokaryotic genes that have been affected by LGT differ 30-fold, ranging from 2% [5] to 60% [6]. Biologists are also hotly debating how LGT should influence our approach to understanding genome evolution on the one hand, and our approach to the natural classification of all living things on the other. These debates erupt most acutely over the concept of a tree of life. Here we consider how LGT and endosymbiosis bear on contemporary views of microbial evolution, most of which stem from the days before genome sequences were available.|118:1|lateral gene transfer, evolutionary biology, family tree, 2098|Dagan2006|When it comes to the concept of a tree of life, there are currently two main camps. One camp, which we shall call the positivists, says that there is a tree of life, that microbial genomes are, in the main, related by a series of bifurcations, and that when we have sifted out a presumably small amount of annoying chaff (LGT), the wheat (the tree) will be there and will still our hunger for a grand and natural system [7-10].|118.1|lateral gene transfer, family tree, tree of life, 2099|Dagan2006|The other camp, which we will call the microbialists, says that LGT is just as natural among prokaryotes as is point mutation, and that furthermore, it has occurred throughout microbial history. This means that even were we to agree on a grand natural classification, the process of microbial evolution underlying it would be fundamentally undepictable as a single bifurcating tree, because a substantial component of the evolutionary process - LGT - is not tree-like to begin with [1,11,12].|118.1|family tree, lateral gene transfer, tree of life 2100|Dagan2006|A recent paper by @Ciccarelli<2006> et al. [9] brings these two views head-to-head. It purports to weigh in heavily for the positivists, but in doing so it inadvertently provides some of the strongest support for the microbialist camp that has been published so far. A closer look reveals why. Ciccarelli et al. [9] report an automated procedure for identifying protein families that are universally distributed among all genomes, with pipeline alignment and tree building. Their routine looked for possible cases of LGT (detected as unusual tree topologies), excluded such proteins, and reiterated the procedure until the universe of proteins had been examined. This left them with 31 presumably orthologous protein sequences present in 191 genomes each, the alignments of which were concatenated to produce a data matrix with 8,089 sites (of which only 1,212 would have remained had gapped sites been excluded). A maximum likelihood tree was inferred from this matrix, motivating a brief discussion of some important events in life’s history as inferred from that tree.|118.1|family tree, tree of life, lateral gene transfer 2101|Dagan2006|Bearing in mind that an average prokaryotic proteome [pb] represents about 3,000 protein-coding genes, the 31-protein tree of life represents only about 1% of an average prokary- otic proteome and only 0.1% of a large eukaryotic proteome. Thus, the positivists can say that there is a tree of life after all: a bit skimpier than expected, but a tree nonetheless. But the microbialists, glaring at the same data, can say that the glass is only 1% full at best, and more than 99% empty! There might be a tree there, but it is not the tree of life, it is the ‘tree of one percent of life’.|118.1f|family tree, nice quote, tree of one percent, lateral gene transfer, tree of life 2102|Francois2014|Trees fail to capture the very common situation in which linguistic diversification results from the fragmentation of a language into a network of dialects which remained in contact with each other for an extended period of time (Bloomfield 1933; Croft 2000; Garrett 2006; Heggarty, Maguire & McMahon 2010), creating what Ross (1988, 1997) calls a “linkage” (see §3.3).|162|dialect network, dialect continuum, family tree, diversification, language change 2103|Francois2014|Non‐cladistic models are needed to represent language relationships, in ways that take into account the common case of linkages and intersecting subgroups. Among existing models, Section 4 will focus on an approach that combines the precision of the Compara‐ tive Method with the realism of the Wave Model. This method, labeled Historical Glotto‐ metry (Kalyan & François f/c), identifies genealogical subgroups in a linkage situation, and assesses their relative strengths based on the distribution of innovations among modern languages. Provided it is applied with the rigour inherent to the Comparative Method, Historical Glottometry should help unravel the genealogical structures of the world’s language families, by acknowledging the role played by linguistic convergence and diffusion in the historical processes of language diversification.|163|comparative method, historical glottometry, cladistics, diversification, definition, introduction 2104|Francois2014|Thus, to say that M, N and O subgroup together as opposed to other languages of their family, amounts to claiming that they all descend from an intermediate protolanguage – call it Proto‐MNO – that was once spoken by a single social community, after the breakup of the earlier language Proto‐KLMNO. According to Figure 2, this language Proto‐MNO must have developed more or less separately from Proto‐KL, the shared ancestor of modern languages K and L. This point is established through the identification of a number of linguistic innovations of various sorts (phonological, grammatical, lexical, etc.) which are jointly reflected by modern languages M, N and O, but not by other languages of the family. |164|subgrouping, historical glottometry, family tree, cladistics, shared innovation 2105|Francois2014|If these three languages share together certain linguistic properties that were not inherited from their ultimate ancestor, it is assumed – provided one can rule out chance similarity or parallel innovation – that they must have acquired these properties at a certain point in time, when their speakers still spoke (mutually intelligible variants of) a single language. The idea is that, instead of positing the same change in three languages (M, N, O) independently, it is more parsimonious – following Occam’s razor – to propose that it took place just once in a single language (Proto‐MNO) and then was simply inherited by its descendants.|164|family tree, shared innovation, Occam's razor, parsimony 2106|Francois2014|Following a principle first formulated by Leskien (1876), the Comparative Method establishes the existence of every intermediate node in a family tree based on the principle of exclusively shared innovations, i.e. by identifying those linguistic changes that are shared by all of its modern descendants, and only by them – what phylogeneticists call synapomorphies (Page & Holmes 2009).|164|cladistics, comparative method, August Leskien, synapomorphy, shared innovation 2107|Francois2014|The sort of separation referred to here is typically understood as an actual event of social split such as migration, whereby a previously unified society broke up into two separate communities with loss of contact.|165|language split, diversification, family tree 2108|Francois2014|This focus on divergence is both a strength and a weakness of the Tree Model. A strength, because it means that trees can help reconstruct events of social disruption when they indeed took place, and can represent them using a visually straightforward diagram. But it is also a weakness, because it distorts the reality of language diversification by shoehorning it into a one‐size‐fits‐all, simplistic model which forces us to reconstruct events of social separation even when they never really happened, at the expense of all other possible scenarios. :comment:`Again the major failure of AF, that he sees a bifurcating tree everywhere although nobody ever said that we need to reconstruct it in this way!`|165|family tree, epistemology, limitations, diversification, language split 2109|Francois2014|In reality, no population in the world can reasonably have its history reduced to just a series of social splits with loss of contact – the scenario favoured by the Tree Model. While some families did go through such events several times in their history, in the form of successive bouts of migration or similar disruptions, these events of split, correlated with neat patterns of linguistic divergence, are always interspersed with other forms of social interaction whose linguistic impact – as we’ll see below – is not compatible with a tree representation. :comment:`But beware of the fact that first the tree is something that is thinkable in general, also according to the own words of François. Secondly, the argument is similar to that of` @Schmidt1872 :comment:`in so far, as Schmidt also criticizes the diversification process, claiming that this process could not be depicted by a tree. Yet what both Schmidt and François ignore in general is the fact that diversification and separation and splits take place at some point in language history, since otherwise languages would not become mutually unintelligible. This point itself proves that a tree is still needed, albeit in form of a rough multi-furcating backbone where multifurcation represents fuzziness, and that the possibility of an intermediate proto-language which persisted for a longer time as a unity cannot be excluded.` :comment:`Further keep in mind that` @Schleicher1873 :comment:`[1863] explicitly points to the transitional quality of any diversity, admitting explicitly that no sharp borders can be drawn between languages and dialects and speech varieties. So putting Schleicher in the position of the simplifying tree drawer is not justified, and Schleicher was fully aware of the limits of the family tree while also admitting its strengths.` |165|diversification, language history, language split, family tree 2110|Francois2014|If M is a member of the MNO subgroup, then it cannot also be a member of a KLM subgroup at the same time: subgroups are mutually exclusive, and never intersect. This seems a sensible idea if the splits in the tree are meant to represent physical separation with no return: if the communities of pKL and pMNO were indeed separated with [pb] complete loss of contact, then it is difficult to imagine how some modern descendants of pMNO, but not others, could share anything with pKL. This principle of separate develop‐ ment is central to the whole logic of subgrouping studies under a cladistic approach, and has important consequences.|165f|family tree, phylogenetic network, shared innovation, subgrouping 2111|Francois2014|For example, the shared property may be proposed to be in fact a case of shared retention (also known as symplesiomorphy in phylogenetics) from the Proto‐ KLMNO ancestor, a property lost by other languages (K, NO): in this case, the property would not indicate any significant genealogical link between L and M – other than their remote relatedness. Alternatively, one could argue that the property is indeed innovative, yet happened independently in L and M, whether by drift or parallel innovation (homoplasy). Finally, a third hypothesis would be that the property was innovated internally in only one language, say L, and then was borrowed by the other language M via contact between L and M, once they had already been formed as separate languages. |166|family tree, patchy data, shared innovation, homoplasy, symplesiomorphy, cladistics, borrowing 2112|Francois2014|Contact‐induced change, which can take place between any two languages regardless of their relatedness, is generally considered to be a separate phenomenon from the sort of “internal change” that underlies genealogical relations. The argument is that, for a property to be borrowed between two separate languages L and M, the two languages need to already exist independently; strictly speaking, the study of their genealogy is interested in how these languages came into existence, not in what happened to them later. Thus, the many words borrowed by English from Scandinavian languages during the Viking invasions, or later from French, are not considered to form part of its genealogical makeup: the English language had by that time already acquired independent existence, as it were, as a member of the Anglo‐Frisian branch of the West Germanic subgroup. :comment:`This is similar to the argument in biology, as brought up in` @Dagan2006 :comment:`This is interesting in this context and should be remembered: we may talk of the tree of one percent, but while biologists criticize the ignorance for borrowing processes, François says the tree is useless per se. So he brings up a stronger point against the tree model, as alternative criticism usually admits the necessity of a tree while mentioning the importance to be aware of its simplifying quality.`|166|language contact, contact-induced similarity, language change, borrowing, family tree 2113|Francois2014|They argue that loanwords, borrowed structures and other facts of cross‐linguistic diffusion form part of the linguistic history of languages as much as the material directly inherited. While the latter point is undoubtedly true, proponents of the Tree Model reply to this objection by acknowledging that trees are only intended to capture a portion of the history of languages, namely their genealogy strictly speaking, and nothing more. As for other facts of language development – notably the effects of contact – they are, or at least should be, treated by other models (Campbell & Poser 2008:327). This is a valid point, which bears keeping in mind every time family trees are cited: language genealogy only forms a portion of the historical picture, and trees should not be assigned more explanatory value than they actually have. :comment:`Also exactly similar in biology, as shown in` @Dagan2006 :comment:`(also interesting: look at the different use of "convergence" in linguistics and biology!)`|166|explanative force, family tree, borrowing, convergence 2114|Francois2014|My argument will also be based on the problem of horizontal diffusion; yet instead of concerning facts of CROSS‐ LINGUISTIC DIFFUSION (contact between already separated languages), my central problem will be processes of LANGUAGE‐INTERNAL DIFFUSION – i.e. the diffusion of innovations across mutually intelligible idiolects in a single language community. :comment:`Note that this argument essentially is the one that favors things like incomplete lineage sorting and the like, and it also reflects the error in François' argument: the diversification process may not be a complete split in the beginning, as also brought up in` @Schuchardt1870 :comment:`but this does not mean that diversification NEVER involves a split.` |167|diversification, language split, Hugo Schuchardt, linguistic diffusion, borrowing, family tree 2115|Francois2014|The whole design of the family tree rests on this fiction that a “language” unproblematically forms an atomic unit, and that innovations just “happen” in them.|167|language history, family tree, linguistic diffusion, 2116|Francois2014|This simplistic view was challenged as early as the end of the 19th century by the work of dialectologists (Gilliéron 1880, Wenker 1881), who showed that a given language typically consists of a network of dialects that can show a great deal of diversity. Language properties were found to be distributed in space following complex patterns, described visually using isoglosses. :comment:`Also similarly argumented in biology now, where every gene is supposed to have its own history (also mosaic history).`|167|word history, language change, family tree, mosaic history 2117|Francois2014|As for dialects and languages, they form more or less homogeneous systems shared by a network of mutually intelligible idiolects. When historical linguists identify a change that happened “once” in a “language”, they really encapsulate a long process of diffusion that took place across large networks of idiolects, sometimes spanning across several generations. :comment:`This is definitely true, but it is important to keep in mind that dialect networks also have their borders, and that mutually unintelligible varieties may also exchange material. The tree is an idealization of the diversity in a single language by reducing it to a single node, but this does not mean that the idealization is not justified, since mutual intelligibility has its definite borders, and therefore the processes of language-internal and language-external diffusion differ from each other.`|168|mutual intelligibility, dialect network, language change, innovation, 2118|Francois2014|Such a process is not fundamentally different from what is involved in language contact: both forms of diffusion involve the progression of a new linguistic behaviour across a social network of individual speakers – a process that is not reducible to a single event. The main distinction is that contact is normally a process of diffusion observed across separate languages, whereas language‐internal diffusion involves mutually intelligible idiolects, which together may be taken to form a single (more or less homogeneous) language community. :comment:`We can argue that it IS fundamentally different, as language-internal diffusion involves first-language acquisition, and every speaker can immediately take up on the innovation, while language-external diffusion involves a small amount of bilinguals who can bring in the words into their first language where it further diffuses. So the limits of diffusability differ greatly, and for this reason we can as well ignore the language-internal diffusion.`|168|borrowing, linguistic diffusion, borrowability, language history 2119|Francois2014|Specifically, the language‐ internal diffusion of innovations does not have to target an entire language community, and commonly settles down to just a cluster of dialects, so that successive innovations target different segments of the network. In this case, the intricate patterns resulting from language‐internal diffusion cannot be captured by a tree, and need to be accounted for by a different model.|168|incomplete lineage sorting, isoglosses, language history, family tree, divergence 2120|Francois2014|Schuchardt, for example, linked it with a general disbelief in the Neogrammarians’ views on the regularity of sound change (Schuchardt 1885). Such an extreme stance is however not essential to the Wave Model, and unduly throws the baby (the Comparative Method) out with the bathwater (the Tree Model). A synthesis should be possible, which preserves the principle of regularity and other useful tenets of the Comparative Method, yet replaces the simplistic tree representations with a wave‐inspired approach.|169|Hugo Schuchardt, wave theory, comparative method, irregularity of sound change, Neogrammarian sound change 2121|Francois2014|As these dialects increase their differences and lose mutual intelligibility, the end result is an increase in the number of distinct languages. Yet crucially, whereas the Tree Model assigns linguistic diversification to social splits with loss of contact (§2.2), the Wave Model is compatible with scenarios where communities remain in contact. In fact, it treats linguistic contact – in the form of multiple, entangled events of diffusion across mutually intelligible dialects – as the very key to understanding patterns of language diversification (cf. François 2011a). This is a radical shift in perspective. :comment:`This is the radical fallacy: if we cannot describe the diversification in successive steps, we will use a multi-furcating tree, and not a subgrouping of binary splits. We can use methods (or think of methods) that help us to improve the trees to find the real splits, and François admits himself that the languages later on split "completely", becoming mutually unintelligible, thus representing tree-like divergence (as trees are there to display how things split!).`|170|wave theory, family tree, language split, diversification 2122|Francois2014|One could propose that the two models are complementary, in the sense that trees would be well‐designed to represent the genealogical relations between separate LANGUAGES; whereas waves would only be concerned with the complex relations between DIALECTS within the boundaries of each language. The two models would then both be useful, but at different grains of observation. I think this view is wrong, for one important reason: namely, that many language (sub)families – as we will see below – have in fact arisen from the diversification of former dialect continua. :comment:`Again a fallacy here: the tree is, as François mentioned earlier himself, not used to represent diversification of languages, but splits of language varieties into mutually unintelligible languages. And if we have mutually unintelligible languages, this presupposes a split, and the split should be displayed by a tree. How the split was accompanied, and whether it contains intermediate splits, or unresolvable dialect continua which would result in multifurcations, etc. is the responsibility of the linguists to be reconstructed.`|170|family tree, wave theory, phylogenetic network, language history 2123|Francois2014|This important point has been made by Malcolm Ross, around the concept of linkage (Ross 1988, 1996, 1997, 2001). Ross (1988:8) defines a linkage as “a group of communa‐ lects which have arisen by dialect differentiation”, where ‘communalect’ is a generic term which may refer to modern dialects or languages. When a dialect continuum – typically structured along the lines of Figure 3 above – evolves in such a way that its members lose mutual intelligibility, it becomes a linkage. A linkage thus consists of separate languages which are all related and linked together by intersecting layers of innovations; it is a language family that cannot be represented by any genealogical tree. :comment:`Again wrong: if we cannot identify intermediate distinct proto-languages, we don't reconstruct them and suppose a multifurcation, so a linkage is in this way just a multifurcation, and it is still a tree.`|171|linkages, dialect continuum, diversification, language history, family tree, wave theory 2124|Francois2014|Would such social‐split signals justify preserving the Tree Model? Not necessarily, for two reasons. First, even if the existence of a separate AB cluster could be represented visually by a ‘branch’ linking Proto‐ABCDEFGH to Proto‐AB, the entangled isoglosses among CDEFGH would still be incompatible with a tree, and would need to be represented by waves anyway. All in all, a wave diagram such as Figure 3 is both necessary and sufficient to display the splits in question, and a tree would add nothing more. :comment:`Again wrong, as a tree can display uncertainty and multifurcation, AND it displays the before and after of potential waves of innovation, even potential later borrowings between languages. So a tree is necessary, or a structure that gives a temporal order of innovations, in order to write language HISTORY, as history presupposes an order of events. Wave models as isogloss bundles are only a display of data.`|172|wave theory, isoglosses, language history 2125|Francois2014|The second argument is of a more epistemological nature, and still favours the Wave Model even in situations of neat social split. Under the Tree Model, splits are assumed to be the only force underlying the formation of subgroups; this constitutes an aprioristic axiom for the whole model to hold together. By contrast, under a Wave approach, the identification of such splits is an empirical – and falsifiable – result of observation. In terms of historical reconstruction, this is an invaluable advantage of the latter method. In other words, Waves are not only better designed than Trees for tackling entangled situations of dialect continua and linkages; they even do better at detecting cases of neat split, which the cladistic model merely takes for granted. :comment:`This is again wrong, as splits can neither detected by mapping isoglosses nor by reconstructing simple trees, as the identification of what people think are valid isoglosses is the crucial step of evaluation.`|172|linguistic diffusion, isoglosses, shared innovation 2126|Francois2014|Insofar as the Wave Model is agnostic as to whether genealogical subgroups should be expected to be nested or to intersect, it constitutes a more encompassing and flexible view of language diversification than the Tree Model; the latter approach entails a number of assumptions and simplifications which are not warranted by what we now know of the actual life of languages. In lieu of trees, historical linguists should use the Wave Model – or some approach derived from it – to achieve a more exact and realistic representation of the genealogical structure of the world’s language families.|172|wave theory, family tree, language history, language model 2127|Francois2014|The tools for distinguishing innovations from retentions are also those of the Comparative Method, and will be illustrated in §4.3.2 below; they include the principle of regularity in sound change, hypotheses on the direction of change and on relative chronology, among other principles. :comment:`This is again the fallacy of the approach, as it is NOT possible to always neatly identify shared innovations, as there are so many different factors that could likewise yield a given pattern.`|174|comparative method, shared innovation 2128|Francois2014|This procedure sometimes involved reasonings on the relative chronology of changes, whenever this was justified by the data.|176|shared innovation, shared retention, comparative method, historical glottometry 2129|Francois2014|At this point, I deliberately avoided making judgments – which would have been largely arbitrary – regarding whether a given innovation was a “common” or an “uncommon” type of change. While this precaution is made necessary by an all‐or‐nothing approach such as the Tree Model (where an uncommon change can serve as a fatal counterexample to a particular subgrouping hypothesis), it is much less relevant in a model capable of handling innovations in conflicting distributions. In fact, in the event that a subgroup AB were supported by ten ‘rare’ innovations and BC by ten ‘common’ ones, there would be no legitimate reason for considering AB to be more strongly supported than BC, and it would be legitimate to give equal weight to the two subgroups, regardless of the nature (common vs. uncommon) of their internal innovations.|174|shared innovation, shared retention, historical glottometry 2130|Francois2014|My hypothesis, which proved successful, was that a large enough number of data points should yield a strong genealogical signal based on well supported subgroups, whereas any noise due to parallel innovations would be reduced, due to the low attestation of associated language clusters.|174|historical glottometry, shared innovation, shared retention 2131|Francois2014|Importantly, all the innovations considered here are unlikely to result from recent borrowing, and can be safely assumed to have been diffused in the earlier times of mutual intelligibility: they are therefore strongly diagnostic of genealogical relations in the sense of the Comparative Method. |178|shared innovation, shared retention, historical glottometry 2132|Francois2014|However, NeighborNet has the disadvantage of being ambiguous as to which of the two sides of a split (bundle of parallel lines) corresponds to a genealogical, innovation‐defined [pb] subgroup. For example, the major split visible between Mota and Mwerlap is indicative of a genealogical subgroup, but doesn’t specify which side is innovative: one needs to look up the historical data separately to realise that the relevant subgroup here is the southern one, running from Mwerlap to Lakon. :comment:`misunderstanding of the neighbornet algorithm, as it is based on distance data, but François is interested in the direct features, which would require character-data.`|178f|Neighbor-Net, shared innovation, distance-based methods, character-based methods, 2133|Francois2014|To avoid this pitfall, Historical Glottometry proposes a method for weighing the amount of evidence supporting each subgroup, so as to reconstruct the most significant patterns in the genealogical history of a language family.|180|shared innovation, weights, methodology, historical glottometry 2134|Francois2014|For any given subgroup G, let be the number of supporting innovations (i.e. innovations which include that whole subgroup in their scope, whether exclusively or not), and the number of conflicting innovations (i.e. innovations whose scope crosscuts G, by involving only some members of G together with some non‐members). The total amount of evidence that is relevant for assessing the cohesiveness of G is . Now, if we call the cohesiveness value of G, we have: :math:`k_g = \frac{p}{p+q}` |180|historical glottometry, weights, definition 2135|Francois2014|The degree of support for a genealogical subgroup can be measured in two ways. In absolute terms, its number of exclusively shared innovations (E) indicates the number of times the subgroup is ‘attested’; in relative terms, its cohesiveness rate (k) indicates how close it is to a perfect subgroup. :comment:`this is wrong, as cohesiveness still includes the number of exclusive innovations, so this is just an error here.`|181|historical glottometry, cohesiveness, definition 2136|Francois2014|The support for each subgroup is visually represented by having line thickness proportional to subgroupiness (E). The brightness of the contour line is proportional to cohesiveness (K), with more cohesive subgroups appearing brighter.|182|historical glottometry, glottometric diagram, visualization 2137|Francois2014|Main paper introducing the theory behind the method of historical glottometry.|000|historical glottometry, introduction, 2138|Crangle2013|This paper presents a new method of analysis by which structural similarities between brain data and linguistic data can be assessed at the semantic level. It shows how to measure the strength of these structural similarities and so determine the relatively better fit of the brain data with one semantic model over another. The first model is derived from WordNet, a lexical database of English compiled by language experts. The second is given by the corpus-based statistical technique of latent semantic analysis (LSA), which detects relations between words that are latent or hidden in text. The brain data are drawn from experiments in which statements about the geography of Europe were presented auditorily to participants who were asked to determine their truth or falsity while electroencephalographic (EEG) recordings were made. The theoretical framework for the analysis of the brain and semantic data derives from axiomatizations of theories such as the theory of differences in utility preference. Using brain-data samples from individual trials time-locked to the presentation of each word, ordinal relations of similarity differences are computed for the brain data and for the linguistic data. In each case those relations that are invariant with respect to the brain and linguistic data, and are correlated with sufficient statistical strength, amount to structural similarities between the brain and linguistic data. Results show that many more statistically significant structural similarities can be found between the brain data and the WordNet-derived data than the LSA-derived data. The work reported here is placed within the context of other recent studies of semantics and the brain. The main contribution of this paper is the new method it presents for the study of semantics and the brain and the focus it permits on networks of relations detected in brain data and represented by a semantic model.|000|semantic similarity, cognition, 2139|Crangle2013|Paper investigates a very small corpus of words which are checked for semantic similarity using * wordnet * collocations (or how this is called, that is, word embeddings) * brain analysis (neurological stuff) They do not use concepts, but rather place names in Europe, so their results are not necessarily comparable with our work on concept similarity in CLICS or other tools.|000|neurology, semantic similarity, word embeddings, psycholinguistics, neurolinguistics 2140|Motomura2012|The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or ‘‘words’’. We first confirmed that the English language highly likely follows Zipf’s law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and ‘‘compressed’’ English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species- specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., ‘‘key words’’) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.|000|biological parallels, amino acid alphabet, protein structure, word embeddings 2141|Motomura2012|They compare words in a language and sentences with amino-acids and show that they follow Zipf's law in English language, and then apply the measures to biology to measure amino-acid subsequences in proteins. They fail to see the real similarity not with words in the language, but with constituents of words.|000|biological parallels, amino acid alphabet, protein structure, word frequency, Zipf's law 2142|Palmer2009|With the urgent need to document the world’s dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated cor- pus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.|000|computer-aided approaches, annotation, linguistic annotation, language documentation 2143|Palmer2009|Paper is a good example on how computer-assisted strategies are important for language documentation and annotation efforts, so it can be quoted in a context like this.|000|computer-aided approaches, linguistic annotation, language documentation 2144|Johnson2011|How relevant is linguistics to computational linguistics? How has the statistical revolution that swept computational linguistics in the 1990s affected the relationship between linguistics and computational linguistics, and how might this relationship change in the near future? These are complex questions, and this paper presents my personal perspective on them. I start by explaining what I take computational linguistics to be, and discuss the relationship between its scientific side and its engineering applications. Statistical techniques have revolutionised many scientific fields in the past two decades, including computational linguistics. I describe the evolution of my own research in statistical parsing and how that lead me away from focusing on the details of any specific linguistic theory, and to concentrate instead on discovering which types of information (i.e., features) are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised. I end by describing some of the ways that ideas from computational linguistics, statistics and machine learning may have an impact on linguistics in the future.|000|quantitative turn, computational linguistics, discussion, NLP 2145|Johnson2011|Paper is potentially highly interesting in the context of comparing what NLP is and what computational linguistics should be. Interestingly, they also talk of a "statistical revolution", similar to our quantitative turn in historical linguistics.|000|computational linguistics, NLP, quantitative turn, discussion 2146|Jakobson1960|This is the famous article by Jakobson, discussing the poetic function, titled "Closing statement: linguistics and poetics", where Jakobson postulates, based on the Organon model by Karl Bühler, his 6 functions of the linguistics sign, including the poetic function.|000|Roman Jakobson, poetic function, linguistic sign 2147|Kay2011|In every field of scientific enquiry, there is much data and therefore frequent cause to turn to the computer to help process it. This is cer- tainly true of linguists. They use computers to search for examples of grammatical phenomena in large corpora and to collect statistics on their occurrence. They can use them to compile lexica, and to compare them with a view to assessing the relatedness of pairs of languages. Activities like these are collectively referred to as Natural Language Processing (NLP). Generally speaking, however, NLP is an engineer- ing, rather than a scientific enterprise, much of it devoted to developing technologies, like machine translation, information retrieval, and speech recognition. It would be natural to expect these technological develop- ments to be informed by the results of scientific enquiry carried out by linguists. In other words, it would be natural that they should have a foundation in computational linguistics. But this is rarely the case. Technological development in NLP is based almost entirely on machine- learning models most of which are wild and fantastical from a linguist’s perspective. This, of course, is an aberration which, fortunately, may be in the course of correction.|000|computational linguistics, nlp, discussion, Zipf's law, linguistic sign 2148|Kay2011|Author discusses the problematic ignorance of NLP towards linguistic theory.|000|discussion, nlp, computational linguistics, linguistic theory, linguistic sign, Zipf's law 2149|Schnoebelen2009|Using MrBayes .. code:: > execute > lset nst=6 rates=gamma > showmodel > mcmcp ngen=1500000 printfreq=10000 samplefreq=150 nruns=1 nchains=4 savebrlens=yes filename=; > mcmc; > set nowarnings=yes; > sumt burnin=2500; > sump burnin=2500;|13f|tutorial, MrBayes, howto, Bayesian phylogenetics 2150|Steel2013|A major problem for inferring species trees from gene trees is that evolutionary processes can sometimes favor gene tree topologies that conflict with an underlying species tree. In the case of incomplete lineage sorting, this phenomenon has recently been well-studied, and some elegant solutions for species tree reconstruction have been proposed. One particularly simple and statistically consistent estimator of the species tree under incomplete lineage sorting is to combine three-taxon analyses, which are phylogenetically robust to incomplete lineage sorting. In this paper, we consider whether such an approach will also work under lateral gene transfer (LGT). By providing an exact analysis of some cases of this model, we show that there is a zone of inconsistency when majority-rule three-taxon gene trees are used to reconstruct species trees under LGT. However, a triplet-based approach will consistently reconstruct a species tree under models of LGT, provided that the expected number of LGT transfers is not too high. Our analysis involves a novel connection between the LGT problem and random walks on cyclic graphs. We have implemented a procedure for reconstructing trees subject to LGT or lineage sorting in settings where taxon coverage may be patchy and illustrate its use on two sample data sets.|000|species tree, lateral gene transfer, consensus tree, algorithms, incomplete lineage sorting 2151|Steel2013|The algorithm is implemented as part of DendroScope, where it is called "primordial consensus tree". Interestingly, their algorithm is also robust to incomplete lineage sorting, as shown by experiments, which means that this method could make up for inconsistencies in linguistic analyses.|000|species tree, primordial consensus tree, consensus tree, algorithms, lateral gene transfer, incomplete lineage sorting 2152|Ross1988|The traditional family tree diagram does not distinguish between separation and dialect differentiation (and perhaps implies that diversification is always by separation). The genetic trees in this work show separation as a branching node: [pb] :comment:`image of a tree` i.e. communalects A and B are descended from Proto X by separation. Dialect differentiation is shown as a double horizontal line: :comment:`a turned-T illustrating linkage-divegence`|9f|linkages, dialect chain, incomplete lineage sorting, family tree 2153|Scally2012|Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.|000|incomplete lineage sorting, gorilla, gorilla genome, hominids, hominid evolution 2154|Scally2012|They explain why 30% of the genes in the Gorilla genome are closer to the human genome or Chimpanzee than to each other, which is explained with ILS.|000|incomplete lineage sorting, gorilla, gorilla genome, hominids, hominid evolution 2155|Mennecier2016|We have documented language varieties (either Turkic or Indo-European) spoken in 23 test sites by 88 informants belonging to the major ethnic groups of Kyrgyzstan, Tajikistan and Uzbekistan (Karakalpaks, Kazakhs, Kyrgyz, Tajiks, Uzbeks, Yaghnobis). The recorded linguistic material concerns 176 words of the extended Swadesh list and will be made publically available with the publication of this paper. Phonologi- cal diversity is measured by the Levenshtein distance and displayed as a consensus bootstrap tree and as multidimensional scaling plots. Linguistic contact is measured as the number of borrowings, from one linguistic family into the other, according to a precision/recall analysis further validated by expert judgment. Concerning Turkic lan- guages, the results of our sample do not support Kazakh and Karakalpak as distinct languages and indicate the existence of several separate Karakalpak varieties. Kyrgyz and Uzbek, on the other hand, appear quite homogeneous. Among the Indo-Iranian languages, the distinction between Tajik and Yaghnobi varieties is very clear-cut. More generally, the degree of borrowing is higher than average where language families are in contact in one of the many sorts of situations characterizing Central Asia: frequent bilingualism, shifting political boundaries, ethnic groups living outside the “mother” country.|000|dataset, Indo-Iranian, Turkic, concept list, edit distance, lexical borrowing 2156|List2016g|Background: For a long time biologists and linguists have been noticing surprising similarities between the evolution of life forms and languages. Most of the proposed analogies have been rejected. Some, however, have persisted, and some even turned out to be fruitful, inspiring the transfer of methods and models between biology and linguistics up to today. Most proposed analogies were based on a comparison of the research objects rather than the processes that shaped their evolution. Focusing on process-based analogies, however, has the advantage of minimizing the risk of overstating similarities, while at the same time reflecting the common strategy to use processes to explain the evolution of complexity in both fields. Results: We compared important evolutionary processes in biology and linguistics and identified processes specific to only one of the two disciplines as well as processes which seem to be analogous, potentially reflecting core evolutionary processes. These new process-based analogies support novel methodological transfer, expanding the application range of biological methods to the field of historical linguistics. We illustrate this by showing (i) how methods dealing with incomplete lineage sorting offer an introgression-free framework to analyze highly mosaic word distributions across languages; (ii) how sequence similarity networks can be used to identify composite and borrowed words across different languages; (iii) how research on partial homology can inspire new methods and models in both fields; and (iv) how constructive neutral evolution provides an original framework for analyzing convergent evolution in languages resulting from common descent (Sapir’s drift). Conclusions: Apart from new analogies between evolutionary processes, we also identified processes which are specific to either biology or linguistics. This shows that general evolution cannot be studied from within one discipline alone. In order to get a full picture of evolution, biologists and linguists need to complement their studies, trying to identify cross-disciplinary and discipline-specific evolutionary processes. The fact that we found many process-based analogies favoring transfer from biology to linguistics further shows that certain biological methods and models have a broader scope than previously recognized. This opens fruitful paths for collaboration between the two disciplines.|000|incomplete lineage sorting, process-based analogies, biological parallels, similarity networks, constructive neutral evolution, word formation 2157|List2016g|If polymorphisms created from word formation (see below) or lexical replacement are resolved after rapid divergence of the languages, ILS creates patterns quite similar to those observed with genetic alleles in biology.|6|incomplete lineage sorting, biological parallels, polymorphisms, language evolution 2158|Flegontov2016|In a recent interdisciplinary study, Das et al. have attempted to trace the homeland of Ashkenazi Jews and of their historical language, Yiddish (Das et al. 2016. Localizing Ashkenazic Jews to Primeval Villages in the Ancient Iranian Lands of Ashkenaz. Genome Biol Evol. 8:1132–1149). Das et al. applied the geographic population structure (GPS) method to autosomal genotyping data and inferred geographic coordinates of populations supposedly ancestral to Ashkenazi Jews, placing them in Eastern Turkey. They argued that this unexpected genetic result goes against the widely accepted notion of Ashkenazi origin in the Levant, and speculated that Yiddish was originally a Slavic language strongly influenced by Iranian and Turkic languages, and later remodeled completely under Germanic influence. In our view, there are major conceptual problems with both the genetic and linguistic parts of the work. We argue that GPS is a provenancing tool suited to inferring the geographic region where a modern and recently unadmixed genome is most likely to arise, but is hardly suitable for admixed populations and for tracing ancestry up to 1,000 years before present, as its authors have previously claimed. Moreover, all methods of historical linguistics concur that Yiddish is a Germanic language, with no reliable evidence for Slavic, Iranian, or Turkic substrata. |000|geographic population structure, human prehistory, Turkey, Yiddish 2159|Flegontov2016|Based on overwhelming empirical evidence, modern linguis- tics generally defines primary evidence for genetic relationship of languages as 1) a significant number of etymological matches between their basic vocabularies, and 2) a significant number of etymological matches between their main gram- matical exponents (such as number, case, person, etc.), see, for example, @Campbell<2008> & Poser (2008).|2263|genetic relationship, proof of relationship 2160|Flegontov2016|The Germanic (or, more precisely, High German) affiliation of Yiddish is thus firmly based on two observations: 1) the Yiddish basic vocabulary is predominantly Germanic, and 2) the majority of grammatical exponents, including the main ones, are Germanic. This may be easily demonstrated by con- sulting such standardized basic wordlists as the 200-item wordlist of Morris Swadesh (where only a small handful of items are of Hebrew or Slavic origin), or the 700-item T. Kaufman’s basic concept list, only approximately 10% of which is of Slavic, and approximately 5% of Hebrew origin.|2263|Yiddish, proof of relationship, genetic relationship, basic vocabulary, concept list 2161|Faarlund2016|The present article is a summary of the book English: The Language of the Vikings by Joseph E. Emonds and Jan Terje Faarlund. The major claim of the book and of this article is that there are lexical and, above all, syntactic arguments in favor of considering Middle and Modern English as descending from the North Germanic language spoken by the Scandinavian population in the East and North of England prior to the Norman Conquest, rather than from the West Germanic Old English.|000|genetic classification, English, North Germanic, proof of relationship 2162|Faarlund2016|We mention just a few typical examples out of hundreds: bag, birth, both, call, crook, die, dirt, dike, egg, fellow, get, give, guess, likely, link, low, nag, odd, root, rotten, sack, same, scrape, sister, skin, skirt, sky, take, though, ugly, want, wing, etc. It is essentially unheard of that a living language on its own territory borrows [pb] huge numbers of daily-life terms from an immigrant population whose lan- guage dies out, yet that is what the traditional scenario is forced to claim about Middle English. Burnley (1992), in fact, concludes that about half the common Germanic words of English are not of English origin, and very few of these, relatively speaking, have any source other than Scandinavian.|3f|basic vocabulary, English, language contact, Scandinavian, 2163|Faarlund2016|They claim that English is not West- but North-Germanic, yet while only showing a few items, of which many are tools and the like, without a comparison to basic vocabulary in the proper sense, they mostly bring up syntactic similarities, which are not necessarily the solid proof we need to claim a higher affiliation of English. We are, in some sense, again close to the question of the "Tree of one percent" (@Dagan2006 ), as we could say that there was mixture to such a degree that we cannot settle it, but anyway, it is a bit strange as an argument in full.|000|English, proof of relationship, genetic classification, North Germanic, Scandinavian 2164|Gelderen2016|The split infinitive is one of seven syntactic properties that English is said to share with Old Norse, and I will show that, on the basis of the area and date of its first occurrence, Norse origin is unlikely.|000|split infinitives, English, proof of relationship 2165|Gelderen2016|This is a comment on @Faarlund2016|000|- 2166|Holmberg2016|The conclusion seems inescapable, if the facts in Emonds & Faarlund are more or less right: Middle English would be the outcome of a shift from West Germanic grammar to an eccentric form of North Germanic grammar.|000|English, Old Norse, proof of relationship, genetic classification 2167|Holmberg2016|This is a reply to @Faarlund2016|000|- 2168|Kemenade2016|Emonds & Faarlund’s assessment of the language contact situation between Anglo- Saxons and Vikings is ill-informed historically. I furthermore discuss a number of instances where their analysis of Scandinavian linguistic impact on English is based on overly hasty interpretations of the literature.|000|English, affiliation, genetic affiliation, Old Norse, 2169|Kemenade2016|comment on @Faarlund2016|000|- 2170|Kortmann2016|The Viking Hypothesis neglects (i) the significant degree of stability from Old to Middle (and even Modern) English grammar and (ii) parallel, but independent, develop- ments not induced by North Germanic in the grammars of continental West Germanic dialects.|000|English, genetic affiliation 2171|Kortmann2016|Comment on @Faarlund2016.|-|- 2172|Los2016|Syntax is not the right arena to make a case for language family relationships.|000|English, genetic affiliation, syntax, proof of relationship 2173|Los2016|Comment on @Faarlund2016.|-|- 2174|McWorther2016|The grammatical items are not Norse, speakers never called it Norse, and Germanic speakers almost always learned their conquerees’ language rather than imposing it— Old English is not Norse.|000|English, genetic affiliation, Old Norse 2175|McWorther2016|Comment on @Faarlund2016|000|- 2176|FontSantiago2016|Emonds & Faarlund judge subgrouping by problematic criteria and do not actually employ their stated criteria, while those criteria in fact show English to be West Ger- manic.|000|English, West Germanic, North Germanic, genetic affiliation 2177|FontSantiago2016|Comment on @Faarlund2016|000|- 2178|Thomason2016|The Viking hypothesis is fatally flawed, in part because syntax is readily borrowed in intense contact situations, while inflectional morphology usually is not—and Middle English inflectional morphology is overwhelmingly of West Germanic origin. The dis- missal of lexical evidence is also misguided: the vast majority of basic vocabulary items come from Old English, not from Norse.|000|English, Old Norse, genetic affiliation 2179|Thomason2016|Comment on @Faarlund2016|000|- 2180|Emonds2016|This response mostly addresses the comments, but its main point is that 11th c. “lan- guage contact” syntactic changes suggest that Norse was expanding while West Saxon was contracting.|000|English, language contact, Old Norse, genetic affiliation 2181|Emonds2016|Closing comment on @Faarlund2016 and all the comments which had been made before by @FontSantiago2016, @Gelderen2016, @Holmberg2016, @Kemenade2016, @Kortmann2016, @Lost2016, @McWorther2016, @Thomason2016. It seems that the main point of most authors in this discussion is that syntax is not proof for higher affiliation of languages.|000|English, genetic affiliation, syntax, discussion 2182|Davidson2016|Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet- based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.|000|incomplete lineage sorting, lateral gene transfer, species tree, phylogenetic reconstruction 2183|Davidson2016|Authors compare different tree estimation methods and find that some of them actually work very robust, despite higher amounts of LGT and ILS.|000|incomplete lineage sorting, lateral gene transfer, species tree, phylogenetic reconstruction 2184|Pagel2016|Human languages evolve by a process of descent with modification in which parent languages give rise to daughter languages over time and in a manner that mimics the evolution of biological species. Descent with modification is just one of many parallels between biological and linguistic evolution that, taken together, offer up a Darwinian perspective on how languages evolve. Combined with statistical methods borrowed from evolutionary biology, this Darwinian perspective has brought new opportunities to the study of the evolution of human languages. These include the statistical inference of phylogenetic trees of languages, the study of how linguistic traits evolve over thousands of years of language change, the reconstruction of ancestral or proto-languages, and using language change to date historical events.|000|Darwin, cultural evolution, language evolution, biological parallels, 2185|Pagel2016|A recent example on typical simplifying parallels between biology and linguistics.|000|- 2186|Pagel2016a|Anthropologists, borrowing techniques from evolutionary biology, have demonstrated that some common fairy tales can be traced back 5,000 years, or more, long before the development of written traditions.|000|fairy tales, anthropology, anthropological evolution, biological parallels 2187|Pagel2016a|Paper introduces basic ideas behind the questionable idea of building trees from fairy tales.|000|- 2188|Straffon2016|This volume contains 8 articles, one introduction by @Straffon2016a on the whole idea behind phylogenetic applications in archaeology. Three studies on concepts and theories, by @Kressing2016, @Rivero2016, and @Lycett2016. Four case studies on application, by @Knappett2016, @Prentiss2016, @Caridi2016, and @Tripp2016.|000|archaelogy, phylogenetic studies, phylogenetic reconstruction, phylogeny, cultural evolution 2189|Straffon2016a|Inferring and explaining cultural patterns and the ways in which human groups relate and interact over large spans of time or space is one of the biggest challenges for archaeologists. When dealing with either the remote past or the pres- ent, researchers struggle to learn about the conditions and mechanisms by which cultural traits originate, move, change, and disappear. The use of phylogenetic methods, originated in evolutionary biology to measure relatedness between spe- cies, can help to make signifi cant advances toward those aims. This introduction maps the fi eld of cultural phylogenetics, considers its potential for archaeological research, and summarizes the proposals laid out by the contributors of this book.|000|cultural evolution, phylogenetic studies, biological parallels, analogy, archaeology 2190|Straffon2016a|They mention, probably justified, that there are not many applications of phylogenetic studies, e.g., to morphological traits, in archaeology. The introduction is, however, quite naive regarding the problematic aspects of reticulation and modeling of phylogenetic data, and the power of the algorithms under question.|000|- 2191|Straffon2016a|The present volume is both timely and needed. The past couple of decades have seen an accelerated increase in the number of works and discussions on the mecha- nisms and processes of cultural evolution and the methodological approaches to best describe and analyze them.|1|cultural evolution 2192|Straffon2016a|Yet, for many archaeologists who are not familiar with evolutionary science, it may not be completely clear what phylo- genetics is and why or how it should be applied in their fi eld of work. The aim of this book is, on the one hand, to address precisely this issue and, on the other, to offer a selection of clear examples of what phylogenetic methods can contribute to archaeological research. This introduction will briefl y present the fi eld of phyloge- netics through its key concepts and will discuss its potential applications in the human sciences in general and archaeology in particular.|1|- 2193|Straffon2016a|Basically, a phylogeny is a hypothetical reconstruction of those evolutionary relationships, built by identifying the distribution of characters among species and inferring their development. That is, a phylogeny constitutes a hypothesis of an evolutionary history. Phylogenetic systematics, also known as cladistics, refers to the methods of phy- logenetics. The aim of these methods is to infer which organisms share ancestry with others and the amount of evolutionary changes that may have occurred within lineages. In this manner, cladistics organizes taxa in groups according to relatedness (clades), by identifying shared characters among them, which have been inherited from a common ancestor (shared derived characters, or synapomorphies). :comment:`Why do they only describe phylogenetic as phylogenetic systematics, and where are all the other approaches included in phylogenetics, especially also the more recent network approaches? It is not only about the simple kind of relatedness by inheritance, what phylogenetics is about.`|2|phylogenetics, introduction, archaeology 2194|Straffon2016a|The result of a phylogenetic analysis is a cladogram, a branching diagram that groups taxa by shared descent. Phylogenies may be represented as trees, lines, or networks, which help visualize the processes of divergence, branching episodes, and convergence, as well as continuity or extinction. Cladograms then serve to test hypotheses about origin, relatedness, change, and, when coupled with a compara- tive approach, adaptation.|3|- 2195|Straffon2016a|As we have noted, the mechanisms that allow for a phylogenetic classifi cation and analyses are common descent and variation from the ancestral form. Therefore, phy- logenetic methods are applicable to any trait or entity, whether genetic or cultural, as long as it undergoes descent with modifi cation (@Levinson<2012> and Gray 2012 , 167).|3|- 2196|Straffon2016a|But biological and cultural evolution also differ in that the latter includes much higher rates of horizontal transfer, hybridization, and borrowing than its genetic counterpart, and its effects can occur much faster, in “Lamarckian” fashion (Gould 1996 , 355).|3|- 2197|Straffon2016a|It is fair to say that researchers involved in cultural evolution are nowadays quite aware of these discussions and their challenges and, like the authors in this book, have taken to fi nding ways of integrating evolutionary methods into their studies [pb] while taking into account that these cannot be simply translated across fi elds but must consider the particular properties of their subject. :comment:`Curious to see whether this is actually true.`|3f|- 2198|Straffon2016a|Tree-shaped representations of relatedness, for instance, have been com- mon in linguistics since the nineteenth century, when the study of language family trees became a standard analytical tool in that fi eld (see @Kressing<2016> and Krischel [...]). :comment:`This is of course not true, as we have early networks as well, and we usually assumed that languages evolve by exchange, rather than inheritance and descent with modification.`|4|- 2199|Straffon2016a|The transmission and accumulation of learned knowledge and behavior across gen- erations is what allows human culture to thrive (Tennie et al. 2009 ). Learning from others and the ability to share information underlie our species’ success in exploit- ing a vast variety of environments, which has allowed it to colonize the globe in a mere hundred thousand years, or so. The quick spread and development of modern human culture that have been made possible by cumulative learning may well be “the greatest transformation in the shortest time that our planet has experienced since its crust solidifi ed nearly four billion years ago” (@Gould<1996> 1996 , 354).|4|- 2200|Gould1996|“the greatest transformation in the shortest time that our planet has experienced since its crust solidified nearly four billion years ago” :comment:`Quoted after` @Straffon2016a |354|nice quote 2201|Straffon2016a|The methods of phylogenetics can be put to use in archaeological research in dif- ferent ways. First, like in biology, they may be used to group related artifacts or series of them (O’Brien et al. 2001 ). Second, they may be used to address what is known as “Galton’s problem,” which refers to the issue of homology vs. analogy: cultural simi- larities, like homologies in organisms, are unlikely to always be the result of indepen- dent invention; rather, they may be attributed to a series of other factors, such as common history, borrowing, diffusion, and coevolution. Constructing cultural phylog- enies can help make inferences about those factors most likely to have shaped the traits or artifact traditions under study (Mace and Pagel 1994 ; Mesoudi and O’Brien 2009 ). Third, like in linguistics, phylogenetic methods can add to the chronological arrangement of cultural traditions (Gray and Atkinson 2003 ; Holden and Shennan 2005 , 23). [...] Finally, the patterns of relatedness that emerge from such classifi - cations and arrangements can then be used to interpret and explain archaeological cultures. |5|- 2202|Straffon2016a|However, while there are important monographs addressing the uses of phylogenetic methods in archaeology (e.g., García Rivero 2013 ; Mesoudi 2011 ; O’Brien and Lyman 2003 ), there are but few edited volumes dedicated to specific case studies that exemplify the potential that cultural phylogenetics holds for our field (e.g., Lipo et al. 2006 ). In this sense, the present book is a welcome addition.|5|- 2203|Kressing2016|This contribution shows that evolutionary thought which dominated the discourse on the development of human populations, cultures, and languages in the nineteenth century (1) dates back to pre-Darwinian concepts that emerged in the times of the Enlightenment, (2) was only possible due to an ongoing interdisciplin- ary exchange between different branches of anthropology, and (3) was bound to the idea that lateral exchange of “racial,” linguistic, or cultural traits would contribute to degeneration instead of “progressive” development. Specifi cally, we would like to draw the reader’s attention to two quite contradictory strains in the history of science: Evolutionary thought dominated the discourse on the development of human populations, cultures, and languages in the nineteenth century. According to this “leitmotif,” inheritance took place in unilinear trees of descendence, with selection and processes of vertical descent leading to development in consecutive stages. Horizontal or lateral transfer, on the contrary, for example, of words between lan- guages, or interbreeding between different species, populations, or “races” would ultimately lead to degeneration instead of development, spoiling the supposedly “pure” lineages of descent. On the other hand, the development of evolutionary theory that had come to dominate scholarly thought in biology, anthropology, linguistics, and sociology could only emerge due to an ongoing interdisciplinary exchange between different branches of the sciences and the humanities, with a decisive role played by anthro- pology and allied disciplines. This means evolutionary theory favoring pure lines of vertical descent could only develop due to frequent and ongoing “interbreeding” between different scholarly disciplines, thus “spoiling” the pure lines of scientifi c descent! This interdisciplinary, “horizontal” descent is illustrated by the fact that the idea of biological evolution dates back to pre-Darwinian concepts that emerged in the Enlightenment and was fi rst introduced to sociology and the humanities before being applied to the newly emerging discipline of biology in the early nineteenth century. While natural history can be traced back much further, the term “biology” was only established at that time by physicians and naturalists like Beddoes ( 1799 ), Burdach ( 1800 ), and Lamarck ( 1802 ). This “horizontal transfer” of ideas transgressing the borders between the sci- ences and humanities persisted even in periods of rejection of evolutionism in both biology and cultural anthropology. We refer to “anthropology” in a broad sense, combining sociocultural anthropology with biology-derived physical anthropology and also including the neighboring disciplines of archaeology and linguistics in accord with the four fi eld approach of North American anthropology. While the borders of “anthropology” in this sense prove to be hard to defi ne, we understand this as just another indication for the transgression of academic borders and inter- disciplinary networking between scholars – a central topic to be put forward in our paper.|000|history of science, anthropology, linguistics, family tree, biological parallels, 2204|Kressing2016|Paper tries to show that both biologists and linguists and anthropologists did only think in trees although they never realized that they were exchanging ideas across disciplines in a web-like fashion.|000|- 2205|Kressing2016|[...] we intend to recon- struct the development of evolutionary thought in both the physically and culturally orientated branches of anthropology and to point out mutual relations in the history of both the sciences and humanities.|20|- 2206|Kressing2016|We have identifi ed the terms “development” and “degeneration” as central to this discourse. The two terms were by no means exclusively used in anthropology, but also in neighboring disciplines, such as evo- lutionary biology, comparative linguistics, and sociology. Exactly for this reason, we have structured this paper around these two terms, in order to capture the [pb] many- faceted infl uences on theories of classifi cation and evolution of human popu- lations and languages.|20f|- 2207|Kressing2016|We will further show how the idea of evolution declined in biology, linguistics, and cultural anthropology from the beginning of the twentieth century onward, achieving fresh attention with the emergence of human genetics and linguistic long- range comparison in the last quarter of the twentieth century. Again, we claim that this new focus on the idea of coevolution of human languages and genetic clusters is due to perpetuated “lateral transfer,” that means mutual exchange between differ- ent academic disciplines in both the sciences and humanities [...]. :comment:`This is just wrong: long-range comparison was only taken by the people, it was essentially predating the last quarter of the 20th century. And linguists never abandaned evolutionary thinking, they just did not compare their trees with biology too much, as biologists did not really have anything to show.`|21|- 2208|Kressing2016|By “primordialism” we refer to the conceptualization of ethnicity as “based in biology and determined by genetic and geographical factors” which is rooted in Herder’s neoro- mantic concept of the Volk|20f|primordialism, definition 2209|Kressing2016|However, primordialist ideas experienced a Renaissance in the shape of the “new synthesis” between genetic, linguistic, and archaeological data during the 1980s, leading to a model of global phylogeny. The aim of this research program was to investigate if a connection between linguistic macro-phyla and genetic clusters of humankind could be identifi ed. Finding such a connection would lead credibility to primordial- ist thinking in the sense that a close link between the vertical transfer of human genes and languages would be implied. :comment:`Again wrong, as first, nobody listened by then to Cavalli-Sforza and others, and second, they were aware of the separation between human race, population, and culture.`|22|- 2210|Kressing2016|Even though the fi rst phylogenetic tree showing the relationship between different lan- guages is often attributed to Schleicher ( 1861 ), a monophyletic representation by Gallet that shows the development of human languages from a common “langue primitive” can be found as early as 1800 (Arroux 1990 ). :comment:`Much earlier we find both trees and networks in the history of linguistics, and the arbre is also essentially a network, compare` @List2016g|24|- 2211|Kressing2016|Early models of morphological language typology, dating back to Smith ( 1762 ), Herder ( 1772 ), Schlegel ( 1808 ), and Humboldt ( 1836 ), emphasize the linear devel- opment from analytic, presumably “simple” built languages (e.g., Chinese) to “higher,” more complex forms like the agglutinative idioms (e.g., Native American languages) and fi nally to infl ective languages like Semitic or Indo-European. :comment:`While this may be partially true, it does not reflect the thoughts of August Schleicher, who believed that languages would grow old and die.`|25|- 2212|Kressing2016|As a conclusion, we might point out that – in the nineteenth-century evolutionist’s view – evolution works according to a pedigree model, involving vertical transfer. Lateral or horizontal transfer, on the other hand, does not contribute to development to higher stages on the evolutionary chain, but to an evolutionary backlash, i.e., degeneration, spoiling the lineage of the evolutionary tree. The discourse on degen- eration by admixture of both “races” and languages presumes an acceptance of the possibility of lateral transfer. This is illustrated by the fact that fear of degeneration was a topic of concern in both racial theory and linguistics. :comment:`Interesting again, that degeneration was what Schleicher's model of language development would see as the last stage of every language. Schleicher admitted admixture of languages, but we cannot find any value he attributes to this.`|29|- 2213|Kressing2016|The origin of the term degeneration makes clear that the concept was primarily developed and used in the medical fi eld, but by referring to Schleicher ( 1850 ), we have already seen that the term degeneration provides a fi ne example for the transfer of concepts between sciences (in this case medicine as a basis of physical anthropol- ogy) and the humanities and occurred in linguistics as well. :comment:`Again, because he believed in the decay of languages, a metaphor stemming from biology.`|30|- 2214|Kressing2016|All the other branches of the Indo-European family had been corrupted due to migration and intense contact with other non-European languages (Schlegel 1808 ; Schleicher 1850 ; Müller 1855 ). :comment:`It is extremely annoying that the authors do not provide page numbers to substantiate their claims. Furthermore, it is clear that especially Schleicher was the first one to point that Aryan languages were not the origin, and that the Indo-European language predated Sanskrit.`|30|- 2215|Schleicher1850|Es ist eine bemerkenswerte Erscheinung, daß um die Untere Donau und weiter nach Südwesten sich eine Gruppe aneinandergrenzender Sprachen zusammengefunden hat, die beim stammhafter Verschiedenheit nur darin übereinstimmen, daß sie die verdorbensten ihrer Familien sind. Diese mißratenen Söhne sind das Walachische in der romanischen, das Bulgarische in der slavischen und das Albanische in der griechischen Familie. Das Verderbnis zeigt in der nördlichsten Sprache, der zuerst genannten, noch in einem geringen Grade, mehr schon in der mit- tleren, dem Bulgarischen, und hat in der südlichen, der albanesischen einen ihrer Herkunft fast völlig verdunkelnden Grad erreicht. :translation:`It is a remarkable fact that at the lower Danube and further southwest, a bunch of neighbor- ing languages is to be found which – apart from their different origin – only share one fea- ture in the sense that they are the most spoiled representatives of their respective families. These wayward sons of languages are Vlah in the Romance family, Bulgarian in the Slavonic family, and Albanian in the Greek family of languages. The degeneration has achieved a minor degree in the northernmost language (mentioned at fi rst); a higher degree in the central one, Bulgarian; and a degree almost completely obscuring its origin in the southernmost language, i.e., Albanian.` Translation by @Kressing2016 (p 31)|143|Sprachbund, language union, degeneration, Bulgarian, Romanian, 2216|Kressing2016|The emergence of the idea of unilinear human evolution that we outlined before was closely connected to a tradition of constructing tree models of both the biological and the linguistic evolution of humans which, too, can be traced back to the eighteenth century. In both physical anthropology and linguistics, treelike models of descent concerning languages and human biodiversity proved to be equally attractive until the present day. :comment:`Completely wrong, as network models were prevailing in linguistics.`|34|- 2217|Kressing2016|:comment:`Broad attack on scholars working on evolutionary data, including Gray et al, Renfrew, etc., which claims taht scholars are isolated in the field and only want to think in terms of trees. This is essentially not true, as most scholars interest in population and language studies is to which degree they coincide, thus making a research question, rather than a definite axiomatic statement.` All these scholars form a scientifi c community (in the sense of Kuhn 1962 ) within anthropology, sharing an evolutionary approach to their work. The research program of “global phylogeny” does not necessarily imply the idea of cultural, lin- guistic, or physical degeneration by ad mixture (vertical transfer) of genes, language features (phonemes, morphemes), or memes, but emphasizes vertical transfer of languages, culture, and genes as favored and prevailing feature of human evolution. [...] [pb] Comparative linguists’ criticism is even more fundamental: They attack the data sets, mainly based on the Swadesh Lists used for lexicostatistical and glottochrono- logical comparison from the 1950s onward that many of the linguistic macro- families are based on claiming chance resemblances and cultural bias of these lists. |35f|- 2218|Heggarty2010|Linguists have traditionally represented patterns of divergence within a language family in terms of either a ‘splits’ model, corresponding to a branching family tree structure, or the wave model, result- ing in a (dialect) continuum. Recent phylogenetic analyses, however, have tended to assume the former as a viable idealization also for the latter. But the contrast matters, for it typically reflects differ- ent processes in the real world: speaker populations either separated by migrations, or expanding over continuous territory. Since history often leaves a complex of both patterns within the same language family, ideally we need a single model to capture both, and tease apart the respective contributions of each. The ‘network’ type of phylogenetic method offers this, so we review recent applications to language data. Most have used lexical data, encoded as binary or multi-state characters. We look instead at continuous distance measures of divergence in phonetics. Our output networks combine branch- and continuum-like signals in ways that correspond well to known histories (illustrated for Germanic, and particularly English). We thus challenge the traditional insistence on shared inno- vations, setting out a new, principled explanation for why complex language histories can emerge correctly from distance measures, despite shared retentions and parallel innovations.|000|SplitsTree, Neighbor-Net, wave theory, family tree 2219|Heggarty2010|This article claims that the split processes in linguistics differ, and that NeighborNets are better in representing dialect continua than family trees, which follows a bit upon the different processes asssumed in @Ross1988, where dialect divergence is distinguished from language divergence, although here, in this paper, SplitsTree and NeighborNets do not make a difference in the display, as was done by Ross.|000|- 2220|Morrison2014b|Exploratory data analysis (EDA) involving both graphical displays and numerical summaries of data, is intended to evaluate the characteristics of the data as well as providing a form of data mining. For multivariate data, the best-known visual sum- maries include discriminant analysis, ordination, and clustering, particularly met- ric ordinations such as principal components analysis. However, these techniques have limiting mathematical assumptions that are not always realistic. Recently, net- work techniques have been developed in the biological field of phylogenetics that address some of these limitations. They are now widely used in biology under the name phylogenetic networks, but they are actually of general applicability to any multivariate dataset. Phylogenetic networks are fast and relatively easy to calcu- late, which makes them ideal as a tool for EDA. This review provides an overview of the field, with particular reference to the use of what are called splits graphs. There are several types of splits graph, which summarize the multivariate data in different ways. Example analyses are presented based on the neighbor-net graph, which seems to be the most generally useful of the available algorithms. This should encourage the more widespread use of these networks whenever a summary of a multivariate dataset is required.|000|data display networks, exploratory data analysis, phylogenetic network 2221|Kalyan2016|One could be tempted to represent this thorny situation by resorting to the diagram in Figure 5, which does not necessarily commit us to any subgrouping hypothesis. This sort of diagram (cf. Ross 1997: 213) is sometimes used as an “agnostic” representation, which Pawley (1999) calls a “rake-like” structure, and van Driem (2001) likens to “fall- en leaves”. (In phylogenetics this is known as “(soft) polytomy”: 6 see Page & Holmes 2009: 13.) Yet it too is unsatisfactory, as it could be interpreted as claiming that there are no exclusively shared innovations between Spanish and French, between French and Italian, or between Spanish and Italian, when—as we have seen—there is in fact solid, positive evidence for all of these.|a6|conflicting data, family tree, wave theory, historical glottometry 2222|Rivero2016|This paper is a review of evolutionary thought in archaeology. It explains why and how the application of Darwinian evolutionary theory to archaeology is possible and, moreover, useful. It expounds what this scientific field gains from considering the study of material culture and, by extension, of cultural change from this perspective. After explaining the main theoretical principles, it develops a history of the application of this epistemology in archaeology, focusing particularly on the tasks of classification and sequencing of data and thus entering into the current field of cultural phylogenetics.|000|archaeology, review, cultural evolution, phylogenetics 2223|Rivero2016|Paper discusses the importance of evolutionary thinking in archaeology. Interestingly, this is stated as Darwinian thinking, although it is not clear what that actually means. Often, terminology is hard to grasp, and it is not necessarily clear to grasp, why it would be the solution apart from others. Phylogenies can contribute, but only if scholars find ways to deal with the methodological problems resulting from lateral transfer and homology. In contrast to linguistics, archaeology seems to have less clues to resolve problems of potential relatedness in an evolutionary way, as multiple and independent origins are always possible.|000|- 2224|Rivero2016|Archaeology is a historical science that benefits from an understanding of other fields of knowledge (such as anthropology, biology, and psychology, among others) for the purposes of a better explanation of our past and our present. In its case, the theoretical positions and paradigms of reasoning that have been developed are many and, moreover, diverse. Whereas some of these may be described as epistemologies, with theoretical bodies grounded on different funda- ments, others correspond to paradigms and more aseptic and specifi c explicative models.|43|- 2225|Rivero2016|Leaving aside the issue of why there is an unequal acceptance of diverse epistemologies in archaeology, it is clear that this variability in reasoning is natural, logical, and, in every way, positive for our field of knowledge. It is natural and logical because, following philosophical principles, someone who holds that life and the world obey objective universal laws could not persuade someone who sustains that each subject has its own perception and therefore rejects the search for general laws, and vice versa.|44|- 2226|Rivero2016|Despite all this theoretical diversity, there is a common denominator to almost all strands of archaeological thought: the untouchable philosophical cornerstone of human intentionality as the only or, at the very best, the main mechanism driving cultural change. Archaeological theories, therefore, widely accept that history is constituted by progressive directional changes that are predetermined by people themselves. As a result, all fields of knowledge concerned with the study of human culture continue to be based mainly on an underlying Lamarckian model of change.|44|- 2227|Rivero2016|Biology has scientifically validated the Darwinian model of change, which is empirically supported across diverse aspects of life. Not only is it fully supported in the fields of organic biology, but also its likelihood has been recently demonstrated by the data available in areas of knowledge concerned with the study of behavior and cultural change, particularly in certain fields of the social sciences and in archaeology [...]. Moreover, this model fulfills the scientific values established by philosophers of science in relation to epistemology: simplicity, unifying power, fertility, internal coherence and external consistency, Popperian falsifiability, and predictive precision [...]. :comment:`Is this really true, that Darwinian model of change is falsifiable? I do not really see that point.`|45|- 2228|Rivero2016|The Darwinian model does not exclude the possible influence that the variation and plasticity of human behaviors may produce upon change. Rather, it sees these forces as functioning together with other mechanisms to provoke change.|45|- 2229|Rivero2016|For Darwinists, the concept of evolution is based on the notion that the variability of populations changes over time and through space. Thus, the relative frequencies of the different traits (both somatic and cultural) that constitute a population (or a system comprised by several populations) do not remain constant, but rather vary in time. The forces that drive change do not reside exclusively inside the organisms, as may have been suggested by the pre-Darwinian concept of transformism. Apart from the internal variation of organisms, produced by genetic mutation and behavioral innovation, for Darwinists, an important means of change is found outside of the organisms.|46|- 2230|Rivero2016|Selection therefore constitutes the main mechanism due to which the relative frequencies of traits and organisms are modified over time. The traits or organisms that, for whatever reason, fail to reproduce or replicate will not leave descendants or copies of themselves. :comment:`Well, this is not true in language evolution, where selection is not the force, but rather drift in the biological sense.`|46|- 2231|Rivero2016|The features that are required for a system to be analyzed from a Darwinist perspective are: (1) variation, (2) differential reproduction of variation, and (3) inheritance, or the genetic or cultural replication of inherited traits.|46|- 2232|Rivero2016|In biology, the term genotype is applied to the DNA that synthesizes the molecular information expressed in the phenotypic traits (eye color, hair color, etc.). In the case of culture, the genotype should be understood as all of the information that is stored and culturally transmitted – by various mechanisms – between human minds, for instance, the processes of pottery production that are codifi ed in the pot- ter’s mind. In this case, the information is expressed phenotypically in the pottery vessels, upon which selection operates. Thus, behaviors and material culture are [pb] phenotypes expressed through the minds of individuals and function in a similar way to that of the genetic phenotype.|46f|biological parallels 2233|Rivero2016|However, given that this model has enabled great advances in the understanding of the organic world, it would be useful to establish an initial scheme to begin to study how cultural information may be structured, stored, and transmitted. :comment:`But that's the very problem, that it is not as simple as with genes!`|47|- 2234|Rivero2016|In our analyses, if we think in terms of populations and we create cultural units, similar to phonemes and morphemes in linguistics, measurable and appropriate for our hypotheses and methodological tools, then we will be able to see how such cultural variants are distributed within and between populations across different periods. Different relative frequencies will enable us to formulate and test hypotheses – essentially inspired by Darwinian principles – on, for instance, the differential reproduction of cultural traits in populations through time. Such an approach will help to understand the reasons behind the proliferation and decline of different archaeological materials. In other words, it will provide explanations of cultural change.|47|analogy 2235|Rivero2016|But how would we proceed if we were dealing with archaeological materials from the same continent and chronology? It is practically impossible, or at least ambiguous, to establish whether similar traits are due to parallelism or kinship based on the observation of similarity alone. This is the main shortcoming of methods based on the analogy of resemblances and similarities, since these are insufficient and incorrect for the determination of groups and classifi cations.|51|- 2236|Rivero2016|:comment:`Mentions the typical sources of similarity, namely (1) chance [pb] (2) convergence, (3) lateral transfer` |50|- 2237|Rivero2016|As I will argue below, phylogenetics is the only method of classification that enables us to confront and analyze this issue in detail and therefore to put forward more rigorous and consistent classifications and models of historical sequence. :comment:`But what is phylogenetics here? That's the crucial point, as I don't see it, and we can due without phylogenies in linguistics, if only the change pattern is strong enough.`|51|- 2238|Rivero2016|Archaeological methods and interpretations often lack such testing, due to the perceived absence of means of direct (internal) testing. Evolutionary methods and phylogenetics, particularly, enable us to test sequences of classification and the underlying historical hypotheses using a statistical numerical basis in accord with the principle of refutation and the systematization and accumulation of knowledge.|52|- 2239|Rivero2016|The only taxonomic methodology that currently enables us to study transmission is phylogenetics. Phylogenetic methodologies are based on transmission, whether genetic or cultural.|53|- 2240|Rivero2016|Synapomorphies constitute what are technically known as “monophyletic clades.” This unit is the only criterion considered in cladis- tics for the ordering of taxa under the Darwinian taxonomic model.|55|- 2241|Rivero2016|While evolutionary taxonomy contemplates model and process simultane- ously, searching for and analyzing genealogical elements in ecological niches, cladistics is concerned solely with the model, accepting Darwinian theory as the backdrop in which all processes are proposed and explained|55|- 2242|Rivero2016|One of the most common and serious objections is that related to the problem of homoplasy, and it is often underlined that cultural evolution is different in nature to organic evolution, since the former is highly retic- ulated (cf. O’Brien et al. 2008 : 48).|57|- 2243|Rivero2016|Regardless, it is worth noting that this suggested problem is not exclusive to the application of phylogenetics to culture (including material culture). Molecular biolo- gists and researchers involved in the study of microorganisms and plants deal with this issue on a daily basis. The importance of horizontal transmission between genomes and bacteria has been recognized (Margulis and Sagan 2002 ), and high rates of hori- zontal transmission between many species and families of plants and animals have also been recorded. As noted by M. J. O’Brien and R. L. Lyman – with reference to relevant literature (cf. 2003 : 104–105) – even hybridization appears to be well docu- mented in biological evolution, particularly among plants, where it may account for up to 20 % of variation, especially in angiosperms. In the animal kingdom, albeit with lower rates, the case of birds may be illustrative (Laskowski and Fitch 1989 ).|57|- 2244|Rivero2016|The matter of horizontal transmission and homoplasy may thus be considered a methodological problem, rather than a theoretical issue (Bellwood 1996 ), that can be tackled by the study and refi nement of the phylogenetic techniques themselves.|58|epistemological problem, methodological problem, definition, nice quote 2245|Rivero2016|In the field of linguistics, the evolutionary conception was not constrained to the study of the evolution of spoken languages. As early as the fi rst half of the nineteenth century, scholars and analysts of manuscripts such as K. Lachmann and C. Gottlieb Zumpt claimed that the systematic study of variants (changing elements) in script traditions is genealogical in nature. Manuscript traditions have been addressed more recently in several works (cf. Spencer et al . 2006 : 67; Lipo et al . 2006b : 5).|59|stemmatics, history of science 2246|Rivero2016|Very soon after the publication of The Origin of Species , a genealogical model for Indo-European languages was put forward by A. Schleicher ( 1863 ). As is evident [pb] from the title, Schleicher’s work was strongly infl uenced by Darwin’s fundamental theory.|58f|biological parallels, analogy, history of science, August Schleicher 2247|Rivero2016|In this first phase of cultural evolution studies, the techniques and methods applied, specifi cally to the evolution of material culture, aimed at creating classifications or lineages that would show the relationships between archaeological objects over time. For example, “seriation” was founded on the assumption of historic continuity and some sense of inheritance and enabled the construction of chronologically ordered series of materials, under the assumption that the degree of similarity between two objects was related directly to their temporal proximity.|60|- 2248|Rivero2016|A. L. Kroeber’s 1916 work on the pottery assemblages of the Zuni people of New Mexico is the fi rst example of “frequency seriation” (cf. O’Brien and Lyman 2003 : 11–12).|62|history of science, cultural evolution, language evolution, 19th century, 2249|Rivero2016|Some of the most recognized specialists in this line of enquiry have suggested (cf. O’Brien et al . 2008 : 40) that the phylogenetic studies carried out in anthropology and archaeology can be divided into three categories: (1) studies that track lines of transmission and descent back in time in search of common ances- tors (prototypes) to examine the processes underlying the geographical distribution and cultural development of the descendants; (2) approaches that fi rst create nested groups of related taxa, or clades, and then track those taxa geographically; and (3) comparative studies that depend on the understanding of models of descent in order to examine the distribution of functionally adaptive traits.|63|- 2250|Rivero2016|In sum, the special relevance of cladistics may be synthesized in four points: (1) epistemic alignment with the Linnaean taxonomic model and the theory of evolution by descent with modification; (2) systematic application of objective and coherent principles, leading to the reconstruction of phylogenetic and kin relationships by means of nested groups of taxa based on shared derived characters; (3) possibility to test and to refute, i.e., the criterion of Popperian falsifi ability; and (4) objective and systematic assessment of hypotheses and results by means of a statistic base that reduces the uncertainties and ambiguities of other archaeological methods and models.|65|- 2251|Rivero2016|Therefore, the potential and advantages of phylogenetic methodologies for archaeology should not be overlooked, and the (mistaken) criticisms of this line of enquiry can no longer be sustained. The only point that may constitute a founded objection to this method is the problem caused by horizontal transmission and homoplasy. As stated above, this is not a philosophical or theoretical issue but a strictly methodological question.|65|lateral transfer, homoplasy, methodological problem, epistemological problem 2252|Josephson2013|The question of archaism or innovation in the Anatolian branch is of fun- damental importance for Indo-European reconstruction. The “Indo-Hittite” hypothesis of Sturtevant and others, which implies early separation of Proto-Anatolian from the rest of Indo-European, can only be verified if a sufficient number of innovations common to the non-Anatolian branches can be demonstrated.|85|Anatolian, retention, innovation, cladistics 2253|Josephson2013|The Anatolian chains of Wackernagel clitics, which comprise sentence con- nectives and a modal particle in the first slot, a citation particle, a modal particle, enclitic pronouns, and a reflexive particle in a fixed order, with one of the originally five local/directional particles in the final slot, were listed by Watkins (2001: 55) as one of several traits of Anatolian that may have been caused by Hurrian and Hattic influence in a more extensive linguistic area.|88|Anatolian, inheritance, borrowing, cladistics 2254|Josephson2013|The Anatolian area can be characterized as a “spread zone” of an already established subgroup of languages, whose members had originated in some unknown close-knit area before a long period of separation followed by the renewed meeting of Luvian and Hittite in Central Anatolia where close contact was reestablished between them. There was some areal pressure from languages of different structure in the larger macro-area as a result of intensive symbiosis. Some early Hattic influence is found in Hittite; Hur- rian affected Luvian language and culture and spread to Hittite because of the strong Luvian influence on Hittite culture and ritual and Hittite-Luvian bilingualism.|90|Anatolian, contact area, Sprachbund, linguistic area, 2255|Josephson2013|:comment:`Interesting summary of innovations and proposed retentions between Anatolian and the rest of Indo-European.`|95f|shared innovation, shared retention, Indo-European, Anatolian 2256|Josephson2013|Interesting summary on current state of research in Indo-European linguistics on the status of Anatolian and the rest of the Indo-European languages|000|Indo-European, subgrouping, cladistics, evidence 2257|Lycett2016|Recent years have seen substantial growth in the application of evolutionary approaches to spatial and temporal variation exhibited in archaeological data. As is now well known, the application of this approach rests on the basis that artifacts are an expression of a genuine evolutionary system mediated by transmission (via social learning), variation in transmitted elements, and differential replication of transmitted elements across time. While this provides the necessary fundamental basis for the application of an evolutionary approach to artifactual variation, application of the term “evolution” still provides a source of confusion for some archaeologists. Part of this confusion may stem from an underdeveloped body of theory that conceptually makes explicit the link between the evolution of socially transmitted information and the expression of that evolutionary process in terms of physical artifacts. This is especially the case given that artifactual variation is inevitably influenced by several different factors (e.g., raw material properties and/or post manufacture attrition), not all of which are necessarily heritable in systems of social learning. In order to resolve these difficulties and make more clear the case for an evolutionary approach to artifactual variation, there is a need for an explicit quantitative body of theory that links statistical variation in artifactual traits to factors such as selection and drift when (1) sources of artifact variation are multiple and not all necessarily heritable, (2) the proximate socially transmitted elements are unknown, and (3) many artifactual traits will be influenced simultaneously by mul- tiple aspects of socially transmitted practices. Here, it is argued that a “quantitative genetic” approach can resolve these problems.|000|artifact evolution, genotype, phenotype, archaeology, cultural evolution 2258|Lycett2016|As others have noted (e.g., O’Brien and Lyman 2000; Shennan 2006), the roots of applying evolutionary principles to archaeological data go back to the late 1800s and early 1900s, during what some would now refer to as archaeology’s “cultural historical” phase of intellectual development (Trigger 1989).|74|- 2259|Lycett2016|However, between the late 1970s and early 1990s, a series of what have been rightly referred to as “benchmark” papers were published (O’Brien 1996a: xiii), which explicitly began the brave task of try- ing to convince an archaeological audience that principles originally developed to study biological evolution might also be applicable to the archaeological record.|74|- 2260|Lycett2016|However, it would be difficult for any archaeologist to be unaware of the considerable recent expansion of archaeological studies now explicitly utilizing evolutionary principles and methods. Importantly, one feature of this new generation of studies is their highly empirical nature, something which some of its earliest proponents have pointed out was sorely needed if the ield was to progress (O’Brien and Lyman 2000: 22).|74|- 2261|Lycett2016|It is perhaps important to note that these evolutionary studies of material culture have covered time periods from the Paleolithic through to ethnographically and histori- cally recorded items and have dealt with artifact classes from across the globe as diverse as pottery, stone tools, baskets, carpets, house architecture, and watercraft. These above-listed studies, which have largely been published in scientific journals, have been joined by a series of edited volumes that either in whole or in part also deal with the application of evolutionary theory and methods to archaeological data (e.g., Hurt and Rakita 2001; Mace et al. 2005; Lipo et al. 2006; O’Brien 2008; Shennan 2009; Lycett and Chauhan 2010b; O’Brien and Shennan 2010; Ellen et al. 2013). The present volume, of course, adds to this growing list.|75|- 2262|Lycett2016|Firstly, the diversity of important anthropological ques- tions to which evolutionary studies of artifacts can, and indeed must be, applied. Secondly, in so doing, they have illuminated the powerful potential of the archaeo- logical record to shed direct light on issues of social inheritance, the existence of variation, and the differential replication and persistence of those variants over the dimensions of time and space.|75|- 2263|Lycett2016|However, despite such apparent attainment of legiti- macy, it is still not unusual to encounter strong viewpoints arguing that such approaches are, in effect, baseless. Indeed, personal experience at major interna- tional conferences has led to recent situations where even highly accomplished col- leagues within the profession are not necessarily coy about making statements such as “phylogenetics cannot be applied to artifacts because artifacts don’t have genes” or, similarly, “artifacts cannot evolve because they are inanimate objects.” Such criticisms will be familiar to anyone seriously engaged in the evolutionary analysis of artifactual variation, who have repeatedly pointed out the fallacious (and jaded) character of such arguments, which can actually be traced back quite some time to at least Brew’s (1946) assertion that “pots don’t breed” (e.g., see O’Brien and Lyman 2000: 9).|75|nice quote, archaeology, scepticism, anecdote 2264|Lycett2016|One particular reason why evolutionary approaches to artifactual variation may cause confusion is the practical disconnect between the evolutionary process and the expression of that evolutionary process in the form of artifacts.|75|genotype, phenotype, expression, idea, archaeology, terminology 2265|Lycett2016|In strict terms, it is genetic information systems that evolve in the case of biological evolution. Hence, put bluntly, the skull of an individual horse can no more evolve than a pot used to carry water, but that does not repudiate that both are expressions of evolving infor- mation systems or, most importantly, that tracking their variation over time and space cannot reveal important information about that underlying evolutionary pro- cess|76|- 2266|Lycett2016|In the case of biology, that information system is coded at the molecular level in “genes,” while the cultural information system is comprised of socially transmit- ted ideas, concepts, beliefs, and/or practices that either consciously or otherwise influence the form of the archaeological record at a particular time and geographic locality.|76|- 2267|Lycett2016|For example, information about some element of an activity involving tools may be drawn to the attention of another because of usage, causing that latter individual to also adopt that behavior (so-called *stimulus enhancement*). Alternatively, it might be informa- tion about the manner in which a particular artifact looks and that form is later copied (*emulation*), and/or it might involve repeating actual behavioral details (e.g., hand position during pottery production) which are then, in turn, copied (*imitation*). It may also, of course, involve an individual directly and deliberately guiding the attention and/or behavior of another such that the particular artifact is replicated more readily (what many would term *teaching*) (e.g., Thornton and Raihani 2010). Combinations of these mechanisms are also feasible.|76|stimulus enhancement, emulation, artifact evolution, imitation, teaching, 2268|Lycett2016|In particular, three key points were increasingly rec- ognized: (1) that traits of organisms are inluenced by both heritable and nonheritable factors (e.g., “environmental” factors such as nutrition), (2) that individual quantitative traits of organisms (e.g., “height”) were simultaneously influenced by several different heritable elements, and (3) that while traits were influenced by particulate inheritance according to Mendelian principles, patterns of variation displayed by the majority of phenotypic characters were continuously distributed in form (Provine 1971; Roff 1997).|77|environment, environmental factors, simultaneous influence, trait evolution 2269|Lycett2016|However, given the circuitous relationship between the evolution (i.e., descent with modification) of the underlying information system and its physical expression in the form of artifacts, a fully developed evolutionary approach requires a precise (i.e., quantitative) frame- work that outlines—in specific terms—how patterns observed in physical forms over time and space respond to evolutionary forces such as drift and selection when individual objects are not themselves the entities that are evolving.|77|- 2270|Lycett2016|In biology, these matters were resolved by the development of a field that eventually became known as “quantitative genetics” (e.g., Falconer and Mackay 1996; Roff 1997).|77|- 2271|Lycett2016|Recently, a similar framework has been proposed and begun to be developed as a means of tackling analogous problems inherent to the evolutionary analysis [pb] of material artifacts (Lycett and von Cramon-Taubadel 2015).|77f|- 2272|Lycett2016|Thereafter, these factors will be expanded upon, in quantitative terms, to show more precisely how patterns of variation observed in the physical attributes of artifacts over time and space can be linked directly to evolutionary forces such as selection and stochastic factors when (1) sources of artifact variation are multi- ple and not all necessarily heritable, (2) the proximate socially transmitted ele- ments are unknown, and (3) many artifactual traits will be influenced simultaneously by multiple aspects of socially transmitted practices.|78|- 2273|Lycett2016|This was compounded as experi- ments began to demonstrate for the first time that variation between individuals was caused not just by heritable components but also by “environmental” factors, such as the soil conditions within which a plant grew (e.g., Johannsen 1909).|78|history of science, biology, phenotype 2274|Lycett2016|What is now known as the field of “statistical quantitative genetics” provided the urgently needed resolution to these problems (Roff 1997, 2007). A key element in the development of quantitative genetics was Ronald Fisher’s (1918) insight that unlike variations between different types of simple differences in classic “Mendelian” traits (such as differences in flower color or between round versus wrinkled forms of pea), which were caused by allelic differences at a single genetic locus, variation in quantitative traits was caused by segregation of their heritable properties across [pb] multiple genetic loci. That is, most traits observed in organisms (especially quanti- tative traits) are produced by the aggregate effects of two or more genes, leading them to be described as “multifactorial” or “polygenic” features (Mayr 1982: 791– 792). :comment:`Much of blabla, but this means that it was not mono-causal influence on phenotype variation caused by change in genes, but the interaction of multiple genes.`|78f|- 2275|Lycett2016|When looking at quantitative variation in a phenotypic trait across a collection (“population”) of individuals, this model expresses the total vari- ance (V P ) observed as: :math:`V_{Phenotype}=V_{Genetic} + V_{Environment}` where V G is the proportion of phenotypic variance within the population controlled by genetic factors and V E is the proportion of the variance caused by environmental factors (see, e.g., Falconer 1960).|79|- 2276|Lycett2016|In sum, quantitative genetics explicitly takes account of the materiality of the data it is using to study evolution, with the built-in recognition that it is looking at a physical expression of an underlying information system that evolves via a process of descent with modification.|80|- 2277|Lycett2016|There are many analogies between the problems that required the development of quantitative genetic approaches to phenotypic traits and certain difficulties cur- rently facing the evolutionary analysis of artifactual traits (Lycett and von Cramon- Taubadel 2015). [...] Hence, observable archaeological traits will invariably be influenced by multiple unobservable and “pleiotropically” operating cultural elements, while also simultaneously being influenced by “environmental” effects.|80|- 2278|Lycett2016|Just as in the biological case, a key point to note here is the reconciliatory effect this framework can have on several different lines of research looking at patterns of artifactual variability, which might otherwise seem in conflict. For instance, just because a series of stone projectile points may have aspects of their variation influ- enced by factors of raw material, or because of resharpening effects, does not negate the fact that statistically meaningful patterns of variation resulting from important evolutionary factors are also detectable in those artifacts and indeed allow evolu- tionary questions to be addressed (Lycett and von Cramon-Taubadel 2015).|82|- 2279|Lycett2016|volution is undoubtedly a process of “descent with modification” in which the three factors of variation, inheritance of variation, and its differential replication are both the necessary and sufficient factors for its operation. Nevertheless, however, much of this may be correct, in the strictest of terms, a bird’s “wingspan” is not inherited by its offspring, nor does one pot “inherit” the shape of its base from another pot; both of these physical features are the material expression of information that is inherited within an evolving informa- tion system.|83|- 2280|Lycett2016|In terms of a metric biological phenotypic trait (e.g., “length” of a skull, or beak “shape”), the heritability of the trait (notated as H 2 ) is the proportion of the total phenotypic variance across the popula- tion (i.e., V P ), expressed as a ratio of the variance attributable to genetic factors (i.e., V G ). Hence, simply: :math:`H^2=\frac{V_G}{V_P}` The use of the squared symbol here is a reminder that the parameter of heritability is based on the descriptive statistic of variance (i.e., the standard deviation squared) in terms of the two variables used to compute it.|83|heritability, formula, mathematical equation, phenotype, 2281|Lycett2016|:comment:`Gives a larger example on the evolution of handaxes for which hypothetical measures are reported, like, heritability, response to variation, amount of change which might be caused by drift etc. The example has one major flaw: although it seems to be very mathematical and correct, it merely deals with the size of hand axes, and glosses over the fact that it largely simplifies.`|85-88|- 2282|Lycett2016|Linking the raw statistical data of artifactual variation to evolutionary processes is, however, by no means straightforward, especially when some of that variation is caused by factors such as raw material.|88|- 2283|Lycett2016|Quantitative genetics helped resolve these difficulties by more explicitly quantifying the dynamics of these factors and their interactions. It is one of the central arguments of this chapter that a similar framework can help to resolve similar problems in the case of linking variations in the physical traits of artifacts to evolutionary dynamics such as sto- chastic factors (drift) and selection biases.|88|- 2284|Lycett2016|The practical benefit is that it provides a statistical basis for studying artifactual variation and evolutionary patterning, framed expressly in material terms. [...] [pb] he approach moves beyond the primary elements of “descent with modification” to express, in more precise terms, the statistical mechanics of the relationship between selective biases, stochastic factors, and the patterns they create in measureable data. As an outgrowth, the theoretical benefit is that it will help to refocus the way we frame and explain what is being studied and how it is being used to examine the evolution of the under- lying information system—something which is entirely unobservable.|88f|- 2285|Lycett2016|Article explains mostly how ideas from phenotype studies in biology can be used to asses things like heretability, influcence of drift, selection or environmental factors in artifact evolution. This is illustrated in a small example. The interesting point is that the author mainly sticks to phenetics in the evolutionary considerations (as basically does the whole book). This is interesting, as it shows that archaeologists do not even think to come close to anything like the stochastic frameworks developed for genetics, but rather stick to those things who are more interesting for them, namely the distinction of the factors that shape the objects they investigate, be it selection, drift, or enviroment (material in the case of artifacts).|000|- 2286|Tripp2016|Numerous studies have interpreted the anthropomorphic “Venus” statuettes of the Gravettian. However, few of these studies have scrutinized the figurines at an individual level or used quantitative analyses in order to understand similarities within sites or between regions. This study tests two hypotheses. The first one, by Leroi-Gourhan, suggests that the Gravettian statuettes share core similarities regardless of where they were created. If correct, statuettes should not be grouped according to the region that they were made. The second hypothesis, by Gvozdover, suggests a Kostenki-Avdeevo unity. Her hypothesis suggests blending among cul- tures in the Russian Plains and that there are “types” of statuettes that are not restricted to a particular site. Here cladistics methods are used in order to under- stand whether ethnogenesis (blending) or cladogenesis (branching) has occurred in the production of “Venus” making. Results confi rm and extend Gvozdover’s hypoth- esis suggesting cultural and ideological connections for “Venus” making in the Russian Plains and also support the uniqueness of a few European statuettes.|000|cladistics, archaeology, cultural evolution, quantitative analysis 2287|Tripp2016|Article tests basically two different hypotheses on the origin of the Venus figurines, which some scholars think go back to a similar tradition, while other think they were developed independently. The article uses cladistics to see whether the two hypotheses show expected branching patterns on a set of binary characters for which the data was coded. This analysis is somewhat disappointing, since it is not clear why at all phylogenetic analyses should be used, given that it is not even clear that the data reflects related traditions. Hypothesis 2 wins, which claims that there was a coherent tradition in the Russian region while not in others, but this could have equally easy been obtained by using any kind of cluster analysis, namely fuzzy clusters, link communities or other things.|000|archaeology, cladistics, Venus figurines, cultural evolution 2288|Tripp2016|More than 50 such figurines, frequently termed “Venuses,” were [pb] discovered at sites from France to Siberia and as far south as Italy and are dated to the Gravettian.|179f|- 2289|Tripp2016|Since the majority of the statuettes is female and is nude, this has led many researchers to suggest that the fi gurines were created for a uniform same purpose (Soffer and Praslov 1993 ).|181|- 2290|Tripp2016|At present, there has not been a single study demonstrating that the aforementioned artifacts are a homogenous group based on any particular combination of shared stylistic features. The figurines are only assumed to represent a cohesive unit because many of the statuettes are nude and female and have exaggerated sexual features. This however masks the diversity that is apparent when examining individual figurines. Not all figurines are female, many appear to be wearing clothing, and a diversity of body shapes is also apparent (apple, pear, reversed triangle). In fact, from the discovery and initial interpretation of Gravettian female statuettes, they were [pb] never considered homogeneous.|181f|- 2291|Tripp2016|He believed that the fi gurines represented a realistic interpretation of two different Paleolithic races (White 2006 ), one of which, he believed, was inferior and characterized by having greater fat deposits, especially in the stomach, hip, and thigh regions. He labeled these as “Venuses” and directly connected these fi gurines with modern day Bushman, specifi cally with Saartjie Baartman. [...] This terminology has caused much unwarranted confusion because it assumes affi liation. In fact, many archaeologists simply presuppose that we are dealing with a cohesive group of objects, since they are all called “Venuses” (Nelson 1990 ). The application of the term also reinforced the idea that the Gravettian female fi gurines share a single function (Nelson 1990 ). The notion of shared usage did not come from microscopic analysis or contextual analysis but from the term “Venus.”|182|- 2292|Tripp2016|In the present paper, I will argue that in order to understand regional connections, we fi rst need to look at individual differences. To investigate this, I will use cladis- tics to test two contrasting hypotheses, one by Leroi-Gourhan (1982) and the other by Gvozdover ( 1989 ).|183|- 2293|Tripp2016|Both of these hypotheses can be tested through the use of a cladistics analysis and each would lead to the creation of different diagrams. Leroi-Gourhan’s hypothesis, henceforth referred to as hypothesis 1, suggests that regardless of their [pb] location, they share major features. This would suggest that statuettes from each region would be spread across the tree indiscriminately, i.e., there would be no clear geographic clustering in the dataset. This would support Leroi-Gourhan’s hypothe- sis as he does not suggest that statuettes from one region, or site, share more simi- larities with each other than those from another group. Hypothesis 2, by Gvozdover ( 1989 ), argues for similarities in artistic industries among the sites in the Russian Plains including Kostenki I, Avdeevo, New Avdeevo, Gagarino, and Khotylevo II. More specifi cally, she argued for the presence of Kostenki, Avdeevo, and Gagarino-type statuettes but suggested that they were not limited to a single site. This hypothesis suggests contact and blending of Russian artistic cultures.|183|- 2294|Tripp2016|By exploring regional patterning, this analysis will also provide insight into cul- tural evolution. Specifi cally, it will clarify whether vertical (branching) or horizon- tal (blending) transmission had a greater effect on “Venus” making in the Gravettian|185|- 2295|Tripp2016|The sample size for this analysis was composed of 30 discrete traits of equal weight found on 27 statuettes. Only statuettes that were complete and anthropomorphic (i.e., clearly human-like) in nature were included in the study.|188|- 2296|Tripp2016|The retention index associated with the most parsimonious cladogram was 0.443. As the cladogram shows, several fi gurines do not share direct associations with other statuettes including the Venus of Pavlov, the Lozenge, Moravany, the Bicephalous, and Lespugue. For the most part, the Russian Plains statuettes group together, while the Central, Southern, and Western European statuettes groups spread throughout the tree.|190|- 2297|Tripp2016|The results do not support hypothesis 1. As is evident in the character trait list, there are multiple head shapes and sizes, arm and leg positions and proportions, breast and buttock shapes, as well as decoration types and locations. Additionally, this hypothesis predicts a high amount of homoplasy and that statuettes should group indiscriminately on the tree. If this hypothesis were correct, the RI should be much closer to zero. An RI of .443 indicates that some branching is occurring. Also, the fact that the Venus of Pavlov, the Lozenge, Moravany, the Bicephalous, and Lespugue form terminal branches demonstrates how unique each of these fi gu- rines is.|191|- 2298|Tripp2016|Lastly, it is important to note that both hypotheses predicted blending among Gravettian cultures and did not predict strict cultural phylogenesis at the site level. The cladogram corroborates this point, as archaeological sites do not form mono- phyletic groups. Instead, the results of the cladogram demonstrate a blending of the Eastern European sites.|192|- 2299|Tripp2016|Too many hypotheses surrounding the “Venus” fi gurines have begun with the assumption that the fi gurines share core similarities but offer little explanation as to what those features are. This analysis has shown that in fact there are regional con- nections in art making among sites within the Russian Plains but also documents the uniqueness and diversity of individual statuettes from other regions. Importantly, these data also demonstrate that the fi gurines are not “practically interchangeable” as argued by Leroi-Gourhan. In order to make larger connections about what these fi gurines were used for, we need to start by analyzing individual statuettes.|194|- 2300|Knappett2016|Although the use of neo-Darwinian models to explain culture change has become quite common in some subfi elds of archaeology, there remains much resistance within ‘interpretive’ archaeologies to what is perceived as the simplistic ‘biologisation’ of culture. Some recent work has sought to build bridges between evolutionary and interpretive archaeologies, with the topic of ‘learning’ emerging as a useful middle ground between these two standpoints. Yet significant barriers remain to a more thorough integration. Here I identify what appear to be two such barriers: one is the continued commitment in neo-Darwinian approaches to a Cartesian notion of ‘information’ and the second is the related adherence to the idea of distinct cultural ‘traits’. I draw on work in cognitive science and developmental biology that places heavy emphasis on the distributed and contextual nature of learning, such that the uptake of an innovative technology cannot be reduced to a process of information transfer for learning a new trait. A distributed and developmental approach is put into play through a case study tackling the variable regional and temporal adoption of the potter’s wheel across the Bronze Age east Mediterranean.|000|learning, cultural evolution, potter's wheel, archaeology 2301|Knappett2016|Article gives a critical account, and sells this as a critical account on evolutionary thinking in archaeology. In the end, however, the article does not criticize evolutionary thinking, but reductionism, by demonstrating that many supposedly simple variables, like "learning" of things, cannot be reduced to a simple process, but should rather be carefully analysed in multiple ways. That substantiates the importance of consilience and cumulative evidence which is generally important in the historical sciences.|000|cumulative evidence, archaeology, cultural evolution, learning 2302|Knappett2016|In this paper I use an archaeological example to address processes of technological innovation, challenging the validity of most neo-Darwinian biological models for explaining certain important kinds of cultural change. Archaeology can provide useful case studies because of its capacity to identify the spread of artefacts and techniques over broad regions and long time spans.|97|- 2303|Knappett2016|This innovation then provides a challenge to some typical neo-Darwinian approaches to cultural evolutionary change. In such approaches, it seems diffi cult to get away from the idea that culture consists of units of ‘information’ of some kind. As Jordan ( 2014 , 2) quite explicitly puts it: .. pull-quote:: human technological traditions...consist of information stored in human brains that is then passed on to other individuals through social learning. Jordan goes on to assert that technological traditions are ‘material manifestations of a complex transmission system’ ( 2014 , 2). So we have here a conception of cul- ture as bundles of information in the brain—a set of mental states or ideas. This means that the material world is, in effect, epiphenomenal.|98|- 2304|Knappett2016|A second, related limitation concerns the defi nition of the cultural ‘trait’ as a basic unit of technological traditions. Jordan ( 2014 , 5–6) argues that specifi c design traits can be combined in various ways, and it is the choices of how to combine them that make up material culture traditions. His use of the term ‘design grammars’ betrays the conception of these combinations as functioning in a modular manner, like language. However, material cultural traits are not created as interchangeable units readily understood according to formal convention. Generally speaking, there does not seem to exist a language of material culture traits such that, within pottery making, for example, one can just swap out coil building for wheel throwing as if they are interchangeable modules.|98|design grammars, cultural evolution, 2305|Knappett2016|As Roux puts it, evolutionary models rely on ‘the hypothesis that contacts between people are necessary and sufficient for social learning to occur’ (Roux 2013 , 313). Roux challenges this basic assertion of social Darwinian approaches, noting that there are many documented instances in which people may ‘learn’ about a new trait or technique, but nevertheless choose not to adopt it. The decision to adopt, she stresses, depends not just on the existence of contact but on the nature of that contact.|99|- 2306|Knappett2016|I have found the developmental approach propounded by Wimsatt and Griesemer very useful in this regard, as they explicitly argue for the concept of scaffolding as an important (and largely missing) cog in any successful attempt at evolutionary explanation (Wimsatt and Griesemer 2007 ; Caporael et al. 2014 ).|100|- 2307|Knappett2016|What is especially interesting though is that the wheel technique sees a targeted rather than a widespread adoption [...].|102|- 2308|Knappett2016|We have described above three regions where, for whatever reasons, the wheel technique does see full or almost full adoption. However, there are many other regions where the wheel is taken up as a technology in much more piecemeal fash- ion.|102|- 2309|Knappett2016|If we were to begin with the assumption that the wheel technique offers obvious advantages over hand-building techniques in terms of both the quality and quan- tity of output, then we might expect to see a regular, smooth uptake of the inno- vation as soon as it becomes available to ancient potters. That is, if they are in contact with potters who have the wheel, then they will surely copy that tech- nique. Yet, when we look at the evidence from across the east Mediterranean in the Early and Middle Bronze Age, what we see is anything but a predictable adoption of the technology of the potter’s wheel.|106|- 2310|Knappett2016|The technique, it would seem, takes on a different status depending on precisely how it is learnt—and we must stress that it is a technique that is diffi cult to acquire, usually requiring some kind of apprenticeship (see Roux and Corbetta 1990 ; also Wendrich 2012 on apprenticeship more broadly). And given this diffi culty, it seems likely that there would exist different strate- gies for learning, which we might understand in terms of various mixes of declarative and nondeclarative knowledge.|107|- 2311|Knappett2016|If we are then arguing that a variety of strategies may exist for learning a ‘single’ technique such as the potter’s wheel, and that these strategies may rely differentially on knowledge embedded in the environment (i.e. scaffolding), then what are the implications for evolutionary approaches to cultural transmission? It should at the very least make us question the notion of the potter’s wheel as a ‘cultural trait’, viewed as part of a design grammar, a module that can be inserted or removed.|107|- 2312|Knappett2016|Well, we might also raise the favoured metaphor in biological models of the branching tree. There are so many localised patterns in adoption and rejection across the Mediterranean that we would have great diffi culty in making the pattern of spread look much like a tree—we could more readily imagine a network rather than a tree.|108|- 2313|Knappett2016|My aim has been to recognise that evolutionary approaches do us a great service by asking those broader questions that interpre- tive archaeologies often fail to. :comment:`This is essential as well in linguistics: it is not the techniques, or the models, that are important, but the questions which we have forgotten to ask ourselves.`|108|nice quote, archaeology, agnosticism, 2314|Knappett2016|Surely we do need reductive approaches (because we learn from comparing cultural traditions across time and space), while also doing justice to the complexities in the data.|108|- 2315|Prentiss2016|There has been signifi cant debate in paleoanthropology and more recently, archaeology, over the concept of mosaic evolution. Essentially, proponents of the concept argue that different aspects of organisms evolve separately while others argue that organisms evolve as integrated entities. Similarly, archaeologists debate the relevance of cultural evolution as a complex multi-scalar process. In this paper we conduct two cladistic analyses of cultural phenomena focusing on skateboard decks and projectile points from an archaeological site to examine variability in the evolutionary process. We find evidence for mosaic evolution in both studies and conclude that modularity likely is an important factor in cultural evolution, at least at the level of artifact design. We caution future investigators of evolution in ancient stone tools that modularity could have complicating effects on phylogenetic outcomes unless explicitly considered.|000|skateboard, archaeology, cultural evolution, artifact evolution, networks, mosaic evolution, 2316|Prentiss2016|Article presents what they call a cladistic approach to test for mosaic evolution in cultural settings. They use simple cladistic (more precisely: parsimony) analyses to test how well two datasets explain the best tree. One dataset for skateboard decks and one for projectile points. What is suprising in the article is that the authors use family tree models to test for the mosaic evolution, but also for the evolution in general of the objects, since these are clearly not the typical objects where one would expect that evolution was tree-like. For the following two reasons: 1. The objects are artifacts and built by humans for specific reasons. So the question is what would constitute the vertical signal here anyway? Is it that X is apprentice from Y when creating the next generation of the object? Was X influenced by Y? But how can that be tree-like, unless it was, say, a teacher-apprentice relation (which was never the case for skateboard evolution, as I assume, as people would just copy those ideas from each other). 2. The objects may be ancestral to each other. Clearly, in some sense, the first skateboard every produced is the ancestor of all the other skateboards, unless it was invented two times independently. Why don't the authors simply test their question, namely, how the path of influence went in the evolution of skateboards and other objects, by simply using network approaches? |000|- 2317|Prentiss2016|Evolutionary studies of artifacts have progressed in a number of signifi cant ways in recent years. We recognize that artifact evolution can be understood in strict neo- Darwinian terms as a process by which variation arising from innovation and copy- ing errors is sorted in the long term by selection and drift (Goodale et al. 2011 ) giving rise to phylogenetic trees characterized by high rates of branching (Jordan and Shennan 2009 ; Prentiss et al. 2011 ; Tehrani 2011 ; Tehrani and Collard 2002 ).|113|- 2318|Prentiss2016|In this paper, we wish to explore another facet of evolutionary process, common to biological and cultural systems, known as mosaic evolution (Gould 1977 ).|114|mosaic evolution, definition, modular development, archaeology 2319|Prentiss2016|In this study we test for the effects of mosaic evolution focusing on a well- understood item, the skateboard deck (Prentiss et al. 2011 ). We apply a cladogenetic approach drawing in particular on the work of Skelton and McHenry ( 1998 ) and then seek to develop implications for future studies of more ancient technologies that are far less well known. As an example, we follow with an archaeological case study of projectile point morphology designed to look for effects of mosaic evolu- tion. Projectile points have been frequently studied by archaeologists interested in phylogenetic histories (Buchanan and Collard 2007 ; Darwent and O’Brien 2006 ; O’Brien et al. 2001 ; O’Brien and Lyman 2003 ).|114|- 2320|Prentiss2016|Briefl y, the fi rst professional skateboard deck, known as the Makaha Phil Edwards, developed in 1963, was little more than a small fl at piece of wood loosely shaped like a surf- board. This design was highly infl uential and effectively persists to this day. However, a wide range of variation in more specialized designs developed during the mid-1970s [...]. The explosion in designs in the 1970s has been likened by skateboard historians (and nonbiologists) such as James Weyland ( 2002 ), as analogous to the Cambrian explo- sion. An additional form, developed in the early 1990s known as the “popsicle stick board,” was designed for fl exible shifting between street and wall riding and featured the now widely recognized popsicle stick shape with double kicktails and convex deck surface.|115|- 2321|Prentiss2016|Studies of projectile point evolution have sought to develop sets of characters from which to measure evolutionary process [...]. Projectile points are designed with haft and blade areas resulting from different design considerations and constraints (Fig. 3 ). Haft areas are at the proximal or base end of the artifact and are designed to facilitate connecting the stone point to a wood or bone arrow or dart shaft. |118|archaeology, cultural evolution 2322|Prentiss2016|We offer a preliminary test of mosaic evolution in projectile points using a sample of arrow points from the Bridge River housepit village in British Columbia. The Bridge River site is a large housepit village occupied most intensively by complex fi sher-hunter- gatherer people during the period of ca. 1800–1100 and 500–100 years ago [...].|119|- 2323|Prentiss2016|:comment:`They basically use 8 skateboards in their data which they analyze, in three different variants, different skateboards. Trees do not show a high resolution, and they include the first skateboard as outgroup.`|122-124|- 2324|Prentiss2016|:comment:`Results on projectile-point data a bit less clear, but it seems that the trees differ not too much.`|124-126|- 2325|Prentiss2016|First, while the results of our study are clearly impacted by homoplasy or other confounding processes, we do recognize a consistent and coherent branching pattern. Projectile points from the second oldest fl oor, Stratum IID, appear on every branch, implying that the nearly full range of variation appeared early in the history of this house. Second, details of clade membership are affected by choice of characters whether base/haft or blade oriented. It should not be surpris- ing that the major distinctions between groups of points, when measured using blade characteristics, are likely the result of differential resharpening activities. Given the fact that all data sets indicate an early pattern of branching, we can tenta- tively conclude that some degree of descent with modifi cation did occur and that limited mosaic evolution occurred in haft forms.|127|- 2326|Prentiss2016|Our study of Bridge River projectile points offers an example of how projectile points could potentially evolve in a modular fashion. Our trees derived from the combined base and blade form data set actually differed relatively little from the base-only trees. An implication is that data derived from blade form will not neces- sarily confound an analysis that primarily emphasizes haft area form. However, this does not mean that analysts should not be aware that variation in blade form can be substantially modifi ed for reasons unrelated to the logic behind overall point form, especially haft area form.|128|- 2327|Prentiss2016|This implies that mosaic evolution did play a major role in skateboard evolution. A major implication is that analyses of prehistoric artifact histories could be heavily affected by our choice of variables.|127|- 2328|Caridi2016|Decorative patterns have long been considered suitable for determining descent, since they are categorized as homologous and adaptively neutral. Rock art, for its part, has often been left aside due to a lack of chronological control. In this paper, we propose a way to treat rock art in order to track Cultural Transmission Paths by means of motif distribution using Northwestern Patagonia as a case study. We present a theoretical and methodological framework for modeling Cultural Transmission Archaeological Paths by constructing a Mutual Information Network between motifs, identifying clusters and defining their associated Site Networks. The results allow us to suggest a hypothetical nuclear region, well known and transited by hunter-gatherers, with few connections to the more distant parts of the study area. This pattern may be related to Patagonia’s population models and fit the suggestion from other fields of inquiry that a sparsely connected and not unnecessarily complex network will be robust enough to sustain information flux.|000|rock art, cultural evolution, Patagonia, hunter gatherers, mutual information 2329|Caridi2016|Since Dunnell’s seminal work (Dunnell 1978), several researchers have considered stylistic characters as adaptively neutral. Decorative patterns, in particular, have been considered as nonfunctional or selectively neutral, since they are not [pb] tied to functional constraints.|131f|- 2330|Caridi2016|For instance, in pottery, decorative patterns are considered so complex that the probability of duplication by chance would be small. So, if two vessels with the same decorative pattern are found “(...), the more parsimonious explanation of such phenomenon is that the vessels share a common developmental history and are from the same tradition” (O’Brien and Lyman 2003: 19).|132|- 2331|Caridi2016|In this work we present a theoretical and methodological framework for modeling Cultural Transmission Archaeological Paths by constructing a Mutual Information Network between motifs, identifying clusters, and defining their associated Site Networks. We will develop this proposal applying it to our study area, NW Patagonia, which includes nine regions (see Fig. 1).|132|- 2332|Caridi2016|Rock art is one of humankind’s most ancient channels of visual communication. In fact, among prehistoric hunter-gatherers, few others existed beside face-to-face interaction (e.g., smoke signals, stylistic messages conveyed in artifacts, etc.; see Wobst 1977). Archaeologists have always been aware of the communicative role of rock art and its information storing capacity. This idea has been formalized in the “information storage model” (Barton et al. 1994; Conkey 1978; Gamble 1991; Mithen 1988; among others), which argues that rock art can store different kinds of information: from social interaction (e.g., Conkey 1978; Gamble 1991) to potential resources (e.g., Mithen 1988) or altered states of consciousness (“sha- manic approach”; see Dowson 1998; Lewis-Williams and Dowson 1988).|135|- 2333|Caridi2016|As mentioned above, Information Theory allows us to treat information content without any concern for meaning. When we observe two sources of messages, we are interested in detecting whether there is any type of correlation between the mes- sages sent from both of the sources. Mutual Information is a measure of the amount of information that one message contains about the other. It measures the average reduction in uncertainty about a message from the first source that results from [pb] knowing the message from the second source, or vice versa.|135f|mutual information, rock art, archaeology 2334|Caridi2016|As an example of how to apply Mutual Information to rock art, let us introduce variable X, representing a particular motif, which can take two possible values: 0 means the absence of the motif in a particular site and 1 its presence. The same occurs with a second motif (variable Y). Mutual Information between these two motifs, I(X,Y), quantifies their correlation: greater values of I(X,Y) means that the presence (or absence) of motif X occurs simultaneously with the presence (or absence) of motif Y.|136|mutual information, examples 2335|Caridi2016|To visualize the patterns of Mutual Information, we will construct a network defined by two sets: a set of nodes (rock art motifs) and a set of links (Mutual Information between them).|136|mutual information, networks, 2336|Caridi2016|But sometimes connections are not explicit. In such cases, connections may be inferred from the attributes of the nodes, by formalizing a correlation network. In genome research, gene coexpression networks were formalized by detecting strong correlations that may exist between gene expression patterns (Torkamaniet et al. 2010). In this area, Mutual Information is used to detect correlations in order to formalize the Mutual Information Network of coexpression of proteins (Simonetti et al. 2013).|136|correlation network, mutual information, evolutionary biology, gene expression 2337|Caridi2016|Rock art is a product of Cultural Transmission. In a systemic context (sensu Schiffer 1972), we assume that the process leading to the distribution of motifs in the landscape begins when someone paints a motif on one site. That very same per- son, or another, could store it in his/her memory and reproduce it at the same site or elsewhere (immediately or at a later moment). There, the motif is seen by others who repeat it and can add other motifs. This process iterates for days, years, and/or centuries. As a result, some motifs will not be reproduced, while others will be dis- tributed in a wide area. In this manner, we obtain a process of Cultural Transmission in which a social network replicates a set of motifs. The product of that process is a differential pattern of motif distribution in the landscape. In archaeology, we cannot track that social network because it is gone, but we have a “fossil” pattern, a relic of that process. Since the pattern does not mirror the network that produced it (reasons below), we need to introduce a new concept in order to separate the process (related to social network activities) and its (patterned) material evidence. On this basis, we can model a Cultural Transmission Path (CTP), a pattern left as a rock art motif distribution in the landscape (space) corresponding to a social network (social rela- tionships) and a certain moment (time).|138|- 2338|Caridi2016|From these three situations, archaeologically, we only can establish the Cultural Transmission Archaeological Paths (CTAPs) that are represented in Fig. 2 below. CTAPs are not related to a specific time, but they are the sum of many CTP. We prefer the term CTAPs for the latter since what we have is an archaeologically deter- mined channel of Cultural Transmission left by the sum of past moments in the same area.|139|- 2339|Caridi2016|Our picture of Northwestern Patagonia in late Holocene times is one of hunter- gatherers trying to maintain links between different places. Most researchers agree that during that period, a steppe-based population had been incorporating forested environments, a process in which rock art would have been part of the colonizing social repertoire, and shared graphics would be present across thousands of kilometers (a similar case as the one posited for the initial settlement of the arid zone of Australia, McDonald and Veth 2011). The establishment and maintenance of regional social ties has been recognized as an important part of hunter-gatherer adaptations to uncertain environments, in terms of creating a “safety net” of con- tacts and relations that can be critical to survival (Whallon 2006).|142|- 2340|Caridi2016|We have analyzed 49 rock art archaeological sites from northwestern Patagonia, located in the steppe, the forest, and their ecotone in a study area located between the 40° 10′ and 45° 50′ parallels (600 lineal km from North to South; see Fig. 1).|143|- 2341|Caridi2016|Also, as is characteristic of decorative patterns, in this dataset we have a quantity of motifs which in their majority are absent from most sites (Scheinsohn et al. 2009, 2015; see also Shennan and Bentley 2008 for ceramic decoration) and whose fre- quencies are distributed “with a large number of variants occurring only in small numbers but a small number being copied frequently and thus occurring a large num- ber of times” (Shennan and Bentley 2008: 170). Since decorative patterns are uncon- strained, variability is high. This results in a database with many “zero” data which posits an analytical challenge and leads us to a specific data treatment.|144|sparse data, archaeology, rock art, motif, 2342|Caridi2016|We defined a variable associated with each of the motifs which received value 1 when the motif was present (no matter the state of character of the motif) and value 0 when it was absent from a particular site.|144|data coding, archaeology 2343|Caridi2016|We used these 24 pairs in order to construct the MIN and define clusters in it. The threshold defines the links we use for defining the MIN. If we decreased or increased the threshold, we would define more or less links. This would modify the cluster structure on the network.|145|clustering, partitioning, connected components, 2344|Caridi2016|The presented results hinge on the threshold for Mutual Information between motifs that was established in order to construct the MIN (therefore, the clusters of defined motifs) and are based on the requirements set for defining the SN (at least two charac- ter states had to be shared to establish a link between sites, a very strong condition).|159|- 2345|Caridi2016|Furthermore, this analysis also allows us to relate rock art sites to Borrero’s model for the population of Patagonia. We suggest that the strong connectivity between the middle and northern regions of the study area reveals a hypothetical nuclear region, well known and transited by the hunter-gatherer population who made the rock art. The few links with the extreme north and the extreme south regions allow us to maintain that those areas were in an exploration/colonization phase.|161|- 2346|Caridi2016|:comment:`Rather precise explanation of the computation of mutual information for the purpose of their study.`|162-164|mutual information, entropy, tutorial, 2347|Straffon2016a|It is foreseeable that the theoretical debates surrounding cultural evolution will continue to discuss traditional problems such as the compatibility of biological and cultural processes and the nature of the mechanisms of transmission. Within archaeology, however, it seems that the major questions will continue to be the usefulness and potential contribution of the phylogenetic methods to the fi eld. It is our hope that this volume will become a part of those discussions, as an effective example of the successful application of the concepts and techniques of cultural phylogenetics in archaeological research.|13|- 2348|Morrison2015a|:comment:`Very nice tabular overview of early evolutionary networks and phylogenetic trees. Definitely worth to be quoted and to be mentioned.`|894|history of science, family tree, biology, evolutionary networks 2349|Silva2016|Ancient population expansions and dispersals often leave enduring signatures in the cultural traditions of their descendants, as well as in their genes and languages. The international folktale record has long been regarded as a rich context in which to explore these legacies. To date, investigations in this area have been complicated by a lack of historical data and the impact of more recent waves of diffusion. In this study, we introduce new methods for tackling these problems by applying comparative phylogenetic methods and autologistic modelling to analyse the relationships between folktales, population histories and geographical distances in Indo-European-speaking societies. We find strong correlations between the distributions of a number of folktales and phylogenetic, but not spatial, associations among populations that are consistent with vertical processes of cultural inheritance. Moreover, we show that these oral traditions probably originated long before the emergence of the literary record, and find evidence that one tale (‘The Smith and the Devil’) can be traced back to the Bronze Age. On a broader level, the kinds of stories told in ancestral societies can provide important insights into their culture, furnishing new perspectives on linguistic, genetic and archaeological reconstructions of human prehistory.|000|cultural evolution, fairy tales, Indo-European, phylogenetic reconstruction 2350|Silva2016|Article uses phylogenetic methods to find out which of the many different fairy tales spoken in Indo-European cultures might go back to the ancient Indo-European society. They use @Bouckaert<2012> et al's tree as reference tree and check for phylogenetic signaly by plotting characters on the tree, feeding it the geographic distribution of fairy tales across Indo-European languages. This article is also quoted in @Pagel2016a's overview on the matter.|000|- 2351|List2014b|The idea that language history is best visualized by a branching tree has been contro- versially discussed in the linguistic world and many alternative theories have been pro- posed. The reluctance of many scholars to accept the tree as the natural metaphor for language history was due to conflicting signals in linguistic data: many resemblances would simply not point to a unique tree. Despite these observations, the majority of automatic approaches applied to language data has been based on the tree model, [pb] while network approaches have rarely been applied. Due to the specific sociolinguistic situation in China, where very divergent varieties have been developing under the roof of a common culture and writing system, the history of the Chinese dialects is complex and intertwined. They are therefore a good test case for methods which no longer take the family tree as their primary model. Here we use a network approach to study the lexical history of 40 Chinese dialects. In contrast to previous approaches, our method is character-based and captures both vertical and horizontal aspects of language history. According to our results, the majority of characters in our data (about 54%) cannot be readily explained with the help of a given tree model. The borrowing events inferred by our method do not only reflect general uncertainties of Chinese dialect classification, they also reveal the strong influence of the standard language on Chinese dialect history.|000|Chinese dialects, minimal lateral network, phylogenetic network, algorithms 2352|Popa2011a|Gene acquisition by lateral gene transfer (LGT) is an important mechanism for natural variation among prokaryotes. Laboratory experiments show that protein-coding genes can be laterally transferred extremely fast among microbial cells, inherited to most of their descendants, and adapt to a new regulatory regime within a short time. Recent advance in the phylogenetic analysis of microbial genomes using networks approach reveals a substantial impact of LGT during microbial genome evolution. Phylogenomic networks of LGT among prokaryotes reconstructed from completely sequenced genomes uncover barriers to LGT in multiple levels. Here we discuss the kinds of barriers to gene acquisition in nature including physical barriers for gene transfer between cells, genomic barriers for the integration of acquired DNA, and functional barriers for the acquisition of new genes.|000|lateral gene transfer, overview 2353|Piantadosi2016|Compositional “language of thought” models have recently been proposed to account for a wide range of children’s conceptual and linguistic learning. The present work aims to evalu- ate one of the most basic assumptions of these models: children should have an ability to represent and compose functions. We show that 3.5–4.5 year olds are able to predictively compose two novel functions at significantly above chance levels, even without any explicit training or feedback on the composition itself. We take this as evidence that children at this age possess some capacity for compositionality, consistent with models that make this abil- ity explicit, and providing an empirical challenge to those that do not.|000|compounding, compositionality, language acquisition, child language 2354|Piantadosi2016|This literature may be interesting in the context of compounding as a general linguistic strategy.|000|- 2355|Willems2016| Background Curious parallels between the processes of species and language evolution have been observed by many researchers. Retracing the evolution of Indo-European (IE) languages remains one of the most intriguing intellectual challenges in historical linguistics. Most of the IE language studies use the traditional phylogenetic tree model to represent the evolution of natural languages, thus not taking into account reticulate evolutionary events, such as language hybridization and word borrowing which can be associated with species hybridization and horizontal gene transfer, respectively. More recently, implicit evolutionary networks, such as split graphs and minimal lateral networks, have been used to account for reticulate evolution in linguistics. Results Striking parallels existing between the evolution of species and natural languages allowed us to apply three computational biology methods for reconstruction of phylogenetic networks to model the evolution of IE languages. We show how the transfer of methods between the two disciplines can be achieved, making necessary methodological adaptations. Considering basic vocabulary data from the well-known Dyen’s lexical database, which contains word forms in 84 IE languages for the meanings of a 200-meaning Swadesh list, we adapt a recently developed computational biology algorithm for building explicit hybridization networks to study the evolution of IE languages and compare our findings to the results provided by the split graph and galled network methods. Conclusion We conclude that explicit phylogenetic networks can be successfully used to identify donors and recipients of lexical material as well as the degree of influence of each donor language on the corresponding recipient languages. We show that our algorithm is well suited to detect reticulate relationships among languages, and present some historical and linguistic justification for the results obtained. Our findings could be further refined if relevant syntactic, phonological and morphological data could be analyzed along with the available lexical data. |000|lateral gene transfer, lexical borrowing, phylogenetic reconstruction, phylogenetic network, galled network, hybridization network 2356|Blasi2016|It is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have been documented in linguistics, cognitive science, and anthropology, but these studies only involved small subsets of the 6,000+ languages spoken in the world today. By analyzing word lists covering nearly two-thirds of the world’s languages, we demonstrate that a considerable proportion of 100 basic vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages (linguistic families or isolates). Prominently among these relations, we find property words (“small” and i, “full” and p or b) and body part terms (“tongue” and l, “nose” and n). The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed. Our results therefore have important implications for the language sciences, given that nonarbitrary associations have been proposed to play a critical role in the emergence of cross-modal mappings, the acquisition of language, and the evolution of our species’ unique communication system. |000|sound symbolism, cross-linguistic study 2357|Birchall2016|The Chapacuran language family, with three extant members and nine historically attested lects, has yet to be classified following modern standards in historical linguistics. This paper presents an internal classification of these languages by combining both the traditional comparative method (CM) and Bayesian phylogenetic inference (BPI). We identify multiple systematic sound correspondences and 285 cognate sets of basic vocabulary using the available documentation. These allow us to reconstruct a large portion of the Proto-Chapacuran phonemic inventory and identify tentative major subgroupings. The cognate sets form the input for the BPI analysis, which uses a stochastic Continuous-Time Markov Chain to model the change of these cognate sets over time. We test various models of lexical substitution and evolutionary clocks, and use ethnohistorical information and data collection dates to calibrate the resulting trees. The CM and BPI analyses produce largely congruent results, suggesting a division of the family into three different clades.|000|concept list, Chapacuran, lexicostatistics, Bayesian approaches 2358|Lewis2016|Are the forms of words systematically related to their meaning? The arbitrariness of the sign has long been a foundational part of our understanding of human language. Theories of communication predict a relationship between length and meaning, however: Longer descriptions should be more conceptually complex. Here we show that both the lexicons of human languages and individual speakers encode the relationship between linguistic and conceptual complexity. Experimentally, participants mapped longer words to more complex objects in comprehension and production tasks and across a range of stimuli. Explicit judgments of conceptual complexity were also highly correlated with implicit measures of study time in a memory task, suggesting that complexity is directly related to basic cognitive processes. Observationally, judgments of conceptual complexity for a sample of real words correlate highly with their length across 80 languages, even controlling for frequency, familiarity, imageability, and concreteness. While word lengths are systematically related to usage—both frequency and contextual predictability—our results reveal a systematic relationship with meaning as well. They point to a general regularity in the design of lexicons and suggest that pragmatic pressures may influence the structure of the lexicon.|000|speech norms, word length, correlational studies, linguistic complexity 2359|Heled2013| Background The multispecies coalescent model has become popular in recent years as a framework to infer a species phylogeny from multilocus genetic data collected from multiple individuals. The model assumes that speciation occurs at a specific point in time, after which the two sub-species evolve in total isolation. However in reality speciation may occur over an extended period of time, during which sister lineages remain in partial contact. Inference of multispecies phylogenies under those conditions is difficult. Indeed even designing simulators which correctly sample gene histories under these conditions is non-trivial. Results In this paper we present a method and software which simulates gene trees under both the multispecies coalescent and migration. Our approach allows for both population sizes and migration rates to change over the species lifetime. Also, migration rates are specified in units of fraction of emigrants per time unit, which makes them easier to interpret. Overall this setup covers a wide range of migration scenarios. The software can be used to investigate properties of gene trees under different migration settings and to generate test cases for programs which infer species trees and/or migration from sequence data. Using simulated data we investigate the effect of migrations between sister lineages on the inference of multispecies phylogenies and on post analysis detection. Conclusions Our results indicate that while estimation of species tree topology can be quite robust to the presence of gene flow, the inference and detection of migration is problematic, even with methods based on full likelihood models. |000|gene tree reconciliation, species tree, coalescent model, coalescent theory 2360|Heled2013|The article deals with the multi-species-coalescent model which they explore by writing BEAST simulations to see how well the species trees can be inferred.|000|- 2361|Edwards2016|In recent articles published in Molecular Phylogenetics and Evolution, Mark Springer and John Gatesy (S&G) present numerous criticisms of recent implementations and testing of the multispecies coalescent (MSC) model in phylogenomics, popularly known as “species tree” methods. After pointing out errors in alignments and gene tree rooting in recent phylogenomic data sets, particularly in Song et al. (2012) on mammals and Xi et al. (2014) on plants, they suggest that these errors seriously compromise the conclusions of these studies. Additionally, S&G enumerate numerous perceived violated assumptions and deficiencies in the application of the MSC model in phylogenomics, such as its assumption of neutrality and in particular the use of transcriptomes, which are deemed inappropriate for the MSC because the constituent exons often subtend large regions of chromosomes within which recombination is substantial. We acknowledge these previously reported errors in recent phylogenomic data sets, but disapprove of S&G’s excessively combative and taunting tone. We show that these errors, as well as two nucleotide sorting methods used in the analysis of Amborella, have little impact on the conclusions of those papers. Moreover, several concepts introduced by S&G and an appeal to “first principles” of phylogenetics in an attempt to discredit MSC models are invalid and reveal numerous misunderstandings of the MSC. Contrary to the claims of S&G we show that recent computer simulations used to test the robustness of MSC models are not circular and do not unfairly favor MSC models over concatenation. In fact, although both concatenation and MSC models clearly perform well in regions of tree space with long branches and little incomplete lineage sorting (ILS), simulations reveal the erratic behavior of concatenation when subjected to data subsampling and its tendency to produce spuriously confident yet conflicting results in regions of parameter space where MSC models still perform well. S&G’s claims that MSC models explain little or none (0–15%) of the observed gene tree heterogeneity observed in a mammal data set and that MSC models assume ILS as the only source of gene tree variation are flawed. Overall many of their criticisms of MSC models are invalidated when concatenation is appropriately viewed as a special case of the MSC, which in turn is a special case of emerging network models in phylogenomics. We reiterate that there is enormous promise and value in recent implementations and tests of the MSC and look forward to its increased use and refinement in phylogenomics.|000|multispecies coalescent model, coalescent theory, gene tree reconciliation, species tree 2362|Rubin1995|Book treats rhyming practice and other things in a quite systematic manner. It gives an overview on 2. the representation of Themes in Memory 3. Imagery 4. Sound 5. Combining Constraints, 6. The Transmission of Oral traditions, 7. Basic Observations of Remembering, etc. It is extremely interesting in the context of rhyming and the poetic function in speech.|000|poetic function, rhyme patterns, oral traditions 2363|Leicht2008|We consider the problem of finding communities or modules in directed networks. In the past, the most common approach to this problem has been to ignore edge direction and apply methods developed for community discovery in undirected networks, but this approach discards potentially useful information contained in the edge directions. Here we show how the widely used community finding technique of modularity maximization can be generalized in a principled fashion to incorporate information contained in edge directions. We describe an explicit algorithm based on spectral optimization of the modularity and show that it gives demonstrably better results than previous methods on a variety of test networks, both real and computer generated.|000|community detection, directed network, algorithms 2364|Leicht2008|Algorithm for community detection in directed networks, using modularity optimalization as a basic approach to community detection.|000|- 2365|Emmeche1991|Text investigates semiotic metaphors in biology, specifically the question of the relation between form and function, which is also interesting in the context of linguistics, especially when considering the linguistic sign.|000|form and function, biological parallels, analogy, biology, 2366|Emmeche1991|At the deepest level, this renewed criticism concerns the question of biological form. Is the development of form to be explained simply through the gradual improvement of function? Do organisms and parts of organisms develop their characteristic forms just because such forms were the most functional (the most successful)? This problem is an old one: what is the relation between substance and form? In reflecting on Korzybski's famous statement, Gregory Bateson traces the problem back to Pythagoras: [pb] .. pullquote:: This statement came out of a very wide range of philosophic thinking, going back to Greece, and wriggling through the history of European thought over the last 2000 years. ... It all starts, I suppose, with the Pythagoreans versus their predecessors, and the argument took the shape of 'Do you ask what it's made of — earth, fire, water, etc.?' Or do you ask, 'What is its pattern?' Pythagoras stood for inquiry into pattern rather than inquiry into substance. That controversy has gone through the ages, and the Pythagorean half of it has, until recently, been on the whole the submerged half. (Bateson 1972: 449) |1f|form and function, biology, functionality, fitness 2367|Nicolai2016|This paper compares the conceptual framework of Schuchardt’s perspective on lan- guage mixing (or at least my representation of it) with that of ‘semiotic dynamics’ as presented in several of my earlier works (Nicolaï, 2011, 2012a). These two approaches entail a same interest in the activities of individuals and groups (communication actors, etc.) who, in their ordinary usage, produce and transform languages. Thus the framework of semiotic dynamics introduces conceptualizations, obviously developed independently of the process which « Slawo-deutsches und Slawo-italienisches » exem- plifies, but which, despite differing trajectories, intersects with it. This intersection justifies my assertion as to the work’s modernity and the usefulness of reviewing it. At the same time, this review broadens the scope of research and reflections in this field. In counterpoint, I will look into the justified (or not) propensity of scholars and documenters to consider a priori the objects-languages with which they work as con- stitutively homogeneous entities, albeit subject to modification and transformation by a (contingent) contact situation.|000|language contact, Hugo Schuchardt, language mixture, semiotic dynamics, 2368|Nicolai2016|Paper gives some interesting overview over Schuchardt's work, bringing it into a modern context. |000|- 2369|Nicolai2016|[...] Meillet published an article— Les parentés de langues4—in which he lays out the principles of his historical approach to languages, affirming that genealogical continuity is grounded in the permanence of speakers’ desire and willingness to “continue” such or such a language, stressing the methodological importance of his linguistic distinction between native elements [éléments indigènes] and borrowings (emprunts], setting himself apart from Schuchardt.|544|history of science, language mixture, Antoine Meillet, Hugo Schuchardt 2370|Nicolai2016|“There is no completely unmixed language” [Es gibt keine völlig ungemischte Sprache]. This well known aphorism which, rather than affirming the exis- tence of language mixing (which would only be a truism), affirms the non- existence of non-mixed languages, is to be found in the first few pages of sdsi.|549|nice quote, Hugo Schuchardt, language mixture 2371|Gooskens2007|The three mainland Scandinavian languages (Danish, Swedish and Norwegian) are so closely related that the speakers mostly communicate in their own languages (semicommunication). Even though the three West Germanic languages Dutch, Frisian and Afrikaans are also closely related, semicommunication is not usual between these languages. In the present investigation, results from intelligibility tests measuring the mutual intelligibility of Danish, Norwegian and Swedish were compared with results of similar tests of mutual intelligibility between speakers of Dutch, Frisian and Afrikaans. The results show that there are large differences in the level of intelligibility depending on test group and test language. Correlations between the intelligibility scores and linguistic distance scores showed that intelligibility can to a large extent be predicted by phonetic distances, while intelligibility is less predictable on the basis of lexical distances.|000|mutual intelligibility, asymmetric intelligibility, dialect chain, West Germanic, North Germanic, 2372|Gooskens2007|Interesting article introducing concepts like asymmetric intelligibility and others, emphasizing the importance of using phonetic rather than lexical distances in intelligibility estimation.|000|- 2373|Daston2016|Objektivität gilt als eines der höchsten Ideale der Forschung. Doch das war nicht immer so. Erst im 19. Jahrhundert trat sie in Konkurrenz zu dem jahrhundertealten Grundsatz der Naturwahrheit. Und noch heute geraten die beiden Leitbilder in Konflikt. Wie unsere Autorin darlegt, lässt sich manche wissenschaftliche Kontroverse besser verstehen, wenn man sich mit der Geschichte der Naturwissenschaften etwas genauer befasst.|000|objectivity, history of science, truth, natural truth 2374|Daston2016|Very interesting article showing that around 1850 and earlier, people switched from believing in "natural truth" to the claim for objectivity. This may also partially be reflected in the abandoning of the tree models in linguistics, as they are an obvious idealization of reality, not an approximation. The wave theory may thus reflect general scientific history.|000|- 2375|Tang2009|Beijing listeners, however, have no advantage of their dialect being similar to the standard language. The asymmetry does not invalidate the comparison between results obtained from opinion tests and from functional tests, nor does it affect the comparison of word and sentence intelligibility, because the asymmetry affects all these results to the same degree.|721|asymmetric intelligibility, Běijīng dialect, Chinese dialects, mutual intelligibility 2376|Tang2009|We argue that mutual intelligibility testing is an adequate way to determine how different two languages or language varieties are. We tested the mutual intelligibility of 15 Chinese dialects functionally at the level of isolated words (word-intelligibility) and the level of sentences (sentence intelligibility). We collected data for each dialect by playing isolated words and sentences spoken in 15 Chinese dialects to 15 listeners. Word-intelligibility was determined by having listeners perform a semantic categorization task whereby words had to be classified as one of ten different categories such as body part, plant, animal, etc. Sentence intelligibility was estimated by having the listeners translate a target word in each sentence into their own dialect. We obtained 47,250 data (15  150  15 for the word part and 15  60  15 for the sentence part). We also had at our disposal structural similarity measures (lexical similarity, phonological correspondence) for each pair of the 15 Chinese dialects published by Cheng (Computational Linguistics & Chinese Language Processing 1997, 2.1, pp. 41–72). Our general conclusion is that the degree of mutual intelligibility can be determined by both opinion and functional tests. These two subjective measures are significantly correlated with one another and can be predicted from objective measures (lexical similarity and phonological correspondence) equally well. However, functional intelligibility measures, especially at the sentence level, better reflect Chinese dialect classifications than opinion scores. # 2008 Elsevier B.V. All rights reserved.|000|mutual intelligibility, Chinese dialects, quantitative analysis 2377|Tang2009|The study consists of two basic tests: one on the word level (speakers need to understand a word, 150 words were tested), and one on the sentence level (speakers need to understand the last word of a sentence). Basically their accounts reflect asymmetric intelligibility among dialects, as they should be able to account for directions by understanding and speaking.|000|- 2378|Tang2007|Distance between languages is used as a criterion when arguing about genealogi- cal relationships between languages. The more languages resemble each other, the more likely they are derived from the same parent language, i.e., belong to the same language family. However, it is difficult to quantify the distance between languages one-dimensionally since languages differ along many structural dimensions (e.g. phonetics, phonology, morphology, syntax). It is unclear how the various dimen- sions should be weighed against each other. Therefore, we select a single criterion — mutual intelligibility. Mutual intelligibility is an overall criterion that may tell us in a psychologically relevant way whether two languages are similar/close.|000|mutual intelligibility, Chinese dialects, experimental study, 2379|Tang2007|They basically conduct a listener experiment, using the Northwind and the Sun fable, recorded in different Chinese dialects, and have listeners rate the story. So this study is essentially based on perception in mutual intelligibility (rather: perception intelligibility, as this is not necessarily symmetric).|000|- 2380|Cheng1997|Some patterns have a large number of words to give an impression of a general rule. Such patterns are communication enhancing signals. On the other hand, some patterns may have only a small number of cognates. Such patterns require us to specifically memorize the exceptional words and thus are considered as disturbing noise. Furthermore, the ele- ments of a correspondence pattern may be identical or different. :comment:`They use the same principle as envisioned in the asymmetry measure, but with modified statistics. They first count all reflexes, then take the average, and then judge whether a sound is beyond or above the average, thus distinguishing signal from noise. This measure is surely interesting, as it may provide additional insights. It is also a good idea to actually compute the measure and provide access to it to also use it on different data.` [...] [pb] The five patterns listed here have a total of 53 cognate words, and the mean is 10.6. We use the mean to determine whether a pattern is signal or noise. This pattern with three items is smaller than the mean and is considered as noise. Moreover, the corresponding elements are different. Since there are other non-cognate words with the zero initial in Beijing, the zero initial for these three words in Jinan will very likely cause confusion with non-cognate words in Beijing. Here we use the dental nasal of Beijing to view the correspondence. We may call Beijing the source dialect and Jinan the target dialect. :comment:`The basic principle is to make a rank similarity between non-noise patterns: identical sound, different sound and not occuring in the source dialect, different and occuring elsewhere in the source dialect. This allows to weight the elements, foolowing a signal-noise schema.` [...] For each item, the target-dialect a. element is the same as that of the source dialect: 1.00 (signal), -0.25 (noise) b. element is different from that of the source dialect: 1. and does not occur in the source dialect: 0.5 (signal), -0.5 (noise) 2. and occurs elsewhere in the source dialect: 0.25 (signal), -1 (noise) |54f|Chinese dialects, mutual intelligibility, phonetic similarity, DOC, 2381|Cheng1997|We feel that intelligibility is not necessarily symmetrical, and hence for a pair of [pb] dialects we calculate two unidirectional intelligibility indices. Then we take the mean to be the mutual intelligibility value. |55f|asymmetric intelligibility, Chinese dialects, phonetic similarity 2382|Jaeger2016b|Most current approaches in computational phylogenetic linguistics require as input multilingual word lists that are categorized into cognate classes. Cognate classification is currently usually done manually by experts, which is time consuming and so far only available for a small number of well-studied language families. Automatizing this step will greatly expand the empirical scope of phylogenetic methods in linguistics, as raw word lists (in phonetic transcription) are much easier to obtain than cognate-coded ones, especially for under-studied language families. Here we propose a method for automatic cognate classification using supervised learning with a Support Vector Machine. The method outperforms Johann- Mattis List’s SCA and LexStat methods (List, 2012; List, 2014b), the current de facto standard.|000|cognate detection, SVM, support vector machine, 2383|Gibbon2016|First, the structures of language, particularly speech or spoken language, are increasingly grounded in an empirically motivated ‘procedural turn’ (as developed in Christensen and Chater 2016 and @List<2016h> et al. 2016), which thereby fills the traditional linguistic perspective on the primacy of speech data (e.g. over written data) with new life. The notion of ‘procedure’ in this context means, essentially, ‘algorithm’ or ‘computation’ and is not to be confused with ‘performance’ as used in the Chomskyan sense to refer to the vagaries of actual language use. Procedures can be (but do not have to be) just as abstract and selective as generative and constraint-based competence theories.|1|quantitative turn, grammar, linguistics 2384|List2016h|Background: For a long time biologists and linguists have been noticing surprising similarities between the evolution of life forms and languages. Most of the proposed analogies have been rejected. Some, however, have persisted, and some even turned out to be fruitful, inspiring the transfer of methods and models between biology and linguistics up to today. Most proposed analogies were based on a comparison of the research objects rather than the processes that shaped their evolution. Focusing on process-based analogies, however, has the advantage of minimizing the risk of overstating similarities, while at the same time reflecting the common strategy to use processes to explain the evolution of complexity in both fields. Results: We compared important evolutionary processes in biology and linguistics and identified processes specific to only one of the two disciplines as well as processes which seem to be analogous, potentially reflecting core evolutionary processes. These new process-based analogies support novel methodological transfer, expanding the application range of biological methods to the field of historical linguistics. We illustrate this by showing (i) how methods dealing with incomplete lineage sorting offer an introgression-free framework to analyze highly mosaic word distributions across languages; (ii) how sequence similarity networks can be used to identify composite and borrowed words across different languages; (iii) how research on partial homology can inspire new methods and models in both fields; and (iv) how constructive neutral evolution provides an original framework for analyzing convergent evolution in languages resulting from common descent (Sapir’s drift). Conclusions: Apart from new analogies between evolutionary processes, we also identified processes which are specific to either biology or linguistics. This shows that general evolution cannot be studied from within one discipline alone. In order to get a full picture of evolution, biologists and linguists need to complement their studies, trying to identify cross-disciplinary and discipline-specific evolutionary processes. The fact that we found many process-based analogies favoring transfer from biology to linguistics further shows that certain biological methods and models have a broader scope than previously recognized. This opens fruitful paths for collaboration between the two disciplines.|000|analogy, biological parallels, history of science, 2385|Adamou2016|Numerous studies on language contact document the use of content words and espe- cially nouns in most contact settings, but the correlations are often based on qualita- tive or questionnaire-based research. The present study of borrowing is based on the analysis of free-speech corpora from four Slavic minority languages spoken in Austria, Germany, Greece, and Italy. The analysis of the data, totalling 34,000 word tokens, shows that speakers from Italy produced significantly more borrowings and noun bor- rowings than speakers from the other three countries. A Random Forests analysis iden- tifies ‘language’ as the main predictor for the ratio of both borrowings and noun borrowings, indicating the existence of borrowing patterns that individual speakers conform to. Finally, we suggest that the patterns of borrowing that prevail in the com- munities under study relate to the intensity of contact in the past, and to the presence or absence of literary traditions for the minority languages.|000|lexical borrowing, language contact, corpus studies, Slavic languages 2386|Gooskens2013|To test intelligibility, a large number of tests have been developed. By means of such tests, the degree of intelligibility can be expressed in a single number, often the percentage of input that was correctly recognized by the subject. This chapter presents an overview of methods for measuring the intelligibility of closely related languages, and discusses their advantages and disadvantages. It focuses on spoken-language comprehension, but many tests can also be applied to the comprehension of written language. Methods for investigating mutual intelligibility can be taken from other disciplines, for example in the area of speech technology, second language acquisition, and speech pathology.|000|mutual intelligibility, dialect intelligibility, dialect classification, 2387|List2016b|In this paper, we develop a frame approach for modelling and investigating certain pat- terns of concept evolution in the history of Chinese as they are reflected in the Chinese writing system. Our method uses known processes of character formation to infer dif- ferent states of concept evolution. By decomposing these states into frames, we show how the complex interaction between speaking, writing, and meaning throughout the history of the Chinese language can be made transparent.|000|embodiment, Chinese character formation, Chinese characters 2388|Hippisley1998|Recent lexeme-based models have proposed that a lexeme carries an inventory of stems on which morphological rules operate. The various stems in the inventory are associated with different morphological rules, of both inflection and derivation. Furthermore, one stem may be selected by more than one rule. For this reason stems in the inventory are labeled with indexes, rather than being directly associated with a particular morphological function. It is claimed that an indexed-stem approach captures generalizations in the morphological system that would otherwise be missed. We argue that such an approach provides for greater generalization in the Russian morphological system. One area of Russian derivation that particularly lends itself to an indexed-stem approach is the highly productive system of personal-noun formation. We present a declarative account of Russian personal nouns that assumes indexed stems and show how such an account on the one hand obviates the need to posit either compound suffixes or "concatenators" and on the other dispenses with truncating/deleting rules. The account is couched within network mor- phology, a declarative lexeme-based framework that rests on the concept of default inheritance and is expressed in the computable lexical knowledge representation language D AT R|000|Russian, morphology, meatative, suffixation, productivity 2389|Hippisley1998|This article is interesting as it gives examples for the (i)na-suffix in Russian, which is productive, but generally not regarded as grammatical, thus illustrating that grammaticality and productivity do not necessarily always point to the same. It is further interesting as it illustrates the idea of *network morphology* which uses network structures to model morphological processes.|000|productivity, Russian, morphology, grammaticality, network morphology, DATR 2390|Fedorenko2016|The neural processes that underlie your ability to read and understand this sentence are unknown. Sentence comprehension occurs very rapidly, and can only be understood at a mechanistic level by discovering the precise sequence of underlying computational and neural events. However, we have no continuous and online neural measure of sentence processing with high spatial and temporal resolution. Here we report just such a measure: intracranial recordings from the surface of the human brain show that neural activity, indexed by γ-power, increases monotonically over the course of a sentence as people read it. This steady increase in activity is absent when people read and remember nonword-lists, despite the higher cognitive demand entailed, ruling out accounts in terms of generic attention, working memory, and cognitive load. Response increases are lower for sentence structure without meaning (“Jabberwocky” sentences) and word meaning without sentence structure (word-lists), showing that this effect is not explained by responses to syntax or word meaning alone. Instead, the full effect is found only for sentences, implicating compositional processes of sentence understanding, a striking and unique feature of human language not shared with animal communication systems. This work opens up new avenues for investigating the sequence of neural events that underlie the construction of linguistic meaning. |000|sentences, neurolinguistics, parsing, understanding, semantics, meaning 2391|Crivelli2016|Theory and research show that humans attribute both emotions and intentions to others on the basis of facial behavior: A gasping face can be seen as showing “fear” and intent to submit. The assumption that such interpretations are pancultural derives largely from Western societies. Here, we report two studies conducted in an indigenous, small-scale Melanesian society with considerable cultural and visual isolation from the West: the Trobrianders of Papua New Guinea. Our multidisciplinary research team spoke the vernacular and had extensive prior fieldwork experience. In study 1, Trobriand adolescents were asked to attribute emotions, social motives, or both to a set of facial displays. Trobrianders showed a mixed and variable attribution pattern, although with much lower agreement than studies of Western samples. Remarkably, the gasping face (traditionally considered a display of fear and submission in the West) was consistently matched to two unpredicted categories: anger and threat. In study 2, adolescents were asked to select the face that was threatening; Trobrianders chose the “fear” gasping face whereas Spaniards chose an “angry” scowling face. Our findings, consistent with functional approaches to animal communication and observations made on threat displays in small-scale societies, challenge the Western assumption that “fear” gasping faces uniformly express fear or signal submission across cultures. |000|gesture, mimic, anthropology, universality, Papua New Guinea 2392|Gomez2016|The psychological, sociological and evolutionary roots of conspecific violence in humans are still debated, despite attracting the attention of intellectuals for over two millennia 1–11 . Here we propose a conceptual approach towards understanding these roots based on the assumption that aggression in mammals, including humans, has a significant phylogenetic component. By compiling sources of mortality from a comprehensive sample of mammals, we assessed the percentage of deaths due to conspecifics and, using phylogenetic comparative tools, predicted this value for humans. The proportion of human deaths phylogenetically predicted to be caused by interpersonal violence stood at 2%. This value was similar to the one phylogenetically inferred for the evolutionary ancestor of primates and apes, indicating that a certain level of lethal violence arises owing to our position within the phylogeny of mammals. It was also similar to the percentage seen in prehistoric bands and tribes, indicating that we were as lethally violent then as common mammalian evolutionary history would predict. However, the level of lethal violence has changed through human history and can be associated with changes in the socio-political organization of human populations. Our study provides a detailed phylogenetic and historical context against which to compare levels of lethal violence observed throughout our history.|000|phylogeny, violence, human prehistory, anthropology, violence 2393|Tresoldi2016|This paper presents a new algorithm, the Modified Moving Contract- ing Window Pattern Algorithm (CMCWPM), for the calculation of field similarity. It strongly relies on previous work by Yang et al. (2001), cor- recting previous work in which characters marked as inaccessible for fur- ther pattern matching were not treated as boundaries between subfields, occasionally leading to higher than expected scores of field similarity. A reference Python implementation is provided.|000|edit distance, Python, implementation, source code, algorithms 2394|Tresoldi2016|This paper presents an apparently faster and more accurate or, say, alternative version to the edit distance. There is also an implementation available, and it seems worthwhile to inspect the algorithm more closely. Essentially, the code iterates over one string only and then searches for a pattern in the other string. This is probably a generally good idea to also account for common patterns across multiple languages.|000|pattern recognition, edit distance, sequence comparison 2395|Berg2016|Recent work on language contact between Scandinavian and Low German during the Middle Ages widely assumes that the varieties were linguis- tically close enough to permit some kind of receptive multilingualism, and hence an example of dialect contact. Two arguments that have been invoked in support of this scenario are the lack of (1) meta-linguistic comments on flawed understanding, and (2) attested bilingualism. However, towards the end of the most intense contact period, in the early sixteenth century, there is indeed meta- linguistic information in the preserved sources suggesting that intelligibility was restricted. Furthermore, there are also examples of code-switching and active bilingualism indicating that the varieties were clearly perceived as distinct languages. This paper presents such examples from Norwegian primary sources that have not been observed in recent scholarship. Based on this evidence, it is argued that the relationship between the languages by the early sixteenth century was asymmetric, Scandinavians being able to understand Low German more often than vice versa.|000|Scandinavian, Low German, language contact, multlingualism, code-switching, mutual intelligibility 2396|Berg2016|A crucial issue regarding the medieval contact situation has been the question of whether the varieties involved should be considered different languages or dialects. This question might be rephrased as follows (Trudgill 2000): where on the continuum between these extremes was the linguistic reality located, i. e. to what degree were the varieties mutually intelligible?|191|- 2397|Berg2016|The term semi-communication was first used by Haugen (1966) to describe the relationship between contemporary Norwegian, Swedish, and Danish (and dia- lects of these languages), where speakers of closely related languages under- stand other varieties based on their own monolingual competence. Other scholars have used different terms, and I shall use receptive multilingualism here (cf. Braunmüller 2012: 95 on the different terms). No speaker of any language is oblivious to variation, as everyone is exposed to different dialects and registers. As an extension of this, incomplete understanding of a similar variety may develop into some form of passive or receptive multilingualism, where speakers learn to understand the other variety through their own compe- tence. It is, however, very hard to keep this phenomenon apart from bilingual- ism in written records, as will become evident.|191|receptive multilingualism, semi-communication, definition, terminology, language variation 2398|Berg2016|Three factors can be singled out that have been suggested to affect mutual intelligibility (see e. g. Gooskens 2007: 446): (1) attitude, (2) contact and expo- sure, and (3) linguistic distance.|191|mutual intelligibility, 2399|Berg2016|One important point that has emerged in recent research on modern lan- guages, however, is that intelligibility between two varieties may be asymmetric (see e. g. Frinsel et al. 2015 with references to Scandinavian; Ciobanu and Dinu 2014: 3316 on Romance). For instance, Norwegian speakers generally understand Swedish and Danish better than vice versa, probably because the widespread use of dialects in Norway makes them used to linguistic variation (cf. Gooskens 2007: 453, 462). The regularity in phoneme mapping between languages can also be asymmetric, as demonstrated by Frinsel et al. (2015) for Swedish and Danish; for instance, Danish /ə/ may correspond to both Swedish /ə/ and /a/. The possibility of asymmetric intelligibility must therefore also be kept in mind when considering earlier contexts.|192|phoneme mapping, asymmetric intelligibility, literature 2400|Ravindranath2014|The dialogue on language endangerment worldwide has largely focused on languages with small speaker populations, in line with Krauss’s (1992) prediction that any language with a speaker population of less than 100,000 is at risk. The relationship between population size and language vitality is particularly relevant in the Indonesian context, where over 700 local languages have speaker populations that range from single digits to tens of millions of speakers. This paper considers the role of size in determining the fate of these local languages, against the backdrop of the highly successful development of Indonesian as a national language. Using Javanese as a case study, we show that even a language with over 80 million speakers can be at risk, a trend that has serious implications for all of the languages of Indonesia. Although a large population may signal a greater likelihood for official recognition and a more diverse speaker population that is less likely to simultaneously shift away from the L1, size alone cannot predict whether robust intergenerational transmission is occurring. Rather a clearer understanding of the demographic, sociolinguistic, and attitudinal factors that lead to individual and community decisions about intergenerational transmission are essential for assessing risk of endangerment.|000|language endangerment, speaker size, demography, South-East Asian languages 2401|Nepusz2012|We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity.|000|protein-protein interaction networks, algorithms, networks, clustering 2402|Labov2015|The history of sociolinguistic research leads us to expect certain patterns of variation by social class, age and gender, but as the field expands to new societies with different social structures we often encounter unexpected results. Such findings can have great value in leading to higher level generalizations. Cases are cited of unexpected gender variation in studies of rural communities in Spain, Egypt, central India and south China where ethnographic observation leads to a better understanding of what seemed at first to be anomalous results.|000|sociolinguistics, social class, history of science 2403|Flack2016|This short paper, rather than providing a thorough analysis of the very broad theme entailed by its title, aims only to programmatically outline the contours of a general framework for future research on structuralism and its genealogy. In essence, I wish to argue that mainstream approaches to structuralism’s history need to be significantly broadened, not only to better account for the contributions of Eastern and Central European thinkers, but also to take into full consideration structuralism’s deep, complex and rich roots in 19 th Century German thought. To make this point, I will succinctly compare three distinct historiographical models of structuralism (“French”, “East-West”, “Jakobsonian”), each of which provides a very rough and selective, yet highly contrastive map of the intellectual and personal networks that underpinned structuralism's development up to World War II. Thanks to this basic comparative exercise, I hope to highlight the reductionistic, limiting nature of the first two models with regards to the more complete (if not exhaustive or definitive) third one and to cast further light on Jakobson’s crucial function as a communicator, synthesiser and passer of ideas between scholars, disciplines and intellectual traditions.|000|history of science, Roman Jakobson, structuralism, 2404|Barteld2016|When annotating non-standard texts such as historical texts or spoken language, tasks that are normally considered to be pure categorization tasks such as part-of-speech tagging are often combined with correcting errors in the tokenization and even the transcribed text itself along the way. As a consequence, inter-annotator agreement measures are needed that measure agreement for categorization by also taking changes in segmentation and the underlying text into account. In this pa- per, we present the first inter-annotator measure of this kind, text-gamma ( t γ ). Based on γ (Mathet et al., 2015), the inter-annotator agreement is measured using an alignment of the annotations. For this, we consider alignments of the annotations that follow from optimal alignments of the underlying text sequences. Furthermore, we use a specialized function to measure the disorder of the alignment. For chance-correction, we introduce a method that takes the annotation bias introduced by pre-annotation into account when estimating the expected (dis)agreement between annotators.|000|inter-annotator agreement, measure, segmentation, ancient texts 2405|Hippisley1998|Aronoff (1992: 14-16) notes that the formal part of a lexeme carries with it a number of notions that are important to disentangle. First, the "root" is the form that is left when all morphologically added structure has been "wrung out." Second, a lexeme's "citation form" is the special form used in lexicography as a place-holder or address which we can think of as the entire lexeme in shorthand. (For Russian nouns this is the nominative singular form.) Third, the "lexical representation" is the analogue in the mental lexicon of the citation form. 6 Fourth, and finally, the stem is "that form of a lexeme to which a given affix is attached or on which a given realization rule operates" (1992: 14).|1096|Russian, lexeme representation, modeling, word formation 2406|Song2014| Reconstruction studies of Old Chinese (OC hereafter) and Proto Sino-Tibetan (PST hereafter) have yielded numerous significant discoveries related to the phonological histories of these two ancient languages. Despite recent advancements into OC and PST phonological histories, a few mysteries remain yet unsolved. One such mystery, the ‘stop coda’ problem, is as hotly debated now as it was when it was first raised seventy years ago. This long-running debate focuses on the existence and identity of the ‘stop codas’ in OC and in its parent language, PST. One reason why this debate has failed to reach a satisfactory conclusion is that the reconstruction methodology is limited, which assumes the Neogrammarian law of sound change. This law holds that sound change occurs without exception in every form that meets the structural description. Although this law applies to many sound changes in myriad world languages, it is not the only possible pathway of sound change. In this paper, I will argue that the key to the ‘stop-coda’ problem of OC belongs to another sound change type — lexical diffusion. The organization of the paper is as follows. In Part One, I will introduce the background of the debate over the ‘stop codas’ in OC. Part Two reviews previous opposing analyses of the ‘stop coda’ debate. Part Three details my proposal for a lexical diffusion analysis of the ‘stop coda’ problem based on internal evidence in Chinese. Part Four investigates the problem using the Comparative Method based on external evidence from Tibetan and loan words from Chinese to Sino-Japanese. In Part Five, I will present my solution to the ‘stop coda’ problem, which is based on the analysis in the two preceding sections. Finally, in Part Six, I will discuss the general methodology of phonological reconstruction in light of the sound change mechanism. |000|plosive code, Old Chinese, lexical diffusion, linguistic reconstruction 2407|Song2014|Nathan Hill has a reply to this article which will soon appear. .. pull-quote:: Song reviews two previous proposals (1) 'the voiced stop coda hypothesis' and (2) 'the open syllable hypothesis' to explain these data and she fnds them both wanting. She ofers a new proposal that employs a the reconstruction of voiced and voiceless stop fnals (exactly opposite in distribution to those in the system of Li Fang-kuei) in the ancestor of Chinese and Tibetan, with the lexical difusion of the loss of the inherited voiceless series paradoxically both at the time of the script's invention and during the time of the 詩經 Shījīng compilation, explaining the relevant oddities in phonetic series and rhyme practice. The comparison of Tibetan voiced fnals with Chinese voiceless fnals serves as a keystone to her argument.|000|Old Chinese, linguistic reconstruction 2408|Feder1995|We first consider the problem of partitioning the edges of a graph G into bipartite cliques such the total order of the cliques is minimized, where the order of a clique is the number of vertices in it. It is shown that the problem is NP-complete. We then prove the existence of a partition of small total order in a sufficiently dense graph and devise an efficient algorithm to compute such a partition and the running time. Next, we define the notion of a compression of a graph G and use the result on graph partitioning to efficiently compute an optimal compression for graphs of a given size. An interesting application of the graph compression result arises from the fact that several graph algorithms can be adapted to work with the compressed representation of the input graph, thereby improving the bound on their running times, particularly on dense graphs. This makes use of the trade-off result we obtain from our partitioning algorithm. The algorithms analyzed include those for matchings, vertex connectivity, edge connectivity, and shortest paths. In each case, we improve upon the running times of the best-known algorithms for these problems.|000|clique, networks, algorithms, NP-complete 2409|Feder1995|A *bipartite clique* is a complete bipartite graph, and its *order* is the number of vertices in it. The order of a collection of bipartite cliques is the sum of the orders of the individual cliques. We establish that the problem of [pb] computing a minimum order partition is NP-complete. :comment:`Authors then refer to` @Holyer1981 :comment:`for solving a similar problem on monopartite graphs.`|261f|NP-complete, minimum order partition, networks, algorithms 2410|Holyer1981|We show that for each fixed  :math:`n > 3` it is NP-complete to determine whether an arbitrary graph can be edge-partitioned into subgraphs isomorphic to the complete graph :math:`K_{n}`. The NP-completeness of a number of other edge-partition problems follows immediately.|000|NP-complete, algorithms, graph theory, networks, clique 2411|Holyer1981|The following problems are now easily seen to be NP-complete. (i) Find the maximum number of edge-disjoint :math:`K_n` ’s in a graph ( n  3 ). (ii) Find the maximum number of edge-disjoint maximal cliques in a graph. (iii) Edge-partition a graph into the minimum number of complete subgraphs. (iv) Edge-partition a graph into maximal cliques. (v) Edge-partition a graph into cycles :math:`C_m` of length m .|3|clique, maximal clique, graph theory, NP-complete 2412|Batagelj2001|In the paper a subquadratic (O(m), m is the number of arcs) triad census algorithm for large and sparse networks with small maximum degree is presented. The algorithm is implemented in the program Pajek. |000|triad census, graph theory, networks, algorithms 2413|Batagelj2001|All possible triads (@Wasserman<1994> and Faust, 1994, p. 244) can be partitioned into three basic types: * the *null* triad 003; * *dyadic* triads 012 and 102; and [pb] * *connected* triads: 111D, 201, 210, 300, 021D, 111U, 120D, 021U, 030T, 120U, 021C, 030C and 120C. |238f|triad, network, graph theory 2414|Wasserman1994|Relationships among larger subsets of actors may also be studied. Many important social network methods and models focus on the *triad*; a subset of three actors and the (possible) tie(s) among them. The analyital shift from pairs fo individuals to triads (which consist of three potential pairings) was a crucial one for the theorist Simmel [...]. Balance theory has informed and motivated many triadic analyses. Of particular interest are whether the triad is transitive (if actor *i( "likes" actor *j*, and actor *j* in turn "likes" actor *k*, then actor i will also "like" actor *k*), and whether the triad is balanced (if actors *i* and *j* like each other, then *i* and *j* should be similar in their evaluation of a third actor, *k*, and if *i* and *j* dislike each other, then they should differ in their evaluation of a third actor, *k*).|19|graph theory, definition, triad 2415|Wasserman1994|Book is very useful for graph theory, also because it contains some easy-to-understand definitions for major terms.|000|graph theory, introduction, 2416|Holyer1981|Given a graph :math:`G = ( V; E )`, the problem is to determine whether the edge-set E can be partitioned into subsets :math:`E 1 , E 2 , ...` in such a way that each :math:`E_{i}` generates a subgraph of G isomorphic to the complete graph :math:`K_n` on *n* vertices.|1|problem, NP-complete, edge-partition, graph theory, partitioning 2417|Ryzhkov1976|It is easily shown that the smallest number of colors that can color a certain graph L = (N, V) is equal to the minimal number of subgraphs into which the graph F, complementary to L, can be partitioned.|939|networks, graph theory, graph coloring, subgraph, partitioning 2418|Ryzhkov1976|Partitioning the graph P into the minimal number of complete subgraphs :math:`Γ_{i} = (N_i, U_i)` and associating to each set math:`N_i` of objects a service point, we thereby solve the problem of determining the optimal number of points, necessary to service all the objects.|1|graph theory, partitioning, minimal number of subgraphs, algorithms 2419|Ryzhkov1976|It is not yet entirely clear how the algorithm works, but the essential idea is to start from a clique with minimal size (and size i >= 3). The author explains the algorithm on pp 942-943 also with an example figure. Based on this, we can derive the algorithm as follows: 1. find the vertex *v* with the minimal number of edges (minimal degree, with degree >= 3 [???]) 2. get the subgraph *s* from the neighbors of *v* and find all maximal cliques in *s* 3. at least one of the subgraphs will enter into the minimal set of maximal cliques Then, there are some further ideas which I don't understand right now, which will be added, once I understand it.|000|clique, partitioning, problem 2420|Bhasker1991|A clique of a simple graph G = (V, E), where V is the set of vertices in the graph and E the set of edges, is a subset W of V such that for every pair of vertices in W, there is an edge in E. The clique-partitioning of a graph refers to the problem of finding the smallest number of cliques in a graph such that every vertex in the graph is represented in exactly one clique. The coloring of a graph refers to the problem of finding the smallest set of “colors,” such that each vertex of the graph is assigned exactly one color from this set, and that no two vertices that are connected have the same color. The clique-partitioning of a graph G is equivalent to the coloring of the complement graph G’.|1|clique-partitioning problem, graph-coloring problem, problem, clique, graph theory, algorithms 2421|Bhasker1991|We study the problem of clique-partitioning a graph. We prove a new general upper bound result on the number of clique-partitions. This upper bound is the best possible, given information of just the vertices and the number of edges. Next we show that there exists an optimal partition in which one of the cliques is a maximal clique. Finally we present two new efficient methods to clique-partition a graph. Since the clique-partitioning of a graph is equivalent to the coloring of the complement of the graph, any coloring algorithm can also be used to clique-partition a graph. We present detailed statistics comparing the performance of our algorithms against two of the best known coloring algorithms and against a recently published clique-partitioning algorithm. Both functional and timing comparisons are given and we show that our algorithms compare very favorably on both counts.|000|clique-partitioning problem, graph-coloring problem, algorithms, problem 2422|Hetland2010|:comment:`Author discusses commonalities between clique cover problem and graph coloring problem.`|256f|graph-coloring problem, clique-partitioning problem, problem, Python, algorithms 2423|Rama2016|In this paper, we introduce a threshold free approach, motivated from Chinese Restaurant Process, for the purpose of cognate clustering. We show that our ap- proach yields similar results to a linguistically motivated cognate clustering system known as LexStat. Our Chinese Restaurant Process system is fast and does not require any threshold and can be applied to any language family of the world.|000|automatic cognate detection, chinese restaurant process, automatic threshold selection 2425|Cerioli2004|A graph G is a unit disk graph if it is the intersection graph of a family of unit disks in the euclidean plane. If the disks do not overlap, then G is also a unit coin graph or penny graph. In this work we establish the complexity of the minimum clique partition problem and the maximum independent set problem for penny graphs, both NP-complete, and present two approximation algorithms for finding clique partitions: a 3-approximation algorithm for unit disk graphs and a -approximation algorithm for penny graphs.|000|graph theory, clique-partitioning problem, independent-set problem 2426|Valente2005|Diffusion of innovations theory attempts to explain how new ideas and practices spread within and between communities. The theory has its roots in anthropology, economics, geography, sociology, and marketing, among other disciplines (Hägerstrand 1967; Robertson 1971; Brown 1981; Rogers 2003), and has in some ways been adapted from epidemiology (e.g., Bailey 1975; Morris 1993). The premise, confirmed by em- pirical research, is that new ideas and practices spread through interpersonal contacts largely consisting of interpersonal communication (Ryan and Gross 1943; Beal and Bohlen 1955; Katz, Levine, and Hamilton 1963; Rogers 1995; Valente 1995; Valente and Rogers 1995).|000|diffusion of innovations, anthropology, network, innovation, network analysis 2427|Brockmann2013|The global spread of epidemics, rumors, opinions, and innovations are complex, network-driven dynamic processes. The combined multiscale nature and intrinsic heterogeneity of the underlying networks make it difficult to develop an intuitive understanding of these processes, to distinguish relevant from peripheral factors, to predict their time course, and to locate their origin. However, we show that complex spatiotemporal patterns can be reduced to surprisingly simple, homogeneous wave propagation patterns, if conventional geographic distance is replaced by a probabilistically motivated effective distance. In the context of global, air-traffic–mediated epidemics, we show that effective distance reliably predicts disease arrival times. Even if epidemiological parameters are unknown, the method can still deliver relative arrival times. The approach can also identify the spatial origin of spreading processes and successfully be applied to data of the worldwide 2009 H1N1 influenza pandemic and 2003 SARS epidemic.|000|diffusion of innovations, epidemology, network analysis, contamination, disease spread, flight 2428|Brockmann2013|This paper is extremely interesting with respect to the spread of linguistic innovations, as it shows that innovations are not necessarily complex but that simple network patterns can often give a closer account on how change spreads. What this paper deals with is also the search for innovations in a network, which is useful as a model for anthropology, where we do not deal with trees, as also shown in the short overview by @Valente2005.|000|- 2429|Valente2005|This paper somehow gives more information on the theory of diffusion in anthropology and other parts of science, finding a concrete application in @Brockmann2013 for the spread of diseases.|000|- 2430|Bratceva1994|Предлагаются три переборных метода. Для непосредственного решения поставленной задачи используется метод последовательных расчетов, при этом требуется знание всех клик графа. Для решения задачи путем нахождения всех наименьших разбиений графа на полные подграфы применяется усовершенствованный метод Вэна — Рыжкова, дополненный так называемым «алгоритмом двудольного графа». В этом случае не требуется отыскания всех клик. Предварительно находятся только некоторые наименьшие разбиения. И, наконец, в третьем, индуктивном методе совмещается построение полных подграфов с отысканием наименьших разбиений, т. е. вместо отщепления клик происходит их наращивание. За исходный граф принимается такой подграф, для которого предыдущим методом легко находятся все наименьшие разбиения, после чего последовательно добав­ляются вершины, заданного графа и решаются параметрические; задачи. :translation:`Three exact methods are presented. For the direct solution of the problem we use the method of successive calculations, for which we need to know all cliques in the graph. For the solution of the task of finding all minimal partitions of teh graph in complete subgraphs we use the advanced method by Ven Ryzhkov, complemented by the so-called "bipartite-graph-algorith." In this case, we do not need to know all cliques. And last not least, in the third, inductive method, we mix the construction of complete subgraphs with the search of minimal partitions, that is, instead of splitting cliques, they are being combined. [...].`|000|clique-partitioning problem, algorithms, graph theory 2431|Bratceva1994|It seems the authors mention first @Ryzhkov1976 and his algorithm and are aware of some concrete solutions for the clique coverage problem (clique partitioning). Suprisingly, they are not quoted in any other literature, so one wonders, why this might be the case.|000|clique-partitioning problem 2432|Kawabata2012|Today, approximately 75,000 CJK ideographs have been encoded in ISO/IEC 10646 and Unicode. However, there are still many more unencoded ideographs, many of which will remain unencoded because they are considered variants of encoded ideographs, and there is still some demand to distinguish subtle minor design differences. GlyphWiki, developed by Koichi Kamichi of Daito-­‐Bunka University, is a wiki-­‐based collaborative glyph design system that meets such demands. By using this system, GlyphWiki published world's first fonts that cover all CJK ideographs in UCS/Unicode. GlyphWiki enables its users to design their own character, name it, and publish it as an OpenType font or SVG file. This paper describes the architecture of GlyphWiki, its glyph-­‐naming system and how the OpenType features of published font correspond to these Wiki-­‐based naming schemes. |000|glyph representation, CJK ideographs, Chinese characters, 2433|Ting1980|This paper examins the Danzhou dialect spoken on Hainan island. It contains vocabulary sketches, selected to illustrate phonology, mostly monosyllabic words, and of little use to more thorough investigations. As a collection of sound inventories, however, it may be useful to be taken into consideration.|000|phonological sketch, Chinese dialects, Hǎinán, 2434|Doenges2016|Wissenschaftler von Microsoft Research haben eine Software entwickelt, die Alltagskonversationen mindestens genauso gut in Text übersetzen kann wie menschliche Profis – bei Tests schnitt das Programm sogar insgesamt ein bisschen besser ab als der Mensch. Der Rekord gelang dem Team um Geoffrey Zweig allerdings nur durch den Nachweis, dass die tatsächliche menschliche Fehlerquote viel höher ist als lange angenommen. Bislang hieß es, rund vier Prozent der Wörter oder Ausdrücke in einem Gespräch würden von Menschen falsch verstanden und/oder verschriftet. Die Forscher spannten nun jedoch den microsoftinternen Transkriptionsservice ein und kamen auf eine Fehlerquote von 5,9 bis 11,3 Prozent.|12|transcription, conversation, software, evaluation 2435|Doenges2016|This article is interesting in so far as it shows two things: * computers can improve by being trained on enough data * humans overestimate the accuracy of their ability to actually understand speech The latter aspect is more interesting, as it shows that consistency is not necessarily automatically achieved in all tasks just because we are humans.|000|transcription, conversation, accuracy, evaluation 2436|Givon2016|Paper treats Saussure's famous statement saying that one should make a distinction between purely linguistic research and the external factors. The author argues against this conception, pointing to the importance of external factors in describing speech.|000|structuralism, Ferdinand de Saussure, external linguistics, 2437|Givon2016|We owe Hanson (1958) the comprehensive elaboration of the process of sci- ence, in his integration of Carnap’s inductivism, Popper’s (1934/1959) deductiv- ism, and Peirce’s (1934, 1940) pragmatism. **Science as a multi-method process:** *Abductive phase* a. Puzzling facts F are incompatible with current theory T. b. Facts F are, however, totally compatible with new hypothesis H. c. If hypothesis H were true, facts F would find their natural explanation in it. d. Therefore, by abduction, hypothesis H must be the case. *Deductive phase:* e. Derive a sufficient number of the logical implications LI of hypothesis H. *Inductive phase:* f. Construct experimental or population-statistic tests of the logical implications LI of H. g. Gather the facts concerning those logical implications. Do the facts uphold or falsify logical implications LI of H? *Deductive phase:* h. If you failed to falsify logical implications LI, hypothesis H survives — till some future test may falsify some of its logical implications, or till new facts are discovered that are incompatible with hypothesis H. i. In the interim, hypothesis H prevails. Two important points emerge from Hanson’s integrated description: • Facts (descriptions) are not independent of theory, but rather are theoretical constructs that interact with the theory. • The process of hypothesis formation (abduction) is indispensable to science, and constitutes theoretical explanation.|685|philosophy of science, theory development, abduction, induction, deduction 2438|Givon2016|:comment:`Author later mentions three dogmas of structuralism, namely:` **Arbitrariness** As noted above, Aristotle’s doctrine of arbitrariness of the linguistic sign — thus arbitrariness of cross-language diversity — pertained only to the semiotic re- lation between concepts (words) and sounds (or letters). Latter-day structuralists, with Saussure as their reigning authority, unreflectively extended the doctrine to grammar. [...] **Idealization: Langue vs. parole** In line with a long Platonic tradition, which Saussure again does not acknowl- edge, he lay down his second firewall between the underlying abstract system — langue — and the manifest behavior — parole. [...] [pb] :comment:`I consider the author's criticism as a bit harsh, as the idea of langue is also reflected in the work by Popper, and not uninteresting in this context.` **Segregation: Synchrony vs. diachrony** Saussure’s third firewall, just as essential to the Platonic enterprise, is the doc- trine of segregation, this time between synchrony, the product, and diachrony, the process that gave it birth. [...] It is of course worth noting that Saussure’s two idealizations, langue ex parole and synchrony ex diachrony, are hopelessly intertwined, being the Siamese twins of the same Platonic philosophical impulse. :comment:`This holds also for a nother Saussurean term: syntagmatics vs. paradigmatics.` |686f|arbitrariness, langue, parole, diachrony and synchrony 2439|Givon2016|The latter is particularly important because it suggests an overlap, not only ana- logical but also homological, with language diachrony, the culturally-transmitted developmental trend that, we used to think, had no precedent in biology. :comment:`Speeking of developmental biology here!` The source of variation in biological populations is both genetic (genotypic) and non-genetic (phenotypic, epigenetic, behavioral). While both can be adaptive, it was earlier assumed that only genetic variation had direct evolutionary consequences. However, the adaptive interaction of genes with the environment — natural selection — is mediated by the phenotype’s structural and behavioral traits, which are only partially controlled by genes. As a result, non-genetic variation does partake in the actual mechanism of adaptive selection. Put another way, synchronic variation in behavior — the adaptive experimentation of individuals — contributes, in a fashion reminiscent of Lamarck, to the eventual direction of evolution.|691|biological parallels, Jean-Baptiste Lamarck, developmental biology, language evolution 2440|Gschwind2015|In social network analysis (SNA), relationships between members of a network are encoded in an undirected graph where vertices represent the members of the network and edges indicate the existence of a relationship. One important task in SNA is community detection, that is, clustering the members into communities such that relatively few edges are in the cutsets but relatively many are internal edges. The clustering is intended to reveal hidden or reproduce known features of the network, while the structure of communities is arbitrary. We propose decomposing a graph into the minimum number of relaxed cliques as a new method for commu- nity detection especially conceived for cases in which the internal structure of the community is important. Cliques, that is, subgraphs with pairwise connected vertices, can model perfectly cohesive communities, but often they are overly restrictive because many real communities form dense but not complete subgraphs. Therefore, different variants of relaxed cliques have been defined in terms of vertex degree and distance, edge density, and connectivity. They allow to impose application-specific constraints a community has to fulfill such as familiarity and reachability among members and robustness of the communities. Standard com- pact formulations fail in finding optimal solutions even for small instances of such decomposition problems. Hence, we develop exact algorithms based on Dantzig-Wolfe reformulation and branch-and-price techniques. Extensive computational results demonstrate the effectiveness of all components of the algorithms and the validity of our approach when applied to social network instances from the literature.|000|clique, community detection, social network analysis (SNA), relaxed clique 2441|Gschwind2015|Classically, cliques have been used to model cohesive groups. They can be seen as extremal cohesive groups in the sense that every member is fully connected with each other. This constraint has been found too restrictive in many applications and, therefore, various relax- ations of the clique concept, such as s-clique, s-plex, s-club, s-defective clique, and γ-quasi-clique, have been introduced (see Pattillo et al., 2013a, and references given there). We refer to these structures as relaxed cliques in the following.|1|relaxed clique, social network analysis (SNA), definition 2442|Wennerberg2011|This article provides a brief introduction to Wittgenstein's idea of "family resemblance". The author characterizes the idea s mainly intended to oppose the common theory of defining characteristics of groups, which Wittgenstein opposes by pointing to the existence of groups without final defining characteristics, but with enough inter-group similarity among all their members.|000|family resemblance, Ludwig Wittgenstein, Philosophische Untersuchungen (PU), 2443|Nishi1999|This paper aims to examine the various interpretations of the phonological system of Old Burmese (of Burma, now Myanmar) so far made and propose a conceivable framework of the history of Burmese in the light of our recent knowledge of Burmish languages and the regional dialects of Burmese, as well as orthographic variations in, and or- thographic changes since, Old Burmese, from the standpoint that Pre- sent-day Standard Burmese is a later changed form of Old Burmese.|000|Old Burmese, phonetic description 2444|Sagart2006|Very interesting article which gives some examples of misinterpretation of Chinese character structure due to a wrong identification of the phonetic and the semantic parts. |000|Chinese characters, Chinese character formation, radical, Old Chinese 2445|Savigny2011|Will man sich auf alle diese unterschiedlich ausdrücklichen Formulierungen von argumentativ benutzten Annahmen einen Reim machen, dann bleiben die einfachen Gleichsetzungen von Sinn des Satzes und Bedeutung des Worts mit ihrem Gebrauch in der Sprache als nächstliegende Lösungen übrig. Das heißt freilich nicht, daß auch klar wäre, was man unter dem „Gebrauch in der Sprache“ zu verstehen hat.|8|Ludwig Wittgenstein, meaning, usage, 2446|Savigny2011|Die für ein Verständnis dieser „Gebrauchstheorie der Bedeutung“ fruchtbarste Vorstel- lung scheint Wittgensteins Gedanke zu sein, daß sprachliche Ausdrücke ihre Bedeu- tung ihrer „Rolle im Sprachspiel“ verdanken (nicht etwa der Sprecherabsicht oder den erzielten Wirkungen).|8|meaning, language game, Ludwig Wittgenstein, Philosophische Untersuchungen (PU) 2447|Savigny2011|Wir haben oben gesehen, daß es schwer ist, die Gebrauchstheorie der Bedeutung für Wörter oder Sätze durch Beispiele plausibel zu machen. Für die Bedeutung von Äußerungen haben wir das Problem gelöst. Wir können für einzelne Beispiele konkret angeben, welcher Gebrauch einer Äußerung dafür kennzeichnend ist, daß sie ihre Bedeutung hat; dabei identifizieren wir den Gebrauch mit dem charakteristischen Paar aus Vorbedingung und Ergebnis. Soweit diese Beispiele plausibel sind und verallgemeinerungsfähig aussehen, wird die Gebrauchstheorie plausibel.|17|Ludwig Wittgenstein, meaning, usage, Philosophische Untersuchungen (PU) 2448|Savigny2011|Article presents an interpreation of the usage-theory of meaning in Wittgenstein's Philosophische Untersuchungen. |000|Philosophische Untersuchungen (PU), Ludwig Wittgenstein, usage, meaning 2449|Pellegrini2012|Detecting and characterizing dense subgraphs (tight com- munities) in social and information networks is an impor- tant exploratory tool in social network analysis. Several approaches have been proposed that either (i) partition the whole network into ”clusters”, even in low density region, or (ii) are aimed at finding a single densest community (and need to be iterated to find the next one). As social networks grow larger both approaches (i) and (ii) result in algorithms too slow to be practical, in particular when speed in ana- lyzing the data is required. In this paper we propose an approach that aims at balancing efficiency of computation and expressiveness/manageability of the output community representation. We define the notion of a partial dense cover (PDC) of a graph. Intuitively a PDC of a graph is a collec- tion of sets of nodes that (a) each set forms a disjoint dense induced subgraphs and (b) its removal leaves the residual graph without dense regions. Exact computation of PDC is an NP-complete problem, thus, we propose an efficient heuristic algorithms for computing a PDC which we christen Core & Peel. Moreover we propose a novel benchmarking technique that allows us to evaluate algorithms for comput- ing PDC using the classical IR concepts of precision and recall even without a golden standard. Tests on 25 social and technological networks from the Stanford Large Net- work Dataset Collection confirm that Core & Peel is efficient and attains very high precison and recall.|000|community detection, dense subgraph, algorithms, graph theory 2450|Kellerwessel2009|This book is an introduction to Wittgenstein's Philosophische Untersuchungen. I did not read it, but it seems that it is an exhaustive resource providing some interesting information regarding the PU. The concent is organized as follows: 1 Introduction 2 Background: Tractatus logico-philosophicus 3 Philosophical ivestigations 3.1 Introduction, specifics, Vorgehensweise 3.2 chronological interpretation 3.3 results and conlcusion: main goals of the PU 4. Wittgenstein's language philosophy in the context of analytical theories of meaning|000|Philosophische Untersuchungen (PU), Ludwig Wittgenstein, introduction 2451|Pathmanathan2016|**Motivation:** The remodeling of genes by shuffling, fusion, fission of genetic fragments and de novo DNA synthesis contrib- ute to the creation and diversification of gene families. Therefore, genetic sequences show similarity with one another for diverse reasons, i.e. common ancestry producing homology, and/or partial sharing of component fragments. These processes must be disentangled to understand the rules and constraints on gene evolution. This task is especially challenging in large molecular datasets, since computational analyses remain a bottleneck. We used microbial environmental data to test whether the evolutionary processes affecting gene remodeling in polluted environments obeyed some detectable rules. **Results:** We developed CompositeSearch, a memory-efficient, fast and scalable method to detect composite gene families in large datasets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity net- works to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data. We applied CompositeSearch to a dataset of 3,906,323 environmental sequences from 3 increasingly polluted sites. We report that increasingly polluted samplings sites present increasing percentages of composite genes, while the rules of functional associations of their components remains identical between sites.|000|community detection, composite genes, algorithms, gene similarity network 2452|Pathmanathan2016|Paper mentions importance of community detection algorithms in gene family search, as otherwise, matches may be overexaggerated, as also reported in @Mills2013, where the authors show that a BLAST search may be misleading.|000|community detection, algorithms, gene similarity network 2453|Pathmanathan2016|If at least two neighbors of this node belong to distinct gene families, CompositeSearch takes the sequence corresponding to the node as a reference and maps the matches from all different families along that sequence. Each region with matches from different families along a composite sequence is called a “gene domain” hereafter. For each “gene domain”, CompositeSearch computes an average position for the start of the domain and an average position for the end of the domain.|3|composite genes, algorithms, gene similarity network 2454|Mills2013|MOTIVATION: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions. RESULTS: We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (>33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone.|000|scoring matrix, BLOSUM, sequence alignment, BLAST 2455|Ku2016|Eukaryotic sequences that share ≥70 % amino acid identity to prokaryotic homologs are prob- ably not lateral gene transfers at all, but just contaminants.|8|biological evolution, contamination, sequencing error, data problems, evolutionary biology 2456|Ku2016|Background: The literature harbors many claims for lateral gene transfer (LGT) from prokaryotes to eukaryotes. Such claims are typically founded in analyses of genome sequences. It is undisputed that many genes entered the eukaryotic lineage via the origin of mitochondria and the origin of plastids. Claims for lineage-specific LGT to eukaryotes outside the context of organelle origins and claims of continuous LGT to eukaryotic lineages are more problematic. If eukaryotes acquire genes from prokaryotes continuously during evolution, then sequenced eukaryote genomes should harbor evidence for recent LGT, like prokaryotic genomes do. Results: Here we devise an approach to investigate 30,358 eukaryotic sequences in the context of 1,035,375 prokaryotic homologs among 2585 phylogenetic trees containing homologs from prokaryotes and eukaryotes. Prokaryote genomes reflect a continuous process of gene acquisition and inheritance, with abundant recent acquisitions showing 80–100 % amino acid sequence identity to their phylogenetic sister-group homologs from other phyla. By contrast, eukaryote genomes show no evidence for either continuous or recent gene acquisitions from prokaryotes. We find that, in general, genes in eukaryotic genomes that share ≥70 % amino acid identity to prokaryotic homologs are genome-specific; that is, they are not found outside individual genome assemblies. Conclusions: Our analyses indicate that eukaryotes do not acquire genes through continual LGT like prokaryotes do. We propose a 70 % rule: Coding sequences in eukaryotic genomes that share more than 70 % amino acid sequence identity to prokaryotic homologs are most likely assembly or annotation artifacts. The findings further uncover that the role of differential loss in eukaryote genome evolution has been vastly underestimated.|000|biological evolution, evolutionary biology, contamination, errors, data problems 2457|Ku2016|Paper discusses two aspects: one, the problem of highly similar sequences as a potential result of contamination, especially between eucaryotes and prokaryotes, two, the idea that there is a natural barrier for lateral gene transfer between eucaryotes and prokaryotes.|000|lateral gene transfer, barriers, contamination, data problems, eukaryotes, prokaryotic evolution 2458|Korn2016|Relations within the Iranian branch of Indo-European have traditionally been modelled by a tree that is essentially composed of binary splits into sub- and sub-subbranches. The first part of this article will argue against this tree and show that it is rendered outdated by new data that have come to light from contemporary and ancient languages. The tree was also methodologically problematic from the outset, both for reasons of the isoglosses on which it is based, and for not taking into account distinctions such as shared innovations vs. shared archaisms. The second part of the paper will present an attempt at an alternative tree for Iranian by proposing a subbranch which I will call “Central Iranian”. Such a branch seems to be suggested by a set of non-trivial morphological innovations shared by Bactrian, Parthian and some neighbouring languages. The reconstruction of the nominal system of Central Iranian which will then be proposed aims to show the result one arrives at when trying to reconstruct a subbranch as strictly bottom-up as possible, i. e. using only the data from the languages under study, and avoiding profitting from Old Iranian data and from our knowledge about the proto-languages.|000|Indo-Iranian, family tree, subgrouping, 2459|Boc2010a|Horizontal gene transfer (HGT) is one of the main mechanisms driving the evolution of microorganisms. Its accurate identification is one of the major challenges posed by reticulate evolution. In this article, we describe a new polynomial-time algorithm for inferring HGT events and compare 3 existing and 1 new tree comparison indices in the context of HGT identification. The proposed algorithm can rely on different optimization criteria, including least squares (LS), Robinson and Foulds (RF) distance, quartet distance (QD), and bipartition dissimilarity (BD), when searching for an optimal scenario of subtree prune and regraft (SPR) moves needed to transform the given species tree into the given gene tree. As the simulation results suggest, the algorithmic strategy based on BD, introduced in this article, generally provides better results than those based on LS, RF, and QD. The BD-based algorithm also proved to be more accurate and faster than a well-known polynomial time heuristic RIATA-HGT. Moreover, the HGT recovery results yielded by BD were generally equivalent to those provided by the exponential-time algorithm LatTrans, but a clear gain in running time was obtained using the new algorithm. Finally, a statistical framework for assessing the reliability of obtained HGTs by bootstrap analysis is also presented.|000|lateral gene transfer, borrowing detection, gene tree reconciliation 2460|Boc2010a|Interesting paper giving a good overview also on the similarities between lateral gene transfer detection using gain-loss-mapping and gene tree reconciliation.|000|gene tree reconciliation, lateral gene transfer 2461|Ballouz2010|Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.|000|gene clusters, selfish operon theory, simulation studies, protein evolution 2462|Cooper2001|The problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implementation. This paper returns to a simple formulation of dominance as a global data-flow problem. Some insights into the nature of dominance lead to an implementation of an O(N 2 ) algorithm that runs faster, in practice, than the classic Lengauer-Tarjan algorithm, which has a timebound of O(E ∗ log(N )). We compare the algorithm to Lengauer-Tarjan because it is the best known and most widely used of the fast algorithms for dominance. Working from the same implementation insights, we also rederive (from earlier work on control dependence by Ferrante, et al.) a method for calculating dominance frontiers that we show is faster than the original algorithm by Cytron, et al. The aim of this paper is not to present a new algorithm, but, rather, to make an argument based on empirical evidence that algorithms with discouraging asymptotic complexities can be faster in practice than those more commonly employed. We show that, in some cases, careful engineering of simple algorithms can overcome theoretical advantages, even when problems grow beyond realistic sizes. Further, we argue that the algorithms presented herein are intuitive and easily implemented, making them excellent teaching tools.|000|dominators, graph theory, algorithm, directed network 2463|Xing1998|本 文 为 《 汉 藏 语 系上 古 音 之 支 脂 鱼 四 部 同 源 字 考》 ( 载 《 民 族 语 文 》 1 8 9 9 年 第 4 。 期 ) 的续 篇写 作 目 的 仍 在 指 出 美 国 学 者 白 保 罗 所 建 立 的 排 斥 侗 台 苗 瑶 语 的 汉 藏语 。 系 的 假 说 不 可 信.、,写 作 方 法 使 用 的 音 标 符 号 及 参 考 书 目 都 跟 前 篇 相 同 不 再 说 明。 |000|Sino-Tibetan, Tai-Kadai, Hmong-Mien, 2464|Xing1998|Author uses 15 words to demonstrate commonality between Sino-Tibetan, Hmong-Mien, and Tai-Kadai.|000|Tai-Kadai, Hmong-Mien, Sino-Tibetan 2465|Starostin2016b|Статья вкратце обосновывает необходимость разработки нового типа «лингвофилологического» коммента- рия к классическим текстам древне­китайской цивилизации, в рамках которого принципы чисто филологического коммен- тирования, восходящие к китайской традиции, органично со- четались бы с новейшими достижениями этимологического, лексикологического и грамма­тического анализа древнеки- тайского языка разных периодов. В частности, обсуждается механизм создания такого ком­ментария для классической поэти­ческой антологии «Шицзин» (I тыс. до н. э.), уже давно являющейся объектом как фило­логических, так и чисто линг- вистических исследований; конкретная модель того, как мо- жет выглядеть комментарий, демонстрируется на примере нескольких строф первого стихотворения антологии.|000|Chinese, Classical Chinese, commentary, philological tradition 2466|Starostin2016b|Author sets up a series of proposed guidelines on how to formalize the comparability of commentaries to ancient Chinese texts and illustrates these for the first passages of the Shījīng.|000|Shījīng, guidelines, commentary, ancient texts, Classical Chinese 2467|Starostin2013c|Как в ранних эпиграфических формах, так и в древнейших литератур- ных памят никах китайского языка общее значение ʽсобакаʼ имеет лишь один бес спор ный иероглифический эквивалент: Ӆ, соотносимый с современным (ман да ринским) чтением quǎn и раннесреднекитайским khwíen (согласно фоне ти че ской ин тер прета ции среднекитайской фонологической системы, пред ло жен ной С. А. Старостиным (1989) и в целом несущественно отличаю- щейся от альтер нативных моделей Э. Пуллиблэнка, У. Бэкстера и др.)|000|Old Chinese, dog, denotation, semantic reconstruction 2468|Starostin2013c|Author uses a corpus analysis of ancient Chinese texts to show that the development of the word gǒu starts from the original meaning of "edible" vs. quǎn "hunting" dog.|000|dog, Old Chinese, denotation, semantic reconstruction 2469|Song2016|Some researchers, including Wang Li 王力 (1900–1986) and Xu Qing 徐青 (1934–), have claimed—based on evidence provided by cursory observations of tonal arrangement—that pre-Yongming poets also attempted to create tonal contrast effects in pentasyllabic poems. In order to assess the validity of these claims, this study adopted a quantitative approach to the analysis of tonal contrasts in three early pentasyllabic poem collections: the “Nineteen Ancient Poems, ” pentasyllabic poems written by Cao Zhi 曹植 (192–232), and pentasyllabic poems written by Xie Lingyun 謝靈運 (385–433). Unlike any previous study, the research presented in this article interprets tonal contrast data obtained in these poems in an enriched context. The author compares the tonal contrast rates in the poems written before the Yongming 永明 era (483–93) to tonal contrast rates obtained from narrative texts written in the same period, as well as to the ratios expected under the condition of random tonal arrangement. The results of these comparisons reveal that there is no simple answer to the question of whether tonal contrast existed in early pentasyllabic poems. Tonal contrast appears to have been in an intricate tran- sitional stage that was likely initiated with an intuitive act of creativity rather than an intentional manipulation of tones to obtain a particular known effect.|000|poetic function, tonal contrast, poetry, Hàn time, Chinese poetry, quantitative analysis 2470|Stamatakis2006|RAxML-VI-HPC (randomized axelerated maximum likeli- hood for high performance computing) is a sequential and parallel pro- gram for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+G yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets 4000 taxa it also runs 2–3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and infer- ences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively.|000|software, maximum likelihood, ancestral state reconstruction 2471|Starostin2016a|In several of my previous publications (Starostin 2010, 2013a, 2013b), I have repeatedly stressed the importance of combining, rather than opposing the classic comparative method, elaborated by several generations of historical linguists over the past two hundred years, and lexicostatistical methodology, originally introduced by Morris Swadesh and his colleagues in the 1950s and once again popular these days due to a massive influx of computational phylogenetic methods from adjacent branches of science. It was with that precise purpose — to integrate the two approaches — that a team of Moscow-based historical linguists set up The Global Lexicostatistical Database (GLD) 1 , a large-scale project that aims at applying a uniform, maximally formalized lexicostatistical methodology to all the languages of the world in order to arrive at a reasonable genetic classification, while at the same time corroborating the results with knowledge gained from traditional historical linguistics and philology.|000|Tower of Babel, Global Lexicostatistical Database, methodology, phylogenetic reconstruction, ancestral state reconstruction, 2472|Starostin2016a|Very interesting description of a guideline for semantic / onomasiological reconstruction in lexicostatistics, apart from a good presentation of the GLD, so it could also be used to quote the GLD as a dataset.|000|Global Lexicostatistical Database, description, semantic reconstruction 2473|Szabo2012|Anything that deserves to be called a language must contain meaningful expressions built up from other meaningful expressions. How are their complexity and meaning related? The traditional view is that the relationship is fairly tight: the meaning of a complex expression is fully determined by its structure and the meanings of its constituents—once we fix what the parts mean and how they are put together we have no more leeway regarding the meaning of the whole. This is the principle of compositionality, a fundamental presupposition of most contemporary work in semantics.|000|definition, compositionality, philosophy, denotation, expression 2474|Nuopponen2014|The paper discusses factors that are relevant when constructing a ty- pology of concept relations for terminology work by focusing especially on ISO 704:2009 Terminology work - Principles and methods and ISO 1087-1:2000 Terminology work - Vocabulary - Part 1: Theory and application standards and their future revisions. At first prerequisites for a concept relation typology are discussed generally. The standards are then scrutinized as to how they intro- duce, define and classify concept relation types, and modifications are sug- gested. A concept relation typology is presented as an example of a comprehen- sive, generalizable and extendable typology.|000|standardization, concept relations, typology 2475|Nuopponen2014|Important paper in the concept of concept relations, but also investiation of polysemies and colexifications.|000|standardization 2476|Fontanari2007|Evolutionary language games have proved a useful tool to study the evolution of communication codes in communities of agents that interact among themselves by transmitting and interpreting a fixed repertoire of signals. Most studies have focused on the emergence of Saussurean codes (i.e., codes characterized by an arbitrary one-to-one correspondence between meanings and signals). In this contribution we argue that the standard evolutionary language game framework cannot explain the emergence of compositional codes - communication codes that preserve neighborhood relationships by mapping similar signals into similar meanings – even though use of those codes would result in a much higher payoff in the case that signals are noisy. We introduce an alternative evolutionary setting in which the meanings are assimilated sequentially and show that the gradual building of the meaning-signal mapping leads to the emergence of mappings with the desired compositional property.|000|compositionality, evolutionary theory, game theory, simulation studies 2477|Hill2016|In a recent article Chenqing Song (2014) draws renewed attention to the problem of groups of Chinese words in which the character used to write one of the words has a stop final reading in Middle Chinese but the character used to write another of the words has an open syllable reading in Middle Chinese, although the two words in question either rhyme in the poems of the 詩經 Shījīng or are members of the same phonetic series, in either case implying that they shared a rime in Old Chinese. She exemplifies this problem with the rhyme of 莫 MChi. mɒk / maek, mâk / mak, muo C / muH ‘end’ 1 and 除 MChi. d� jwo / drjo, d� jwo C / drjoH ‘to pass’ in poem 114 and the inclusion of 乍 MChi. dẓa C / dzraeH ‘suddenly’ and 作 MChi. tsâk / tsak ‘to act’ (Song 2014: 99) in the same phonetic series.|000|plosive coda, Old Chinese, linguistic reconstruction, lexical diffusion 2478|Hill2016|Paper is a comment on the proposal by @Song2014.|000|Old Chinese, plosive coda, linguistic reconstruction 2479|Dasgupta2007|The first step of our morpheme induction method involves extracting a list of candidate prefixes and suffixes. We rely on a fairly simple idea originally proposed by @Keshava<2006> and Pitler (2006) for extract- ing candidate affixes. Assume that a and b are two character sequences and ab is the concatenation of a and b. and . If ab and a are both found in the vocabulary, then we extract b as a candidate suffix. Simi- larly, if ab and b are both found in the vocabulary, then we extract a as a candidate prefix.|157|suffix detection, morpheme detection, automatic approach 2480|Keshava2006|We present a simple, psychologically plausible algorithm to perform unsuper- vised learning of morphemes. The algo- rithm is most suited to Indo-European lan- guages with a concatenative morphology, and in particular English. We will describe the two approaches that work together to detect morphemes: 1) finding words that appear as substrings of other words, and 2) detecting changes in transitional proba- bilities. This algorithm yields particularly good results given its simplicity and con- ciseness: evaluated on a set of 532 human- segmented English words, the 252-line program achieved an F-score of 80.92% (Precision: 82.84% Recall: 79.10%).|000|morpheme detection, automatic approach, suffix detection 2481|Keshava2006|This approach is further used by @Dasgupta2007 and represents the simple strategy of decomposition which we already tested on the CLICS data.|000|morpheme detection, automatic approach 2482|Hajnal2016|The method of comparative reconstruction is considered as the best instrument to reveal the prehistory of individual languages or linguistic families. It is inductive (data-driven) and seems to be much more reliable as the concurring method of internal reconstruction. Still it has been overlooked that each comparative reconstruction - to be innovative - has to be based on an initial premise. Such a premise is always the result of an abductive conclusion as used in internal reconstruction. Thus each comparative reconstruction has a core consisting of internal reconstruction. For syntactical reconstruction the following premise seems to be suitable: Linguistic structures and their changes obey the principles of universal grammar (e. g. the Minimalistic Program) and of grammaticalization. This premise proves fruitful for the explanation of the Early Greek augment */e-/< *h1e-. The augment can be determined as an original particle in the CP layer of the sentence. Later on it is reanalysed as a temporal particle, becomes part of the vP and underlies univerbation with the verb.|000|comparative method, reconstruction methodology, historical linguistics 2483|Holman2016|Since the early 1970s biologists have debated whether evolution is punctuated by speciation events with bursts of cladogenetic changes, or whether evolution tends to be of a more gradual, anagenetic nature. A similar discussion among linguists has barely begun, but the present results suggest that there is also room for controversy over this issue in linguistics. The only previous study correlated the number of nodes in linguistic phylogenies with branch lengths and found support for punctuated equilibrium. We replicate this result for branch lengths but find no support for punctuated equilibrium using a different, automated measure of linguistic divergence and a much larger dataset. With the automated measure, segments of trees containing more nodes show no greater divergence from an outgroup than segments containing fewer nodes. |000|punctuational change, phylogenetics, language change, 2484|Heath2014|Time-calibrated species phylogenies are critical for addressing a wide range of questions in evolutionary biology, such as those that elucidate historical biogeography or uncover patterns of coevolution and diversification. Because molecular sequence data are not informative on absolute time, external data—most com- monly, fossil age estimates—are required to calibrate estimates of species divergence dates. For Bayesian divergence time methods, the common practice for calibration using fossil information involves placing arbitrarily chosen parametric distributions on in- ternal nodes, often disregarding most of the information in the fossil record. We introduce the “fossilized birth–death” (FBD) pro- cess—a model for calibrating divergence time estimates in a Bayes- ian framework, explicitly acknowledging that extant species and fossils are part of the same macroevolutionary process. Under this model, absolute node age estimates are calibrated by a single diversification model and arbitrary calibration densities are not necessary. Moreover, the FBD model allows for inclusion of all available fossils. We performed analyses of simulated data and show that node age estimation under the FBD model results in robust and accurate estimates of species divergence times with realistic measures of statistical uncertainty, overcoming major lim- itations of standard divergence time estimation methods. We used this model to estimate the speciation times for a dataset com- posed of all living bears, indicating that the genus Ursus diversi- fied in the Late Miocene to Middle Pliocene.|000|fossiles, gain-loss models, birth-death models, gain-loss mapping 2485|Jacques2016c|‘Pronominalization’ was used by Hodgson (1857–8) as a label to describe the presence of person indexation in some Trans-Himalayan languages, such as the Kiranti languages, of which he had first-hand fieldwork experience. This pioneering insight attracted attention to an important characteristic of these languages. On the other hand, the analysis encapsulated in this term (that person indexation is the result of the accretion of pronouns on the verb complex) may need to be re-examined in greater detail, making use of newly available evidence. To preview the result of the re-analysis set out below, it would seem best to recognize that the term ‘pronominalized’ is now outdated and potentially misleading: it should be retired, for at least three reasons. :comment:`Interesting point that should be kept in mind, namely, the tendency to use explanative rather than descriptive terminology, also mentioned on page 32 in` @List2014d|2|person indexation, pronominalization, Kiranti, terminology, 2486|Sejane2013|Interesting study presenting basic properties of real world networks, which all often share similar characteristics, like that they are scale-free, having some hubs and short shortest paths on average. They cite some other literature which will be interesting in this context. |000|network approaches, graph theory, real world networks, small world network, graph properties 2487|Sejane2013|In contrast, investigations into lexical semantic similarities between languages are usually – but not exclusively – based upon the concept of a network comprising linguistic entities such as words and semantic relations between these entities such as synonymy, hypernymy, etc. Semantic or con- cept networks for languages have been pioneered in the WordNet (cf. Fell- baum 1998) and EuroWordNet projects (cf. Vossen 1999). A whole research strand in computational linguistics uses such word networks to determine semantic similarity between words in the same language (e.g. Resnik 1995; Jiang and Conrath 1997) or across languages (e.g. Hassan and Mihalcea 2009; Eger and Sejane 2010). More ambitiously, Mehler, Pustylnikov, and Diewald (2011) and in part Gaume, Duvignau, and Vanhove (2008) (at least indirectly) exploit language networks to induce semantic typologies in anal- ogy to the classical (phonological or morphological) genealogies based on the comparative method.|448|semantic change, graph theory, network approaches, semantic similarity 2488|Sejane2013|:comment:`Illustration of use of bipartite graphs of two languages to induce semantic relations by converting bipartite graph to one language. Actually the same what we do for CLICS, with the difference being that CLICS uses a larger bipartite network and a stricter semantic alignment.`|455f|semantic network, colexification, bipartite network 2489|Nerbonne2013|This paper applies a measure of linguistic distance to differences in pro- nunciation which have been observed as a consequence of modern speakers orienting themselves to standard languages and larger regions rather than local towns and villages, resulting in what we shall call regional speech . We examine regional speech, other local “varieties” in the Dutch of the Nether­ lands and Flanders, and also standard Netherlandic Dutch and Belgian Dutch.|000|phonetic distance, sequence alignment, dialectometry, 2490|Nerbonne2013|Paper offers a good introduction to ideas of spread of innovations in networks of speakers in a speech community. This is quite interesting and could be quoted in the relevant contexts.|000|language change, sociolinguistics, dimensions of variation, language variation, Dachsprache 2491|Hammarstroem2013|Paper investigates dependencies of grammatical features (typological features) in large datasets such as WALS. It is interesting in so far as they seem to have found a way to handle those dependencies and identify them, so whenever working on dependencies, one should give their study a look.|000|dependencies, typological features, typology, WALS, similarity, universals, 2492|Snoek2013|Furthermore, I will demonstrate that a particular semantic domain, namely that of fluids related to the body, presents a better hunting ground in the search for cognates than others, because it is diachronically more stable.|231|body parts, stability, body fluids, cognate detection, semantic similarity, semantic field 2493|Snoek2013|The universality of human anatomy and physiology as referents to linguistic expression make the body and its products an excellent referential context within which semantic variation and polysemy can be dealt with in a principled way.|232|body parts, body fluids, semantic change 2494|Snoek2013|As a conceptual domain, the human body is made up of not only those parts whose removal would constitute great loss to the organism, but also those that are dispensed with regularly (such as bodily excretions) or periodi- cally (such as fingernails). These differences in permanence may have lexical ramifications.|232|semantic change, aleniability, disposability, body parts 2495|Snoek2013|Paper provides an interesting list of body part and body fluid terms, also classifying those into different domains (body parts, effluvia, and ephemera). It also presents results of a cognate search in these domains. It is one of the rare papers that provide a **concept list** for body part terminology.|000|concept list, body parts, effluvia, ephemera, 2496|Borin2013|The present volume collects contributions addressing different aspects of the measurement of linguistic differences, a topic which probably is as old as language itself – see the Bible quotation introducing this chapter – but which has acquired renewed interest over the last decade or so, reflecting a rapid development of data-intensive computing in all fields of research, including linguistics.|000|anthology, linguistic differences, distance measures, comparative linguistics 2497|Cysouw2013|The difficulty people have in learning a foreign language strongly depends on how different this language is from their native tongue (Kellerman 1979). Although this statement seems uncontroversial in the general form as it is formulated here, the devil lies in the detail, namely in the problem how to define differences between languages. In this paper, I investigate various factors that quantify differences between languages, and explore to which extend these factors predict language learning difficulty. This investigation results in concrete predictive formulas that derive the learning difficulty for native English speakers depending on a small selection of linguistic factors of the language to be learned.|000|language learning, bilingualism, language comparison, 2498|Heylen2013|The goal of the current study is to show how aggregated lexical variation can be studied by means of corpus-based techniques, which differ in their amount of semantic control. In the current variationist field, one finds many studies of phonological or morphological variation on the basis of corpora. 1 Remarkably at first sight, though, studies of lexical variation in corpora are rare, especially in comparison with dialectology, where the study of lexical variation is part of the main research goal. In contrast to the other corpus- based variationist studies, however, the dialectological account of lexical variation is very much restricted to elicited data, as stored in well-known dialect atlases. Therefore, the current study sets out to show how this void of corpus-based studies of lexical variation can be filled, while taking into account possible issues with lexical semantic complexity.|000|lexical variation, corpus studies, automatic approach 2499|Kondrak2013|The focus of this presentation is the following observation: words that are phonetically similar across different languages are more likely to be mutual translations. This phenomenon has been exploited in the past to improve various tasks in Natural Language Processing (NLP). However, to the best of my knowledge, the proposition has never been explicitly stated or justified. The term mutual translations should be understood here as words that can be used to express the same meaning. In particular, words that corre- spond to each other on two sides of a sentence in a bilingual corpus (bitext) are considered translations, as well as words that are used to define each other in a bilingual dictionary.|000|translational equivalence, phonetic similarity, cognacy, automatic approach 2500|Kondrak2013|Study is interesting in so far as the author measures similarity of words with identical and different meanings across languages, thereby finding that words with similar meanings seem to be also more similar in pronunciation. He attributes this to frequency effects, not to sound symbolism, as @Wichman<2010> et al. (2010) have been claiming.|000|sound symbolism, cognacy, word frequency, translational equivalence 2501|Kondrak2013|Here, I propose a different explanation of the phenomenon. My intention is not to deny the influence of sound symbolism, which is clearly a factor, but to suggest another reason for the observed divergence. I posit the correla- tion between the following word characteristics: translatability, frequency, length, and similarity. Below, I consider these in order.|382|word frequency, word length, phonetic similarity 2502|Hajnal2016|Interesting article, especially since it explicitly deals with questions of induction and abduction in historical linguistics. One thing that may be quoted in the context of discussing linguistic reconstruction methodology.|000|abduction, induction, reconstruction methodology 2503|Gesner1555|This text which is interesting in many respects contains a word list of Rotwelsh and German, which may be interesting to be added to the Concepticon.|000|concept list, Rotwelsch, German, history of science 2504|Morin2016|This discussion paper responds to two recent articles in Biology and Philosophy that raise similar objections to cultural attraction theory, a research trend in cultural evolution putting special emphasis on the fact that human minds create and transform their culture. Both papers are sympathetic to this idea, yet both also regret a lack of consilience with Boyd, Richerson and Henrich’s models of cultural evolution. I explain why cultural attraction theorists propose a different view on three points of concern for our critics. I start by detailing the claim that cultural transmission relies not chiefly on imitation or teaching, but on cognitive mecha- nisms like argumentation, ostensive communication, or selective trust, whose evolved or habitual function may not be the faithful reproduction of ideas or behaviours. Second, I explain why the distinction between context biases and content biases might not always be the best way to capture the interactions between culture and cognition. Lastly, I show that cultural attraction models cannot be reduced to a model of guided variation, which posits a clear separation between individual and social learning processes. With cultural attraction, the same cognitive mechanisms underlie both innovation and the preservation of traditions.|000|cultural evolution, cultural attraction theory, evolutionary theory, 2505|Ferrada2008|Recent laboratory experiments suggest that a molecule’s ability to evolve neutrally is important for its ability to generate evolutionary innovations. In contrast to laboratory experiments, life unfolds on time- scales of billions of years. Here, we ask whether a molecule’s ability to evolve neutrally—a measure of its robustness—facilitates evolutionary innovation also on these large time-scales. To this end, we use protein designability, the number of sequences that can adopt a given protein structure, as an estimate of the structure’s ability to evolve neutrally. Based on two complementary measures of functional diversity— catalytic diversity and molecular functional diversity in gene ontology—we show that more robust proteins have a greater capacity to produce functional innovations. Significant associations among structural designability, folding rate and intrinsic disorder also exist, underlining the complex relationship of the structural factors that affect protein evolution.|000|robustness, protein structure, protein evolution 2506|Divjak2016|Most studies in Cognitive Linguistics are based on data from one language. There is a strong bias towards Indo-European languages, and to English in particular. At the same time, there have been quite a few notable exceptions. Particularly fruitful has been the collaboration between Cognitive Linguistics and semantic and lexical typology, which goes back to the famous study of Basic Colour Terms by Berlin and @Kay<1969> (1969). Abundant cross-linguistic co-lexification data, which have become available recently (e. g., List et al. 2014), allow the linguist to identify the most common semantic extensions and compare how languages “carve up” different semantic domains. A concise overview of this research area is presented in @KoptjevskajaTamm<2015> (2015). The grammatical pole has enjoyed less attention. Notable exceptions are Talmy’s (1985) influential typology of verb-framed and satel- lite-framed languages, which differ with regard to the expression of motion events, and Newman’s (1996) cognitive linguistic study of GIVE-verbs and the corresponding constructional patterns in a large sample of typologically diverse languages.|10|colexification, cognitive linguistics, data-driven research, 2507|KoptjevskajaTamm2015|Article in handbook introduces major ideas in semantic typology, including a definition of the topic, selected major examples (including color, cognition and perception, motion event, and body (body parts)), methodological challenges, and take-home messages (lessons) from semantic typology, as well as further research questions.|000|semantic typology, body parts, cognitive linguistics 2508|Hilpert2015|At first blush it may seem odd that researchers in Cognitive Linguistics should have an interest in the historical development of language. After all, analyzing the relationship between language and cognition seems a much more feasible task if there are speakers whose behavior can be observed in the here-and-now. This of course is not possible with languages or language varieties that are no longer spoken. To give just two examples, one cannot conduct a lexical decision task with speakers of Old English, nor is it possible to study the metaphorical underpinnings of gestures that accompanied conversations in Hittite. How can a cognitive approach to language history be anything but utter specula- tion? This chapter will make the case that looking at language change is not only perfect- ly in line with the cognitive linguistic enterprise, but that furthermore an understanding of how language change works is a necessary prerequisite for an adequate theory of how language and cognition are related in synchrony. The key idea underpinning this argu- ment is the usage-based approach to language, that is, the hypothesis that language use shapes speakers’ cognitive representation of language.|000|cognitive linguistics, historical linguistics, introduction 2509|Hilpert2015|Article argues that investigations in historical linguistics are a key to understanding how cognition works. Definitely good to be quoted in a context where this point should be made|000|cognitive linguistics, introduction, language-use, historical linguistics 2510|Ludusan2016|We propose in this paper a language-independent method for syllable segmentation. The method is based on the Sonor- ity Sequencing Principle, by which the sonority inside a syl- lable increases from its boundaries towards the syllabic nu- cleus. The sonority function employed was derived from the posterior probabilities of a broad phonetic class recognizer, trained with data coming from an open-source corpus of En- glish stories. We tested our approach on English, Spanish and Catalan and compared the results obtained to those given by an energy-based system. The proposed method outperformed the energy-based system on all three languages, showing a good generalizability to the two unseen languages. We con- clude with a discussion of the implications of this work for under-resourced languages.|000|syllable, syllable segmentation, sound classes, language-independent approach, automatic approach 2511|Ludusan2016|They use a sound class system of 8 classes, based on sonority for their speech syllabifier (taking sound waves as input).|000|sound classes, syllable segmentation, 2512|Ludusan2016|We use here a sonority scale similar to the one proposed by Clements [6] (vowels>glides>liquids>nasals>obstruents), by further dividing the obstruent class in three sub-classes (fricatives>affricates>plosives), for a better modelling of the obstruent phonemes. Thus, we use a 7-steps sonority scale, with the value 7 corresponding to the vowel class and plosives having a sonority value of 1. The silence intervals were given a sonority value equal to 0.|?2|sonority hierarchy, sound classes, syllable segmentation 2513|Hinze2010|Background: Much work in systems biology, but also in the analysis of social network and communication and transport infrastructure, involves an in-depth analysis of local and global properties of those networks, and how these properties relate to the function of the network within the integrated system. Most often, systematic controls for such networks are difficult to obtain, because the features of the network under study are thought to be germane to that function. In most such cases, a surrogate network that carries any or all of the features under consideration, while created artificially and in the absence of any selective pressure relating to the function of the network being studied, would be of considerable interest. Results: Here, we present an algorithmic model for growing networks with a broad range of biologically and technologically relevant degree distributions using only a small set of parameters. Specifying network connectivity via an assortativity matrix allows us to grow networks with arbitrary degree distributions and arbitrary modularity. We show that the degree distribution is controlled mainly by the ratio of node to edge addition probabilities, and the probability for node duplication. We compare topological and functional modularity measures, study their dependence on the number and strength of modules, and introduce the concept of anti-modularity: a property of networks in which nodes from one functional group preferentially do not attach to other nodes of that group. We also investigate global properties of networks as a function of the network's growth parameters, such as smallest path length, correlation coefficient, small-world-ness, and the nature of the percolation phase transition. We search the space of networks for those that are most like some well-known biological examples, and analyze the biological significance of the parameters that gave rise to them. Conclusions: Growing networks with specified characters (degree distribution and modularity) provides the opportunity to create surrogates for biological and technological networks, and to test hypotheses about the processes that gave rise to them. We find that many celebrated network properties may be a consequence of the way in which these networks grew, rather than a necessary consequence of how they work or function.|000|network modularity, measure, 2514|Hinze2010|Paper introduces an additional notion of modularity, namely "functional modularity", which they distinguish from Newman's modularity measure (@Newman2004). |-|network modularity, measure, 2515|Starostin2015|This is a collection of interviews of Sergej Starostin about language diversity, and it may be very interesting to look at some of his ideas from the interviews, as they are surely interesting as quotes in certain contexts.|000|interview, S. A. Starostin, language diversity 2516|Ginsburgh2016|The paper is devoted to an econometric analysis of learning foreign languages in all parts of the world. Our sample covers 193 countries and 13 important languages. Four factors significantly explain learning: world population of speakers of home language, trade with speakers of foreign language, linguistic distance between home and foreign language and literacy. Trade may well deserve more emphasis than the other three factors, not only for its significance, but also because its direction can change faster and by a larger order of magnitude. Controlling for any of the 13 target languages, including English, is of no particular importance.|000|foreign language, bilingualism, second language learning 2517|Gaizauskas1977|Given an information extraction (IE) system that performs an extraction task against texts in one language, it is natural to con- sider how to modify the system to perform the same task against texts in a different language. More generally, there may be a requirement to do the extraction task against texts in an arbitrary number of differ- ent languages and to present results to a user who has no knowledge of the source language from which the information has been extracted. To minimise the language-specific alterations that need to be made in extending the system to a new language, it is important to separate the task-specific conceptual knowledge the system uses, which may be assumed to be language independent, from the language-dependent lexi- cal knowledge the system requires, which unavoidably must be extended for each new language. In this paper we describe how the architecture of the LaSIE system, an IE system designed to do monolingual extrac- tion from English texts, has been modified to support a clean separation between conceptual and lexical information. This separation allows hard- to-acquire, domain-specific conceptual knowledge to be represented only once, and hence to be reused in extracting information from texts in mul- tiple languages, while standard lexical resources can be used to extend language coverage. Preliminary experiments with extending the system to French are described.|000|multilingual information extraction, concepticon, lexicon 2518|Gaizauskas1977|We discuss several alternative approaches to constructing a multilinguM IE system, and then describe in detail the strategy we have chosen. Our choice is based on the assumption that it is possible, when dealing with a particular domain or application, to construct a language-independent representation of concepts relevant to the domain. We call this representation a domain model or concepticon. This unaesthetic neologism is advanced on rhetorical grounds: our proposal is that it is both possible and sensible to represent some (natural) language-independent conceptual content separately from language-specific in- formation about particular lexical items. If the latter belongs in a lexicon (which [pb] must, by any reasonable definition, be language-specific) then the former belongs in its own repository for which it is useful to employ an equally compelling term. Decoupling what many authors would refer to as "lexical semantics" from the lexicon, makes it possible to construct a system in which entries in multiple language-specific lexicons stand in a many-to-many relation with entries in a common concepticon. It is this architecture that makes multilingual IE possi- ble. It also facilitates the extension of an IE system to perform monolingual extraction in a new language.|29f|concepticon, lexicon, definition, 2519|Yiu1989|Paper treats tonal disruption in Cantonese and conducts several experiments for tone perception and production of aphasic people. It also contains a nice survey on different interpretations of the Cantonese tonal system by different authors.|000|aphasia, tone, tone language, Cantonese, 2520|Castelvecchi2016|Article reports problems of dealing with deep learning and machine learning, as one does not know how the results are created.|000|blackbox methods, machine learning, problem 2521|Castelvecchi2016|Twenty-five years later, deciphering the black box has become exponentially harder and more urgent. The technology itself has exploded in complexity and application. Pomerleau, who now teaches robotics part-time at Carnegie Mellon, describes his little van-mounted system as “a poor man’s version” of the huge neural networks being implemented on today’s machines. And the technique of deep learning, in which the networks are trained on vast archives of big data, is finding commercial applications that range from self-driving cars to websites that recommend products on the basis of a user’s browsing history.|21|deep learning, machine learning, artificial intelligence, problem 2522|Castelvecchi2016|Faced with such challenges, AI researchers are responding just as Pomerleau did — by opening up the black box and doing the equivalent of neuroscience to understand the networks inside. Answers are not insight, says Vincenzo Innocente, a physicist at CERN, the European particle-physics laboratory near Geneva, Switzerland who has pio- neered the application of AI to the field. “As a scientist,” he says, “I am not satisfied with just distinguishing cats from dogs. A scientist wants to be able to say: ‘the difference is such and such’.”|22|machine learning, deep learning, insight, answer, philosophy of science, 2523|Luo2011|Hierarchies occur widely in evolving self-organizing ecological, biological, technological, and social networks, but detecting and comparing hierarchies is difficult. Here we present a metric and technique to quantitatively assess the extent to which self-organizing directed networks exhibit a flow hierarchy. Flow hierarchy is a com- monly observed but theoretically overlooked form of hierarchy in networks. We show that the ecological, neu- robiological, economic, and information processing networks are generally more hierarchical than their com- parable random networks. We further discovered that hierarchy degree has increased over the course of the evolution of Linux kernels. Taken together, our results suggest that hierarchy is a central organizing feature of real-world evolving networks, and the measurement of hierarchy opens the way to understand the structural regimes and evolutionary patterns of self-organizing networks. Our measurement technique makes it possible to objectively compare hierarchies of different networks and of different evolutionary stages of a single net- work, and compare evolving patterns of different networks. It can be applied to various complex systems, which can be represented as directed networks.|000|flow hierarchy, maximum flow, graph theory, directed network 2524|Ding2016|Classification of signs into various kinds is a vital enterprise in semio- tic research. As early as over a century ago, the American semiotician Charles Sanders Peirce laid down a solid foundation for this work by proposing his famous trichotomy of signs. Later scholars have been mostly applying Peirce’s theory to their own semiotic studies rather than challenging the inadequacies that exist therein, thus giving rise to a great number of confusions or even contradictions. The present article modifies Peirce’s theory from the perspective of sign emergence and evolution and emphasizes the importance of understanding sign transformations.|000|sign evolution, sign emergence, semiotics, semiosis, semiotic dynamics, Charles Sanders Peirce, icon, index, symbol, 2525|Ding2016|Article revises Peirce's trichotomy of index, icon, and symbol, by proposing an evolutionary viewpoint. The article brings examples from Chinese literature and writing, which is, as we know, rich in semiotic relations, given the semantic character of the writing system. |000|icon, index, symbol, Charles Sanders Peirce, semiotics, Chinese 2526|Ding2016|The only signs exclusive to humans belong to Peirce’s third category: symbols. The interpretation of this group of signs does not rely on the temporal contiguity between two events or part-and-whole relationship between two things; nor is it dependent on the similarity that exist between the representa- men and its object; rather, it is based on the habitual associations between forms and meanings of signs prescribed by a linguistic community.|172|symbol, semiotics, language, 2527|Ding2016|In the beginning, humans, just like other animals, tried to make connections between things and events in their living environment, turning [pb] natural phenomena into indexical signs of one another. For example, they could attribute a particular sound to a specific kind of birds because the former is produced by the latter, giving rise to an indexical sign. Later on, when a need arose for a person to mention a bird of that kind which was absent from the scene, he/she could imitate its warbling so that his/her conversation partner could make a deduction through this iconic sign. However, an icon always loses its ability to evoke a similar image over time because repeated deductions from a certain representamen to a similar object make the association between them “automatic,” turning the iconic sign into a symbol whose interpretation is rule- governed rather than similarity-based.|172f|icon, index, symbol, semiotics, sign evolution, sign emergence 2528|Ding2016|Nearly all Chinese characters have gone through this process of “symbolification.” :comment:`Explains symbolification with help of the character 休, showing a man leaning on a tree.`|173|Chinese writing system, symbolification, symbol, sign evolution 2529|Ding2016|Peirce’s sentence “Nothing is a sign unless it is interpreted as a sign” can only be constituted by noises rather than symbols. All of these point to the fact that the identity of a sign depends very much on how it is interpreted by the user. :comment:`This is not necessarily new and we find, I assume, clearer accounts in` @Keller1995|174|sign emergence, interpretation, 2530|Somov2016|Codes can be viewed as mechanisms that enable relations of signs and their components, i.e., semiosis is actualized. The combinations of these relations produce new relations as new codes are building over other codes. Structures appear in the mechanisms of codes. Hence, codes can be described as transformations of structures from some material systems into others. Structures belong to different carriers, but exist in codes in their “pure” form. Building of codes over other codes fosters their regulation. There are several ways to add codes: by types of transforma- tion of structures involved in codes; by dimensions of pragmatics, semantics, and syntactics; through “abstract universals versus precise forms” relations; and by regulation levels in the “organism – environment” relations. More complicated codes are formed based on the interrelations of codes built over. These interrelations are presented as a conceptual chart, which reflects the way typical semiotic forma- tions emerge in mind based on the interrelations of various codes. It also presents the related sociocultural semiotic systemities: motives, needs, aspirations, moral values, purposes, language-like systemities, fundamental frames, patterns of culture, etc.|000|code, semiotics, structure, transformation, 2531|Somov2016|This article may become interesting in the context of semiotics in biology and the differences in biological semiotic systems and human language. The text is not extremely easy to understand due to the use of a specific terminology, but the basic idea that heterogeneity (let's say: distinctivity) is created by transformation of structures, etc., is obvious and clear (think of morphology recombining morphemes to create distinctivity). More careful reading may reveal useful thoughts for biological and linguistics parallels and analogies.|000|biological parallels, semiotics, distinctivity, double articulation 2532|Somov2016|The fundamental universals that are typical for codes arise from the relations between something and something. It is nothing but structures. The ideas of [pb] structures are cognate in a number of sciences. At the same time, the category of structure is interpreted in different ways in the systems thinking approach, in various branches of structuralism, and in various concepts of languages and texts.|558f|code, universals, structure, semiotics, 2533|Rupp2016|I propose the specific words used by a community define that com- munity, yet at the same time the community is defined by those words. This ever-changing lexicon of communal metaphor is the storehouse of all the mean- ings and their usages used by a given group. By looking at the metaphors that permeate any communal language, we see that all language is metaphoric. With the use of Conceptual Metaphor Theory and Conceptual Blending Theory, I investigate how new meanings enter our lexicon and become social meaning. This investigation also provides a closer understanding of “literal” meanings. We come to see they are just stale metaphors or neglected blendings devoid of potency. The process by which meanings are created illuminates how they become “literal.” Thus, showing us the danger that accompanies us in the modern, literal age.|000|metaphor, semiotics, cultural evolution, meaning, 2534|Rupp2016|We make sense of the world via language and in particular through the use of metaphors. This is a basic principle of human rational thought. As Huang et al. (2013: 106) state, “Metaphor understanding is an intricate task in language understanding.” The meaning of a metaphor is shaped in part by how we, a people of a certain age or community, impose our meaning on a given metaphor (A is like B). However, this structure of metaphor already exists as a fundamen- tal axiomatic pattern for human analogical thinking and imposes its structure on us (Rupp 2015). We cannot help but use metaphors to build our understanding of the world.|420|metaphor, universals, cognition 2535|Rupp2016|The Communal Lexicon is a subconscious world that all the members of a certain group can access. Furthermore, it is access to this lexicon that defines a group. As words in the lexicon acquire new meaning or as new words or metaphors enter the lexicon, the essential quality of the accessers changes.|422|communal lexicon, external language, metaphor, 2536|Rupp2016|Barfield (2013: 70) calls the act of saying-one-thing-and-meaning-another “tarn- ing,” taken from the German Tarnung, and I will use it in turn. This is another way words enter our Communal Lexicon. The essential quality of tarning is the fact that it says one thing but intends something else. The power of tarning comes from the fact that the words are not the meanings themselves, but rather point at meanings. This is precisely what a strong metaphor does.|427|semantic change, tarning, metaphor, definition 2537|Rupp2016|An issue that arises with the use of so called literal words is that a literal meaning can be nothing more than stale metaphor or blending so old that all subordinate metaphors have been forgotten or neglected (as shown above in the case of sexist language).|429|tarning, blending, metaphor, semantic change, 2538|Gherlone2016|Shortly before his death, Yuri Lotman (1922–1993), by now blind, dictated some considerations on the concept of ‘alien,’ ‘stranger’ (chuzhdoe): a concept that de facto weaves all of his thirty-year reflections on the relationship between language, meaning, and culture and that, until the end, appears as the mark of a speculative orientation focused on the ethics of otherness. A profound influence on Lotman’s thinking in this direction was exercised by two leading figures of the Russian intellectual tradition: the psychologist Lev Vygotsky (1896–1934) and the philosopher, critic, and literary theorist Mikhail Bakhtin (1895–1975). It is no wonder the Tartu-Moscow Semiotic School dedicated to them volumes IV (1969) and VI (1973), respectively, of the Trudy po znakovym sistemam, the review on sign systems launched in 1964 by the Department of Russian Literature of the University of Tartu. The horizon of otherness, and the consequent emphasis on the relational nature of man, fill in fact as much of Vygotsky’s theoretical reflection on the human mind as does Bakhtin’s on literary creation (slovesnost’). This article intends to explore the concept of “dialog” as thematized in Vygotsky’s and Bakhtin’s studies, theoretical roots of the Lotmanian idea of communication as a dialogical semiotic act.|000|Lev Vygotsky, semiotics, Mikhail Bakhtin, history of science, 2540|Cheng2009b|The first proposal of systematic approach to handle rhyme data can be traced back to Chen Li’s (陳澧, 1810-1882) Qieyun Kao 《切韻考》(Chen 1995). As the name tells, Chen’s subject of investigation is Qieyun 《切韻》, an authoritative rhyming dictionary representing the phonology of Middle Chinese, so precisely speaking, his approach was proposed for the study of the pronunciation indicators, namely fanqie (反切), contained in the dictionary. However, as Zhu (1989: 14) points out, fanqie can be seen as rhyming in strict sense, and this kind of approach has actually been used in the study of rhyming in verse such as Shijing 《詩經》 (Chu 1999: 519-521), therefore it worth discussing in the present context.|29|shījīng, fǎnqiè, statistics, Old Chinese, Middle Chinese, 2541|Cheng2009b|The approach proposed by Chen is called the xilian (系聯) approach, which means to conjoin rhyme words.|31|xìlíanfǎ, Chinese historical phonology, yīnyùnxué, Old Chinese 2542|Cheng2009b|:comment:`mentions limitations of the xìliánfǎ, including problems with errors, and problems with broken rhymes, thereby confusing that all rhymes are impure in some sense, and that it is not possible to really distinguish the two.` The second problem, which is also related to the first one, is this approach being insensitive to the frequency factor. Theoretically speaking, under this approach, one case of common usage/symmetric usage/transitive usage is enough for establishing rhyming relationship, and 100 or 1000 more cases of the same kind will not have any effect on the analysis. By common sense we know that only a few examples are not trustworthy while abundant cases of the same kind may increase the credibility of our judgement, but it is not considered under this approach. This is also the reason why random errors can not be dealt with properly by this approach: despite of their relative low frequencies, erroneous cases will be accepted indifferently with the normal cases.|32|xìlíanfǎ, Chinese historical phonology, 2543|Cheng2009b|Bai (1931) is said to be the first work that makes use of descriptive statistics openly for the study of Middle Chinese fanqie.|34|statistics, fǎnqiè, 2544|Chen2009|This thesis deals with the phonological behavior and developments of the Nantong dialect, which belongs to the Tongtai group of Jianghuai Mandarin. Logical entailment, when applied to dialect comparison, can lead to discoveries of the gradual spread of particular sound changes lost in history. Mainly through this discovery process, the following phonological phenomena can be noted in Nantong or Tongtai: 1. The partial merger of *ŋ-, *n-, *l- occurred before high front vowels in Nantong; 2. Of all the consonant initials, nasal consonants are most likely (m- being more likely than n-) to lower the following high back vowel u; 3. In terms of phonological condition, the syllable initial u became the voiced fricative v in the following order: first in front of mid front vowels, then u# itself becoming vu#, and finally in front of the low back vowel /A/ in closed syllable while u/A/# remains unaffected; 4. -i is more likely than -y to maintain the difference between ts- and th-; 5. -i is more likely than -y to palatalize t and t h ; 6. The combination of l- and -y- is most stable in open syllable, while in closed syllable a following glottal stop is a more likely environment than a nasal final for ly- to become li-. This thesis also attempts to provide explanation for the sound changes of the Ri initial, the origin of the categorical combination of th- and -y- at the expense of ts- and -u-, and the probable vowel shift ə > /E/ > a > o in Nantong.|000|Chinese dialects, 南通方言, Nántōng dialect, phonetic description 2545|Chen2009|Phonetic description of the Nántōng dialect of the Jiānghuái Mandarin group, which shows some specific phonetic developments but unfortunately does not offer any lexical accounts.|000|Nántōng dialect, Jiānghuái, Chinese dialects, phonetic description, phoneme inventory 2546|Bai1931|This is an early study on the statistics of fǎnqiè readings, this time in Jíyùn, not in Qièyùn or Guǎngyùn. It should be quoted whenever one investigates the statistics of fǎnqiè readings.|000|fǎnqiè, statistics, Jíyùn, Qièyùn, Middle Chinese 2547|Luo1931|This is a study on distinctions of rhymes in the Qièyùn, based on fǎnqiè readings. It is therefore highly relevant for all kinds of investigations of the networks underlying fǎnqiè readings in Middle Chinese and Chinese linguistics.|000|fǎnqiè, Qièyùn, Middle Chinese, statistics, 2548|Csardi2006| The igraph software package provides handy tools for researchers in network science. It is an open source portable library capable of handling huge graphs with millions of vertices and edges and it is also suitable to grid computing. It contains routines for creating, manipulating and visualizing networks, calculating various structural properties, importing from and exporting to various file formats and many more. Via its interfaces to high-level languages like GNU R and Python it supports rapid development and fast prototyping. |000|igraph, software, graph theory 2549|List2014d|Many [pb] of the broad terms which cover a large range of distinct processes are “ex- planatory” rather than descriptive, since they also offer an explanation why the respective changes happened or happen.|31f|terminology, explanation, description, 2550|Woese1998|Lateral gene transfer, which has long been recognized as a secondary evolutionary mechanism, becomes primary in this primitive evolutionary dynamic. It is through lateral transfer, not vertical inheritance, that systems primarily evolve at the progenote stage. As a result of genetic mixing, organismal lineages, consensus histories of an organism’s genes, did not exist, although short-term “cell lines” necessarily did. The universal ancestor does have an evolutionary history, but that history is physical, not genealogical. |6858|universal ancestor, tree of life 2551|Woese1998|A genetic annealing model for the universal ancestor of all extant life is presented; the name of the model derives from its resemblance to physical annealing. The scenario pictured starts when “genetic temperatures” were very high, cellular entities (progenotes) were very simple, and information processing systems were inaccurate. Initially, both mutation rate and lateral gene transfer levels were elevated. The latter was pandemic and pervasive to the extent that it, not vertical inheritance, defined the evolutionary dynamic. As increasingly complex and precise biological structures and processes evolved, both the mutation rate and the scope and level of lateral gene transfer, i.e., evolutionary temperature, dropped, and the evolutionary dynamic gradually became that characteristic of modern cells. The various subsystems of the cell “crystallized,” i.e., became refractory to lateral gene transfer, at different stages of “cooling,” with the translation apparatus probably crystallizing first. Organismal lineages, and so organisms as we know them, did not exist at these early stages. The universal phylogenetic tree, therefore, is not an organismal tree at its base but gradually becomes one as its peripheral branchings emerge. The universal ancestor is not a discrete entity. It is, rather, a diverse community of cells that survives and evolves as a biological unit. This communal ancestor has a physical history but not a genealogical one. Over time, this ancestor refined into a smaller number of increasingly complex cell types with the ancestors of the three primary groupings of organisms arising as a result. |000|genetic annealing model, universal ancestor, tree of life, lateral gene transfer 2552|Yanai2002|**BACKGROUND** Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations. **RESULTS** We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods. They have a dominant cluster that contains approximately 80%-90% of the genes, independent of genome size, and the dominant clusters show the small world behavior expected of a biological system, with global connectivity that is nearly random, and local properties that are highly ordered. **CONCLUSIONS** When the information on functional linkage provided by three emerging computational methods is combined, the integrated network uncovers large numbers of conserved pathways and identifies clusters of functionally related genes. It therefore shows considerable utility and promise as a tool for understanding genomic structure, and for guiding high throughput experimental investigations.|000|biological evolution, gene fusion, protein structure, functional networks, 2553|Hurles2003|Human expansion into the far reaches of the Pacific has occurred within the past 3000–4000 years. This is so recent that it is arguably the best opportunity to test models of the origin and dispersal of human groups and their domesticated plants and animals, cultural and linguistic evolution, human impacts on a pristine environment, and the lower limits for a long-term sustainable population. Multidisciplinary research is essential because these models must account for archaeological, ecological, cultural, historical, social, linguistic and (both mitochondrial and nuclear) genetic data. This synthesis has not yet been achieved for any settlement in the world, but there has been considerable progress recently on integrating these disciplines with respect to the settlement of Polynesia.|000|Neighbor-Net, splits networks, Pacific settlement, Polynesian 2554|Falk1999|Women have been a dynamic force in American linguistics, yet this has not always been apparent in current histories of linguistics. For twentieth century linguistics, Julia S.Falk argues, the same story has been told over and over: a story of leading men, their followers, and their interest in language as a structure observable in patterns of the distribution of forms. This book challenges this received history by presenting a much-needed reevaluation of twentieth century American linguistics which focuses on the contributions of women to our modern understanding of language. This book relates an account of linguistics as perceived and experienced by three American women in the first half of the twentieth century. Alice Vanderbilt Morris dreamed of creating a new auxiliary language for international communication, and brought together professional linguists and members of the New York upper class to achieve her goal. Gladys Amanda Reichard devoted her life to the study of Native American languages; despite opposition from men who claimed the territory, her studies still survive. And E.Adelaide Hahn brought themes of modern linguistics to the study of Latin and Hittite, keeping her colleagues mindful of ancient and classical languages. Rather than the standard American story of an increasingly triumphant march of scientific inquiry towards structural phonology, Women, Language and Linguistics reveals a linguistics where the purpose of language was communication; the appeal of languages lay in their diversity; and the authority of language resided with its speakers and writers. Julia S.Falk explores the vital part which women have played in preserving a linguistics based on the reality and experience of language; this book finally brings to light a neglected perspective for those working in linguistics and the history of linguistics.|000|feminism, linguistics, history of science 2555|Trovato2013|This paper is concerned with an aspects of Bédier’s legacy, possibly the least known in the English-speaking world. Bédier›s works of 1913 and 1928–29 did not just create a schism in the apparently peaceful context of textual scholarship: through his statements, critical editions produced with a single copy-text regained the academic prestige that Gaston Paris› adaptations of stemmatic method had taken away from them. Since then, Bédier’s objections have also forced meticulous textual critics to rethink their editorial practice: though retaining the method of shared errors, such scholars (often scarcely known outside Italy) have brought important progress in the methods of textual criticism.|000|stemmatics, history of science, lateral transfer, family tree 2556|Tucker1908|This book gives an introduction to historical linguistics, building on the work of the 19th century, and apparently considering different language families. It has an extra chapter on genealogy and shows how trees are reconstructed. This is definitely worth a closer read, as it may show some more general aspects of tree thinking (or tree criticism) in linguistics. Interestingly, there is even a chapter on general phonetic tendencies in languages. Extremely interesting in the context of the search for preference laws of sound change.|000|handbook, language history, family tree, introduction, sound change 2557|Timpanaro2005|The book examines the history of stemmatics as a discipline and points to general problems of reconstructing ancient documents. This is really interesting in the context of family trees in biology and linguistics, as stemmatics surely played its role there, especially by influencing linguistics. One specific chapter is specifically devoted to a comparison of textual criticism (stemmatics) and comparative-historical linguistics (pp 119ff). This may provide very substantial information regarding the mutual influence of the disciplines during the time when they were established.|000|history of science, stemmatics, family tree, lateral transfer, introduction, historical linguistics, analogy 2558|Hale2007|Interesting "classical" account on historical linguistics, touching many interesting points from the perspective of a trained historical linguist. Especially the discussion of "descent relationships" in linguistics (27-48) is extremely interesting, and should be thoroughly read. The chapter on "reconstruction methodology" (223) may likewise be very interesting in the context of linguistic methodology.|000|methodology, historical linguistics, genetic relationship, linguistic reconstruction, genetic descent 2559|Gil2016|Many linguists maintain that the grammars of different languages are incommensurable. This poses the problem of how to compare them. One pro- posed solution is to distinguish between descriptive categories for individual languages and comparative categories for crosslinguistic comparisons. At the same time, it is also commonly assumed that language-internal variation can be described in a unitary manner, thereby presupposing that different dialects of the same language are commensurable. However, it is well known that the language-dialect distinction is not categorical but rather forms a continuum. This raises the question: Where lies the boundary between commensurability and incommensurability? This question is best addressed in terms of the notion of languoid, a cover term that includes languages, smaller entities such as dialects and registers, but also larger assemblages such as genealogical and areal groupings. This article proposes replacing the notion of language-specific descriptive category with that of languoid-associated descriptive category. Since languoids can be of arbitrary size, such categories may form the basis for crosslinguistic comparisons, alongside comparative categories. What this means is that different languoids, regardless of how close or distant they are to each other, may be commensurable with respect to some linguistic features but incommensurable with regard to others.|000|language comparison, comparison concepts, comparative categories, typology, languoid, doculect, language variation, 2560|Gil2016|This article argues that the presence of a continuum between dialects and languages poses a serious challenge to the notion of linguistic incommensurability as argued for by scholars such as Dryer (1997), Croft (2001), Lazard (2006), Haspelmath (2007), and many of the contributions to this debate in LT. In particular, the language-dialect continuum points towards a refinement of the distinction between language-specific descriptive categories and comparative concepts proposed by Haspelmath (2010, 2015, 2016), allowing for the possibility that descriptive categories observable in particular languages may also form the basis for crosslinguistic comparisons. This article thus goes some of the way towards answering the questions posed by Moravcsik (2016), in this issue, especially that pertaining to the “validity domain of categories and languages”.|440|dialect continuum, comparative categories, comparison concepts, language comparison 2561|Hartnett2016|Einer der größten Mathematiker der Gegenwart findet in einer eigenen Arbeit einen Fehler – und stürzt sich in ein Projekt mit dem Ziel, das Beweisen gänzlich dem Computer anzuvertrauen. Dazu muss er nichts weniger als die Grundlagen der Mathematik neu fassen.|000|proof, formal mathematics, 2562|Hartnett2016|This article gives a popular overview on current attempts to formalize the way in which proofs are done in mathematics. The ultimate goal is to have proofs been done automatically, by a computer. While many people remain sceptical, it shows that even or especially a field like mathematics can no longer rely only on human intuition or insights, especially when the problems become more and more complex.|000|formal mathematics, proof, logic, mathematics, computer evaluation, evaluation 2563|Jacques2004|Dissertation on the phonology and morphology of Japhug introduces the archaic Sino-Tibetan language and sets it into the bigger context of Sino-Tibetan, thus offering rich material for general questions of Sino-Tibetan, Old Chinese, and the general history of the Sino-Tibetan languages.|000|Japhug, Sino-Tibetan, language history, phonology, morphology, phoneme inventory, introduction 2564|Skoglund2016|The appearance of people associated with the Lapita culture in the South Pacific around 3,000 years ago 1 marked the beginning of the last major human dispersal to unpopulated lands. However, the relationship of these pioneers to the long-established Papuan people of the New Guinea region is unclear. Here we present genome-wide ancient DNA data from three individuals from Vanuatu (about 3,100–2,700 years before present) and one from Tonga (about 2,700–2,300 years before present), and analyse them with data from 778 present-day East Asians and Oceanians. Today, indigenous people of the South Pacific harbour a mixture of ancestry from Papuans and a population of East Asian origin that no longer exists in unmixed form, but is a match to the ancient individuals. Most analyses have interpreted the minimum of twenty-five per cent Papuan ancestry in the region today as evidence that the first humans to reach Remote Oceania, including Polynesia, were derived from population mixtures near New Guinea, before their further expansion into Remote Oceania 2–5 . However, our finding that the ancient individuals had little to no Papuan ancestry implies that later human population movements spread Papuan ancestry through the South Pacific after the first peopling of the islands.|000|peopling of South-West Pacific, genome analysis, genetic study 2565|Yanai2016|THERE IS AN ANCIENT SOCIETY OF GENES THAT IS INEXTRICABLY linked to our human society. The members of this society shaped your body and your brain, your instincts and your de- sires. They have brought humanity to the present, but they don’t necessarily dictate our future. To understand how these genes infl uence us— and how humanity can rise above them— you might imagine that we need to fi nd out what each individual gene does. But this approach won’t work, because we are not the simple sum of our genes. The members of the society of genes do not live in isolation. Working together, forming rivalries and partnerships, is the only way they can form a human body that can sustain them for a few de cades and propel them into the next generation of humanity. Almost 250 years ago, Adam Smith realized that it is the self- interested interactions of individuals that make the mar- ketplace effi cient. Similarly, it is the competition and coop- eration among genes striving for their own long- term survival that promotes the per sis tence of humanity as a whole. [...] If we want to understand our genome, the key is an under- standing of the strategies of these genes. A genome—we will fi nd—is best seen as a conglomerate of selfi sh genes, held to- gether by an intricate network of cooperation. This is the story of the society of genes, of the triumphs and failures of its members, and their eternal confl icts and partnerships. |000|protein-protein interaction networks, genome, biological evolution, networks, domain promiscuity 2566|Yanai2016|Interesting popular scientific account on evolution, which focuses on the interaction of genes rather than their "selfish replication". The stories may be interesting in the context of word formation in linguistics, as the organisation of our lexica may show similar patterns as the organisation of genes in a genome.|000|biological parallels, genome evolution, protein-protein interaction networks, selfish gene theory, evolutionary theory 2567|Sihler2000|In contrast to the title a rather standard introductory textbook to historical linguistics. Apart from this, the topics touched int he book, however, are interesting: * Changes in pronunciation * Sound laws * Analogy * Semantic change * Reconstruction * External aspects of language change * Written records The book also contains a glossary of terms in German, and information on different systems of transcription.|000|handbook, historical linguistics, introduction, semantic change, analogy, linguistic reconstruction, sound change 2568|OHara1996|"The Natural System" is the name given to the underlying arrangement present in the diversity of life. Unlike a classification, which is made up of classes and members, a system or arrangement is an integrated whole made up of connected parts. In the pre-evolutionary period a variety of forms were proposed for the Natural System, including maps, circles, stars, and abstract multidimensional objects. The trees sketched by Darwin in the 1830s should probably be considered the first genuine evolutionary diagrams of the Natural System — the first genuine evolutionary trees. Darwin refined his image of the Natural System in the well-known evolutionary tree published in the Origin of Species, where he also carefully distinguished between arrangements and classifications. Following the publication of the Origin, there was a great burst of evolutionary tree building, but interest in trees declined substantially after 1900, only to be revived in recent years with the development of cladistic analysis. While evolutionary trees are modern diagrams of the Natural System, they are at the same time instances of another broad class of diagrams that may be called "trees of history": branching diagrams of genealogical descent and change. During the same years that Darwin was sketching his first evolutionary trees, the earliest examples of two other trees of history also appeared: the first trees of language evolution and of manuscript genealogy. Though these were apparently independent of evolutionary trees in their origin, the similarities among all these trees of history, and among the historical processes that underlie them, were soon recognized. Darwin compared biological evolution and language evolution several times in the Origin of Species, and both Ernst Haeckel and the linguist August Schleicher made similar comparisons. Both linguists and stemmaticists (students of manuscript descent) understood the principle of apomorphy — the principle that only shared innovations provide evidence of common ancestry — more clearly than did systematists, and if there had been more cross-fertilization among these fields the cladistic revolution in systematics might well have taken place in the nineteenth century. Although historical linguists and stemmaticists have in some respects had sounder theory than have systematists, at least until recently, they have also had the practical problem of very large amounts of data, a problem not often faced by systematists until the advent of molecular sequencing. The opportunity now exists for systematists to contribute to the theory and practice of linguistics and stemmatics, their sister disciplines in historical reconstruction, through application of our commonly used computer programs for tree estimation. Preliminary results from the application of numerical cladistic analysis to a large stemmatic data set have been very encouraging, and have already generated much discussion in the stemmatics community.|000|history of science, stemmatics, family tree, systematics, cladistics, philology, 2569|Montemurro2016|In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be trans- mitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers.|000|word frequency, Google N-Grams, lexical change, computational linguistics, automatic approach 2570|Faudree2014|Linguistic anthropologists have dedicated considerable effort to explor- ing how understandings of the past are the result of semiotic processes in present micro-contexts of interaction. They have also examined how tra- jectories of past events lead to formations of semiotic complexes (idioms, institutions, ideologies, identities) that structure and define particular pasts while influencing the present as well. Yet while these two approaches have contributed enormously to our understanding of the causal and conditional relations between language and history in social contexts, they have remained largely distinct from each other. We suggest that researchers working at the intersection of language, society, and history would benefit from an approach that more fully integrates the insights of both lines of inquiry.|000|linguistic anthropology, cultural evolution, history, 2571|Hultman2015|Over 20% of Earth’s terrestrial surface is underlain by permafrost with vast stores of carbon that, once thawed, may represent the largest future transfer of carbon from the biosphere to the atmosphere 1 . This process is largely dependent on microbial responses, but we know little about microbial activity in intact, let alone in thawing, per- mafrost. Molecular approaches have recently revealed the identities and functional gene composition of microorganisms in some perma- frost soils 2–4 and a rapid shift in functional gene composition during short-term thaw experiments 3 . However, the fate of permafrost car- bon depends on climatic, hydrological and microbial responses to thaw at decadal scales 5,6 . Here we use the combination of several molecular ‘omics’ approaches to determine the phylogenetic com- position of the microbial communities, including several draft gen- omes of novel species, their functional potential and activity in soils representing different states of thaw: intact permafrost, seasonally thawed active layer and thermokarst bog. The multi-omics strategy reveals a good correlation of process rates to omics data for domi- nant processes, such as methanogenesis in the bog, as well as novel survival strategies for potentially active microbes in permafrost.|000|permafrost, bacterial evolution, ancestral transfer, lateral transfer, Jurassic Park 2572|Hultman2015|The interesting aspect of this paper is that it could theoretically allow for ancestral transfer in biology: If ancient bacteria serve by providing genes for new bacteria who descend from them, we could have a situation similar to Latin and the European languages.|000|ancestral transfer, biological parallels, nice paper 2573|Frankland2015|The 18th-century Prussian philosopher Wilhelm von Humbolt famously noted that natural language makes “infinite use of finite means.” By this, he meant that language deploys a finite set of words to express an effectively infinite set of ideas. As the seat of both language and thought, the human brain must be capable of rapidly encoding the multitude of thoughts that a sentence could convey. How does this work? Here, we find evidence supporting a long-standing conjecture of cognitive science: that the human brain encodes the meanings of simple sentences much like a computer, with distinct neural populations representing answers to basic questions of meaning such as “Who did it?” and “To whom was it done?” Human brains flexibly combine the meanings of words to compose structured thoughts. For example, by combining the meanings of “bite,” “dog,” and “man,” we can think about a dog biting a man, or a man biting a dog. Here, in two functional magnetic resonance imaging (fMRI) experiments using multivoxel pattern analysis (MVPA), we identify a region of left mid-superior temporal cortex (lmSTC) that flexibly encodes “who did what to whom” in visually presented sentences. We find that lmSTC represents the current values of abstract semantic variables (“Who did it?” and “To whom was it done?”) in distinct subregions. Experiment 1 first identifies a broad region of lmSTC whose activity patterns (i) facilitate decoding of structure-dependent sentence meaning (“Who did what to whom?”) and (ii) predict affect-related amygdala responses that depend on this information (e.g., “the baby kicked the grandfather” vs. “the grandfather kicked the baby”). Experiment 2 then identifies distinct, but neighboring, subregions of lmSTC whose activity patterns carry information about the identity of the current “agent” (“Who did it?”) and the current “patient” (“To whom was it done?”). These neighboring subregions lie along the upper bank of the superior temporal sulcus and the lateral bank of the superior temporal gyrus, respectively. At a high level, these regions may function like topographically defined data registers, encoding the fluctuating values of abstract semantic variables. This functional architecture, which in key respects resembles that of a classical computer, may play a critical role in enabling humans to flexibly generate complex thoughts.|000|semantics, neurology, neurolinguistics, sentence meaning, meaning, conceptualization 2574|Frankland2015|Article provides evidence for some regularity in storing meanings in human brains. This may likewise be taken as evidence that the processes of semantic change could be used to investigate basic processes of reasoning, provided we follow a strictly cross-linguistic paradigm.|000|meaning, semantics, neurology, neurolinguistics 2575|Lameli2010|Article discusses the history of linguistic cartography and linguistic maps, pointing to early isogloss maps and other interesting aspects.|000|history of science, linguistic atlas, dialect geography, isogloss, isogloss map, wave theory, language in space, 2576|Dingemanse2012|They are noted for their special forms, distinct grammatical behaviour, rich sensory meanings, and interactional uses related to experience and evidentiality. This review surveys recent developments in ideophone research. Work on the semiotics of ideophones helps explain why they are marked and how they realise the depictive potential of speech. A true semantic typology of ideophone systems is coming within reach through a combination of language-internal analyses and language-independent elicitation tools. Documentation of ideophones in a wide variety of genres as well as sequential analysis of ideophone use in natural discourse leads to new insights about their interactional uses and about their relation to other linguistic devices like reported speech and grammatical evidentials. As the study of ide- ophones is coming of age, it sheds new light on what is possible and probable in human language.|000|ideophones, classification, sound symbolism, 2577|Plechac2016|The following paper describes the algorithms of phonetic and metrical components of the Czech verse processing system KVĚTA, updating information contained in previous reports (Ibrahim and Plecháč 2011; Plecháč et al. 2013a; Ibrahim and Plecháč 2014). The system is being used in the building of the Corpus of Czech Verse (hereinafter CCV), which at present contains 1 689 Czech books of poetry (over 2.5 million lines) from the nineteenth and early twentieth centuries. In contrast to standard language corpora, in each lexical unit are not only the lemma and morphological tag attributes assigned but they also contain a phonetic transcription; furthermore, the attributes metre (iamb, trochee…), length (number of feet), ending (feminine, masculine…) and metrical pattern are assigned to each verse line. At higher levels rhyme pairs (or n-some) and fixed forms (sonnet, rondel, etc.) are annotated. Here we will focus on components providing phonetic and metrical annotation: (1) the F-component, whose task is to derive the phonetic transcription from the input data, (2) the G-component, whose task is to generate a set of all possible metrical interpretations of these data, and (3) the M-component, whose task is to select from this set the final interpretation. Automatic analysis has so far been limited to accentual-syllabic (hereinafter AS) and monometric poems – i. e., poems consisting of repetitions and variations of a single metrical pattern (though AS imitations of some quantitative meters are recognized).|000|metrics, verse, Czech, rhyme patterns, corpus studies 2578|Hoefler1956|Unter diesen Umständen aber bleibt zu erwägen, ob das Nacheinander-Auftauchen dieser Lautgesetze bei den verschiedenen deutschen Stämmen in der Tat als ein "Ausbreitungs"-Vorgang im Sinn der Wellentheorie zu deuten ist. Es könnte wohl aush sein, daß das hier scheinbar so unmittelbar vor uns sich entrollende Bild einer wellenförmigen Ausbreitung uns täuchste. Könnte es, zumal bei dem weitgehenden Mangel früher schriftlicher Belege, nicht möglich sein, daß auch in dem schenbar so geschlossenen binnendeutschen Raum sich diese Wandlung in Wahrheit an verschiedenen Stellen unabhängig angebahnt und durchgesetzt hätte, z. B. bei den Quaden in Mähren udn bei den Alemannen im Westen, von denen uns Ammianus Marcellinus im 4. Jh. eine Anzahl von Namen mitteilt? Hier könnte wohl einer der Fälle vorliegen, wie in Lessiak für neuere Mundartenetnwicklungen angenommen hat, wo sprachliche "Polygenese" die gleiche Neuerung an verschiedenen Punkten auch innerhalb eines engeren, relativ sehr geschlossenen Mundargebietes entstehen läßt. |471|wave theory, homoplasy, drift, language change, family tree 2579|Urban2014|The introductory example showed that German *bein* underwent semantic change from the original basic meaning 'bone" to'leg'. In the older stages of other West and Norht Germanic langauges, 'bone' is uniformly found as the general meaning of *beins*'s cognates (@Orel2003: 32). If this is true, positing an original meaning 'leg' requires the assumption of a change at many nodes of the family tree, while positing an original meaning 'bone' necessitates only one change at a low node of the tree. By the principle of parsimony, then, reconstructing 'bone' would be preferred on methodological grounds. |286|family tree, semantic reconstruction, evidence, directionality 2580|Kilgarrif2015|Dictionaries are supposed to list all the words of a language. But how many words are there? We would like a nice answer—a number—but, I’m sorry to say, I am not going to give you one. The question does not stand up to scrutiny. There are many reasons why, from the language of scientific specializations to that of restaurant menus, and the chapter talks through a number of them, drawing data from chemistry, my local vegetarian restaurant, and very large databases of English to illustrate what happens when you start to enter the murky zone where things that might possibly be English words start to be outnumbered by things that you think really can’t be.|000|word frequency, dictionary, human language, lexicon 2581|Kilgarrif2015|Article discusses the interesting question of how many words a language contains.|000|word frequency, dictionary, lexicon, human language, 2582|Guy1989|An analysis of recent proposals concerning the typologies of language change attempts to provide a synthesis identifying the major types of change that need to be distinguished. The three major types of language change discussed are spontaneous change, borrowing, and imposition. Upon analysis, it is concluded that these three types of change adequately incorporate all the analytical distinctions examined, and that the model allows comparison of a variety of characteristics associated with the change types and the making of testable predictions for particular situations. The ramifications are seen as potentially far-reaching, although much additional work is needed. It is proposed that clear and systematic treatment of change types mares possible more precise statements of the domains and conditions under which the laws of historical linguistics apply, and may suggest principled explanations of why they take the forms they do. Finally, the model is seen to aid in keeping diachronic linguistics rooted in social history. (MSE)|000|language change, typology, typology of language change, spontaneous change, borrowing, imposition, 2583|Guy1989|Spontaneous changes, therefore, are those that arise from within a single speech com- munity, uninfluenced by an external linguistic model or target. That is, there is no other language or dialect available to speakers in the community which serves as the structural source or goal of such a change. Such changes are often considered the unmarked case in historical studies, and hence they are frequently dignified with the term 'natural change'. However, since there is nothing unnatural about the other types of change that involve contact, I prefer to avoid the use of the term 'natural', relying instead on the term 'spon- taneous', which I adopt from Bickerton 1980.|2|spontaneous change, language change, definition 2584|Guy1989|However, we must also recognize cases where 'contact' occurs entirely within a single community, such as contact between contemporary and archaic forms of the same language in diglossic situa- tions like those of Arabic and Sinhala. In all contact-induced changes some .iegree of bilingualism by some fraction of the population must occur; these speakers will be the principal agents of the change, and the locus of the contact.|2|ancestral transfer, lateral transfer, borrowing, bilingualism, definition 2585|Guy1989|In the borrowing case, which Van Coetsem labels 'recipient language agentivity', native speakers import into their language features from another language. In the imposition case, formally termed 'source language agentivity' by Van Coetsem, speakers who are learning a second language impose onto it features of their first language, usually in the course of language shift.|3|imposition, borrowin, definition, typology of language change, 2586|Weingarten2016|The present article explores uses of verse in direct speech attributed to Confucius (551–479 BCE ) within works compiled mainly during the Han dynasty (206 BCE –220 CE ). Analysing short prose narratives and dialogues, the article investigates a poeticized debate, in which versified proverbs or apothegms are employed as tools of argumentation. In addition I examine the mnemonic function of reiterated rhymes on politics, and emotive song as an expression of thwarted ambition, purportedly revealing glimpses at Confucius’s inner life; and libretto-like records of dramatic encounters whose participants exchange verse. The goal of the investigation is two- fold. First, it demonstrates how, in early imperial China, the image of Confucius was remoulded to fulfil different functions and satisfy diverse needs beyond his by now familiar role as philosopher and as patron saint of an intellectual tradition and state ideology. Second, it draws atten- tion to the riches of stylistic nuance and functional variety exhibited by early Chinese writings, which have so far hardly been tapped.|000|Confucius, rhyme patterns, Chinese, Old Chinese, rhymes 2587|Goebl1983|Ausgehend von einer möglichst allgemeinverständlichen Darstellung der in der Numeri- schen Taxonomie üblichen Behandlung der meßtheoretisch und statistisch-mathematisch relevanten Begriffe 'Ding' (Element), 'Eigenschaft' (Merkmal) und 'Relation' (Ähnlich- keit) wird anhand konkreter Beispiele vorgeführt, inwiefern der seit mehr als einem Jahr- hundert behauptete Gegensatz zwischen den Generalmetaphern „Stammbaum" und „Welle" gegenstandslos ist, und auch gezeigt, daß die beiden Schemata zum Standardin- strumentarium moderner Klassifikationsmethoden gehören. Zahlreiche erklärende Strichzeichnungen, Dendrogramme, Choroplethenkarten und numerische Kartogramme begleiten die Darstellung. Datenbeispiele aus dem französischen Sprachatlas ALF.|000|wave theory, family tree, language history, genetic relationship, dialectology, dialectometry 2588|Goebl1983|Classical article on taxonomy and dialectometry, emphasizing the "objectivity" of the approach a bit too much, as it does not discuss the selection of features. But the article has interesting visualizations and quotes interesting literature regarding the discussion of wave and family tree.|000|dialectology, dialectometry, taxonomy, wave theory, family tree 2589|Bostoen2015|This article reviews evidence from biogeography, palynology, geology, historical linguistics, and archaeology and presents a new synthesis of the paleoclimatic context in which the early Bantu expansion took place. Paleoenviron- mental data indicate that a climate crisis affected the Central African forest block during the Holocene, first on its periphery around 4000 BP and later at its core around 2500 BP. We argue here that both phases had an impact on the Bantu expansion but in different ways. The climate-induced extension of savannas in the Sanaga-Mbam confluence area around 4000–3500 BP facilitated the settlement of early Bantu-speech communities in the region of Yaoundé but did not lead to a large-scale geographic expansion of Bantu-speaking village communities in Central Africa. An extensive and rapid expansion of Bantu-speech communities, along with the dispersal of cereal cultivation and metallurgy, occurred only when the core of the Central African forest block was affected around 2500 BP. We claim that the Sangha River interval in particular constituted an important corridor of Bantu expansion. With this inter- disciplinary review, we substantially deepen and revise earlier hypotheses linking the Bantu expansion with climate- induced forest openings around 3000 BP.|000|Bantu languages, Bantu expansion, phylogenetic reconstruction, review 2590|Darnell1971|Boas's view was that at a certain time depth it was impossible to distinguish results of borrowing from those of common orign. Under certain socio-cultural conditions, he even believd that ‘mixed languages’ might develop. Instead of attempting to separate the two kinds of cause, Boas backed away from any kind of generalization, repudiating his own genetic connections on the Northwest Coast in the 1890's. [...] In 1920, Boas formally stated his disagreement with his former pupils, claiming that because language was similar to the rest of culture, history would typically obscure past relationships. Genetic relationship wasn an acceptable explanation only if lexical, morphological and phonetic results [pb] pointed to the same conclusion. .. pull-quote:: In other words, the whole theory of an "Urspra- che" for every group of modern languages must be held in abeyance until we can prove that these languages go back to a single stock and that they have not originated, to a large extent, by the process of acculturation. [Franz Boas, 1920, in Race Language and Culture, New York, Free Press: 215-6 (1940).]|25|linguistic diffusion, language change, wave theory, family tree, genetic relationship 2591|Darnell1971|Morris Swadesh is thought of by most linguists as someone who tried to arrive at greater and greater linguistic time depth, including the possible reconstruction of the origin of language. However, Swadesh also continued Sapir's interest in tracing areal influences. In A Structural Trend in Nootka in 1948, Swadesh pointed out that, in the recent history of Nootka, many old post-posed particles have become suffixes, under the influence of neighboring 'suffixing' languages of the Northwest Coast.|28|linguistic diffusion, Morris Swadesh, family tree, language contact 2592|Abramov2011|**Network Theory applied to Linguistics – New Advances in Language Classification and Typology** This is a very interesting and very strange thesis, as it treats genetic classification (but with really poor results), but als word formation, using methods for simulation. It seems, given all these topics, and especially also the network-aspect, this thesis may deserve a closer read, as it seems to cotnains some interesting insides. |000|networks, network approaches, word formation, biological parallels, phylogenetic reconstruction, quantitative analysis 2593|Daniels2016|In this paper I introduce Magɨ, a previously undocumented speech variety of central Madang Province, Papua New Guinea. Magɨ is closely related to the Aisi language; however, I argue that it should not be considered an Aisi dialect but rather a separate language. I present arguments from various domains in support of this position, including lexicon, phonology, morphology, syntax, historical change, mutual intelligibility, and language attitudes. The facts provided as evidence for these arguments also double as an outline of Magɨ structure, and I conclude that Magɨ is a separate language. The first appendix contains Magɨ and Aisi wordlists, and the second contains a short Magɨ text.|000|Magɨ, Madang, Papua New Guinea, language documentation, concept list 2594|Weinreich1968|1. Linguistic change is not to be identified with random drift proceeding from inherent variation in speech. Linguistic change begins when the generalization of a particular alternation in a given subgroups of teh speech community assumes direction and takes on the character of orderly differentiation.|187|drift, linguistic change, random drift, language change, explanation of language change 2595|Weinreich1968|2. The association between structure and homogeneity is an illusion. Linguistic structure includes the orderly differentiation of speakers [pb] and styles trough rules which govern variation in the speech community; native command of the language includes the control of such heterogeneous structures. |187f|heterogeneity, structure, language change, explanation of language change 2596|Weinreich1968|4. Not all variability and heterogeneity in language structure involves change; but all change involves variability and heterogeneity.|188|language change, explanation of language change, language variation 2597|Weinreich1968|5. The generalization of linguistic change throughout linguistic structure is neither uniform nor instantaneous; it involves the covariation of associated changes over substantial periods of time, and is reflected in the diffusion of isoglosses over areas of geographical space. |188|linguistic diffusion, language change, explanation of language change 2598|Weinreich1968|6. Linguistic change is transmitted within the community as a whole, it is not confined to discrete steps within the family. Whatever discontinuities are found in linguistic change are the products of specific discontinuities within the community, rahter than inevitable producs of teh generational gap between parent and child.|188|transmission, linguistic diffusion, language change, 2599|Campbell1999|:comment:`Problems of language change (following` @Weinreich1968 :comment:`):` (1) *The constraints problem*: what are the general constraints on change that determine possible and impossible changes and directions of cahnge. [...] [pb] (2) *The transition problem*: how (or by what route or routes) does language change? [...] (3) *The embedding problem*: how is a given language change embedded in the surrounding system of linguistic and social relations? [...] (4) *The evaluation problem:* how do speakers of the language (members of speech community) evaluate a given change, and what is the effect of tehir evaluation on the change? [...] (5) *The actuation problem*: why does a given lingusitic change occur at the partialur time and place that it does?|194f|problem, language change, 2600|Campbell1999|MutuaL intelligibility: when speakers of different linguistic entities can understand one another. This is the principal criterion for distinguishing dialects of a single language from distinct languages (which mayor may not be closely related). Entities which are totally incomprehensible to speakers of other entities clearly are mutually unintelligible, and for linguists they therefore belong to separate languages.|193|mutual intelligibility, definition 2601|Campbell1999|Language: the definition of 'language' is not strictly a linguistic enterprise, but sometimes is determined more by political or social factors. For this reason, Max Weinreich's definition of language is very frequently reported: a language is a dialect which has an army and a navy. This emphasises that the definition of a 'language' is not merely a linguistic matter.|193|language, definition 2602|Campbell1999|This sort of change is sometimes called dialect borrowing. Most impor- tantly, this example shows that neither model is sufficient to explain all of linguistic change and all the sorts of relationships that can exist between dialects or related languages. Without accepting the sound change, we would not be able to recognise these dialect fonns as excep- tions, and without the infonnation from dialectology, our knowledge of how some changes are transmitted would be incomplete. Clearly, both are needed.|191|family tree, wave theory, Neogrammarian sound change, word history 2603|Labov2007|The view I present here is that the primary source of diversity is the transmission (and incrementation) of change within the speech community, and that diffusion is a secondary process, of a very different character. Such a clear dichotomy between transmission and diffusion is dependent on the concept of a speech community with well-defined limits, a common structural base, and a unified set of sociolinguistic norms.|348|language change, family tree, linguistic diffusion, transmission 2604|Dyen1953|A shared innovation is one which cannot be due to chance (i.e. to independent linguistic change) or to separate borrowing. We speak of exclusively shared innovations because a conclusion drawn from a shared innovation applies equally to all languages that share the innovation. The need to consider more than one exclusively shared innovation in establishing a subgroup arises when we attempt to demonstrate that a language boundary appeared in the dissolution of the proto-language of the family, a boundary that included what today appear as different languages. It is of course conceivable that a single exclusively shared innovation is the sole remnant of such a boundary. But the demonstration that such a boundary actually existed depends generally on the appearance of a set of exclusively shared innovations; the mass of such innovations, however, need only be greater than that which could be due to the dialectal spread of innovations (whose borders now coincide as a result of later developments) within the proto-language. Thus teh mass and teh identity of range of sets of exclusively shared innovations in related languages determine the probability of the former existance of a language boundary.|580f|shared innovation, cladistics, subgrouping, family tree 2605|Banczerowski1986|Linguistics, similarly to other sound disciplines, has already sur- vived several catastrophes, as for instance the catastrophe of ne- ogrammarianism and the catastrophe of classical structuralism. It is now surviving, unobtrusively, the catastrophe of transformational- generative grammar. But it has not yet survived the catastrophe of macroscopism, which should lead staight to glottotronics.|23|generative grammar, Neogrammarians, sound change, nice quote 2606|Campbell1986|I wish to dispel two misleading assumptions which involve loan words and the interpretation of sound correspondences in languages which are only distantly related. The first is the widely-held belief that in cases of distant genetic relationship one should expect unusual, exotic, non-identical sound correspondences. The second, also often re- peated, is the claim that one should expect sounds in loan words to be identical or nearly so to sounds in the word of the donor language. These two go together, apparently, as a strategy for attempting to sort loans from potential cognates in proposed cases of remote genetic relationship. Thus it is believed that cases bearing unusual or exotic correspondences are more apt to be real cognates, given the opinion that loans bear sounds similar to the source language. And conversely, identical or very similar sound matchings suggest borrowing and are less trustworthy as evidence of a relationship, since true sound corre- spondences due to a genetic connection are expected to be exotic. Both assumptions are false in an absolute sense, and therefore it is of value to warn against too much allegiance to either.|000|sound correspondences, methodology, comparative method, phonetic similarity, individual-identifying evidence 2607|Campbell1986|A quick survey of once-disputed but now established remote genetic relationships reveals that identical (or very similar) sound correspondences are not that unusual.|222|sound correspondences, phonetic similarity, lexical borrowing 2608|Lehrer1985|A semantic field is a set of lexemes which cover a certain concep- tual domain and which bear certain specifiable relations to one another.|283|semantic field, definition 2609|Strauss1986|Most lexicologists working with the notion of the 'semantic field' agree that we have to distinguish the 'lexical field' (Lehrer's 'set of lexemes') from the 'conceptual field' (Lehrer's 'conceptual domain'). The present article comments on the relations that exist between the two fields and, above all, on how 'borderline cases' of the lexical field like loan-words, neologisms or metaphors can be integrated into the concept of field structure.|000|semantic field, meaning, semantics, 2610|Austerlitz1986|Internal reconstruction is at its best and therefore at its most useful when applied to isolates — languages without congeners (related lan- guages). The reason for this is obvious: there is no temptation and (leaving aside dialects for the moment) there is no mechanism for introducing the comparative method into reconstruction simply because, in the case of isolates, comparative evidence is not avail- able. Internal reconstruction should therefore be ideally viewed as a tool primarily for recapturing the past history of isolates or of stages of languages which cannot be recaptured by means of the comparative procedure. And when internal reconstruction is applied to a language which does have congeners, the semblance of propriety — namely, playing the game as if the language in question were an isolate — should be observed.|000|internal reconstruction, dialect data, methodology, comparative method 2611|Austerlitz1986|Interesting text, did not really grasp the main content, but it seems to bring up some interesting aspects about internal reconstruction. Maybe worth a closer reading.|000|internal reconstruction, methodology 2612|Kastovsky1986|Contributions to historical word-formation tend to concentrate on the morphological and semantic aspects of the field, i. e. they are primarily concerned with the loss of old word-formation pat- terns and the rise of new ones, or with semantic shifts within one and the same morphological pattern. There is, however, another, much more general aspect which so far does not seem to have attracted much attention, viz. the interaction between such changes and the semiotic-communicative functions of word-formation. It turns out that the rise or decline of a pattern very often also involves a shift of its communicative function.|000|word formation, communicative function, historical linguistics, language change 2613|Kastovsky1986|Obviously, the derivatives faker, pottiness, chuckle, birther, split, murderee, patting, patter and shadowee do not primarily serve as designations. Rather, they have the function of referring back to something contained in the previously occurring text: they thus condense information. In the case of the examples just quoted, this condensing takes the form of a noun, i. e. we have to do with nominalizations. But this is not the only possibility, [...]|411|communicative function, communicative efficiency, word derivation, word formation, language change 2614|Kastovsky1986|Interesting article brings some evidence that word formation (especially simultaneous word formation) may serve a communicative function in so far as it increases the back-reference to the content of what is being discussed.|000|word formation, communicative efficiency, communicative function 2615|Lass1986|Much of what we know (or think we know) about language history is based on our belief in a central claim of comparative method: that regular phonetic change yields regular correspondences, and that this entails the regular development (and hence comparability and recon- structability) of morphemes and words. At least this is so in the 'nor- mal' cases, where l'arbitraire du signe is maintained, and analogy and similar disruptions don't supervene.|473|regular sound change, language change 2616|Lass1986|'Etymology' is essentially a set-theoretical notion. The etymology of a form is a set of segment-to-segment functions from one time- indexed set into one or more other time-indexed sets. I. e. a formative at any time t, is an ordered set of segments :math:`{s^i_, s^i_2 ,...}`, and there is a function :math:`f^k_i` for every :math:`s^i_k E F_i` that maps it into its image :math:`s^j_k E F_j` (where :math:`F_j = F_{i+1}` , in some (possibly null) context C. The set of all functions from segments in F ; onto their images in all succeeding time-sets up to the formative whose etymology is the goal of the sequence consti- tutes the 'etymology o f that formative. :comment:`An alignment illustration follows.`|474|etymology, sequence alignment, phonetic alignment, sound change 2617|Lass1986|Thus the 'dialect-splits' we reconstruct to yield the daughter forms — since they are not mappings between fully specified and determinate forms — will not be 'sound changes' in the usual sense. They will rather be 'choices' of vowel qualities from some stipulated (morphosyntactic) set. We must, that is, assume that at least the earliest splits from Proto-Indo-European carried with them (to some unspecified and probably unspecifiable post-Proto-Indo-European period) some relics of a nominal ablaut system — as Germanic [pb] clearly carried a large part of the verbal ablaut. So when we say (in the customarily rather loose way that professionals are prone to in the right circles) that Goth, tunpus 'represents a Proto-Indo-Euro- pean zero-grade', we cannot in fact really mean that. We are not talking strictly about descent in time of an inherited 'vowelled' root */dn0t-/ or */tun8-/ or whatever, but rather of a differential dissolution of the ablaut system in a particular direction (through regularization of a specific alternant as paradigm-base). And in this case, an 'irregular' dissolution, since the Germanic pattern is fixation of o-grade in this lexeme everywhere else.|478f|incomplete lineage sorting, examples, Germanic,Indo-European 2618|Lass1986|Article reflects the general problem of handling morphology as representing regular sound change, using the example fot "tooth" with an irregular (unexpected) vocalism in Gothic.|000|incomplete lineage sorting, Germanic, Indo-European, morphological change, phonetic alignment 2619|Brugmann1884|In dieser Polemik gegen die Spaltungstheorie traf Schmidt im wesentlichen zusammen mit Max Müller, der in demselben Jahre (Über die Resultate der Sprachwissenschaft. Strassburg, 1872, S. 18ff.), nter Hinweise darauf, daß eine Anzahl von Übereinstimmungen jedesmal zwischen je zwei benachbarten Sprachen, Slavisch und Deutsch, Deutsch und Keltisch, keltisch und Italisch, u. s. w., aufgefunden worden sei, erklärte, die Frage nach den Graden der Verwantschaft zwischen den verschiedenen Sprachen, sofern man darauf historische Folgerungen in Bezug auf Verzweigung der Völker baue, sei eine [pb] der natur der Sache nach fruchtlose. Max Müller begnügte sich mit der Abweisung des Stammbaums. Nicht so Schmidt.|227f|Max Müller, Johannes Schmidt, Wellentheorie, wave theory, family tree, Stammbaumtheorie 2620|Brugmann1884|Nun bedurfte es aber einer Erklärung, daß in historischer Zeit vielfach doch unleugbar scharf trennende Sprachgrenzen bestehen, wie zwischen Baltisch-Slavisch und Arisch, zwischen Germanisch und Baltisch-Slavisch u. s. f. Von unmerklichen Übergängen, wie sie Schmidt für die vorhistorischen Zeiten voraussetzte, kann hier nirgends die Rede sein. Schmidt deutete diese Thatsache so. Auf verschiedenen Punkten des ganzen indogermanischen Sprachgebietes gewannen einzelne kleinere Stämme über ihre nächsten Nachbarn das Übergewicht und nötigten ihnen ihre Sprachvarietät auf. Dadurch kamen die Vermittelungsdialekte, die bis dahin von dem Dialekt des sich ausbreitenden Stammes zu den Mundarten der fernen wohnenden Stämme des sich ausbreitenden Stammes zu den Mundarten der ferner wohnenden Stämme allmählich hinübergeleitet hatten, in Wegfall, und es berührten sich numehr Dialekte, die sich stärker von einander unterschieden, es waren jetzt wirkliche Sprachgrenzen vorhanden.|228|linguistic borders, diversification, wave theory, family tree, language split 2621|Brugmann1884|Schmidt's Hypothese fand mehr Widerspruch als Zustimmung [...[ bis Leskien in der Einleitung seiner Schrift über "Die Deklination im Slavisch-Litauischen und Germanischen" (Leipzig, 1876) zeigte, daß die Spaltungs- und die Übergangstheorie einander gar nicht so durchaus ausschließen, wie man zuerst annahm. [...] :comment:`Keeps on talking about Leskien's article, mentioning problem of assuming Indo-European as a static distribution that never expanded, and the like.` Die von vielen vertretene Spaltungstheorie aber könne aufrecht erhalten werden, wenn man die Spracherscheinungen, die für Schmidt der Grund waren, die bis dahin geltende Spaltungshypothese und die Sonderverbände wie den Baltischslavisch-Germanischen abzulehnen, als dialektische Varietäten in die vor jeder Ausbreitung des indogerman. Volkes auf einem nur engen Terrain gesprochene Grundsprache verlege. [...] :comment:`Keeps on giving examples for shared isoglosses due to dialectal diffusion.`|229|August Leskien, wave theory, family tree, language split, linguistic diffusion 2622|Brugmann1884|Auch darin hatte Schmidt recht -- was er wiederholt hervorhob --, seine Gegner hätten noch kein einziges positives zwingendes Kriterium für den Stammbaum beigebracht. Aber ebenso unrecht hatte er einerseits darin, daß er meinte, die Anname eines 'Stammbaumes' als unzulässig und die speziellen Beziehungen zwischen je zwei andeinander grenzenden Hauptsprachen als nur durch seine Übergangshypothese erklärbar erwiesen zu haben, und andrerseits darin, daß er behauptete, das Zusammentreffen von zwei Sprachen in etwas der Ursprache fremdem, erst später entwickeltem könne nicht auf Zufall beruhen. :comment:`Essentially picks up the problem of homoplasy that has been ignored by Schmidt.`|231|wave theory, homoplasy, shared innovation, family tree, language split 2623|Brugmann1884|Darüber ist man heute allgemein einig oder sollte es doch sein: wirkliche Beweisgründe für die engere Zusammengehörigkeit zweier oder mehrerer Sprachen können nur solche Übereinstimmungen sein, welche Abweichungen von den übrigen Sprachen desselben Stammes sowie zugleich von der allgemeinen Grundsprache sind, also gemeinsam vollzogene Neuerungen. Aber sollte hier das Zusammentreffen der beiden Sprachen nicht oft lediglich auf Zufall beruhen, d. h. jede der Sprachen unabhängig auf dieselbe Neuerung verfallen sein? :translation:`Regarding this, one would or should today generally agree: true proof for the closer relationship of two or more languages can only be commonalities, who appear to be deviations from the rest of all languages of the same family, but also from the general proto-language, that is, shared innovations. But should it here not also be possible that the shared trait of both languages is due to coincidence, that is, that both languages developed the same innovation?` :comment:`Goes on by pointing to similar environment of languages which may give raise to similar pathways of change. Gives more examples up to the following page.`|231|shared innovation, homoplasy, subgrouping, wave theory, family tree 2624|Brugmann1884|Ich glaube nun, auf die Erfahrung mich stützend, daß oft Sprachen unabhängig von einander denselben Weg der Neuerung betreten, daneben auch auf Grund einer anderen unten zu erörternden thatsache, zeigen zu können, daß ein engerer Zusammenhang irgend welcher der sieben durch starke und deutliche Grenzen von einander geschiedenen idg. Hauptsrpachen, sei es im Sinne der Spaltungstheorie oder in dem der Übergangshypothese, bis jetzt nicht irgend wahrscheinlich gemacht ist und, mit Ausnahme vielleicht des Italo-keltischen, vermutlich nie wird wahrscheinlich zu machen sein.|232|wave theory, conflicting data, family tree, homoplasy 2625|Brugmann1884|Daß derartige :comment:`As mentioned earlier, mostly sound changes.` Übereinstimmungen zufällig sein können, wird nun klar, wenn man sieht, wie oft Sprachen unzweifelhaft unabhängig von einander denselben Lautwandel vornehmen.|235|homoplasy, sound change, subgrouping 2626|Brugmann1884|:comment:`Problem of using lexical evidence (pairwise comparison for shared cognates): (a) lexicon is very prone to change, (b) dependence upon researchers selection and knowledge, (c) differential loss of words in later times, (d) patchy distributions which cannot be reconciled, neither by wave (geography) nor by family tree. (e) borrowing in ancient times. Goes then on to discuss word for 1000 in Germanic and Slavic, and shows that Franz Bopp was actually thinking it was a borrowing.`|250f|family tree, lexical borrowing, wave theory 2627|Brugmann1884|Rekapitulieren wir nun kurz. Bei der Frage, ob zwei Sprachen innerhalb eines größeren Sprachenverbandes eine engere Enheit bilden, können nur solche Übereinstimmungen in Betracht kommen, die sich als gemeinsam vollzogene Neuerungen darstellen. Hier ist man aber auf Schritt und Tritt der [pb] Gefahr ausgesetzt irre zu gehen, einerseits weil verwandte Sprachen oft ganz unabhängig von einander denselben Weg der Neuerung betreten, anderseits, und das betrifft besonders die lexikalischen Übereinstimmungen, weil oft eine Sprache von der andern entlehnt und uns die Mittel fehlen, um alle in vorhistorischen Zeiten geschehenen Entlehnungen als solche zu erkennen.|252f|genetic classification, subgrouping, lexical borrowing, homoplasy, shared innovation 2628|Brugmann1884|:comment:`Talks on assumed subgroupings of Baltic and Slavic, etc.` Es ist hier nicht eine einzelne und sind nicht einige wenige auf zweien oder mehreren Gebieten zugleich auftretende Spracherscheinungen, die den Beweis der näheren Gemeinschaft erbringen, sondern nur die große Masse von Übereinstimmungen in lautlichen, flexivischen, syntaktischen und lexikalischen Neuerungen, die große masse, die den Gedanken an Zufall ausschließt.|253|cumulative evidence, cladistics, shared innovation, family tree, evidence, 2629|Brugmann1884|:comment:`Talks in further length about the problem of projecting forms and features back to the proto-language, pointing to problems of language contact and again homoplasy (coincidence). Later then also turns against the majority rules argument, as certain changes can be determined by sheer directionality based on physiological evidence, especially in sound change. So the directionality helps to project features back to the proto-language.`|254f|directionality, sound change, proto-form, linguistic reconstruction, 2630|Brugmann1884|In sehr vielen Fällen wird man sich vorläufig einfach mit einem non liquet bescheiden und auf die Aufschlüsse rechnen müssen, die unsere rüstig fortschreitende Wissenschaft künftig geben wird.|256|agnosticism, evidence, progress 2631|Walker2011|Phylogenetic inference based on language is a vital tool for tracing the dynamics of human population expansions. The timescale of agriculture-based expansions around the world provides an informative amount of linguistic change ideal for reconstructing phylogeographies. Here we investigate the expansion of Arawak, one of the most widely dispersed language families in the Americas, scattered from the Antilles to Argentina. It has been suggested that Northwest Amazonia is the Arawak homeland based on the large number of diverse languages in the region. We generate language trees by coding cognates of basic voca- bulary words for 60 Arawak languages and dialects to estimate the phylogenetic relationships among Arawak societies, while simultaneously implementing a relaxed random walk model to infer phylogeo- graphic history. Estimates of the Arawak homeland exclude Northwest Amazonia and are bi-modal, with one potential homeland on the Atlantic seaboard and another more likely origin in Western Amazonia. Bayesian phylogeography better supports a Western Amazonian origin, and consequent dispersal to the Caribbean and across the lowlands. Importantly, the Arawak expansion carried with it not only language but also a number of cultural traits that contrast Arawak societies with other lowland cultures.|000|Arawakan, lexicostatistics, concept list, phylogenetic reconstruction 2632|Pisani1952|:comment:`Problem of distinguishing inheritance from borrowing.` Nous passons maintenant à examiner la catégorie des concordances historiques, pour voir, si et dans quelle mesure on pourra ou on devra disginuer les deux sous-catégories, la parené et l'emprunt. |8|lexical borrowing, inheritance, language change, linguistic reconstruction, Romance, Latin 2633|Pisani1952|En d'autre mots, nous constatons une parenté secondaire de ces langues avec les langues celtiques, le grec moderne, les langues germaniques: dans le cas de l'osco-ombrien on ne trouve pas de parentés secondaires, parce que les éléments de cette langue survivent seulement dans le roman.|11|tree of one percent, language change, lexical borrowing, Latin, Romance 2634|Pisani1952|[...] le terme de parenté [...] porte à une vision erronnée comme l'autre, qui y est connexée, de l'arbre généalogique. Une image beaucoup plus correspondante à la réalité des faits linguistiques pourrait être suggérée par un systeème hydrique compliqué: un fleuve qui rassemble l'eau de torrents et ruisseaux, à un certain moment se mêle avec un autre fleuve, puis se bifurque, chacun des deux bras continue à son tour à se mêler avec d'autres cours d'eau, peut-être aussi avec un ou plusieurs dérivés de son compagnon, il se forme des lacs dont de nouveaux fleuves se départent, et ainsi de suite.|13|family tree, wave theory, language history, modeling 2635|Lass2014|People have been conscious that language has a temporal dimension at least as long as they have been writing about it. In the West, perhaps the earliest ‘serious’ recognition of language change is the collection of speculative etymologies and discussion of the meaning of letters, sounds and names in Plato’s *Cratylus*. One primary issue there is the essentialist question of whether words have meaning by nature or by convention. Semantic change and dialect difference are invoked as a partial argument for conventionality. There is also a claim of monogenesis by an act of creation – names were given by some  gure in the distant past (the ‘Legislator’). Change and variation are then seen as betrayals of this original creation. The notion of ‘originality’ reappears in different forms later on. This is a fairly isolated example; by and large language in time was not a focal concern in the Classical traditions, Greek or Latin, though language as a philosophical object certainly was|45|Cratylus, Platon, language change, awareness of language change, history of science 2636|Lass2014|By the sixteenth century, monogenesis, though still discussed, became less important. The introduction of Semitic grammatical works into Europe seems to have stimulated the serious adoption of the idea of there being many families of related languages, which are nonetheless not related to other linguistic families. Thus polygenesis came to be a foundation concept. An early example is Theodor Bibliander’s work on Semitic (1548), which identi es Arabic as a descendant of Hebrew (this idea was already current in the ancient Semitic grammatical tradition), but not related to any European languages. The idea of families of independent origin, descending from unrelated ‘mother’ languages, was commonplace: @Gessner<1555> (1555) allowed for both mother languages that were unrelated and those that were related (cognatae).|-|language awareness, awareness of language change, history of science, linguistics, 2637|Lass2014|By the seventeenth century we can see the outlines of what was to become the modern view of language  liation. Mother languages generate later dialects the way plants produce branches or shoots (Scaliger 1610: 119: “multi dialecti tamquam propagines deductae sunt”). A more historical way of thinking (of a type now ordinary but then conceptually new) can be seen in the same work: Scaliger visualises an ‘original’ persisting through time and space in changed forms, so that Italian genero, Spanish yerno, French gendre ‘son-in-law’ < L gener could be said still to ‘be Latin words’, in a special historical sense.|46|family tree, history of science, 17th century, linguistics 2638|Lass2014|The fact that there existed an intelligentsia who read widely outside of their own  elds, and were interested in evolutionary and reconstructive topics may have been an important factor in the development of a distinct historical linguistics. Even much later in the century this same kind of wide reading and interest persisted: Darwin (1871: Ch. III) has an elaborate discussion of language, in which he cites (fn 55) an English translation of a book on Darwinism and linguistics by the German linguist Schleicher (1863), which appeared only four years after the publication of On the Origin of Species.|48|Darwin, August Schleicher, biological parallels, language evolution, biological evolution 2639|Lass2014|Before I look in detail at certain nineteenth-century conceptual innovations which still shape our subject, it might be useful to give an overall view: what did later pre-modern historical linguistics accomplish that is part of our lasting heritage? Perhaps the most important concepts that developed in and came to dominate nineteenth-century historical linguistics were the following: i Genetic relationship (strictly, common origin) can lead to sets of regular sound correspondences, and these can be turned into arboriform genealogies. [...] ii The fact that these trees are rooted allows the construction of sequences of ancestors receding toward the root, and eventually, in the ideal case, to recovering the root or ‘archetype’ as well as the intermediates. This is done by  nding a latest common ancestor for a related group and then working backwards through a sequence of sub-genealogies. This notion is still part of text criticism as well as linguistic and zoological  liation, though now we have computational tree-making tools of huge sophistication and can handle more data and evaluate it statistically. [...] iii Correspondences between categories can be remetaphorised as ‘actions’, in particular within systems, which themselves emerge as metaphorisations over sets of actions. Thus one major domain of change becomes the shift of entire systems, not just individual movement from segment to segment. [...][pb] iv The establishment of actions in time, i.e. reconstruction, as a necessary basis for filiation. The outputs of pathways of action are new language states. In fact it is reconstruction that validates the correspondences or trees, by giving predicted data as output. This was accepted from the 1850s onwards. [...] v The objects that are reconstructed may be ‘imaginary’ in that they are invented by the reconstructor, but this does not make them ontologically ‘unreal’. Reconstructing the ‘imaginary’ in this sense is simply the insertion of unattested but argumentatively justified objects into histories on the basis of procedural imperative, in other words allowing the disciplinary praxis to generate objects in the past. The crucial test for the validity of a reconstructed object increasingly becomes whether or not it works, ideally with maximal parsimony and naturalness, and can be rationally defended. [...] vi The correspondences that lead to reconstruction are by and large ‘regular’: this stipulation allows the trustworthy (rational) reconstruction of protolanguages and earlier states of given languages. It also allows construction of reliable genetic trees for language families, the establishment of families and subfamilies, and reliable etymologies. [...] vii Following from (vi). As a procedural imperative all irregularities should, if possible, be expunged by the creation of subregularities. |49f|historical linguistics, overview, achievements, linguistic reconstruction, sound change, sound correspondences, family tree 2640|Lass2014|:comment:`Overview on the Neogrammarian Manifesto.`|53f|Neogrammarian Manifesto, overview 2641|Vincent2014|Natural languages are held in a tension between their use as vehicles of communication, which implies consistency between speaker and hearer, and the fact that such use inevitably begets change. A language system is a  nite resource of items – af xes, words, categories, constructions, idioms, set phrases (the exact inventory depends on one’s theoretical stance) – out of which speakers can produce, and hearers can interpret, a potential in nity of messages. For this to be possible, there has to be some way in a given sentence or text for the whole to be understood on the basis of its parts. This is the principle of compositionality, which in its simplest form states: “The meaning of a complex expression is a function of the meanings of its constituents and the way they are combined” (Szabó 2012: 64). Countervailing this is the fact that with recurrent use some combinations may get routinised or conventionalised and come to have a value different from what we might expect on the basis of our knowledge of their parts. This process can be seen at work in different ways in the genesis of new grammatical constructions, or grammaticalisation, and in the development of various types of  xed expression, or what Wray (2002) has called ‘formulaic language’.|103|compositionality, language, semantics, definition 2642|Hale2014|In an important methodological paper, @Lichtenberk<1994> (1994: 1–2) distinguishes between an ‘instrumentalist’ and a ‘realist’ view of the nature of proto- languages: .. pull-quote:: Do we assume that at some time in the past there really was a language that had the properties that we have reconstructed (the realist view), or is such an assumption irrelevant to our concerns (the formalist view)? [...] On the realist view, reconstructed proto-languages are viewed not as formal devices but as real entities, as real as the languages around us. Lichtenberk’s paper is an extended argument in favour of the ‘realist’ view, the orientation which is certainly dominant in contemporary reconstruction methodology. The implication of that view which he is most interested in exploring in that paper is the following: if the proto-language represents our best hypothesis about the properties of a ‘real language’, then shouldn’t our reconstructed proto-languages have dialects and show the type of variability and diversity that we  nd when we examine the languages around us?|149|nature of the proto-language, scientific construct, proto-language, methodology, linguistic reconstruction 2643|Hale2014|Thus while it is certainly true that at present those working on syntactic reconstruction cannot perhaps claim the large body of successful applications of the Method that we have seen in phonological reconstruction, we should be aware  rst that the reconstruction of bound morphology generally speaking directly entails the reconstruction of aspects of the syntax of the languages being compared – and thus much more syntactic reconstruction has been done than is generally acknowledged – and that doubts about the applicability of the concepts of ‘cognancy’ (sometimes called the ‘what do we compare?’ problem) and ‘regularity’ in the syntactic domain may be somewhat overstated.|158|syntactic reconstruction, linguistic reconstruction, methodology 2644|Hamann2014|In this chapter, we will focus on changes in phonological systems across generations of speakers of the same language community. In contrast to phonetic changes, which have been dealt with in the previous chapter, phonological changes are not directly observable or measurable, because they involve a change in the mental representations of sounds.|249|phonological change, sound change, definition 2645|Hamann2014|The very  first example of a near-merger, from Labov, Yaeger and Steiner (1972), is the vowel contrast in the word pair *source* and *sauce*, which was thought to be pronounced the same in the vernacular of New York City. Contrary to these expectations, Labov et al. (1972) found a statistical difference between the acoustic realisations of the two vowels, though the speakers could not perceptually differentiate them. Based on their  findings, Labov et al. suggest that such a sound change is happening when two sounds perceptually merge into one, but due to the consistent articulatory differentiation the phonemes stay separate, and their phonetic realisations move apart again in the following generations. In a later study, Labov et al. (1991: 45) observed considerable individual variation within the community that shows such near-merges.|251|phoneme merger, sound change, perception, language change, 2646|Hamann2014|Computer simulations are useful tools to test hypotheses on the acquisition of phonological systems. The types of changes we can observe in such simulations can be tested against real occurrences of sound changes and provide insight into the accuracy of the underlying assumptions. Usually, however, computer simulations proceed the other way around, i.e. they start with existing data on changes and try to model the data as closely as possible, sometimes irrespective of whether the simulation is realistic in mimicking the knowledge of a speaker/listener or not.|259|sound change, simulation studies, phonological change 2647|Barddal2014|The goal of this chapter is to present an overview of earlier and current research on syntactic reconstruction and illustrate how such a reconstruction may be carried out. Earlier structuralist and generative scholars of the last decades of the twentieth century took a strong aversion to syntactic reconstruction, an aversion which is still found among many contemporary historical syntacticians. This view is gradually changing, however, and more and more concurrent linguists are arguing for the feasibility of syntactic reconstruction. Particularly within the framework of Construction Grammar it has been argued that syntactic reconstruction is not only achievable, but may also be quite successful. In addition, historical syntacticians from a diversity of frameworks have started coming forward to propose different methods of syntactic reconstructions within their respective models.|343|syntactic reconstruction, linguistic reconstruction, language change, syntax, introduction 2648|Urban2014|The topic of this chapter is lexical semantic change, the change of the meaning of languages’ lexical items through time. To give a  rst idea of this process by example, the basic meaning of Old High German bein was ‘bone’ (the original or source meaning), while in Modern German, it is ‘leg’ (the innovative or target meaning). In this example, there is a complete change in meaning, with the original meaning disappearing almost entirely. However, arguably, semantic change is also present if the original meaning persists and a new one is merely added, since already then there is a difference from the original state.|374|semantic reconstruction, introduction, semantic change, lexical change 2649|Urban2014|The introductory example showed that German bein underwent semantic change from the original basic meaning ‘bone’ to ‘leg’. In the older stages of other West and North Germanic languages, ‘bone’ is uniformly found as the general meaning of bein’s cognates (Orel 2003: 32). 2 If this is true, positing an original meaning ‘leg’ requires the assumption of a change at many nodes of the family tree, while positing an original meaning ‘bone’ necessitates only change at a low node of the tree. By the principle of parsimony, then, reconstructing ‘bone’ would be preferred on methodological grounds. Therefore, if we did not already know from the available internal evidence regarding the development of German that ‘bone’ is the likely original meaning, the comparative evidence would direct us towards the same conclusion.|386|semantic change, semantic reconstruction, parsimony, family tree 2650|Mailhammer2014|This chapter gives an overview of what etymology is, what it engages in and what issues are under debate, focusing on the following key points. First, etymology is both the foundation of historical linguistics, the starting point of the historical investigation of a language and at the same time one of its sub elds. In order to establish the etymology of an item one must backtrack to the chronological stage at which it was transparently formed for the  rst time, and account for the changes it subsequently underwent. Hence, etymologies give us a view of a language’s history. They make more general statements possible, and they are historical accounts of individual items as well. Second, etymology is distinct from describing synchronic processes, such as word-formation. It is an inherently diachronic perspective in relating one chronological stage of a language to earlier ones. Third, the research questions of etymology can, in principle, be applied to bigger and smaller linguistic units with suf cient stability, as etymology is concerned with the origin and history of linguistic elements.|423|etymology, introduction, language change, evolutionary scenario 2651|Stanford2014|Language acquisition has long held an important place in theories of language, including questions about how language is structured and which aspects of language are acquired or innate. Historical linguistics is another area where language acquisition enters the discussion. This chapter reviews major lines of thinking about the role of language acquisition in language change, outlining perspectives from classic studies as well as recent work on less commonly studied language communities.|466|language change, language acquisition, reasons for language change, introduction 2652|Stanford2014|Therefore, while it is clear that children have a role in language change, it would be good to cast a wider net to catch other possible sources as well. To this end, we now turn to language variation in the speech of adults, adolescents, and children, as well as in-migrating speakers and other ‘leaders of linguistic change’ in the community (@Labov<2001> 2001: 385–411).|470|language change, language acquisition, reasons for language change 2653|Stanford2014|@Labov<2007> (2007) exempli es the way in which synchronic variation can be used to integrate language acquisition and language change. In this approach, transmission is de ned in terms of adult-to-child contact, i.e.  rst language acquisition. Transmission is the vertical dimension along the tree in Figure 21.2. Systematic features of language are faithfully transmitted from generation to generation because of children’s innate ability in language learning.|471|transmission, diffusion, language acquisition, language change 2654|Stanford2014|By contrast, diffusion is the horizontal dimension, and it is de ned in terms of adult–adult contact, i.e. speakers who come into contact after their childhood language acquisition is complete.|471|diffusion, transmission, definition, language change, 2655|Bybee2014|Recent work in historical linguistics has bene ted greatly from a better understanding of the cognitive processes underlying language production and perception, and from the study of the way change takes place during language use. The view of language change that is emerging from recent studies is that change is gradual and takes place in small increments during usage events, as language users apply cognitive processes to the task of communicating with one another. As each usage event has an impact on users’ cognitive representations, change can take place in both children and adults. We outline this usage-based perspective in section 2 and apply it in subsequent sections as we survey the known psycholinguistic or cognitive processes that serve as the mechanisms of language change.|503|language change, cognition, language use, 2656|Lucas2014|Despite this relatively greater interest in internally caused change, there is of course a large and ever-expanding descriptive literature on the synchronic outcomes of language contact in various situations. Arising out of this descriptive literature, there has also been a considerable amount of work that tries to generalise across language-contact situations, the main focus of this work being on the putative limits of contact-induced change: which linguistic elements can and cannot be ‘borrowed’ (see section 3 for further discussion of this term), and in which order, and how sociolinguistic factors can act as constraints on contact- induced change.|000|contact-induced language change, introduction 2657|Simpson2014|People change the way they talk for social reasons, wanting to make themselves understood, and wanting to speak like the people they admire (accommodation) or to differentiate themselves from the people they don’t want to be like (divergence). People also change aspects of the way they talk in order to express new ideas, new practices, and new things. This affects mostly vocabulary, but syntactic and morphological structures may also change under the pressure of new language technologies, such as writing, and the need to be concise for speed of production.|537|language attrition, language change, introduction 2658|Hull1976|The processes which contribute to the evolution of biological species take place at a variety of levels of organization; e.g., genes give rise to other genes, organisms give rise to other organisms, and species give rise to other species. All of these processes require continuity through descent. If species are to be units of evolution, they need not be composed of similar organisms; instead they must be made up of organisms related by descent. Taxonomists do not impose this requirement on the phenomena; rather it follows from the nature of the evolutionary process itself. In addition to spatiotemporal continuity, species must also possess a certain degree of unity to function as units of evolution. Gene exchange is one means by which such unity can be promoted. The mechanisms by which asexual species maintain a similar unity are problematic; higher taxa pose an even more serious problem. However, if species are chunks of the genealogical nexus, they cannot be viewed as classes. Instead they possess all the characteristics of individuals—that is, if organisms are taken to be paradigm individuals. The major difference between organisms and species as individuals is that organisms possess a largely fixed genetic makeup which constrains their development, whereas species do not. If species are individuals, then their names are most naturally viewed as proper names, names which denote particular individuals but do not possess any intensional meaning. |000|species, definition, individuals, biology, methodology, biological theory 2659|Honeybone2016|One question that historical phonology should reasonably seek to answer is: are there impossible changes? That is: are there plausible changes that we could reasonably expect to occur in the diachrony of languages’ phonologies, but which nonetheless do not ever occur? In this paper I seek to spell out what it really means to consider this question and what we need to do in order to answer it for any specific case. This will require a consideration of some fundamental issues in historical phonology, including the distinction between exceptionless and lexically-specific/sporadic changes (which I call ‘N-changes’ and ‘A-changes’), and the connection between that distinction and the ‘misperception’ model of phonological change. It will involve an analysis of aspects of the phonological history of Pulo Annian, Arabic, Italic, Spanish and several varieties of English. I argue that the current state of evidence indicates that there are indeed impossible changes (which I symbolise using ‘x ≯ y’ to represent that ‘x cannot change into y’) in a very specific but phonologically real way, and that f ≯ θ is one.|000|sound change, regular sound change, methodology, phonological theory, pathways 2660|Honeybone2016|Paper contains an interesting historical review on pathways of sound change, and hints on where the author wrote more about that. Very interesting on first sight, and should be more thoroughly read.|000|pathways, sound change, regular sound change, sporadic sound change 2661|Jepson1991|This paper explores the diachronic predictions of the typological models of Greenberg, Vennemann, and Hawkins, by matching their universah against attested change in Chinese. The proposal made by some linguists is that Chinese has undergone a long-term typological change from VO to OV; Vennemann's and Hawkins's models predict large-scale changes. Greenberg's universals predict that Chinese should have remained typologi- cally stable between the periods of Archaic Chinese and Modern Standard Chinese. The evidence presented here suggests that Greenberg's universals are accurate in their predictions: Chinese appears to have undergone no sweeping typological changes. Reasons for the failure of Vennemann's and Hawkin's models accurately to predict the stablility of Chinese word order are discussed.|000|Chinese, universals, Joseph Greenberg, Theo Vennemann, language change, word order 2662|Bradley1982|Ests trabajo presenta una nueva reconstrucción del proto-mixteco, con base en el material léxico recolectado en veinte pueblos representativos de diferentes variedades del mixteco. Proponemos para el protomixteco un sistema fonológico y una serie de reglas que pueclen representar el desarrollo histórico de las variedades modernas. :translation:`This work present a new reconstruction of Proto-Mixtex, based on lexical material collected in 20 villages representative of the different varieties of Mixtec. We propose for Proto-Mixtex a phonological system and a series of rules which can represent the historical development of the modern varieties.`|000|Proto-Mixtec, linguistic reconstruction, etymology, wave theory, family tree, 2663|Bradley1982|Paper shows an interesting simulation (based on six images) which shows how the authors think the languages developed.|000|Mixtec, Proto-Mixtec, linguistic reconstruction, wave theory, family tree 2664|Moran2011|This paper presents the design and implementation of the Ontology for Accessing Transcription Systems (OATS), a knowledge base that supports inter- operation over disparate transcription systems and practical orthographies. OATS uses RDF, SPARQL and Unicode to facilitate resource discovery and intelligent search over linguistic data. The knowledge base includes an ontological description of writing systems and relations for mapping transcription system segments to an interlingua pivot, the IPA. It includes orthographic and phonemic inventories from 203 African languages, which were mined from the Web. OATS is motivated by four use cases: querying data in the knowledge base via IPA, querying it in native orthography, error checking of digitized data, and conversion between transcription systems. The model in this paper implements each of these use cases.|000|phonetic transcription, transcription, transcription systems, ontology 2665|Albert2016|Interesting paper introducing a specific speech variety of the Yanomami in South America used to imitate voices of the forest. |000|sound symbolism, voice imitation, animal voices, Yanomami, Southern American languages 2666|DeLaet2005|About 10 years ago, @Maddison<1993> (1993; see also @Platnick<1991> et al. 1991) drew attention to problems that can arise in parsimony analyses when data sets contain characters that are not applicable across all terminals. Examples of such characters are tail color when some terminals lack tails, or positions in DNA sequences in which gaps are present. Maddison (1993) examined various ways of coding such characters for various parsimony algorithms and concluded that no general solution was available. Since then, the problem of inapplicables has been rediscussed repeatedly (e.g. Lee and Bryant 1999; Strong and Lipscomb 1999; Seitz et al. 2000), but Maddison’s conclusion still holds.|000|inapplicables, maximum parsimony, sequence data, gaps, missing data, problem 2667|DeLaet2005|This article is interesting in context of the discussion of homology and homoplasy: Normally, this is downplayed in bioloby, and homology is simply assumed, but the possibility of homoplasy arises in sequence alignments and sequence data, as sequence evolution may re-invent a character state, and we could not necessarily talk of homology here, also when talking about gaps in an alignment. This apparently has consequences for parsimony principles in phylogenetic reconstruction and in general in biology and may have similar implications in linguistics, as in many cases, what we discuss, are also "inapplicables", as we may often tend to compare datapoins with each other which have simply never been present in a given lineage. |000|maximum parsimony, inapplicables, homology, homoplasy, 2668|DeLaet2005|@Hennig<1966> (1966, p. 89) introduced the terms symplesiomorphy and synapomorphy to decribe the presence of plesiomorphies and apomorphies among terminals. As above, these terms are defined with respect to true evolutionary history, but are often used to refer to inferences as well. Such context- dependent shifts in meaning of these and similar terms are widespread in the literature, Hennig (1966) being a prime example.|85|description, ontology, epistemology, explication, terminology, cladistics, maximum parsimony, symplesiomorphy, synapomorphy 2669|DeLaet2005|A crucial assumption in the above interpretation of a single character is Hennig’s auxiliary principle, stating ‘that the presence of apomorphous char- acters in different species . . . is always reason for suspecting kinship [i.e. that the species belong to a monophyletic group], and that their origin by convergence should not be assumed a priori’ (@Hennig1966, p. 121; square brackets present in original). In this quote, the term ‘character’ refers to a ‘special character’ (Hennig 1966, p. 89), which is a character state as used in this chapter, whereas an apomorphous (special) character refers to a special character that ‘can certainly or with rea- sonable probability be interpreted as apomor- phous’ (Hennig 1966, p.121), i.e. an hypothesis of apomorphy or a putative apomorphy; monophyly is used in its true historical meaning.|86|homology, auxiliary principle, maximum parsimony 2670|DeLaet2005|Such additional statements are implicit in Farris’ (1983, p. 8) formulation of Hennig’s auxiliary principle: ‘homology should be presumed in absence of evidence to the contrary’, where homology refers to similarities among organisms that have arisen historically through inheritance from a common ancestor, irrespective of these similarities being apomorphic or plesiomorphic.|86|auxiliary principle, homology, parsimony, 2671|DeLaet2005|Considering all this, the Hennig–Farris auxiliary principle can be phrased as the following rule for erecting character hypotheses and interpreting their optimizations on trees: ‘features that on the basis of empirical evidence are deemed sufficiently similar to be called the same at some level of generality should be treated as putative homo- logues in phylogenetic analysis (even if they may be true homoplasies instead).’|88|auxiliary principle, homology, homoplasy, parsimony, 2672|DeLaet2005|To remove such hard-coded higher-level assump- tions, @Pleijel<1995> (1995) proposed to use absence/presence coding of character states, which is formally identical to non-additive binary coding, a technique that stems from phenetics (see, e.g., @Sokal1986). Whether it is feasible or desirable to exclude such assumptions from the analysis will be examined below. :comment:`This is the coding proposed in` @Gray2003|94|presence-absence coding, gain-loss models, parsimony, homology, 2673|DeLaet2005|But whatever the answer, the use of absence/ presence coding as a means of doing so can lead to internal inconsistencies in the phylogenetic explanation of data, a result that is particularly relevant for this paper because Pleijel (1995) advanced absence/presence coding as a promising way to deal with inapplicables.|94|presence-absence coding, inconsistencies, maximum parsimony, homology 2674|Ellis2016|The causes of Earth's transition are human and social, write Erle Ellis and colleagues, so scholars from those disciplines must be included in its formalization.|000|anthropocene, discussion 2675|DeLaet2005|Similarities as coded in characters can very well be true homoplasies rather than true homologies. Likewise, it cannot be ruled out that character similarities that can be explained as homologies on most-parsimonious cladograms are true homo- plasies instead, even when using single-character data sets as above. Combined with the observation that parsimony minimizes putative homoplasy, such observations are sometimes taken to mean that it is an assumption of parsimony analysis that homoplasy is rare in evolutionary history. How- ever, even if rarity of homoplasy may be a suffi- cient condition to prefer most-parsimonious trees (see, e.g., Felsenstein 1981), it is definitely not a necessary condition.|87|parsimony, homology, homoplasy, problem 2676|Melamed2016| **Significance** Our knowledge of the diet of early hominins derives mainly from animal skeletal remains found in archaeological sites, leading to a bias toward a protein-based diet. We report on the earliest known archive of food plants found in the superimposed Acheulian sites excavated at Gesher Benot Ya‘aqov, Israel. These remains, some 780,000 y old, comprise 55 taxa, including nuts, fruits, seeds, vegetables, and plants producing underground storage organs. They reflect a varied plant diet, staple plant foods, seasonality, and hominins’ environmental knowledge and use of fire in food processing. Our results change previous notions of paleo diet and shed light on hominin abilities to adjust to new environments and exploit different flora, facilitating population diffusion, survival, and colonization beyond Africa. **Abstract** Diet is central for understanding hominin evolution, adaptation, and environmental exploitation, but Paleolithic plant remains are scarce. A unique macrobotanical assemblage of 55 food plant taxa from the Acheulian site of Gesher Benot Ya‘aqov, Israel includes seeds, fruits, nuts, vegetables, and plants producing underground storage organs. The food plant remains were part of a diet that also included aquatic and terrestrial fauna. This diverse assemblage, 780,000 y old, reflects a varied plant diet, staple plant foods, environmental knowledge, seasonality, and the use of fire in food processing. It provides insight into the wide spectrum of the diet of mid-Pleistocene hominins, enhancing our understanding of their adaptation from the perspective of subsistence. Our results shed light on hominin abilities to adjust to new environments, facilitating population diffusion and colonization beyond Africa. We reconstruct the major vegetal foodstuffs, while considering the possibility of some detoxification by fire. The site, located in the Levantine Corridor through which several hominin waves dispersed out of Africa, provides a unique opportunity to study mid-Pleistocene vegetal diet and is crucial for understanding subsistence aspects of hominin dispersal and the transition from an African-based to a Eurasian diet. |000|paleo diet, hominid evolution, nutrition, 2677|Holstege2016|Vocalizations such as mews and cries in cats or crying and laughter in humans are examples of expression of emotions. These vocalizations are generated by the emotional motor system, in which the mesencephalic periaqueductal gray (PAG) plays a central role, as demonstrated by the fact that lesions in the PAG lead to complete mutism in cats, monkeys, as well as in humans. The PAG receives strong projections from higher limbic regions and from the anterior cingulate, insula, and orbitofrontal cortical areas. In turn, the PAG has strong access to the caudal medullary nucleus retroambiguus (NRA). The NRA is the only cell group that has direct access to the motoneurons involved in vocalization, i.e., the motoneuronal cell groups innervating soft palate, pharynx, and larynx as well as diaphragm, intercostal, abdominal, and pelvic floor muscles. Together they determine the intraabdominal, intrathoracic, and subglottic pressure, control of which is necessary for generating vocalization. Only humans can speak, because, via the lateral component of the volitional or somatic motor system, they are able to modulate vocalization into words and sentences. For this modulation they use their motor cortex, which, via its corticobulbar fibers, has direct access to the motoneurons innervating the muscles of face, mouth, tongue, larynx, and pharynx. In conclusion, humans generate speech by activating two motor systems. They generate vocalization by activating the prefrontal-PAG-NRA-motoneuronal pathway, and, at the same time, they modulate this vocalization into words and sentences by activating the corticobulbar fibers to the face, mouth, tongue, larynx, and pharynx motoneurons.|000|human speech, language capacity, neurology, neurolinguistics 2678|Fitch2016|For four decades, the inability of nonhuman primates to produce human speech sounds has been claimed to stem from limitations in their vocal tract anatomy, a conclusion based on plaster casts made from the vocal tract of a monkey cadaver. We used x-ray videos to quantify vocal tract dynamics in living macaques during vocalization, facial displays, and feeding. We demonstrate that the macaque vocal tract could easily produce an adequate range of speech sounds to support spoken language, showing that previous techniques based on postmortem samples drastically underestimated primate vocal capabilities. Our findings imply that the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy. Macaques have a speech-ready vocal tract but lack a speech-ready brain to control it.|000|language capacity, speech acoustics, monkeys, language origin 2679|Cser2016|In the first half of the nineteenth century comparative and historical linguistics focused mainly on morphological structure. Although important phonological discoveries were made, phonology played a subsidiary role to morphology. What could be called the models of language were all theories of morphology. These speculations were targets the Neogrammarians attacked vigorously, mainly in the spirit of uniformitarianism. Phonology was different in terms of abstractness. Sounds were treated in a superficially abstract manner, but this was based on the phonetically imprecise littera-tradition, the emphasis on correspondences, the focus on dead languages, and the impact of the Indian tradition. The Neogrammarians, by contrast, strove to make phonology more phonetic and more rigorous and, paradoxically, earned the contempt of their opponents for introducing a different kind of abstractness by reconstructing a segment not attested in unchanged form in any of the Indo-European languages. In turn, while the Neogrammarians admitted that de Saussure’s analysis in the Mémoire is highly logical, they dismissed it as lacking sufficient empirical motivation. It appears that the argument reminded them of the analyses of the previous generation, and de Saussure’s formulation, which they found unduly abstract, was superficially just the kind they wanted to purge linguistics of at last.|000|history of science, phonology, morphology, 19th century, Neogrammarian sound change, ancestral states, unattested character states, 2680|Sen2016|The life cycle of phonological processes (e.g. @BermudezOtero<2015> 2015) provides an account of how a sound change might develop over the history of a language, from its beginnings in the pressures of speaking and hearing, through its progress to a cognitively-controlled process and maturation into a categorical phenomenon, to its final resting-place as a lexical or morphological pattern. It has been the subject of increased research in recent times, but has faced strikingly few challenges to its diachronic aspects, notably its predictions of unidirectionality and cycle-based dialectal splits. Furthermore, the cognitive mechanisms rooted in morpheme-based learning which are required to predict domain narrowing (phrase > word > stem) rather than broadening need to be tested through child (and adult) acquisition studies. This paper examines how a historical phonologist might go about interrogating the life-cycle model using extensive historical data spanning several centuries, and methodically ascertaining what the model predicts in order to know what to look for. The paper concludes by briefly addressing some of the many other questions raised by the model which have faced comparatively little investigation given the purported pervasiveness of the life cycle.|000|life cycle, pathways, phonological change, sound change, 2681|Gao2016|This study focuses on the on-going disappearance of low tone breathiness in Shanghai Chinese. In the change from a voicing contrast to a tone register contrast in Sinitic languages, the ancient voiced series was characterised by a breathy voice quality, which remained as a secondary and redundant cue of low tones in Shanghai Chinese. This study, using transversal production data from 12 young and 10 elderly speakers, shows that low tone breathiness is better preserved by elderly than young speakers, and by male than female speakers. We predict a future loss of this secondary cue, which is speeding up due to the interference with Standard Chinese. We also found that the disappearance is more advanced in female speakers, which might be explained by female speakers’ stronger adherence to Standard Chinese as the prestigious form. Indeed, our young female speakers reported more frequent usage of Standard Chinese than Shanghai Chinese and higher competence in Standard Chinese than in Shanghai Chinese, whereas young male speakers were more confident in their usage of Shanghai Chinese.|000|tonal change, tone language, Shanghainese, breathy voice, speech acoustics, sociolinguistic variation 2682|Seinhorst2016|In the acquisition of phonological patterns, learners tend to considerably reduce the complexity of their input. This learning bias may also constrain the set of possible sound changes, which might be expected to contain only those changes that do not increase the complexity of the system. However, sound change obviously involves more than just pattern learning. This paper investigates the role that inductive biases play by assessing the differences in system complexity of a small number of attested sound changes: the evolution of the obstruent and vowel inventories from Old English to Modern English, and the First Germanic Consonant Shift.|000|sound change, pathways, restrictions, language change, phonological change, language acquisition 2683|Persi2016|Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.|000|orthology, paralogy, protein evolution, protein repeats, protein functions 2684|Clackson2002|n recent years that has been a surge of interest in Indo-European word-formation in general, and in particular in compounding. In the field of Latin linguistics alone there have been three books exclusively on compounds in the past 15 years (Benedetti 1988, Oniga 1988 and Lindner 1996), not to mention many more articles on individual words or aspects of compounding. This interest partly re ̄ects the inherent attractions of the field: compounds often exhibit peculiar developments in phonology (see further Rasmussen's paper in this volume), suxation and meaning, and the internal syntax of compounds can sometimes reveal much about the syntactic patterns of a language. Compounds thus o€er a transitional area between word-formation and other areas of linguistic study. Furthermore, for many of the Indo-European languages, which have been the subject of intense philological scrutiny for the last 150 years, they o€er a relatively untrammelled field. Another reason for the rise in interest in compounding among Indo-Europeanists can be largely attributed to the groundbreaking work of two scholars: Ernst Risch (see in particular Risch 1944/1949 and 1974) and Jochem Schindler (see Schindler 1986, 1987 and 1997), whose publications and teaching have inspired a generation of students to take up topics involving the diachronic and synchronic study of compounds.|000|compounding, Indo-European, composition, compositionality, literature, introduction 2685|Clackson2002|Paper is an introduction to a special issue in the journal, thus containing interesting hints to literature on the topic.|000|compounding, Indo-European, compositionality, composition 2686|Koonin2012|My major incentive in writing this book is my belief that, 150 years after Darwin and 40 years after Monod, we now have at hand the data and the concepts to develop a deeper, more complex, and perhaps, more sat- isfactory understanding of this crucial relationship. I make the case that variously constrained randomness is at the very heart of the entire history of life.|000|biological evolution, randomness, random variation, 2687|Koonin2012|The book is interesting in several respects, as it offers not only an overview on state-of-the-art ideas in biology, but also on the history of ideas. Here's the table of contents: Chapter 1: The fundamentals of evolution: Darwin and Modern Synthesis Chapter 2: From Modern Synthesis to evolutionary genomics: Multiple processes and patterns of evolution Chapter 3: Comparative genomics: Evolving genomescapes Chapter 4: Genomics, systems biology, and universals of evolution: Genome evolution as a phenomenon of statistical physics Chapter 5: The web genomics of the prokaryotic world: Vertical and horizontal flows of genes, the mobilome, and the dynamic pangenomes Chapter 6: The phylogenetic forest and the quest for the elusive Tree of Life in the age of genomics Chapter 7: The origins of eukaryotes: Endosymbiosis, the strange story of introns, and the ultimate importance of unique events in evolution Chapter 8: The non-adaptive null hypothesis of genome evolution and origins of biological complexity Chapter 9: The Darwinian, Lamarckian, and Wrightean modalities of evolution, robustness, evolvability, and the creative role of noise in evolution Chapter 10: The Virus World and its evolution Chapter 11: The Last Universal Common Ancestor, the origin of cells, and the primordial gene pool Chapter 12: Origin of life: The emergence of translation, replication, metabolism, and membranes—the biological, geochemical, and cosmological perspectives Chapter 13: The postmodern state of evolutionary biology |000|neutral evolution, biological evolution, introduction, handbook, history of science 2688|Lupyan2016|Why are there different languages? A common explanation is that different languages arise from the gradual accumulation of random changes. Here, we argue that, beyond these random factors, linguistic differences, from sounds to grammars, may also reflect adaptations to different environments in which the languages are learned and used. The aspects of the environment that could shape language include the social, the physical, and the technological.|000|adaptationism, language evolution, neutral evolution, 2689|Lupyan2016|All languages share several basic design features, such as productivity, categorical deno- tation, and compositionality (see Glossary), which distinguish linguistic systems from both nonhuman communication systems and nonverbal human communication.|649|compositionality, productivity, denotation, 2690|Lupyan2016|A standard account of how different languages form is uncannily similar to the traditional folk tales related by the opening quote. Languages change over time. If everyone always spoke to everyone else, these changes might spread evenly to the entire speech community. However, people are more likely to communicate with their neighbors, thereby ‘inheriting’ their ways of speaking. This asymmetry means that variation within a group will become increasingly decoupled from variation between groups, leading to the eventual formation of dialects and languages [13,14]. An analogous situation arises in biology. Genomes are undergoing constant changes and these changes are more likely to be inherited within a population than between populations. We refer to this divergence due to accumulation of changes as drift (see Glossary for a distinction between the biological and traditional linguistic senses of the word).|650|language change, random drift, adaptation 2691|Lupyan2016|Adaptation: in biological evolution, adaptations are changes to an organism that lead to an increase in the frequency of a trait, generally through an increase in the reproductive success of the organism. In cultural evolution, adaptations are changes that improve the transmissibility (and, hence, frequency) of a cultural trait. In the domain of language, such improvements can be achieved by increased fidelity of transmission, greater learnability, more efficient comprehension, and so on.|650|definition, adaptation, biology, linguistics, cultural evolution 2692|Lupyan2016|Categorical denotation: unlike sensorimotor experiences, which are always specific, words and larger expressions can denote categories rather than specific goals or perceptual events. This property may allow language to ‘transcend the tyranny of the specific’ [84].|650|reference potential, categorical denotation, definition 2693|Lupyan2016|Compositionality: the possibility of recombining a smaller set of units (morphemes) into a larger set of expressions (words, utterances) through structured recombination. Along with combinatoriality (the combining of meaningless segments into meaningful morphemes), compositionality and combinatoriality comprise the ‘duality of patterning’ [85].|650|compositionality, definition 2694|Lupyan2016|Drift: the linguistic notion of drift [14] (directed drift) concerns processes such as phonological shifts and grammaticalization. These processes have predictable direction that is to some extent predictable from principles discovered by variationist linguists [11,13,86]. Such directed drift does not explain linguistic diversity because its directedness implies that languages would converge to a common form. That they do not implies a random component akin to the biological notion of drift (random drift), which refers to changes in a trait caused by random sampling among its variants.|650|drift, Sapir's drift, definition, random drift, biological parallels, 2695|Lupyan2016|Grammaticalization: a process of language change wherein items change from lexical to grammatical meanings; for example, ‘going to’ changing from its original meaning of literal motion to having a grammatical function of marking intention and/or the future. In becoming grammaticalized, items often become reduced in form (hence, ‘going to’ frequently becoming ‘gonna’) [73].|650|grammaticalization, definition, language evolution 2696|Lupyan2016|Productivity: the ability of natural language to convey novel information. Together with compositionality, the productive capacity of language allows hearers to understand and produce novel utterances.|651|productivity, definition, terminology 2697|Lupyan2016|Natural language is strongly constrained by what can be learned by infants [27,28]. An aspect of language that can only be learned by 50% of infants will propagate considerably less well than one that can be learned by all infants. However, not all languages are similarly constrained by what can be effectively learned by adults. Some languages have large populations of non-native (L2) speakers. For example the majority (64%) of English-speakers and most of (90%) of Swahili- speakers are L2 speakers. In contrast, most smaller languages (and some larger ones, such as Turkish and Japanese) are learned almost exclusively by children as native languages [29].|651|language evolution, second language learning, learnability, adaptation 2698|Lupyan2016|Despite the challenges, we believe that viewing languages as adaptations to different environ- ments will help advance our understanding of linguistic diversity, language origins, and the ways in which cultural evolution contributes to shaping the human mind.|658|linguistic diversity, adaptation, adaptationism, 2699|DeLancey2000|Paper discusses the preliminaries for a universal basis of case across languages.|000|case, grammar, universals, typology, 2700|Smith2016|The present study begins with an introduction to “sound symbolism” from a cognitive linguistic perspective. Such frameworks provide that while the relationship between sound and meaning within the lexicon is in general arbitrary, languages may also feature specific, statistically significant relationships between particular sounds and particular semantic domains, whether language specific or motivated by more general cognitive tendencies. An example from modern Mandarin leads to a broader consideration of this phenomenon within Old Chinese (OC), with particular reference to the reduplicative vocabulary, or dieyin ci 疊音詞 , of the Book of Odes corpus. The author presents a series of persistent associations between sound and meaning in this subset of the OC lexicon, with statistical evidence adduced in support of their cognitive reality for OC speakers. Finally, this article offers a tentative exploration of the role of such “expressive” or “ideophonic” vocabulary in producing particular poetic effects relating to point of view and to conceptual metaphor.|000|sound symbolism, Shījīng, phonaestheme, reduplication, rhyme patterns, Old Chinese 2701|Meillet1931|La grammaire comparée doit se faire en utilisant les anomalies—c’est-à-dire les survivances—bien plus que les formes régulières. Les formes qui, à date historique, sont normales sont celles qui ont subi le plus de réfections. Au contraire, les formes fortes et, plus encore, les formes anomales portent témoignage d’états de langue plus lointains; donc plus une forme est anomale, plus il y a chance qu’elle soit une survivance de l’époque de communauté. :comment:`Quoted after` @Viti2014 |194|Antoine Meillet, shared aberrancies, exceptions, internal reconstruction 2702|Viti2014|This paper discusses the problem of linguistic reconstruction in the Indo-European languages with particular attention to syntax. While many scholars consider syntactic reconstruction as being in principle impossible, other scholars simply apply to syntax the same tenets of the Comparative Method and of Internal Reconstruction, which were originally used in Indo-European studies for reconstructing phonology and morphology. Accordingly, it is assumed that synchronically anomalous syntactic structures are more ancient than productive syntactic constructions; the former are considered as being residues of an early stage of Proto-Indo-European, where they were also more regular and took part in a consistent syntactic system. Various hypotheses of Proto-Indo-European as a syntactically consistent language, which in the last years have witnessed resurgence, are here discussed and criticized. We argue that syntactic consistency is nowhere attested in the Indo-European languages, which in their earliest records rather document an amazing structural variation. Accordingly, we reconstruct Proto-Indo-European as an inconsistent syntactic system in the domains of word order, agreement, configurationality, and alignment, and we consider inconsistency and structural variation to be an original condition of languages. Moreover, we make some proposals for the appropriate use of typology in linguistic reconstruction, with some examples of what can or cannot be reconstructed in syntax.|000|syntactic variation, syntactic reconstruction, Indo-European, methodology 2703|Viti2014|A basic principle of linguistic reconstruction is that synchronically anomalous forms represent the relics of an older linguistic stage and that these forms, therefore, are more revealing for a comparativist.|74|linguistic reconstruction, irregular forms, internal reconstruction, methodology, 2704|Viti2014|Meillet (1931) formulated the anomaly principle while comparing thematic and athematic stems in various early IE languages. He expressed the (nowadays commonly accepted) observation that thematic stems are more recent than athematic stems on the basis of two considerations.|75|Antoine Meillet, anomaly principle, linguistic reconstruction, internal reconstruction 2705|Viti2014|This principle has also had the more or less tacitly assumed consequence that synchronically anomalous forms were “normal” in older stages, and that the historical linguist must reconstruct an original linguistic stage in which the later diverse and idiosyncratic forms are rather inserted into a net of interlock- ing regularities within a homogenous system.|74|anomaly principle, methodology, comparative method, irregular forms, linguistic reconstruction 2706|Cromar2015|Motivation: Network biology has emerged as a powerful tool to uncover the organizational properties of living systems through the application of graph theoretic approaches. However, due to limitations in underlying data models and visualization software, knowledge relating to large molecular assemblies and biologically active fragments is poorly represented. Results: Here, we demonstrate a novel hypergraph implementation that better captures hierarchical structures, using components of elastic fibers and chromatin modification as models. These reveal unprecedented views of the biology of these systems, demonstrating the unique capacity of hypergraphs to resolve overlaps and uncover new insights into the subfunctionalization of variant complexes. Availability and implementation: Hyperscape is available as a web application at http://www.compsysbio.org/hyperscape. Source code, examples and a tutorial are freely available under a GNU license. |000|hypergraph, graph theory, network approaches, bipartite network 2707|Hill2014|As is well known, PIE possessed several distinct sigmatic formations with modal or future-like semantics. The paper deals with two sigmatic formations which must be reconstructed for PIE and obviously possessed a similar semantic value. First: a full grade -si̯e/o-formation which is attested in Indo-Iranian, Continental Celtic and Balto-Slavonic; and second, an athematic -s-formation which is attested in Italic and in the Eastern branch of Baltic. The diverging morphology of these formations implies that they originally also differed in their semantics. The problem is that both formations are reflected as simple future tense in all daughterlanguages which preserved them. However, it seems possible to detect the original semantic difference between these formations by using the evidence of the only IE branch which preserved both formations side by side, i.e. Baltic. The paper investigates the morphology of the sigmatic future tense in dialects of Lithuanian and Latvian and shows that for the common prehistory of East Baltic dialects a secondary conflation of originally independent PIE formations—-si̯e/o-formation and -s-formation—in one single paradigm must be assumed. The particular distribution of both formations within the unified paradigm of Proto-East-Baltic makes it possible to obtain information on the lost semantic difference between them. Possible traces of the -si̯e/o-formation in the only recorded West Baltic language, Old Prussian, seem to confirm the conclusions drawn on the basis of the East Baltic evidence.|000|suppletion, semantic reconstruction, Indo-European, 2708|Chirkova2013|This article is an overview of issues in language classification, in particular in connection with three subgroups of the Tibeto-Burman language family: Tibetic, Sinitic and Qiangic. First, I discuss the practical application of currently prevalent classifications of Tibetic, Sinitic and Qiangic languages. Then, with reference to insights from classification practices in biology, I review alternatives for a practical classification in linguistics.|000|language classification, genetic classification, Sino-Tibetan, Tibeto-Burman, methodology 2709|Chirkova2013|To return to the field of linguistics, we can observe that linguistics does not have in its arsenal any analogy to natural classification in biology. Instead, genetic classification has come to be used as an explanatory concept and a general linguistic classification. It is, however, a secondary classification, because it selects few criteria (essentially basic vocabulary in the domain of Tibeto-Burman languages) and it pursues one specific domain of inquiry (the common ancestry of languages). As such, it falls short of accounting for the versatility of empirical data, and it is unable to accommodate new data, as discussed above on the basis of Tibetic, Sinitic, and Qiangic languages. While a significant and intriguing facet of linguistic inquiry, indispensable for the purposes of historical linguistics, it is unsuitable as general linguistic classification.|727|natural classification, biological parallels, genetic classification, dialect classification, Sino-Tibetan, 2710|Lenclud2013|Les langues se plient-elles au test du dénombrement? Non, à l’évidence. Le terme de langue n’est pas un authentique nom nombrable. Cela empêche-t-il le linguiste de leur appliquer la grammaire du même et de l’autre? En théorie oui, dans la pratique non. Les langues représenteraient-elles une catégorie d’entités à part? Aucunement: on ne sait pas davantage compter les cultures ou les nations, ni même, comme on va le voir, les arbres, les bateaux ou les personnes. Faudrait-il alors réviser notre ontologie de sens commun et lui substituer une ontologie de parties temporelles selon laquelle êtres et choses seraient tout ce qui leur arrive? Peut-être. C’était, au demeurant, le point de vue de Saussure pour qui une langue est son histoire. L’adoption généralisée d’une telle ontologie entraînerait toutefois des conséquences déconcertantes. :translations:`Are the languages amenable to enumeration? No, clearly not. [...]`|000|methodology, number of languages, language variation, dimensions of variation 2711|Mair2013|Terminological imprecision, particularly with regard to the Chinese word fangyan and its translation by the English word “dialect,” has resulted in a situation whereby Sinitic language taxonomy may variously be described as chaotic, impene- trable, or functionally absent. For such a large, diverse agglomeration of languages as Sinitic, this is an unacceptable state of affairs. Through rigorous definition and careful analysis, it is possible to arrive at a clearer understanding of the nature of the relationships among the constituent languages of the Sinitic Language Group/Family (SLG/F).|000|genetic classification, Chinese dialects, dimensions of variation, language variation 2712|Chappell2013|In Chinese languages, when a direct object occurs in a non-canonical position preceding the main verb, this SOV structure can be morphologically marked by a preposition whose source comes largely from verbs or deverbal prepositions. For example, markers such as kā 共 in Southern Min are ultimately derived from the verb ‘to accompany’, pau 11 幫 in many Huizhou and Wu dialects is derived from the verb ‘to help’ and bǎ 把 from the verb ‘to hold’ in standard Mandarin and the Jin dialects. In general, these markers are used to highlight an explicit change of state affecting a referential object, located in this preverbal position. This analysis sets out to address the issue of diversity in such object-marking constructions in order to examine the question of whether areal patterns exist within Sinitic languages on the basis of the main lexical fields of the object markers, if not the construction types. The possibility of establishing four major linguistic zones in China is thus explored with respect to grammaticalization pathways.|000|bǎ-construction, object marking, Chinese dialects, cross-linguistic study 2713|Bottero2013|C’est dans la perspective de l’histoire des dictionnaires chinois que j’aborderai ici l’étude du Qièyùn 切韻 et du Kānmiù bǔquē Qièyùn 刊謬補缺切韻. En tant qu’ouvrage fondateur de la tradition des livres de rimes, le Qièyùn occupe avant tout une place fondamentale dans l’histoire de la phonologie chinoise, mais il m’a paru intéressant d’étudier le rôle qu’il a pu jouer dans l’histoire des dictionnaires, étant donné la nouvelle méthode de classement qu’il offrait pour organiser les caractères. J’ai donc tout d’abord cherché à dégager les particularités de ce texte fondateur en partant des plus anciens fragments connus du Qièyùn original et en les comparant aux versions ultérieures. J’ai ensuite étudié le système des gloses propres aux Qièyùn original et celles des versions augmentées jusqu’au Kānmiù bǔquē Qièyùn, afin de mieux comprendre comment ont évolué les livres de rimes à cette époque et comment ils en sont venus à ressembler de plus en plus à de véritables dictionnaires.|000|history of science, Qièyùn, Chinese traditional phonology, yīnyùnxué, historical overview 2714|Zhang2013|This paper begins by defining poetic prosody and discussing various building elements of Chinese poetic prosody, such as rhythmic grouping, tonal assignment, and rhyming patterns. It then distinguishes between poetic prosody itself and the performance of metered texts in order to provide a better understanding of tonal prosody and the nature of rhyming.|000|Chinese poetry, prosody, poetic function, meter, poetry 2715|Li2013|There are markers and processes in hominid’s evolution and cognitive capability that are relevant to the emergence of language. After a long process of creation and concatenation of symbolic signals in their speech, grammatical structures emerged shortly, in a swift, graded and ongoing manner. This dynamic process defies any attempt to impose on it a non-arbitrary discrete marker as the point of origin of language.|000|language origin, language evolution, sign emergence 2716|Wang2013|In this essay I will discuss some aspects of the phylogeny and the ontogeny of language within an evolutionary perspective. The first part of the essay stresses the heterogeneity of language, a view proposed by Weinreich et al. (1968). In contrast to Generative Grammar, which represents linguistic structures in terms of deep derivations, I submit that these structures can be better studied in terms of shallow memorial processes, cf. Bolinger (1961), and Tomasello (2003). This view of shallow memorial processes is consistent with an observation von Neumann (1958) made when he compared the computer with the brain, and with the recent proposals toward flat structure by Culicover & Jackendoff (2005, Chapter 4). The second part begins with a very brief review of recent developments in cognitive neuroscience, and especially of EEG methods. Several experiments are then discussed which reveal the remarkably rich biological resources for learning that the infant brings to the challenge of language acquisition. These include its ability to imitate facial expressions almost immediately at birth, and to distinguish the phonetic features of its native language from non-native languages, etc. These developments are demystifying language acquisition, and steadily laying a solid empirical foundation upon which our understanding of language ontogeny can build.|000|language evolution, neurology, human brain, phylogeny, ontogeny, 2717|Coblin2003|The Chinese dialogues in the Qïngwén qïméng, a Manchu textbook for Chinese readers, provide an extended sample of spoken northern Guânhuà from the mid- eighteenth century. And one version of this text, to be examined here, adds Manchu transcriptional forms for the Chinese text. In the present paper certain phonological, lexical and syntactic features of the form of Chinese underlying the text are examined with specific reference to the development of northern Guânhuà as a koine and to the relationship of this koine to its more prestigious counterpart, the southern Guânhuà of the Nanking area. The paper ends with some thoughts about the route followed by northern Guânhuà as it became the dominant koine variety during the nineteenth century.|000|18th century, Mandarin, Early Mandarin, language history 2718|Coblin2007|Interestingly, however, besides the abovementioned logographic sources, there exists for certain varieties of standard Chinese a corpus of systematic alphabetic records, commencing in the thirteenth century and continuing to the present day. (We exclude here the sizable corpus of Tibetan, Uyghur, and other transcriptional materials of Tang times, since these do not employ systematic orthographic systems.) This alphabetic material begins in 1269 with Chinese recorded in the 'Phags-pa alphabet.|000|Phags-pa alphabet, introduction, handbook, Chinese dialects, language history, linguistic reconstruction, yīnyùnxué 2719|Mathur2016|Phylogenies are the commonly used tools for the prediction of ancestry of present day organisms from the past decades. Several methods have been developed to construct phylogenetic trees that predict the history of species by direct linkage of edges. Very few studies have been developed for the phylogenetic networks (which is the generalization of trees). Presently, the methods used to determine phylogenetic networks are based on distance measures or character measures of the sequences of species. It is a very challenging task for computational biologists to find the exact method that can predict the accurate networks of organisms. In this study, a phylogenetic network construction model based on basic graph theory concepts is reported. This model finds the distance matrix of every sequence considered in the study. The two features (positioning and stack interactions) of every DNA sequence and their combined effect have been taken into account to calculate the distances. Results suggested that reticulate events can be observed by using the distances obtained by the proposed method and no such event is predicted by using the distances calculated by the previous method. The important results obtained in the form of distances are 1.637300, 2.000000, 0.932700, 2.331300, 2.829200 and the significance of these values is to represent the different reticulation events among the sequences for different features. Hence distances calculated by this model gives better insights to study the phylogenetic networks. |000|phylogenetic network, distance measures, sequence similarity, graph theory 2720|Mathur2016|The authors propose essentially a way to quickly compare two sequences without aligning them, but by modeling them as a graph with the nodes representing letters, and the edges between the letters representing the "interaction" between the letters. That is, they essentially use the syntagmatic information in their graph model. This should definitely be tested, as it seems to be straightforward to apply this in the context of linguistics, and it may work actually better than the classical model by @Guy1994, where the author counts all possibilities of interaction without using the graph approach.|000|phylogenetic network, sequence comparison, graph theory 2721|Corel201X|Complex microbial gene flows affect how we understand virology, microbiology, medical sciences, genetic modification and evolutionary biology. Phylogenies only provide a narrow view of these gene flows: plasmids and viruses, lacking core genes, cannot be attached to cellular life on phylogenetic trees. Using bipartite graphs that connect thousands of gene families with thousands of related and unrelated genomes, we can show that biological evolution is a modular process only partially constrained by cells and vertical inheritance. Gene families are recycled by lateral gene transfer, and evolution abundantly copies gene families between completely unrelated genomes, i.e. viruses, plasmids and prokaryotes. In particular among Bacteria, a process of ‘gene externalization’ takes place where genes are copied to mobile elements, mainly driven by gene function. Bipartite graphs give us a view of vertical and horizontal gene flow beyond classic taxonomy on a single infinitely-expandable graph that goes beyond the cellular Web of Life.|000|network approaches, bipartite network, microbial gene flow, 2722|Corel201X|Paper introduces two simple and straightforward approaches to bipartite network analysis: articulation points and twins.|000|articulation point, bipartite network, twins, 2723|Benito2016|In this paper we present a tool aimed to improve the com- prehension of that massive amount of data through visual- ization means, thus trying to help in the reach of meaningful conclusions and the acquisition of valuable insights in easy and fast ways. With it, analysts can discover cultural issues and access them through means of language and visualiza- tion. This is possible thanks to a multidimensional approach to data analysis based on the use of maps, projections and other visualization artifacts. To reach our goal, a team of experts with different backgrounds worked together trying to close the gap between the Humanities and Computer Sci- ences fields through the creation of our prototype and its multiple iterations.|000|visualization, interactive visualization, historical linguistics, dictionary 2724|Haynie2016|The naming of colors has long been a topic of interest in the study of human culture and cognition. Color term research has asked diverse questions about thought and communication, but no previous research has used an evolutionary framework. We show that there is broad support for the most influential theory of color term development (most strongly represented by Berlin and Kay [Berlin B, Kay P (1969) (Univ of California Press, Berkeley, CA)]); however, we find extensive evidence for the loss (as well as gain) of color terms. We find alterna- tive trajectories of color term evolution beyond those consid- ered in the standard theories. These results not only refine our knowledge of how humans lexicalize the color space and how the systems change over time; they illustrate the promise of phyloge- netic methods within the domain of cognitive science, and they show how language change interacts with human perception.|000|color terms, ancestral state reconstruction, Australian languages, denotation 2725|Au2008|In the simulation results of the model, two controversial sound change transition patterns, Neogrammarian regularity and lexical diffusion, can both be found under different conditions. During a shift without fusion of sounds, the pronunciation of the lexical items changes regularly as described in the Neogrammarian hypothesis; during a merger, the spoken forms display a regular pattern as in a shift at the beginning. Then the changing patterns become irregular lexically as described in lexical diffusion, when the two perceptual categories are fusing together. These conditions primarily match with the empirical data reported in literature. Besides the coexistence of these two controversial patterns, the simulation results also support the existence of another controversial phenomenon, the near-merger, in which individual speakers in a population cannot perceptually distinguish two sounds but can produce them differently.|000|lexical diffusion, Neogrammarian sound change, simulation studies, transmission, language change, sound change 2726|Gong2009|In this monograph, I presents a multi-agent computational model to explore a key question in language emergence, i.e., whether syntactic abilities result from innate, species-specific competences, or they evolve from domain-general abilities through gradual adaptations. The model simulates a process of coevolutionary emergence of two linguistic universals (compositionality, in the form of lexical items; and regularity, in the form of constitute word orders) in human language, i.e., the acquisition and conventionalization of these features coevolve during the transition from a holistic signaling system to a compositional language. It also traces a “bottom-up” process of syntactic development, i.e., agents, by virtue of reiterating local orders between two lexical items, can gradually form global order(s) to regulate multiple lexical items in sentences. These results suggest that compositionality, regularity, and correlated linguistic abilities could have emerged as a result of some domain-general abilities, such as pattern extraction and sequential learning.|000|language emergence, syntax, simulation studies, language acquisition, 2727|Jacques2016f|Chinese historical phonology differs from most domains of contemporary linguistics in that its general framework is based in large part on a genuinely native tradition. The non-Western outlook of the terminology and concepts used in Chinese historical phonology make this field extremely difficult to understand for both experts in other fields of Chinese linguistics and historical phonologists specializing in other language families.|000|introduction, Chinese traditional phonology, yīnyùnxué, handbook 2728|Bartlett2009|Syllables play an important role in speech synthesis and recognition. We present sev- eral different approaches to the syllabifica- tion of phonemes. We investigate approaches based on linguistic theories of syllabification, as well as a discriminative learning technique that combines Support Vector Machine and Hidden Markov Model technologies. Our experiments on English, Dutch and German demonstrate that our transparent implemen- tation of the sonority sequencing principle is more accurate than previous implemen- tations, and that our language-independent SVM-based approach advances the current state-of-the-art, achieving word accuracy of over 98% in English and 99% in German and Dutch.|000|phoneme syllabification, syllabification, syllable segmentation, automatic approach, natural language processing 2729|Coblin1983|This book treats Eastern Han sound glosses and provides an early attempt to settle how words were glossed and transcribed in Hàn Chinese.|000|Hàn time, Old Chinese, sound glosses, dúruò, dataset, 2730|Honkola2016|In the thesis, I have studied linguistic divergence with a multidisciplinary approach, applying the framework and quantitative methods of evolutionary biology to language data. With quantitative methods, large datasets may be analyzed objectively, while approaches from evolutionary biology make it possible to revisit old questions (related to, for example, the shape of the phylogeny) with new methods, and adopt novel perspectives to pose novel questions. My chief focus was on the effects exerted on the speakers of a language by environmental and cultural factors. My approach was thus an ecological one, in the sense that I was interested in how the local environment affects humans and whether this human- environment connection plays a possible role in the divergence process. I studied this question in relation to the Uralic language family and to the dialects of Finnish, thus covering two different levels of divergence. However, as the Uralic languages have not previously been studied using quantitative phylogenetic methods, nor have population genetic methods been previously applied to any dialect data, I first evaluated the applicability of these biological methods to language data.|000|Finnish, Uralic languages, quantitative analysis, phylogenetic reconstruction, biological parallels, 2731|Honkola2016|Judging from introduction and content a rather standard-analysis of language data in a rather naive biological approach.|000|biological parallels, phylogenetic reconstruction, Uralic languages 2732|Jacques2016g|Un des problèmes les plus fondamentaux et des plus difficiles de la lin- guistique historique du sino-tibétain est la question de reconstruction de la morphologie. En effet, la famille sino-tibétaine est peut-être celle présen- tant la plus grande diversité typologique de toutes les langues du monde. A côté de langues presque prototypiquement isolantes, telles que le chinois, le karen ou le lolo-birman, on trouve des langues polysynthétiques comme le rgyalronguique ou le kiranti. On trouve aussi des langues d’un degré de complexité morphologique intermédiaire, comme le tibétain ancien, qui bien que dépourvu d’indexation personnelle, a une morphologie très irrégulière et synchroniquement opaque.|000|Sino-Tibetan, typology, linguistic reconstruction, methodology 2733|Jacques2016g|Article discusses the nature of the Sino-Tibetan proto-language, based on evaluating the current nature of its descendants.|000|Sino-Tibetan, nature of the proto-language, 2734|Nakhleh2016|*Bipartition* Division of the complete set of taxa into two nonoverlapping groups (sometimes also called a split). Character A characteristic or trait being measured on a set of taxa for use in a phylogenetic analysis, which displays at least two mutually exclusive character states (e.g., present vs. absent). |264|bipartition, definition 2735|Nakhleh2016|*Character* A characteristic or trait being measured on a set of taxa for use in a phylogenetic analysis, which displays at least two mutually exclusive character states (e.g., present vs. absent).|264|definition, character, 2736|Nakhleh2016|*Gene duplication* The duplication of a block of genetic material, often involving a complete gene or even a whole chromosome|264|gene duplication, definition 2737|Nahkleh2016|*Gene tree* A generic term for a phylogenetic tree derived from data for a single non-recombining sequence block (sometimes loosely referred to as a locus).|264|definition, gene tree, 2738|Nahkleh2016|Graph In phylogenetics, the model used is a graph consisting of nodes (representing taxa) connected by edges (representing their inferred relationships).|264|definition, graph, 2739|Nahkleh2016|*Horizontal gene transfer* Movement of a small piece of a genetic material between unrelated organisms by means other than sexual reproduction.|264|definition, horizontal gene transfer, lateral gene transfer 2740|Nahkleh2016|*Hybridization* The merging of distinct population lineages to produce a new hybrid lineage, achieved by combining the genomes within each organism.|264|hybridization, definition 2741|Nahkleh2016|*Incomplete lineage sorting* Retention of allelic polymorphisms through one or more speciation events, followed by selective loss of some of the alleles (sometimes also called deep coalescence, or ancestral polymorphism).|264|incomplete lineage sorting, definition 2742|Nahkleh2016|*Introgression* Movement of a small piece of a genetic material between related organisms by means of sexual reproduction.|264|introgression, definition 2743|Nahkleh2016|*Most recent common ancestor* The ancestor most recently shared between two or more taxa.|264|definition, most recent common ancestor 2744|Nahkleh2016|Recombination The formation of composite genetic material within an individual by the mixing of parental genes via processes such as crossing-over or re-assortment.|264|definition, recombination 2745|Nahkleh2016|*Sequence alignment* An arrangement of DNA or protein sequences to indicate which nucleotides or amino acids are related by inheritance from a common ancestor, usually with the sequences running horizontally and the related molecules aligned vertically.|264|sequence alignment, definition 2746|Nahkleh2016|*Taxon (plural taxa)* A generic term for any level of the biological hierarchy (e.g., individual, population, species, genus, etc.)|264|taxon, definition 2747|Nahkleh2016|Biological processes that result in scenarios where the evo- lutionary history of a sequence alignment cannot be captured by a single tree can be divided into two categories: treelike discord processes and reticulation processes. Treelike discord processes correspond to ‘trees within a tree’ scenarios (Figure 3 (a)), in which the gene histories are different even though the species history is treelike. Processes such as incomplete lineage sorting (ILS), heterogeneity in mutation rates across characters, and gene duplication/loss belong to this category. Modeling evolutionary histories in the presence of these processes does not require networks, though these processes could be explored using data-display networks, as we discuss below.|266|treelike discord processes, reticulation processes, incomplete lineage sorting, differential loss, definition 2748|Nahkleh2016|Reticulation processes correspond to ‘trees within a net- work’ scenarios (Figure 3(b)), in which the species history itself is not treelike. Processes such as (meiotic) recombin- ation, re-assortment, HGT, introgression, and hybridization belong to this category. When such processes occur, the evo- lutionary history of the sequence alignment takes the form of a phylogenetic network. Here, both the data-display and evo- lutionary networks could be used to explore and model the data (Morrison, 2005; Huson and Bryant, 2006).|266|reticulation processes, treelike discord processes, biological evolution, lateral gene transfer, introgression, hybridization 2749|Nahkleh2016|It is important to note here that the two characters in the example could very well have evolved down a single tree. For example, if each character represents a different site in the genomes of the four taxa, where 0 is A and 1 is C, then, the evolution of the sequence data can be explained by invoking multiple mutations at one or both of the characters. This is a central issue that underlies the construction, analysis, and in- terpretation of this type of network. The fact that the graph is not treelike does not mean reticulation has occurred during the evolution of the taxa; rather, the non-treelike components of the network call for a closer inspection of those parts to understand the conflicts in the data.|267|conflicting data, family tree, reticulation, 2750|PatEl2013|In this paper I have suggested that there are well-tested historical linguistic tools to help us distinguish between internally and externally motivated changes: i) Intermediate stages, where the various stages of a process are attested in one language, while the other shows only the final result. I claimed that the source language is likely the one where the various stages of the process are attested. ii) Consistency across categories, where a change is observed in vari- ous patterns in one language, but is much more restricted in another. I claimed that a language where a pattern is used broadly and its development is moti- vated is the more plausible source language. More generally, it is argued here that historical linguistic tools are adequate to distinguish between internal and external change and to help identify the source of a change, if it is external. [...] With genetically related languages, contact may be harder to detect due to the similarity between the systems; therefore, particular caution should be exercised. Without an in-depth and careful historical and comparative evalua- tion the true picture eludes us and we are left with false impressions. External change is always a clear possibility and should be considered alongside other more traditional historical scenarios.|325|criteria, lexical borrowing, borrowing detection, genetically related languages, 2751|PatEl2013|Several prominent scholars have recently doubted whether it is possible to differentiate borrow- ing from internal change, to the point that in some cases subgrouping is not feasible or is restricted (Dench, 2001; Dixon, 2001). Since a situation of prolonged and intense contact between closely related languages is very common, language contact and its results are a major problem if not a real hazard to historical linguistics. The main practical problem is how to dif- ferentiate internal changes, changes motivated by internal processes, from external changes, changes due to language contact, when the structure of the languages is so similar. In other words, how do we know which linguistic form is the source of the change: one of the attested languages, or the mother of both of them? In this paper, I suggest two preliminary criteria to isolate the source language in cases of contact: 1) the existence of intermediary stages, and 2) an even spread of the change across categories. I will show, using test cases from the Semitic lan- guage family that these criteria can help us distinguish between internal and external changes.|000|lexical borrowing, borrowing detection, Semitic languages, 2752|Coblin2004a|In this paper an effort to reconstruct a phonological proto-system for five central Jiang-Huai dialects is expanded to include a wider range ofdiaiects found to the east and west of the central area. The result, called "common Jiang-Huai,, is designed to account for all currently reported members of the family. Some dis- cussion of subgrouping problems in the family is also included.|000|Jiānghuái, Mandarin, sound system, linguistic reconstruction, Chinese dialects, 2753|Coblin2004a|.. image:: static/img/Coblin2004.png :width: 400px :name: img A diagram of this type adequately represents the overlapping nature of the pertinent sound changes, but it gives no hint of the relative chronology or time depth of these developments. Can anything be said in this respect on the basis of non-linguistic evidence?|764|Venn diagram, isoglosses, shared innovation, genetic classification, Jiānghuái, Chinese dialects, subgrouping 2754|Oettl2015|This study investigated whether formal complexity, as described by the Chomsky Hierarchy, corresponds to cognitive complexity during language learning. According to the Chomsky Hierarchy, nested dependencies (context-free) are less complex than cross-serial dependencies (mildly context-sensitive). In two artificial grammar learning (AGL) experiments participants were presented with a language containing either nested or cross-serial dependencies. A learning effect for both types of dependencies could be observed, but no difference between dependency types emerged. These behavioral findings do not seem to reflect complexity differences as described in the Chomsky Hierarchy. This study extends previous findings in demonstrating learning effects for nested and cross-serial dependencies with more natural stimulus materials in a classical AGL paradigm after only one hour of exposure. The current findings can be taken as a starting point for further exploring the degree to which the Chomsky Hierarchy reflects cognitive processes.|000|Chomsky hierarchy, artificial language learning, language learning, 2755|Oettl2015|Study contains a good explanation on the different types of regular grammar along with their different dependencies, as outlined (but less comprehensible) in @Chomsky1959. .. image:: static/img/oettlimage.png :width: 400px :name: bla The image is useful for understanding the implications here. |000|Chomsky hierarchy, introduction, 2756|Pisani1957|A travers la reconstruction on parvient à des formes dont descendent celles qui nous sont attestées historiquement. Ces formes reconstruites, si elles apparaissent dans deux ou dans plusieurs diverses traditions et ne sont pas surgies indépendemment dans chacune d'elles, constituent des isoglosses qui en partie sont dues à des relatiobs entre parlers i.-e. postérieurement à la dissolution de l'unité primitive, en partie remontent à cette unité. Mais les secondes ne nous donnent pas le tableau d'un système linguistique rigidement un (comme p. ex. le latin classique), dont on puisse partir pour atteindre les origines ultimes de l'i.-e. (racines monosyllabiques, etc.): ce sont là des phantaisies qui se réclament des théories de Schleicher, fondées sur une vision de la langue comme organisme naturel. L'indo-européen est un phénomène historique comparable au “latin vulgaire”; comme celui-ci n'est pas une langue unique, mass un ensemble de dialectes réunis par des isoglosses partielles ou totales et au fond une “ligue linguistique” née de la rencontre de langues diverses qui en ont déterminé l'aspect général et les traits locaux, ainsi on devra penser de l'indo-européen, dans qui, comme le latin de Rome a donné la partie la plus importante et substantielle des traits propres aux dialectes du “latin vulgaire”, une des langues qui y sont confluées (peut-être une phase plus ancienne du sanskrit) peut avoir fourm le schéma fondamental et une bonne partie des formes, qui suivant les régions ont pu être acceptées dans une mesure plus ou moins large. Notre tâche est précisément de chercher à rétablir les variétés de l'unité i.-e. et de retracer les lignes fondamentales selon qui cette unité relative s'est constituée, non de façonner un “indo-européen” d'une seule pièce, qui est un postulat indémontrable.|000|Indo-European, family tree, isoglosses, language variation, nature of the proto-language 2757|Pisani1957|Interesting article discussing the nature of the proto-language as a language of high diversity, and no unity, as reconstruction may make linguists think.|000|comparative method, methodology, nature of the proto-language 2758|Watters2004|This is a comprehensive grammatical documentation of Kham, a previously undescribed language from west-central Nepal, belonging to the Tibeto-Burman language family. The language contains a number of grammatical systems that are of immediate relevance to current work on linguistic theory, including a functionally transparent split ergative system, a well developed system of mirativity, restrictive and non-restrictive noun phrases based on word order, a rich class of derived adjectivals, and extensive transitivity alternations in the verb. Its verb morphology has implications for the understanding of the history of the entire Tibeto-Burman family. The book, based on extensive fieldwork, deals with all major aspects of the language including segmental phonology, tone, word classes, noun phrases, nominalizations, transitivity alterations, tense–aspect–modality, non-declarative speech acts, and complex sentence structure. It provides copious examples throughout the exposition and includes three short native texts and a vocabulary of more than 400 words, many of them reconstructed for Proto-Kham and Proto-Tibeto-Burman. This book will be a valuable resource for typologists and general linguists alike.|000|Kham, Sino-Tibetan, dataset, concept list, grammar, 2759|Coupe2007|This book describes the grammar of Mongsen, one of two major dialects of the Ao language. According to the 1991 Census of India, Ao is spoken by 170,000 people. It is estimated that perhaps forty percent of that number speak Mongsen as their first language, fifty percent speak the prestige Chungli dialect as their first language and the remaining ten percent speak other Ao dialects.|000|Mongsen, Ao, Sino-Tibetan, grammar, word list, 2760|Jatteau2016|Dissimilation is classically considered as a phonetically categorical sound change. In contrast to this assumption, this paper presents evidence for a phonetically gradient pattern of aspiration dissimilation found in Aberystwyth English (Wales): an aspiration feature is consistently reduced in the vicinity of another aspiration feature. Two other patterns of gradient aspiration dissimilation have been reported, in Halh Mongolian and in Georgian, which suggests that it may actually be a more general phenomenon. The Aberystwyth data are however better controlled for phonological contexts and lexical regularity than the Mongolian and the Georgian data. The results can then be discussed in light of the two available theories of dissimilation, Ohala’s (1981) hypercorrection theory, and the traditional link with speech errors. Importantly, a number of arguments support Garrett’s (2015) hypothesis that gradient dissimilation might be a(nother) precursor to complete dissimilation. The pattern thus shows how the use of careful phonetic inspection can lead to a reanalysis of our understanding of well-established diachronic processes. |000|dissimilation, lexical diffusion, Neogrammarian sound change, English, 2761|Petzel2015|This article presents three linguistically analysed and annotated stories in the Kagulu language, together with Kagulu–English and English–Kagulu word lists. Kagulu is a Bantu language, classified as G12, spoken by approximately 340,000 people in the Morogoro region of Tanzania. The objective of the article is to make these linguistic data and stories public, for several reasons. First, there is very little published material in the Kagulu language at all, and no modern English–Kagulu word list. Second, the anthropological stories that are published do not come with annotations, glossing or even a word-for-word translation into Swahili or English, which do not make these texts very meaningful from a linguistic perspective. Thirdly, these stories tell us about Kagulu traditions and can thus be a tool in helping us to understand the culture and identity associated with the language. Finally, it is important that every language is written down, described and published. Undescribed languages run the risk of disappearing, while documenting a language forestalls its loss.|000|Bantu languages, Kagulu, wordlist, grammar sketch, glossed text 2762|Rama2016b|In this paper, we present phoneme level Siamese convolutional networks for the task of pair-wise cognate identification. We represent a word as a two-dimensional matrix and employ a siamese convolutional network for learning deep representations. We present siamese architectures that jointly learn phoneme level feature representations and language relatedness from raw words for cognate identification. Compared to previous works, we train and test on larger and realistic datasets; and, show that siamese architectures consistently perform better than traditional linear classifier approach.|000|neural network, cognate detection, machine learning 2763|Gawne2013|Lamjung Yolmo is an endangered dialect of the Tibeto-Burman Yolmo language, spoken in Nepal. In this thesis I focus on three grammatical features of Lamjung Yolmo: The encoding of modality on copula verbs, question structures and reported speech structures. I explore the grammatical structure of these features and focus on their function from the perspective of social cognition. Social cognition recognises that human language is deeply embedded in interaction, and that this interaction is situated within a larger interpersonal and cultural context.|000|Lamjung Yolmo, Sino-Tibetan, grammar sketch 2764|Handel201X|An article on Chinese character classificatin, discussing the role of the Huìyì category in Xǔ Shèn's work.|000|Chinese characters, Chinese character formation, Huìyì, Xǔ Shèn, liùshū, Shuōwén Jiězì 2765|Gawne2016|This book provides the first grammatical description of the Lamjung variety of Yolmo, a Tibeto-Burman language spoken in Nepal. The volume outlines key ethnographic information about the speakers of Lamjung Yolmo, including an account of the historical migration from the Melamchi Valley to low hills in the Lamjung District. The relationship to other Yolmo varieties, including that spoken by the main population in the Melamchi Valley, and the Syuba variety spoken in Ramechhap, is outlined, as well as its place within the Central Bodic branch of Tibeto-Burman. The focus of the volume is the grammatical description, which encompasses the major features of the language. The chapter on phonetics and phonology includes discussion of the vowel and consonant inventories, as well as the lexical tone system. The parts of speech chapter includes argumentation for the existence of word classes including nominals, verbs, adjectives, adverbs postpositions, interjections, discourse markers and honorifics. The chapter on the noun phrase includes discussion of pronominal forms, articles and case-marking. The verb phrase chapter includes discussion of tense, aspect and modality, including the evidential distinctions made in the language. The final chapter looks at features of clause structure, including relative clauses, complement clauses, nominalisation, clause combining questions and reported speech. A collection of interlinearised texts is also included.|000|grammar, Lamjung Yolmo, Sino-Tibetan, 2766|Bradley2002|A rather lengthy and probably informative description of general problems in Sino-Tibetan (Tibeto-Burman) language classification and subgrouping.|000|Sino-Tibetan, subgrouping, genetic classification, Tibeto-Burman 2767|Beckwith2002a|Potentially interesting article that introduces the irregularity as a major problem of Sino-Tibetan etymologies.|000|Sino-Tibetan, etymology, irregularity of sound change, sound change, regular sound change, regularity, methodology 2768|Willis2007|This dissertation is a description of Darma, an under-documented Tibeto-Burman language spoken in the eastern corner of the state of Uttarakhand, India. With fewer than 2,600 speakers and no writing system, Darma is considered endangered. This is the most comprehensive description of Darma to date.|000|Darma, Sino-Tibetan, Tibeto-Burman, grammar, glossary, word list 2769|Hyow2013|The origin of Hyow, one of the Kuki-Chin languages, in the current study of the language family is not a conclusive one. This study aims to compare the previous findings and the research data of Hyow that is collected from Chittagong Hill Tracts in Bangladesh. Based on the shared innovations of the Kuki-Chin languages and the findings of my research, I have tried to draw a conclusive origin of Hyow in the Southern Kuki-Chin branch. The comparative study reveals that Hyow should be aligned with Khumi and Cho-Asho rather than be kept under Asho of the Southern-Plains-Chin group.|000|Hyow, Kuki-Chin, Sino-Tibetan, subgrouping, 2770|Blackburn2008|The stories in this book come from a high valley in the Indian state of Arunachal Pradesh. Located at fi ve thousand feet in the eastern Himalayas, the Apatani valley is about four miles long and two across, although thin fi ngers of land stretch a little further on the eastern perimeter. Paddy fi elds cover every available square foot, with islands of gardens, bamboo groves and millet patches on higher ground. Hundreds of brown wooden granaries, propped up on thick posts, squat like square boxes in the fi elds. The mountains crowd in on all sides, their dark green slopes rising two or three thousand feet above the valley fl oor. To the north, the snow peaks of the high Himalayas are visible only from outside the valley and only on a clear day. To the south, beyond the bustling administrative centre hacked out of the jungle in the 1970s, tall ridges stand between the fertile valley and the fl ood plain of Assam.|000|orality, Sino-Tibetan, Apatani Valley, tales, tribal tales, Himalayan tales, 2771|Coblin2000|Article presents the phonological system of Míng time Guānhuà. |000|Míng time, Guānhuà, Mandarin, Early Mandarin, Chinese dialects 2772|Epps2013|The most important shift in this field has been the attempt to identify and isolate what motivates and facilitates the transfer of linguistic features in the languages or speaker populations involved. One of the major issues dis- cussed in the context of contact is the question of linguistic structure and what influence typological and structural similarity has on the extent of borrowing.|210|genetic relationship, language contact, lexical borrowing, borrowability 2773|Epps2013|hese considerations highlight the two-fold problem raised by the issue of contact among genetically related languages: on the one hand, how are we to distinguish between the outcomes of inheritance and contact; on the other, how might the dynamics of contact-induced change actually vary according to the degree of language relationship?|210|borrowability, genetic relationship, 2774|Epps2013|Paper introduces a special issue in the Journal of Language Contact, focusing on language contact among closely genetically related languages.|000|genetic relationship, language contact, borrowability, special issue, introduction 2775|Marck2000|Book introduces general issues in the subgrouping of Polynesian languages.|000|Polynesian, introduction, subgrouping, handbook 2776|Gerber2016|Presentation presents evidence for genetic links between Kiranti languages and Lhokpu, a Tibeto-Burman isolate.|000|Lhokpu, Kiranti, Sino-Tibetan, genetic classification, subgrouping 2777|Goldwater2005|Unsupervised learning algorithms based on Expectation Maximization (EM) are often straightforward to implement and provably converge on a local likelihood maximum. However, these algorithms often do not perform well in practice. Common wisdom holds that they yield poor results because they are overly sensitive to initial parameter values and easily get stuck in local (but not global) maxima. We present a series of experiments indicating that for the task of learning syllable structure, the initial parameter weights are not crucial. Rather, it is the choice of model class itself that makes the difference between successful and unsuccessful learning. We use a language-universal rule-based algorithm to find a good set of parameters, and then train the parameter weights using EM. We achieve word accuracy of 95.9% on German and 97.1% on English, as compared to 97.4% and 98.1% respectively for supervised training.|000|syllable segmentation, syllabification, phoneme syllabification, German, English, Dutch, CELEX, 2778|Stephenson2016|New compilations of records of ancient and medieval eclipses in the period 720 BC to AD 1600, and of lunar occultations of stars in AD 1600–2015, are analysed to investigate variations in the Earth’s rate of rotation. It is found that the rate of rotation departs from uniformity, such that the change in the length of the mean solar day (lod) increases at an average rate of +1.8 ms per century. This is significantly less than the rate predicted on the basis of tidal friction, which is +2.3 ms per century. Besides this linear change in the lod, there are fluctuations about this trend on time scales of decades to centuries. A power spectral density analysis of fluctuations in the range 2–30 years follows a power law with exponent −1.3, and there is evidence of increased power at a period of 6 years. There is some indication of an oscillation in the lod with a period of roughly 1500 years. Our measurements of the Earth’s rotation for the period 720 BC to AD 2015 set firm boundaries for future work on post-glacial rebound and core–mantle coupling which are invoked to explain the departures from tidal friction.|000|earth rotation, ancient texts, importance of historical linguistics, outreach, popular science, nice quote 2779|Whelan2017|Molecular evolution can reveal the relationship between sets of homologous sequences and the patterns of change that occur during their evolution. An important aspect of these studies is the inference of a phylogenetic tree, which explicitly describes evolutionary relationships between homologous sequences. This chapter provides an introduction to evolutionary trees and how to infer them from sequence data using some commonly used inferential methodology. It focuses on statistical methods for inferring trees and how to assess the confidence one should have in any resulting tree, with a particular emphasis on the underlying assumptions of the methods and how they might affect the tree estimate. There is also some discussion of the underlying algorithms used to perform tree search and recommendations regarding the performance of different algorithms. Finally, there are a few practical guidelines, including how to combine multiple software packages to improve inference, and a comparison between Bayesian and Maximum likelihood phylogenetics.|000|phylogenetic reconstruction, introduction, handbook, evolutionary biology, 2780|Whelan2017|Very detailed introduction to tree inference, explaining important concepts in biology.|000|phylogenetic reconstruction, family tree, introduction, handbook 2781|Whelan2017|Both approaches share the limitations of their assumptions about models and data, so that there is no guarantee about what will happen when things go wrong. Furthermore, conflicting minority signals in the data are ignored by virtue of the fact that a tree is produced as the final estimate of the phylogeny. Biological processes that follow a non-tree-like structure, such as hybridiza- tion and gene flow, can create such signals, which may be important.|370|Bayesian approaches, Bayesian inference, maximum likelihood, problem 2782|Suzuki2011|Sogpho Tibetan is a Khams Tibetan variety spoken in Danba Cowtty in western Sichuan, China, a region that borders the Tibetan and rGyaJrong speaking area and Han China. This paper explores the phonological system and provides an analysis of its peculiar phonetic and lexical features.|000|Sogpho Tibetan, Sino-Tibetan, Tibetan dialects, Tibetan, sound inventories, 2783|Whorf1950|Article introduces Whorf's radical idea of linguistic relativity.|000|Sapir-Whorff hypothesis, Benjamin Lee Whorf, Hopi, linguistic relativity, founding paper 2784|Tsang2017| Background Archaeobotanical remains of millet were found at the Nan-kuan-li East site in Tainan Science Park, southern Taiwan. This site, dated around 5000–4300 BP, is characterized by remains of the Tapenkeng culture, the earliest Neolithic culture found so far in Taiwan. A large number of millet-like carbonized and charred seeds with varied sizes and shapes were unearthed from the site by the flotation method. Since no millet grain was ever found archaeologically in Taiwan previously, this discovery is of great importance and significance. This paper is in an attempt to further analyze these plant remains for a clearer understanding of the agricultural practice of the ancient inhabitants of the Nan-kuan-li East site. Result We used light and scanning electron microscopy to examine the morphological features of some modern domesticated and unearthed seeds to compare and identify the archaeobotanical remains by three criteria: caryopsis shape, embryo notch, and morphology of lemma and palea. We also developed a new methodology for distinguishing the excavated foxtail and broomcorn millet seeds. Conclusion Two domesticated millet, including broomcorn millet (Panicum miliaceum) and foxtail millet (Setaria italica), as well as one wild millet species, yellow foxtail (Setaria glauca), were identified in the unearthed seeds. Together with the millet remains, rice was also cultivated in the area. Archaeological evidence shows that millet and rice farming may have been important food sources for people living about 5000 years ago in southern Taiwan. |000|domestication, nutrition, Taiwan, Austronesian, foxtail millet 2785|Jurafsky1996|The problems of access—retrieving linguistic structure from some mental grammar —and disambiguation—choosing among these structures to correctly parse ambiguous linguistic input—are fundamental to language understanding. The literature abounds with psychological results on lexical access, the access of idioms, syntactic rule access, parsing preferences, syntactic disambiguation, and the processing of garden-path sentences. Unfortunately, it has been difficult to combine models which account for these results to build a general, uniform model of access and disambiguation at the lexical, idiomatic, and syntactic levels. For example, psycholinguistic theories of lexical access and idiom access and parsing theories of syntactic rule access have almost no commonality in methodology or coverage of psycholinguistic data. This article presents a single probabilistic algorithm which models both the access and disambiguation of linguistic knowledge. The algorithm is based on a parallel parser which ranks constructions for access, and interpretations for disambiguation, by their conditional probability. Low-ranked constructions and interpretations are pruned through beam-search; this pruning accounts, among other things, for the garden-path effect. I show that this motivated probabilistic treatment accounts for a wide variety of psycholinguistic results, arguing for a more uniform representation of linguistic knowledge and for the use of probabilistically-enriched grammars and interpreters as models of human knowledge of and processing of language.|000|access, disambiguation, algorithms, parsing, probabilistic model, language and communication 2786|List2017a|The Fāngyán 方言 (‘Dialect[s]’ or ‘Topolect[s]’), usually attributed to Yáng Xióng 揚雄 (53 BCE– 18 CE), a famous fù 賦 -poet and philosopher, is a collection of dialectal and regional (includ- ing non-sinitic) expressions compiled during the end of the Western Hàn period (206 BCE– 9 CE, Norman 1988:185). It is the fijirst attested study on linguistic geography and dialectology in China, possibly even worldwide (Wáng Lì 1980 [2006]:17, Lǐ Shùháo 2004:1). Published under the baroque title “Dialectal Expressions from For- eign States and Glosses on Words from Extinct Eras Collected by the ‘Light Chart Offfijicials’ ” (Yóuxuān shǐzhě juédài yǔ shì biéguó fāngyán 輶軒使者絕代語釋別國方言 , for details regard- ing the title see Behr 2005:23 and n. 36), the work is a remarkable witness of early linguistic diversity in China, and it is usually assumed that the modern Chinese term fāngyán 方言 ‘dialect’ goes back to the title of the work.|000|Yáng Xióng's Fāngyán, Chinese dialectology, Chinese dialects, introduction 2787|List2017b|Contraction refers to phonological processes by which a sequence of sounds that constitutes one or more words is reduced or fused (Trask 1996:92). The reduction may be accompanied by additional changes of the sound segments; these changes usually belong to the family of lenition processes (Kuo 2010; Bauer 1988). From the perspective of the process itself, contrac- tion in Chinese does not difffer notably from contraction in other languages.|000|introduction, Chinese, contraction, phonology, phonetics, 2788|Meyer2017|The date of the first permanent human occupation of the high Tibetan Plateau has been estimated at about 3600 years ago, when agriculture became established. Meyer et al. used several dating techniques to analyze sediments at a high-altitude site (4270 m) where human handprints and footprints have been found. Their analysis indicates occupation of the plateau 7400 years ago and possibly earlier. These dates are consistent with the genetic history of Tibetans and suggest that a permanent preagricultural peopling of the plateau was enabled by the wetter regional climate at that time. Current models of the peopling of the higher-elevation zones of the Tibetan Plateau postulate that permanent occupation could only have been facilitated by an agricultural lifeway at ~3.6 thousand calibrated carbon-14 years before present. Here we report a reanalysis of the chronology of the Chusang site, located on the central Tibetan Plateau at an elevation of ~4270 meters above sea level. The minimum age of the site is fixed at ~7.4 thousand years (thorium-230/uranium dating), with a maximum age between ~8.20 and 12.67 thousand calibrated carbon-14 years before present (carbon-14 assays). Travel cost modeling and archaeological data suggest that the site was part of an annual, permanent, preagricultural occupation of the central plateau. These findings challenge current models of the occupation of the Tibetan Plateau.|000|peopling of the Tibetan Plateau, peopling of South-East Asia, population, population genetics, Sino-Tibetan 2789|Calloway2017|Interesting article about the origin of freed slaves on an island between Southern America and Africa. It turns out, that the genetic analyses do not reveal the origin of the freed Slaves, which has been forgotten by now. This could be due to missing genetic coverage on Africa in general, as the authors think, but why should it not be due to errors in the methods? Or do genetecists really work that fail-proof?|000|slavery, population genetics, archaeogenetics 2790|Ao1993|Thesis presents the Nántōng 南通 variety of Chinese, belonging to the Wú dialect group. The thesis presents the phoneme inventory and offers interesting analyses and thoughts on phonological theory in general.|000|Chinese, Chinese dialects, Nántōng dialect, phonology, phoneme inventory 2791|Ao1993|One problem with the distributionalist approach to phonemic analysis involves the non-uniqueness condition first noted by Chao (@1934). For example, since the Mandarin alveopalatals are in complementary distribution with the velars, one might assume that they are allphones of the velars, so that surface [tc\i], [tc\y], etc. are underlyingli /ki/, /ky/ etc. However, since the alveopalatals are also in complementary distribution with the alveolar sibilants, one might also assume that they are allophones of the alveolar sibilants, so that surface [tc\i], [tc\y] etc. are underlyingly /tsi/, /tsy/ etc. Since the putative derivation of the alveopalatals is motivated by their distribution but not supported by any morphophonemic alternation, it is not possible to determine which analysis is correct. As a result, there can be no unique underlying representation of morphemes containing alveopalatals. Such a condition is undesirable, because it implies that phonological representations could be arbitrarily determined, and thereby undermines the validity of phonemic analysis. We can avoid this condition by assuming that the alveopalatals are underlying, not derived. Obviously, doing so would require a departure from the distributionalist tradition. |50|distributional phonology, complementary distribution, phonological theory, Nántōng dialect, Chinese, 2792|Ao1993|:comment:`Argues in length against the distributionalist position on phonology, interesting and worth a read.`|49-53|complementary distribution, distributional phonology, phonological theory, Chinese 2793|Chao1934|In reading current discussions on the transcription of sounds by phonemes, one gets the impression of a tacit assumption that given the sounds of one language, there will be one and only one way of reducing them to a system of phonemes which represent the sound-system correctly. Since different writers do not in fact agree in the phonemic treatment of the same language, there arise then frequent controversies over the 'correctness' or 'incorrectness' in the use of phonemes. The main purpose of the present paper is to show that given the sounds of a language, there are usually more than one possible way of reducing them to a system of phonemes, and that these different systems or solutions are not simply correct or incorrect, but may be regarded only as being good or bad for various purposes. |000|phonology, phonological theory, complementary distribution, distributional phonology, Chinese, 2794|Chao1934|Very valuable paper that brings up the attention to one huge problem of distributional phonology, namely the tendency to make things as abstract as possible. |000|Chinese, phonology, phonological theory, complementary distribution, distributional phonology 2795|Kniebe2016|Interesting review of the 2016 movie "Arrival". Could be used for a lecture on the topic.|000|lecture material, movie, linguistics, introduction, language documentation 2796|Lu2009|Having a set of valid and scientific methods is the essential condition to the development and prosperity of academics. The ancient rhyme study in Manchu dynasty was a scientific and systematical study and the breakthrough made at that time was decided by a set of methods induced by the scholars in Manchu dynasty. This article combines the research of the rhyme of Jinwen Shangshu to introduce a series of scientific and valid methods of the ancient rhyme study.|000|introduction, Chinese historical phonology, rhyme analysis, 2797|Peust2014|Rhyme, like other characteristics of poetic language, belongs to the least explored fields within linguistics. I suggest that these topics would profit from being explored by linguists and that information on them should be rou- tinely included into the grammatical description of any language. This article attempts to outline a typology of poetic rhyme. “Rhyme” is defined as the phonetic identity of sections within text strings (“lines”). Languages vary in whether the identity is conventionally located in the beginning, the middle or the end of lines. The latter choice ( end rhyme) is now the by far most common type, but its present near-global distribution seems to be the result of recent language contact. Major typological parameters of end rhyme include the size of the sections at the line ends that are required to be identical, as well as the partition of the sound space implied in the notion of “identity”, here called “ rhyme phonology”, which can differ from the partition of the sound space by ordinary phonology. Finally, end rhyme in Egyptian is discussed, where this technique became current only by the Late Coptic period. Being a tradition relatively independent from the better known European rhymes, Coptic rhyme provides some exotic features which are of considerable typological interest.|000|rhyme patterns, Egyptian, cross-linguistic study, end rhyme, 2798|Peust2014|Very valuable overview on different end rhyme systems across different languages. The author claims that end rhyme developed late through the history of languages, and that apparently transfer across cultures played a role during its spread.|000|end rhyme,rhyme patterns, cross-linguistic study, introduction 2799|Minkova1993|Article is interesting in so far as it gives some overview on how rhyme evidence is being used in the historical reconstruction of Middle English and other stages of the language.|000|Middle English, English, rhyme evidence, rhyme patterns, 2800|Lu2014|Tangshu Shiyin was produced in BeiSong dynasty,it’ s phonetic certainly embodies the actual voice phenomenon of Song dynasty. In this paper,we obtain most of the phonetic annotation performance reflects on speech characteristics of the Song Dynasty in Jiangxi dialect through a detailed study of Tangshu Shiyin.|000|rhyme patterns, Běisōng dynasty, Chinese, Chinese dialects, rhyme evidence 2801|Lu2014|Whether this paper is valuable has to be checked, but it may present an interesting resource to study certain rhyme patterns in later stages of Chinese.|000|rhyme evidence, Chinese, Běisōng dynasty, rhyme patterns 2802|Cristofaro2010|Semantic maps are usually assumed to describe a universal arrangement of different conceptual situations in a speaker's mind as determined by perceived relations of similarity between these conceptual situations. This paper provides a number of arguments that challenge this view, based on various types of evidence from processes of semantic change and synchronic implicational universals. The multifunctionality patterns described by semantic maps may originate from processes of form-function recombination in particular contexts rather than any perceived similarity between individual conceptual components. These patterns may also originate from the fact that a particular functional principle leads to the association of a particular construction type with different conceptual situations, independently of any specific relation between these conceptual situations as such. A number of synchronic and diachronic phenomena pertaining to the very structure of individual semantic maps further reveal that, even if one assumes that these provide a representation of similarity relations between different conceptual situations, they do so only to a limited extent.|000|semantic map, colexification, methodology, mental representation 2803|Cristofaro2010|It is useful to give this paper a proper read in the context of colexification studies, as it argues against a certain interpretation of semantic maps, namely, that they reflect mental states, and the like.|000|mental representation, mental state, semantic map, colexification 2804|Jaeger2005|I define a slip of the tongue as a one-time error in speech production planning; that is, the speaker intends to utter a particular word, phrase, or sentence, and during the planning process something goes wrong, so that the production is at odds with the plan. It is not simply a misarticulation (e.g. stuttering or mumbling), a lack of knowledge or memory slip (where the speaker doesn't know the correct word or can't remember it at the moment), or a false start (where the speaker changes his or her mind about the propositional content of the utterance). Speakers themselves will consider the utterance to contain an error, and will often correct it immediately, sometimes with commentary on the silliness of what they have said. Extending this definition to child language, the crucial premise of the current study is that a slip of the tongue cannot be made on a structure unless that structure has already been learned or acquired. This is because an utterance cannot be considered to be an error from the child's point of view, unless the child has a standard within his or her own system by which to judge the utterance. Note that this definition of 'error' is different from one commonly used in child language studies, in which the standard by which the child's utterance is judged is the adult model (e.g. Locke 1980). Of course children produce many utterances which are different from the adult model because they haven't yet mastered the adult forms. However, if slips of the tongue made by children are to be considered the same phenomenon as adult slips, then only the child's current knowledge is relevant as a standard by which to judge the utterance.|000|slips of the tongue, child language, introduction 2805|Jaeger2005|Book investigates and presents a corpus of examples for slips of the tongue in early child language. Given that, as the author notes in the preface, it was long considered that children could not produce slips of the tongue, this book seems to offer ample evidence against this claim.|000|child language, slips of the tongue, introduction 2806|Arakawa2001|There is a unique Tangut material in Institute of Oriental Studies (St.-Petersburg branch), inventory No. 4166 titled San shi shu ming yan ji wen. It has some rhyming verses which would contribute to the study of Tangut phonology. The verses were accompanied by the comment 'pay attention to sound (rhyme)'. This kind of materian has not been well-known except the report of Nishida (1976). The aim of this paper is linguistic reanalysis of Tangut rhyming poetry in view of the Tangut phonology basd on the rhyme dictionaries. |000|Tangut, rhyme patterns, rhyme analysis, rhyme evidence 2807|Yamauchi2016|Linguistic typology provides features that have a potential of uncovering deep phylogenetic re- lations among the world’s languages. One of the key challenges in using typological features for phylogenetic inference is that horizontal (spatial) transmission obscures vertical (phylogenetic) signals. In this paper, we characterize typological features with respect to the relative strength of vertical and horizontal transmission. To do this, we first construct (1) a spatial neighbor graph of languages and (2) a phylogenetic neighbor graph by collapsing known language families. We then develop an autologistic model that predicts a feature’s distribution from these two graphs. In the experiments, we managed to separate vertically and/or horizontally stable features from unstable ones, and the results are largely consistent with previous findings.|000|typology, phylogenetic reconstruction, 2808|Yamauchi2016|Yet another paper that makes the failure of interpreting typological identity as providing enough hints for assessing homology of items.|000|typology, phylogenetic reconstruction, 2810|Francois2016a|The Comparative method is commonly hailed as a solid methodology for comparing genetically related languages, and for reconstructing the history of their linguistic systems. Equally common is the assumption that the results of its analyses are best displayed in the form of a tree, or *Stammbaum:* starting from a common protolanguage, its linguistic descendants should form neatly separated branches and subgroups, eachof which should be defined by a set of exclusively shared innovations. Teh expectation -- or at least the hope -- is that the historical innovations reflected in modern members of a family should be distributed in nested patterns,so as to fit a cladistic representation of that family. This belief is reflected in the vast popularity of the tree model in works of historical linguistics up to this day. The present paper aims at separating these two lines of thought, by showing that the strength of the Comparative Method does not necessarily entail the validity of the tree model which has been so often associated with it since the Neogrammarians. In fact, I will even propose that the CM provides precisely the analytical tools necessary to demonstrate the limitations of the tree model. Indeed the method rests on principles of consistency and regularity of sound change, which allow the linguist to conduct rigorous demonstrations in the identification of innovations for each language, and in the reconstruction of words' histories. As each innovation is assigned a set of modern languages, it becomes possible to assess how nested (and thus how tree-like) their distribution is in the family.|000|family tree, historical glottometry, tree-likeness, conflicting data, wave theory 2811|Oestling2015|In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology. In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available —which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world. Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.|000|word alignment, parallel corpus, bible corpus, word order 2812|Poulisse1999|This book provides a large corpus on L1 and L2 slips of the tongue as well as some introduction into the theoretical background.|000|slips of the tongue, second language learning, corpus studies, dataset 2813|Schuessler2002|Paper gives an overview on apparent inconsistencies of aspirated sounds in the system of Old Chinese, claiming that aspirates only evolved as a secondary process.|000|aspirated initial, Old Chinese, linguistic reconstruction, phonology, internal reconstruction 2814|Hill2017|Article gives an overview on the life of Walter Simon (1893–1981).|000|biography, sinology, Walter Simon 2815|Ruse2016|Cultural evolution has always been a bit iffy. Other than those poisoned by religious extremism, every- one accepts biological evolution. Usually people are prepared to extend this to cosmology. We speak without tension of the evolution of the galaxy and that sort of thing. But culture is different. There is of course change. In the Middle Ages there were serfs. Today there are graduate students. But does one want to speak of the change as “evolution- ary” and even more does one want to speak of the change as “Darwinian”? And then there is the ques- tion of progress. Biological evolutionists tend to be a bit tense on the topic, and when it comes to cul- ture we are even more so. There was a change from the world of serfs to the world of graduate students. But was it progress?|509|progress, evolution, cultural evolution, biological evolution, 2816|Ruse2016|Cambridge philosopher Tim Lewens’s spritely little book, Cultural Evolution, although a little ahis- torical, is a splendid introduction to the topic, look- ing in careful detail at much of the discussion today. That said, I detect an odor of anti-Darwinism that seems almost fashionable among philosophers these days, although indeed it is a tradition that goes back a long way to such major thinkers of a century ago like G. E. Moore and Bertrand Russell. Lewens is rightly critical of such crude analogizing from bio- logical selection as one finds in so-called memetic theory where cultural units battle with cultural units. This fails if only because it seems to lead to no new unexpected predictions, the mark of great science.|509|cultural evolution, biological parallels, biological evolution, analogy 2817|Shapiro2016|One significant difference is that Biblical scholars in particular are not really sure what they are trying to do. For the believer, the goal would seem to be to work back through the additions and emendations to arrive at an “original” or “pure” text. Short of find- ing the actual written notes of the Evangelists—there almost certainly never were any, and the Gospels were first set down years after Jesus’ death—this is an impossible task in the formal sense. Putting aside Jeffersonian biases about what Jesus should have said, is phylogenetic reconstruction likely to get us closer to what he really did say? We reconstruct phylogeny in the firm belief that there is a true phylogeny to reconstruct, and our project is thus much more self-contained and self-defined than that of the phi- lologist poring over ancient papyri.|495|ontology, epistemology, reconstruction, stemmatics, methodology 2818|Erfani2013|This is a study of the effects of language contact on the structure of Azeri, a minority language spoken in Iran. Azeri, the second largest language in Iran, is a Turkic language, but it is heavily influenced by the national language Persian, an Indo-European language. Turkic languages are head-final: in noun phrases, modifiers appear before head nouns. In contrast, Persian is head-initial: modifiers follow head nouns. Notably, Azeri allows both head-final and head-initial structures. A field study conducted with ten Azeri speakers in Tabriz, Iran, revealed that in noun compounds the two types of structures are used almost equally. However, older and monolingual speakers prefer the head-final structure, while younger, educated bilingual speakers prefer the head- initial structure. This shows that Azeri is becoming persified in this domain, as predicted in such situations of language contact involving a politically-dominant language. However, all speakers accept head-final structure, showing the persistence of Turkic morphosyntax despite a millennium of intense social and cultural contact with Persian.|000|Turkish, Persian, loan word, compound words, compounding 2819|Hale2003|A method is presented for calculating the amount of information conveyed to a hearer by a speaker emitting a sentence generated by a probabilistic grammar known to both parties. The method applies the work of Grenander (1967) to the intermediate states of a top-down parser. This allows the uncertainty about structural ambiguity to be calculated at each point in a sentence. Subtracting these values at successive points gives the information conveyed by a word in a sentence. Word-by- word information conveyed is calculated for several small probabilistic grammars, and it is sug- gested that the number of bits conveyed per word is a determinant of reading times and other measures of cognitive load.|000|information content, speaker-listener-model, cognitive load, algorithms, probabilistic model 2820|Hintze2010|Background: Much work in systems biology, but also in the analysis of social network and communication and transport infrastructure, involves an in-depth analysis of local and global properties of those networks, and how these properties relate to the function of the network within the integrated system. Most often, systematic controls for such networks are difficult to obtain, because the features of the network under study are thought to be germane to that function. In most such cases, a surrogate network that carries any or all of the features under consideration, while created artificially and in the absence of any selective pressure relating to the function of the network being studied, would be of considerable interest. Results: Here, we present an algorithmic model for growing networks with a broad range of biologically and technologically relevant degree distributions using only a small set of parameters. Specifying network connectivity via an assortativity matrix allows us to grow networks with arbitrary degree distributions and arbitrary modularity. We show that the degree distribution is controlled mainly by the ratio of node to edge addition probabilities, and the probability for node duplication. We compare topological and functional modularity measures, study their dependence on the number and strength of modules, and introduce the concept of anti-modularity: a property of networks in which nodes from one functional group preferentially do not attach to other nodes of that group. We also investigate global properties of networks as a function of the network's growth parameters, such as smallest path length, correlation coefficient, small-world-ness, and the nature of the percolation phase transition. We search the space of networks for those that are most like some well-known biological examples, and analyze the biological significance of the parameters that gave rise to them. Conclusions: Growing networks with specified characters (degree distribution and modularity) provides the opportunity to create surrogates for biological and technological networks, and to test hypotheses about the processes that gave rise to them. We find that many celebrated network properties may be a consequence of the way in which these networks grew, rather than a necessary consequence of how they work or function.|000|growing network, graph theory, modularity, assortativity, 2821|Hintze2010|This paper might be useful for simulation studies, but it is not yet clear what they actually mean by "growing networks".|000|growing network, graph theory, 2822|Prokics2012|A SHIBBOLETH is a pronunciation, or, more generally, a variant of speech that betrays where a speaker is from (Judges 12:6). We propose a generalization of the well-known precision and recall scores to deal with the case of detecting distinctive, characteristic variants when the analysis is based on numerical difference scores. We also compare our proposal to Fisher’s linear discriminant, and we demonstrate its effec- tiveness on Dutch and German dialect data. It is a general method that can be applied both in synchronic and diachronic linguis- tics that involve automatic classification of linguistic entities.|000|shibboleth, dialectology, automatic approach, German dialects, Dutch dialects 2823|Rama2016c|Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group. In this paper, we apply a sequence-to-sequence autoencoder to learn a deep representation for words that can be used for meaningful comparison across dialects. In contrast to the alignment-based methods, our method does not require explicit alignments. We apply our architectures to three different datasets and show that the learned representations indicate highly similar results with the analyses based on Levenshtein distance and capture the traditional dialectal differences shown by dialectologists.|000|dialectology, sequence alignment, autoencoder, machine learning, German dialects, Dutch dialects, 2824|@Lewis2016a|Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data.|000|Bayesian inference, phylogenetic information content, 2825|Lewis2016a|Note that this approach is discussed in this blog post: * http://phylonetworks.blogspot.de/2017/01/why-do-we-need-bayesian-phylogenetic.html Note also that Morrison says that the usefulness of this approach is questionable, as SplitsTree and NeighborNet do about the same in less time.|000|phylogenetic information content, Bayesian inference 2826|Robbeets2017|Article gives an interesting overview on different kinds of evidence (linguistic, archaeological) for the peopling of South East Asia, with a specific focus on Altaic.|000|Altaic, Trans-Eurasian, archaeology, peopling of South-East Asia 2827|Onysko2016|This article explores the hypothesis that bilingual knowledge of different compounding patterns can influence the interpretation of a set of novel English noun-noun compounds. The focus of the study is on bilingual speakers who are fluent in two typologically diverse languages: te reo Māori (postmodifying) and English (premodifying). A comparison of bilingual and monolingual participant groups indicates that Māori-English bilingual speakers more frequently rely on the Māori structure of left-headed compounding in their meaning interpretation of English compounds. The implications of these results are discussed in terms of the cognitive process of transfer and additional means of meaning association in bilingual speakers.|000|language contact, English, compounding, word formation 2828|Jacques2017a|Chinese historical phonology difffers from most domains of contemporary linguistics in that its general framework is based in large part on a genuinely native tradition. The non-Western outlook of the terminology and concepts used in Chinese historical phonology make this fijield extremely difffijicult to understand for both experts in other fijields of Chinese linguistics and historical phonologists specializing in other lan- guage families.|000|Chinese traditional phonology, introduction 2829|Callies2016|In recent years, there has been a growing interest in contrastive and comparative approaches to word-formation. While typological and cross-linguistic research on lexical morphology has been on the research agenda for quite some time and is perhaps now revitalized by the publication of Word-formation in the world’s languages. A typological survey (Štekauer et al. 2012), it is the field of contrastive word-formation, a hitherto underexplored if not neglected research area, that has recently caught researchers’ attention (Fernández-Domínguez et al. 2011). Lefer (2011) provides a general overview of studies in contrastive word-formation and addresses some key issues relevant to the field, focusing specifically on the proposal of a new contrastive methodology for the cross-linguistic study of word-formation based on a dynamic conception of the tertium comparationis and the use of empirical data drawn from both comparable and translation corpora.|000|overview, cross-linguistic study, word formation, contrastive linguistics, 2830|Hume2016|In this paper we propose that insight into the unmarked nature of these patterns can be gained when we take seriously the view of language as a system of information transmission. In particular, we suggest that perceptually weak and strong unmarked patterns are those that effectively balance two competing properties of effective communication: (a) the contribution of the phonological unit in context to accurate message transmission, and (b) the resource cost of the phonological unit (see Hall, Hume, Jaeger & Wedel, forthcoming). In order to demonstrate this, we begin in §2 by describing key properties of language as a system of information transmission. In the following section, we turn to the predictions that follow from such a system for perceptually weak and strong sound patterns and show why it makes sense for them both to be described as unmarked.|000|markedness, predictability, systemic processes, context 2831|Hall2016|Based on a diverse and complementary set of theoretical and empirical findings, we describe an approach to phonology in which sound patterns are shaped by the trade-off between biases supporting message transmission accuracy and resource cost. We refer to this approach as Message-Oriented Phonology. The evidence suggests that these biases influence the form of messages, defined with reference to a language's morphemes, words or higher levels of meaning, rather than influencing phonological categories directly. Integrating concepts from information theory and Bayesian inference with the existing body of phonological research, we propose a testable model of phonology that makes quantitative predictions. Moreover, we show that approaching language as a system of message transfer provides greater explanatory coverage of a diverse range of sound patterns.|000|sound patterns, phonology, information theory, language and communication, phonological theory, markedness 2832|Muenning2017|Lù Zhìwéi was an important linguist, psychologist, and educator who laid the foundations of psychology in China, a Christian convert involved in the governance of Yenching University, and a member of the Chinese Academy of Science. His main achievements in linguistics lie in historical phonology, as well as in lexicology and morphology.|000|Lù Zhìwéi 陸志韋, biography 2833|Harbsmeier2016|The book under review summarises and develops many decades of painstaking research in the early history of the pronunciation of the Chinese language. It is the result of the collaboration between two influential linguists. An examination of the methodology deployed in this book and the philological evidence it is based on reveals very serious shortcomings of many kinds that invite further discussion. For example, the very nature of Bernhard Karlgren’s contribution to the field is misconstrued as being concerned with phonology, when in fact Karlgren was a vociferous opponent of phonology throughout his long life; there is a complete failure to problematise and properly consider the very concept of “Old Chinese”: the literature on Dialectology of Old Chinese is never considered; the analysis of derivation by tone change is quite inadequate; the discussion of first Old Chinese first person pronouns is basically ill-informed. Most importantly, the methodology is unacceptably conjectural throughout.|000|review, Old Chinese, phonology, linguistic reconstruction 2834|Harbsmeier2016|Review of @Baxter2014 with a generally negative tone.|000|Old Chinese, phonology, linguistic reconstruction, phonological reconstruction, review 2835|Kuhn2010|Dissertation describes and introduces "controlled English", an idea to make language both human- and machine-readable. In this context it may be interesting as an example for general human-machine interaction, computer-assisted frameworks, and also for programming languages.|000|controlled English, knowledge organization, programming languages, human- and machine-readable, human-machine interaction, computer-aided approaches 2836|Estivalet2016|Stem processing is an essential phase in word recognition. Most modern Romance languages, such as Catalan, Italian, Portuguese, Romanian, and Spanish, have three theme vowels that define verbal classes and stem formation. However, French verbal classes are not traditionally described in terms of theme vowels. In this work, stem formation from theme vowel and allomorphic processes was investigated in French verbs. Our aim was to define the verbal stem formation structure processed during mental lexicon access in French. We conducted a cross-modal experiment and a masked priming experiment on different French stem formation processes from the first and third classes. We compared morphology-related priming effects to full priming obtained through identity priming, as well as to no priming obtained through a control condition. Stems from the first and third classes with a theme vowel presented full priming, whereas stems from the third class with allomorphy presented partial priming in both experiments. Our results suggest root-based stem formation for French. Verbs are recognized through word decomposition into stem and inflectional suffixes, and stem processing is based on root, theme vowel, and allomorphic processes. These results support a single-mechanism model with full decomposition and pre-lexical access defined by morphological rules.|000|stem formation, morphology, word formation, verbal inflection, French, Romance, 2837|Schindelin2005|Good general overview on some of the most frequently addressed quantitative aspects of Chinese language and writing. Table of contents: 1. Vorbemerkung zur chinesischen Sprache und Schrift 2. Zur Anzahl der chinesischen Schriftzeichen 3. Zur Zahl der Morpheme des modernen Chinesisch 4. Wortschatz und Wortartenverteilung im modernen Chinesisch 5. Zur Verteilung von Strichzahl, Graphemzahl und Strukturtyp im Schriftzeicheninventar 6. Zur Verteilung von Wort- und Satzlängen im modernen Chinesisch 7. Das Zipfsche Gesetz und die chinesische Schrift 8. Die Entropie der chinesischen Schrift 9. Das Menzerathsche Gesetz und die chinesische Sprache und Schrift 10. Köhlers Basismodell der Lexik und das chinesische Schriftzeichensystem 11. Schlusswort 12. Literatur (in Auswahl)|000|Chinese, Chinese writing system, quantitative analysis, introduction, overview 2838|Balasubrahmanyan2005|Introduction to entropy, information, and complexity from a linguistic viewpoint. Table of contents: 1. Introduction 2. Origin of the concept of entropy Entropy in communication theory 3. Language discourses, Zipf’s law, and other entropies 4. Gell-Mann’s characterization of complexity 5. Schroedinger’s work on the extension of Boltzmann-Gibbs-Shannon entropy 6. Entropy in isolated systems and systems far from equilibrium 7. Entropy, complexity, and the physics of information: some general remarks 8. Literature (a selection)|000|complexity, entropy, information theory, introduction, overview 2839|Dermatas2005|Introduction to grapheme-phoneme conversion. Table of contents: 1 Introduction 2 Dictionary look-up methods 3 Rule-based methods 4 Hidden Markov models 5 Neural networks 6 Hybrid systems 7 Literature (a selection)|000|grapheme-to-phoneme, automatic approach, overview, introduction 2840|Anderl2017|This is an overview on medieval Chinese syntax.|000|language history, Chinese, syntax, Middle Chinese, overview, introduction 2841|Anderl2017b|“Northwestern Medieval Chinese” (NWMC) here refers to the variety (or possibly varieties) of Chinese spoken in and around the Héxī 河西 Corridor (situated in today’s Gānsù Province) in the northwest of the Yellow River in late- and post-Táng times (roughly 9th–12th centuries CE). Connecting the Tarim Basin with Northern China, the corridor constituted an important part of the Northern Silk Route, with Dūnhuáng 敦煌 (or Shāzhōu 沙州 ) as its most important center. Consequently, the variety of Chinese spoken throughout this area is also known as the Shāzhōu Dialect (e.g., Coblin 1988) or alter- natively Héxī Dialect (e.g., Takata 1988a).|000|Northwestern Medieval Chinese, Chinese, language history, syntax 2842|Shen2016|This paper discusses the origin of Middle Chinese *j*, more exactly 愈声母 in 4th division 四等.|000|Old Chinese, linguistic reconstruction, phonology, Middle Chinese, Middle Chinese divisions, 2843|MacklinCordes2015a|This study extracts a finer-grained level of phonotactic variation using segment frequencies and the Markov chain probability of transitions between segments. Using a number of modern, phylogenetic indices, the study demonstrates that phonotactics do, in spite of the traditional Australianist view, reflect the phylogenetic history of languages.|000|phonotactics, Australian languages, computational approaches, automatic approach, phylogenetic reconstruction 2844|MacklinCordes2015a|The study describes the use of a script that phonemicises text, which is very interesting, but unfortunately not published officially. It further seems to rely on simple segmental transitions, thus ignores larger dependencies in the sound sequences.|000|Australian languages, phonotactics, transition, phylogenetic reconstruction, automatic approach 2845|Si2016|Traditional ecological knowledge recorded as part of a language documentation program can include valuable information on the presence or absence of plant and animal species in a given locality. Such data have the potential to inform biodiversity surveys at a local or landscape scale. In this study, bird names were recorded in six languages spoken around the town of Aungban in Shan State, Myanmar. A checklist of local birds was first compiled using online sources, and pictures and recordings of the calls of over 250 species were presented to native speakers to elicit bird names. A statistically significant correlation was found between the number of languages in which a bird was named, and the frequency with which it was sighted by ornithologists in a recently published study at a nearby location. Native speakers provided historical information on birds that were once present near their villages, and it was also possible to obtain indications of small-scale differences in the ranges of some birds. While there were some noteworthy mismatches between the number of sightings of some birds and the number of names recorded in the target languages, the findings indicate that overall, a language-documentation-based survey of bird species occurrence can provide valuable biodiversity information in a quick and cost-effective manner.|000|flora and fauna, concept list, denotation 2846|Starostin2016c|The paper presents a brief assessment of “Nostratic” – the controversial, but promising hypothesis on deeper linguistic connections of the Indo-European family, as envisaged by Vladislav Illich-Svitych and his contemporaries (particularly Aharon Dolgopolsky). We discuss some of the most important developments of the theory since the 1960s, and explain how emphasis on “quan- tity over quality” of data in the new huge corpora of “Nostratic” comparanda is less useful for advancing the hypothesis than a narrowly targeted emphasis on identifying the “core” evidence for the macrofamily. Identifying this “core” evidence, consisting of a small, but generally stable layer of the basic lexicon, is necessary to lend a more historically realistic flavor to the hypothesis, and its statistical evaluation will also help better understand the place of Indo-European among the other potential constituents of “Nostratic”. We argue that, in weighing the evidence, typological plausi- bility of semantic shifts and absence of topological conflicts in the tree are no less important than regularity of sound changes. We also show how the credibility level of various theories on the exter- nal connections of Indo-European can be arranged along a gradient – from “Indo-Uralic” to a gen- eral “Nostratic”, and indicate implications that such an arrangement may hold for future studies.|000|Nostratic, Indo-European, long-range comparison, historical overview, 2847|Starostin2016c|Paper is an interesting methodological contribution, as they use the short 50-item vocabulary list to compare contemporary languages and check how well their relatedness can be proven without further evidence.|000|concept list, methodology, comparative method, proof of relationship 2848|Hill2015b|Tangut is among a handful of Trans-Himalayan 1 languages with an early date of attestation and a vast literature. First recorded in 1042 C.E., Tangut is more recent than Chinese (c. 1200 B.C.E) and Tibetan (650 C.E.), but older than Burmese (1113 C.E.). With the loss of the Tangut polity to the Mongols in 1227 C.E., the language gradually declined, with the most recent known text from 1499 C.E. The decipherment of Tangut became possible after Pyotr Kozlov excavated a sizable number of documents at Khara-Khoto in 1909 and transferred them to St Petersburg. The language and its literature are now reasonably well understood and actively researched. Nonetheless, the diachronic development of the language has garnered scant attention. The work under review treats the phonology and morphology of Tangut within a comparative context. Jacques makes particular reference to Japhug Rgyalrong, a spoken language of our day. By any standard, the methodological rigor and philological sophistication of this work is outstanding. The author has mastery over Tangut philology and its attendant secondary literature, written in French; the work consults research in Russian, Chinese, and Japanese. Tangut texts are cited at first hand and lucidly presented. In addition, Jacques brings the results of his extensive fieldwork on Japhug Rgyalrong to bear throughout.|000|review, Tangut, Sino-Tibetan, 2849|Wang2004a|Text elaborates on the distinctions between rhymes from the Qièyùn and later rhyme dictionaries, like Guǎngyùn, etc. This offers a solid basis for a more finegrained study on *fǎnqiè* phonology using network approaches.|000|fǎnqiè, Middle Chinese, Guǎngyùn, Qièyùn, rhyme patterns 2850|Blevins2013|Over the past decade, information theory has been applied to the analysis of a successively broader range of morphological phenomena. Interestingly, this tradition has arisen independently of the linguistic applications of information theory dating from the 1950’s. Instead, the point of origin for current work lies in a series of studies of morphological processing in which Kostić and associates develop a statistical notion of ‘morphological information’ based on ‘uncertainty’ and ‘uncertainty reduction’. From these initial studies, analyses based on statistical notions of information have been applied to general problems of morphological description and typological classification, leading to a formal rehabilitation of the complex system perspective of traditional WP models.|000|paradigms, word formation, word derivation, morphology, inflection, 2851|Blevins2013|Words paradigm models are the target of this article, which gives an overview on their development. What this means in concrete is classical grammar which assembles words into classes for specific paradigms (inflection classes, like "o-declension" in Latin). The author contrasts this approach, which treats word paradigms as complex systems with a leading exemplar (think of ablative in Latin) from which other forms are derived, with other approaches to morphology.|000|word paradigm morphology, WP morphology, morphology, paradigms, inflection 2852|Blevins2016|Book gives an overview on word paradigm morphology and develops an apparently powerful model out of it, which may allow to handle inflection more properly, especially with regard to linguistic practice and computer-assisted approaches.|000|word paradigm morphology, overview, morphology, paradigms, inflection 2853|Blunden2012|Book gives an introduction on various aspects of current theory on concepts across, apparently, different scientific branches. *TOC* * The psychology of concepts * classical theory of concepts * reflections on Aristotle * prototypes, exemplars, and ideals * theory theory and semantic networks * analytical approaches * problems of analytical approaches * analysis * sociocultural turn * Narratives and Metaphors * narrative turn * metaphors, models, analogy * analogy in creating concepts * conceptual change and linguistics * Piaget * Thomas Kuhn's sociology of science * misconceptions and conceptual change * linguisitcs * Wittgenstein * Robert Brandom on concepts * introduction * Brandom's theory of concepts * Brandoms critique of the psychology of concepts * critique of Brandom's theory * conclusion * Where are we now with concepts * though-forms and mental images * networks, plots, categories, theories and institutions * conclusion * Hegel on concepts (long part of the book) * From philosophy to the human sciences * Vygotsky Book is definitely worth a thorough read, containing many interesting aspects around the concept of concepts.|000|Lev Vygotsky, concepts, semantics, philosophy of science, overview, introduction, 2854|Lee2012|In 1994, two batches of unearthed Chu 楚 bamboo slips from the Warring States period (Zhanguo shidai 戰國時代, 475–221 B.C.), totaling 1200 pieces and featuring over 35000 characters, appeared in the cultural relic market of Hong Kong. These slips, highly valuable for philological research, cover nearly 100 kinds of ancient Chinese classic. They were soon collected by the Shanghai Museum for restoration and analysis, and have, since 2001, been published in successive volumes entitled Shanghai bowuguan cang Zhanguo Chu zhushu 上海博物館藏戰國楚竹書, eight volumes of which have been issued to date.|000|Warring States, Zhànguó shídài 战国时代, excavated manuscripts, ancient texts, Old Chinese 2855|Yong2008|An interesting book on Chinese lexicography, which gives an historical overview and seems to be quite complete, worth being consulted when discussing the history of linguistics.|000|Chinese lexicography, history of science, lexicography, introduction, overview, Chinese linguistics, 2856|Hoehna2016|Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic- graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com.|000|Bayesian approaches, Bayesian inference, phylogenetic reconstruction, software 2857|Schackow2015|This work is the first comprehensive description of the Yakkha language (iso-639: ybh), a Kiranti language spoken in Eastern Nepal. The primary focus of this work is on the dialect spoken in Tumok village. The grammar is intended to serve as a reference to scholars interested in lin- guistic typology and comparative studies of Tibeto-Burman and Himalayan lan- guages in general, and also as a foundation for members of the Yakkha commu- nity to aid future research and activities aiming at documenting and preserving their language. The grammar is written in a typological framework. Wherever possible I have tried to incorporate a historical perspective and comparative data in explaining how a particular subsystem of the grammar works. For the sake of reader friendli- ness and to ensure long-term comprehensibility, the analyses are not presented within any particular theoretical framework, and terms that strongly imply a particular theory have been avoided as far as this was possible. Preparing a grammar can be a simultaneously satisfying and frustrating task, both for the same reason: the sheer abundance of topics one has to deal with, which makes grammars very different from works that pursue more specific questions. Necessarily, a focus had to be set for this work, which eventually fell on morphosyntactic issues. Verbal inflection, transitivity, grammatical rela- tions, nominalization, complex predication and clause linkage are dealt with in greater detail, while other topics such as phonology, the tense/aspect system and information structure leave much potential for further research. Since this is the first grammatical description of Yakkha, I have decided to include also the topics that are analyzed in less detail, in order to share as much as possible about this complex and intriguing language.|000|Sino-Tibetan, grammar, description, Yakkha, Kiranti, introduction 2858|Schaefer2017|Die hier vorliegende Untersuchung beschreibt grammatische Strukturen einer fiktionalen Sprache, die auf Imitationen der jiddischen Sprache fußt und zum festen Inventar jüdischer Figurendarstellung im 19. Jahrhundert gehört. Dieses sogenannte Literaturjiddisch“ (Richter 1995) ist eine im (langen) 19. Jahrhun- ” dert 1 weit verbreitete literarische Modeerscheinung, ohne die kaum ein literari- scher Text auskommt, der mit jüdischen Figuren arbeitet.|000|linguistic imitation, fictional language, Jiddish, German, literature 2859|Jucquois1966|On voit également comment la théorie de Kuryłowicz, défendue notamment dans ses *Études indo-européennes*, pp. 1 à 26, et dans *L'apohonie en indo-européen*, pp. 356 à 366, qui propose de considérer les «gutturales» labiovélaires comme secondaires par rapport aux «gutturales» pures, trouve dans ces données statistiques une confirmation nouvelle. :translation:`One can also see how the theory of Kuryłowicz [...] which proposes to treat the labio-velars as secondary in comparison with the pure velars, finds a new confirmation in these statistical data.`|61|statistical analysis, Indo-European, labio-velars, root cognates 2860|Jucquois1966|Study presents a statistical account on roots in Indo-European, which are investigated for the frequency of consonants and vowels in dependence of their positions in the roots. The study seems to draw on @Pokorny1959, and gives many tables with interesting statistics. Whether those are safe is a difficult question, but it is an early investigation beyond simple intuition and needs mentioning.|000|Indo-European, root structure, root cognates, statistical analysis, etymology, phonotactics 2861|Valas2010| **Background** The wealth of prokaryotic genomic data available has revealed that the histories of many genes are inconsistent, leading some to question the value of the tree of life hypothesis. It has been argued that a tree-like representation requires suppressing too much information, and that a more pluralistic approach is necessary for understanding prokaryotic evolution. We argue that trees may still be a useful representation for evolutionary histories in light of new data. **Results** Genomic data alone can be highly misleading when trying to resolve the tree of life. We present evidence from protein abundance data sets that genomic conservation greatly underestimates functional conservation. Function follows more of a tree-like structure than genetic material, even in the presence of horizontal transfer. We argue that the tree of cells must be incorporated into any new synthesis in order to place horizontal transfers into their proper selective context. We also discuss the role data sources other than primary sequence can play in resolving the tree of cells. **Conclusions** The tree of life is alive, but not well. Construction of the tree of cells has been viewed as the end goal of the study of evolution, where in reality we need to consider it more of a starting point. We propose a duality where we must consider variation of genetic material in terms of networks and selection of cellular function in terms of trees. Otherwise one gets lost in the woods of neutral evolution. |000|tree of life, structural data, web of life, discussion 2862|Malkiel1975|The diligent amassing of data that some early -twentieth-century etymologists practiced, as if to compensate for the element of arbitrariness or caprice at the moment of decision-making, has grated on the nerves of those younger linguists who have been spoiled or, if you prefer, immunized, as regards any excessive commitment to data-gathering, by structuralism and related movements.|104|structuralism, data collection, objectivity, subjectivity, etymology 2863|Malkiel1975|Now the majority of 'pure etymologists' -- those who aim to embark on a dictionary venture, for instance -- cultivate a certain aloofness from grammatical preoccupations; consequently, as they forge ahead, leaping nimbly from one word history beset with unkowns to another, even less transparent, they tend to sweep these uncertainties under the rug (or to be satisfied with citing isolated parallels), instead of exposing the gaps to full view and volunteering to investigate them with the equipment of a well stocked linguistic workshop.|119|etymology, problem, critics, philological tradition, word history, 2864|Malkiel1954|It may seem strange against the background of this avid search for new techniques of grouping that no systematic inquiry has, to my knowledge at least, yet been made into ways and means of extracting etymologically useful information from the configuration of word families. Modern linguistics stresses the fundamental kinship of inflection and syntax which traditional teachings used to separate quite sharply. Also it has become empirically know that word formation (that is, derivation and composition) which, with inflection propoer, makes up morphology in the Indo-European languages, is best studied and practiced jointly with etymology. [...] In monographs and dictionaries alike [...] the consistent grouping of words by families has almost become standard practice. |265|etymology, word family, word history 2865|Malkiel1954|In combing the immense volume of etymological literature, one finds fleeting allustions and casual hints at certain varieites of derivational and compositional hierarchy, but surely no attempt at organized typology. |266|etymology, word history, word family, word formation 2866|Crist2005|Many lexical markup schemes ignore etymological information. This is often a perfectly reasonable design choice. For many applications (text-to- speech, part of speech tagging, machine translation, etc), this kind of information is of no obvious use.|000|etymology, markup, representation, modeling, formal linguistics 2867|Whitney2002|This led to the resuscitation of a doctrine originally articulated by Jacob Grimm (1785- 1863), which became the rallying cry of the anti-Neogrammarian 'resistance': "every word has its history and lives its own life" ("jedes Wort hat seine Geschichte und lebt sein eigenes Leben", @Grimm<1819>:xiv).|50|nice quote, word history, history of science 2868|Koerner1990a|In referring to comparative anatomy as a model science, Grimm is obviously following Friedrich Schlegel's program of @1808 (cf. @Koerner1980:215-16), but Grimm, more than anyone before him, emphasizes the historical approach, even to the extent of asserting that "jedes Wort hat seine Geschichte und lebt sein eigenes Leben" (@Grimm1819:xiv).|13|word history, nice quote, Jacob Grimm, history of science 2869|Grimm1819|Diese Sprachkünstler scheinen nicht zu fühlen, daß es kaum eine Regel gibt, die sich steif überall durchführen läßt; jedes Wort hat seine Geschichte und lebt sein eigenes Leben, es gilt daher gar kein sicherer Schluß von den Biegungen und Entfaltungen des einen auf die des andern, sondern erst das, was der Gebrauch in beiden gemeinschaftlich anerkennt, darf von der Grammatik angenommen werden. |xiv|Jacob Grimm, nice quote, word history, history of science 2870|List2017c|The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection—although not perfect—could become an important component of future research in historical linguistics.|000|automatic cognate detection, LingPy, gold standard, dataset 2871|Sagart2014|This paper responds to all of Malcolm Ross’s criticisms, published in Language and Linguistics 13.6 (2012), of Sagart’s numeral-based model of Austronesian phylogeny (Sagart 2004). It shows that a part of these criticisms is addressed to an invented version of Sagart’s model, while another appeals to questionable princi- ples. It points out various errors of fact and interpretation. It also criticizes Ross’s own account of the evolution of early Austronesian numerals, showing that it has little explanatory power, fails to account for phonological irregularities, and cannot explain the observed nesting pattern among numeral isoglosses. Finally, this paper shows that Tsouic, a Formosan subgroup which contradicts Ross’s phylogeny, is valid.|000|Australian, phylogenetic reconstruction, genetic classification, subgrouping, numerals 2872|Kemmerer2016|With the aim of exploring some connections between semantic typology and cognitive neuroscience, this paper focuses on the following simple but provoca- tive argument: (i) Premise 1 (from semantic typology): Concrete word meanings vary greatly across languages. (ii) Premise 2 (from cognitive neuroscience): Concrete word meanings are, to a large extent, grounded in sensory and motor brain systems. (iii) Inference: The high-level representations in one’s sensory and motor brain systems are shaped, in part, by the typologically unique lexical-semantic properties of one’s language. I first elaborate each step of this argument, then I present some initial support for its validity, and finally I mention some future directions.|623|denotation, perception, cognition, semantics, sensory-motor 2873|Kemmerer2016|Beginning with body part terms, their most important semantic component is probably form, i.e., geometric information about the canonical contours and boundaries of the designated entities. Given that the human body is a single object with a continuous surface, the purpose of a mereological nomenclature is to allow speakers to refer to spatially delimited body segments. Current evidence suggests that, crosslinguistically, the most common principle of body segmentation is visual discontinuity (Enfield et al. 2006). For instance, most languages rely on the visual discontinuities of joints to distinguish between different parts of the limbs (e.g., upper arm, forearm, hand, finger).|624|sensory-motor concepts, body parts, cross-linguistic study, semantics, denotation 2874|Fertig2016|This evidence makes it clear that the relatively poorly understood mecha- nisms known to be at work in contamination and folk etymology should not be marginalized by restricting them to interactions between words that are semanti- cally and phonetically but not morphologically related. Thorough investigation of these long-neglected mechanisms could contribute greatly to our understanding of morphological and phonological change.|452|paradigm levelling, morphological change, tendencies 2875|Fertig2016|Historical linguists have long been divided in their views about the mechanisms behind paradigm leveling, with many invoking a special mechanism related to a universal preference for paradigm uniformity while others attribute leveling to the same mechanism responsible for other types of analogical change. I argue that although ‘proportional’ analogical innovation plays a major role in paradigm leveling, it cannot account for all cases, and that something akin to the ‘interference’ mechanisms commonly associated with contamination and folk etymology account well for the non-proportional instances. I further show that all of the mechanisms involved in paradigm leveling are also implicated in other types of analogical change, and I argue against the need to posit any universal bias against (stem) allomorphy.|000|paradigm levelling, tendencies, morphological change 2876|Fertig2016|Paradigm leveling is a common kind of morphophonological development in many languages. A well-known example, shown in (1), involves the alternation between diphthongal and monophthongal root vowels that arose in a number of verbs in Old French – depending on whether stress fell on the root or on the fol- lowing syllable in a given form – and was subsequently eliminated (‘leveled’) in favor of the reflexes of the Old French diphthongs|423|French, paradigm levelling, examples, definition 2877|Fertig2016|One fundamental issue concerns the relevant definition of ‘paradigm’. There are two basic dimensions to morphological structure. On one axis, we can group [pb] ogether the inflected forms of a given word (e.g., break, breaks, breaking, broke, broken) and lexemes based on the same root (break, breakable, unbroken, etc.); on the other axis, we can group forms of different words that correspond in one or more grammatical categories (e.g., broke, ate, walked, etc. or breakable, wearable, readable, etc.). Many linguists identify paradigms exclusively with the first of these dimensions (e.g., Harris 1973: 67; McCarthy 2005: 172; cf. Kenstowicz 2005 for discussion), and quite a few further restrict their notion of paradigm to inflectionally related forms (e.g., McCarthy 2005: 174; Albright 2011: 1972). That the tendency to level alternations is generally stronger within inflectional paradigms than among derivationally related words has long been widely recognized (Paul 1886: 169; Kiparsky 1972: 208).|427f|inflection classes, systemic processes, paradigm levelling, morphological change 2878|Fertig2016|Very interesting paper, making a good point and seems to offer the state of the art in theory on paradigm levelling. The implications for analogy in general and other cases of analogy in historical linguistics are important and should be taken into account when discussing this in other contexts.|-|paradigm levelling, morphological change, introduction, analogy 2879|AlonsoDeLaFuente2016|Discussions in the Altaic debate usually revolve around a couple of salient phonological features and a very restricted set of lexical items, and these are repeated over and over again, most often without advancing new arguments or ideas. Grammar-related issues are rarely tackled. In view of these circumstances, verb morphology might provide an ideal battlefield to bring into the picture inno- vative, original proposals. Moreover, due to some of the complexities involved, general linguists and typologists may find new areas of interest.|000|review, Altaic, morphology 2880|AlonsoDeLaFuente2016|Interesting review that also gives some interesting theoretical discussion points regarding the way we reconstruct languages. In general dismissive of the Altaic hypothesis proposed by @Robbeets2015 (which can't be judged from a non-experts perspective), the points on reconstruction practice and comparison with practice of linguistics in Indo-European are quite interesting.|000|review, Altaic, morphology 2881|AlonsoDeLaFuente2016|More generally, MR does not address the question of whether grammatical- izations can be used as evidence in favour of genealogical relatedness (nor the possibility that some of them may be the result of chance similarity; see a concise discussion on p. 493). If we can recover the lexical value of a given morpheme, it means among other things that in historical terms such a morpheme may belong to a recent layer of the grammar. This idea is supported by the nature of the sound correspondences which can be deduced from the table on pp. 485–486: they are all trivial one-to-one correspondences (the only sound changes involved include palatalization before /i/ or spirantization of velar segments). The fact that these morphemes are extremely unlikely to be very archaic considerably decreases their probative value. Furthermore, all of these grammaticalizations are very common cross-linguistically, which diminishes the confidence a linguist can have in them from a comparative standpoint.|533|Altaic, grammaticalization, chance resemblance, homoplasy 2882|Gordon2015|Convergent evolutionary analogies (homoplasies) of many kinds occur in diverse phylogenetic clades/ lineages on both the animal and plant branches of the Tree of Life. Living organisms whose last common ancestors lived millions to hundreds of millions of years ago have later converged morphologically, behaviorally or at other levels of functionality (from molecular genetics through biochemistry, physi- ology and other organismic processes) as a result of long term strong natural selection that has con- strained and channeled evolutionary processes. This happens most often when organisms belonging to different clades occupy ecological niches, habitats or environments sharing major characteristics that select for a relatively narrow range of organismic properties. Systems biology, broadly defined, provides theoretical and methodological approaches that are beginning to make it possible to answer a perennial evolutionary biological question relating to convergent homoplasies: Are at least some of the apparent analogies actually unrecognized homologies? This review provides an overview of the current state of knowledge of important aspects of this topic area. It also provides a resource describing many homo- plasies that may be fruitful subjects for systems biological research.|000|homoplasy, homology, convergent evolution, systemic processes, biological evolution 2883|Orlovaite2015|This paper introduces a new automatic cognate identification approach that can take as input orthographic word lists and produce both pairwise and group-based decisions. While the initial goal of the project was to reproduce the (@Hauer<2011> & Kondrak, 2011) cognate discovery method, several weaknesses of the model were discovered. The approach introduced here addresses these issues and follows the comparative method more closely. First, instead of choosing string similarity measures that appear to be reasonable, feature selection is employed, leading to eight best-performing attributes. Second, different aspects of cognateness are captured by also incorporating part-of- speech, letter correspondence, and language family information into the model. Third, strings are selectively preprocessed using either vowel removal or sound class substitu- tion in order to reduce the gap between letters and phonemes, and to only penalize word differences if they are unlikely to appear in cognates. Finally, logistic regression is employed instead of SVM, using probabilities of cognateness as distance measures in hierarchical agglomerative clustering. The reproduced (Hauer & Kondrak, 2011) approach, the new method, and three rule-based baselines are tested using the 95- language 200-meaning Comparative Indo-European Database from (Dyen, Kruskal, & Black, 1992). The new approach performs significantly better than the other four mod- els in terms of both classification and clustering.|000|cognate detection, automatic approach, orthography, historical linguistics 2884|Orlovaite2015|Generally interesting approach, also available on github (already forked it), but suffers from the unreflected use of orthography. No comparability with LexStat possible, since results are not given in B-Cubes (but might be available on GitHub).|000|cognate detection, automatic approach, historical linguistics, orthography 2885|Bobenhausen2009|Metrical markup done manually demands an enormous effort of time. Wouldn’t it be smart to let computers do the work automatically – and would that be possible? The answer is »Yes«. The following text describes how automatic metrical markup for stressed and unstressed syllables in German verse text can be achieved on the bases of theoretical postulations and their methodical realization. |000|markup, poetry, computational philology 2886|Schorer2016|The large databases of the Sino-Tibetan Etymological Dictionary and Thesaurus (STEDT) project—which was finally made available in beta form in October 2010—and of the Trans-Himalayan Database Project in Bern, Switzerland, have now begun to fill this gap. Sadly, no comparable volume con- taining reconstructions of basic vocabulary is available for many of the sub- branches of Tibeto-Burman. Yet in recent years the successful reconstructions of some low-level groups have started to appear in the specialist literature (Sun 1993, Mortensen 2003, VanBik 2007, Wood 2008, Button 2009).|102|resources, reconstructions, etymological dictionary, Sino-Tibetan 2887|Bobenhausen2015|Paper presents the Metricalizer project and can be used to quote it properly. The project uses specific algorithms to automatically identify the metrics of German poems.|000|metrics, automatic approach, project, introduction 2888|Bank2017|Morphological analyses usually prefer ‘deriving’ form-identities as systematic syncretism over just stating them in terms of accidental homophony. While such anti-homophony is mostly assumed implicitly, Müller (2004) spells it out more explicitly as violable Syncretism Principle guiding both language acquisition and linguistic analysis (‘same form → same meaning’). However, as soon as the child or linguist decomposes word forms into smaller formatives (morpheme segmentation, subanalysis), it is unclear what instances of form-identity exactly are to be avoided (e.g. substring-identities?). This paper frames the logical space of possible Syncretism Principle interpretations, which relate to their functional motivation (ambiguity avoidance) demonstrating their concrete consequences for analysis with a paradigm learning algorithm offering segmentation and meaning assignment.|000|syncretism, grammar, morphology, colexification 2889|Simmons2016|This paper examines the evolution of the traditional Chinese linguistic analy- sis known as the sìhū 四呼 “four types of rime onset”. We find that the discov- ery of the sìhū was closely related to new developments in phonological analysis made by Míng 明 (1368–1644) scholars as they compiled innovative rime tables and rime books that departed from strict adherence to Qièyùn 切韻 and Middle Chinese phonology and focused on contemporary colloquial Mandarin dialects and the pronunciation of the prestige Mandarin koinē known as Guānhuà 官話.|000|Chinese traditional phonology, history of science, sìhū 四呼 2890|SimsWilliams1998|Many claims have been made linking ancient languages with genetically identified prehistoric and modern populations. There is much n e w ‘evidence’ and intense debate on the validity and appropriateness of such interdisciplinary work. Here Patrick Sims- Williams provides a timely comment on linguistics and the quest for ancient pop ula tions .|000|biological parallels, genetics, linguistics, interdisciplinary research 2891|Koerner1988|Meillet was a student of Saussure's at the École Pratique des Hautes Études in Paris during 1885-89, substituting for him during 1889-90, when Saussure took a sabbatical leave. Following Saussure's acceptance of a professorship at the University of Geneva in 1891, Meillet remained in touch with him ; letters by the latter addressed to Meillet attest to their friendship. Meillet, for his part, never tired to acknowledge his debt to Saussure ; by contrast, his influence on his former teacher with regard to general linguistic ideas is much less certain. The present paper addresses this question as well as the traditional claim that Saussure was influenced by Durkheimian sociology, most probably mediated by Meillet Throughout most his his career Meillet made general observations about the nature of language and linguistic methodology. But these are usually expressed in book reviews and few papers ; all studies of his of book length are devoted to languages or language groups of the Indo- European family, and it is evident that Meillet remained a comparativist. A close analysis of Meillet's general linguistic ideas reveals that he usually stated the obvious at least if compared with what Saussure had to say about the foundations of linguistics, and that there is little that Saussure could have found in Meillet as a generalise It is therefore not surprising that Meillet's reaction to the Cours de linguistique générale was much less favourable than one might have expected ; for Meillet, Saussure remained first and foremost the author of the Mémoire sur le système primitif des voyelles, which appeared in Leipzig in 1878.|000|Antoine Meillet, Ferdinand de Saussure, history of science, linguistics, structuralism 2892|Stokes2017|The tide of people moving across the world, be they immigrants or refugees, has sparked concern in Australia, Europe and the United States. In particular, the ethnic, linguistic and cultural background of migrants has triggered intense debates over the benefits and the costs of growing diversity and the risk of open borders to national identity. Unease over the cultural, economic and security ramifications of immigration helped to fuel the Brexit vote in the United Kingdom, encourage the idea of a wall along the U.S.-Mexican border and broaden support for right-wing populist parties in France, Germany and the Netherlands.|000|language, bilingualism, nationality, language awareness, identity, 2893|Zhang2017|Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.|000|lexicostatistics, word list, statistics, sound correspondences 2894|Zhang2017|Article discusses minimal size of lexicostatistical wordlists. Not convincingly, as the only empirical data they base this upon is the word lists by @Kessler2007, and they do not really try to solve the question of how many words we need to sufficiently distinguish chance from regular correspondences.|000|word list, lexicostatistics, statistics, sound correspondences 2895|Hall2002|The Parasitic Hypothesis, formulated to account for early stages of vocabu- lary development in second language learners, claims that on initial exposure to a word, learners automatically exploit existing lexical material in the L1 or L2 in order to establish an initial memory representation. At the level of phonological and orthographic form, it is claimed that significant overlaps with existing forms, i.e. cognates, are automatically detected and new forms are subordinately connected to them in the mental lexicon. In the study re- ported here, English nonwords overlapping with real words in Spanish (pseu- docognates), together with noncognate nonwords, were presented to Spanish- speaking learners of English in a word familiarity task. Participants reported significantly higher levels of familiarity with the pseudocognates and showed greater consistency in providing translations for them. These results, together with measures of the degree of overlap between nonword stimuli and transla- tions, were interpreted as evidence for the automatic use of cognates in early word learning.|000|language learning, cognacy, English, Spanish, experimental study 2896|Handel2016|Article (draft) discusses tonal development in Duoxu, a Tibeto-Burman language.|000|Duoxu, Tibeto-Burman, Sino-Tibetan, tone, introduction 2897|Ree1998|Assumptions about the costs of character change, coded in the form of a step matrix, deter- mine most-parsimonious inferences of character evolution on phylogenies. We present a graphical approach to exploring the relationship between cost assumptions and evolutionary inferences from character data. The number of gains and losses of a binary trait on a phylogeny can be plotted over a range of cost assumptions, to reveal the inection point at which there is a switch from more gains to more losses and the point at which all changes are inferred to be in one direction or the other. Phylogenetic structure in the data, the tree shape, and the relative frequency of states among the taxa inuence the shape of such graphs and complicate the interpretation of possible permutation- based tests for directionality of change. The costs at which the most-parsimonious state of each internal node switches from one state to another can also be quantiŽed by iterative ancestral-state reconstruction over a range of costs. This procedure helps identify the most robust inferences of change in each direction, which should be of use in designing comparative studies.|000|step matrix, parsimony, homoplasy, homology, ancestral state reconstruction, ancestral states 2898|Ree1998|Paper describes methods to get more out of gain-loss models by testing when character-state evolution turns, under certain weights. In this sense, it may be worthwhile giving it a closer look when further working on birth-death or gain-loss models.|000|gain-loss models, character mapping, parsimony, ancestral states 2899|Nixon2012|Homology in cladistics is reviewed. The definition of important terms is explicated in historical context. Homology is not synonymous with synapomorphy: it includes symplesiomorphy, and Hennig clearly included both plesiomorphy and synapomorphy as types of homology. Homoplasy is error, in coding, and is analogous to residual error in simple regression. If parallelism and convergence are to be distinguished, homoplasy would be evidence of the former and analogy evidence of the latter. We discuss whether there is a difference between molecular homology and morphological homology, character state homology, nested homology (additive characters), and serial homology. We conclude by proposing a global definition of homology.|000|introduction, homology, terminology, review, cognacy 2900|Phillips2005|Hypotheses of homology are the basis of phylogenetic analysis. All character data are considered to be equivalent regardless of the source of those characters. Putative homology statements are designated based on observations of similarity. Pairwise sequence align- ment using the Needleman–Wunsch algorithm is the basis for similarity maximization between molecular sequences. Multiple sequence alignment uses this algorithm in a topologically hierarchical framework. The resulting hypotheses of homology are tested in conjunction with character congruence through parsimony. This review introduces some underlying principles of phylogenetic analysis as they pertain homology testing and DNA sequence alignment.|000|homology, DNA sequence, evolutionary model, philosophy of science, cladistics 2901|Phillips2005|Systematics is an historical science with distinct epistemological constraints. The data is acquired through observation and not through experimental manipulation. The results of a phylogenetic analysis are only provisionally accepted pending the next new set of observations. The past can never be truly known and we can only rely on our best estimates. History has occurred only once and unique serendipitous events are pivotal during the process of evolution. If you could rewind the tape of time and replay it, then you would observe a different series of events every time you watch it. The results of a phylogenetic analysis are explicitly uncertain; accuracy is a pipe dream|18|epistemology, epistemological problem, methodological problem, philosophy of science, historical sciences 2902|Phillips2005|Our concept of homology is not simply derived from our ability to ascribe similarity. Homology is a process theory necessitated by descent with modification. Phylogenetic analysis via parsimony both requires and substantiates our hypotheses of homology through mutual corrobora- tion of the characters. In the absence of a phylogenetic hypothesis, there is no test of homology. Consequently, it is important to take into account how we execute our phy- logenetic analysis and our justifications for doing so.|19|philosophy of science, homology, cladistics, 2903|Yaveroglu2015|**Motivation:** Network comparison is a computationally intractable problem with important applica- tions in systems biology and other domains. A key challenge is to properly quantify similarity between wiring patterns of two networks in an alignment-free fashion. Also, alignment-based methods exist that aim to identify an actual node mapping between networks and as such serve a different purpose. Various alignment-free methods that use different global network properties (e.g. degree distribution) have been proposed. Methods based on small local subgraphs called graphlets perform the best in the alignment-free network comparison task, due to high level of topological detail that graphlets can capture. Among different graphlet-based methods, Graphlet Correlation Distance (GCD) was shown to be the most accurate for comparing networks. Recently, a new graphlet-based method called NetDis was proposed, which was claimed to be superior. We argue against this, as the performance of NetDis was not properly evaluated to position it correctly among the other alignment-free methods. **Results:** We evaluate the performance of available alignment-free network comparison methods, including GCD and NetDis. We do this by measuring accuracy of each method (in a systematic precision-recall framework) in terms of how well the method can group (cluster) topologically simi- lar networks. By testing this on both synthetic and real-world networks from different domains, we show that GCD remains the most accurate, noise-tolerant and computationally efficient alignment- free method. That is, we show that NetDis does not outperform the other methods, as originally claimed, while it is also computationally more expensive. Furthermore, since NetDis is dependent on the choice of a network null model (unlike the other graphlet-based methods), we show that its performance is highly sensitive to the choice of this parameter. Finally, we find that its performance is not independent on network sizes and densities, as originally claimed|000|alignment, networks, network comparison, introduction 2904|Yaveroglu2015|If the task comes to network comparison, this paper may be a starting point to read more about the problem, as they quote a lot of literature.|000|network comparison, methodology 2905|Yaveroglu2015|If the task comes to network comparison, this paper may be a starting point to read more about the problem, as they quote a lot of literature.|000|network comparison, methodology 2906|Wang2011d|Book presents a new analysis of Shījīng, which is highly interesting, as the book gives concrete analyses of rhyming behavior.|000|shījīng, rhyme patterns, rhyme analysis, dataset 2907|Yang2014|Very explicit collection of rhymes in Bronze inscriptions, interesting in the context of @Behr2008 for the investigation of rhyme patterns.|000|dataset, rhyme patterns, Bronze inscriptions, Old Chinese 2908|Pagel2013|The search for ever deeper relationships among the World’s lan- guages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultra- conserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic super- family that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the trans- mission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.|000|Nostratic, comparative method, lexicostatistics, Bayesian inference 2909|Mahowald2013|Paper is a reply to @Pagel2013, arguing that short words have more chance similarities.|000|chance resemblance, word length, 2910|Mei2000|These are the collected writings of Měi Zǔlín.|000|historical linguistics, Old Chinese, Sino-Tibetan, linguistic reconstruction, yīnyùnxué, 2911|Mei2000|These are the collected writings of Měi Zǔlín.|000|historical linguistics, Old Chinese, Sino-Tibetan, linguistic reconstruction, yīnyùnxué, 2912|Cooney2017|The origin and expansion of biological diversity is regulated by both developmental trajectories 1,2 and limits on available ecological niches 3–7 . As lineages diversify, an early and often rapid phase of species and trait proliferation gives way to evolutionary slow- downs as new species pack into ever more densely occupied regions of ecological niche space 6,8 . Small clades such as Darwin’s finches demonstrate that natural selection is the driving force of adaptive radiations, but how microevolutionary processes scale up to shape the expansion of phenotypic diversity over much longer evolutionary timescales is unclear 9 . Here we address this problem on a global scale by analysing a crowdsourced dataset of three-dimensional scanned bill morphology from more than 2,000 species. We find that bill diversity expanded early in extant avian evolutionary history, before transitioning to a phase dominated by packing of morphological space. However, this early phenotypic diversification is decoupled from temporal variation in evolutionary rate: rates of bill evolution vary among lineages but are comparatively stable through time. We find that rare, but major, discontinuities in phenotype emerge from rapid increases in rate along single branches, sometimes leading to depauperate clades with unusual bill morphologies. Despite these jumps between groups, the major axes of within-group bill-shape evolution are remarkably consistent across birds. We reveal that macroevolutionary processes underlying global-scale adaptive radiations support Darwinian 9 and Simpsonian 4 ideas of microevolution within adaptive zones and accelerated evolution between distinct adaptive peaks.|000|bird evolution, adaptationism, Darwinian evolution, macro-evolution, micro-evolution, evolutionary rates 2913|Butler2000|Current debate concerning homology arises from three different research interestsÐphylogenetics, character evolution, and generative pathways. Phylogenetic homology focuses on descent of the character from a common ancestor. Biological homology addresses character evolution and diversification. Exceptions to the general case complicate these two approaches: historically and biologically homologous characters may be produced by different generative pathways, and minutely similar characters produced by the same generative pathways may have a sporadic phylogenetic distribution. We suggest that for studies of comparative developmental biology, new descriptive terms are needed to distinguish similar structures that result from the same generative pathways from those that result from different generative pathways. The terms syngeny, meaning ``same genesis'', and allogeny, meaning ``dif- ferent genesis'', allow the acknowledgement of same- ness at the generative level and can be used in combination with the terminology of historical homol- ogy and biological homology to describe any given character.|000|homology, philosophy of science, homoplasy, introduction, terminology 2914|Butler2000|They view the homology of any character as strictly dependent on the test of homology prescribed by cladistic methodology, i.e., exhibiting a phylogenetic distribution that is congruent with a monophyletic taxon due to descent from a common ancestor that also exhibited the character. This conceptual school of homology is referred to as taxic, historical, or phylogenetic homology. It is concerned with pattern.|847|homology, definition, terminology 2915|Butler2000|Article discusses different aspects of homology, mainly the question of hierarchy and the difference between gene homology and morphological trait homology. Mentions classical cases, like bird-wings vs. forelimbs in mammals, etc.|000|terminology, homology, discussion, 2916|Butler2000|Article discusses different aspects of homology, mainly the question of hierarchy and the difference between gene homology and morphological trait homology. Mentions classical cases, like bird-wings vs. forelimbs in mammals, etc.|000|terminology, homology, discussion, 2917|Burling1967|An early introduction to Lolo-Burmese with some rather detailed data on shared words (morphemes).|000|dataset, Loloish, Burmish languages, Sino-Tibetan, Lolo-Burmese 2918|Guillaume2006|It appeared recently that the classical random graph model used to represent real-world complex networks does not capture their main properties. Since then, various attempts have been made to provide accurate models. We study here a model which achieves the following challenges: it produces graphs which have the three main wanted properties (clustering, degree distribution, average distance), it is based on some real-world observations, and it is sufficiently simple to make it possible to prove its main properties. This model consists in sampling a random bipartite graph with prescribed degree distribution. Indeed, we show that any complex network may be viewed as a bipartite graph with some specific characteristics, and that its main properties may be viewed as consequences of this underlying structure. We also propose a growing model based on this observation.|000|bipartite network, graph theory, methodology, introduction 2919|Haspelmath2016|Since the 1970s, serial verb constructions (SVCs) have been discussed widely in African, Oceanic and many other languages throughout the world. This article gives an overview of the most important generalizations about SVCs that have been proposed and that do seem to hold if a sufficiently restrictive definition of the concept is adopted. The main problem with the earlier comparative literature is that the notion of an SVC has not been delimited clearly, and/or has been formulated in much too wide terms. As a result, some linguists have despaired of finding a coherent cross-linguistic concept of SVC. For example, one scholar asked ‘Are there any universal defining properties of serial verb constructions? Probably not . . .’. These problems can be seen as a result of the confusion between comparative concepts and natural kinds: Serial verb constructions have (most often implic- itly) been regarded as natural kinds (universal categories), so that phenomena in additional languages were regarded as SVCs even when they had somewhat different properties. This procedure inevitably leads to a fuzzy and very broad understanding of the concept, with a prototype (or ‘canonical’) structure that does not allow falsifiable claims. Here I propose a narrow definition of SVC and formulate 10 universals that are apparently true of all serial verb constructions in this narrow sense. The claim that these are universally true of (narrowly defined) serial verb constructions is based on a thorough reading of the comparative and theoretical literature, not on a systematic sample of language—the latter would not have been practical, because SVCs are rarely described in sufficient detail in descriptive grammars. No attempt is made at explaining these generalizations in the present article, but I claim that we finally have a good idea of what it is that needs to be explained in a general way.|000|serial verb constructions, introduction, cross-linguistic study 2920|LaPolla1994|In many Tibeto-Burman languages we find that there are a number of forms that are clearly related though differ in one segment. In some cases these variations may be due to regular or common alternations, such as in Tibetan, where you have dental suffixes that can nominalize a verb (e.g. rkun-po 'thief', from rku 'steal'). In other cases we cannot find any morphological reason for the variation, even though the variation may involve the same segments, as in Tibetan bka, skad 'speech'. When we reconstruct the Proto-Tibeto-Burman provenience of these cognates, we sometimes have no way of knowing which form is older, so we must reconstruct two forms that are dearly related, that are what James A. Matisoff has dubbed 'allofams'. On the Chinese side of Sino-Tibetan we find similar alternations among cognate forms, as in *mjar.J, M; *mjag 'negative/not have'; tE *gwjaiJ. -T· *gwjag 'go'.|000|allofams, cognacy, proof of cognacy, philosophy of science, 2921|LaPolla1994|Within the whole cognacy debate, where we say that words are clearly related but not regular, this paper is a good starting point, as it explicitly deals with the cases. For further research, we need to investigate regularity and irregularity, and how to identify cases of clear cognates which are nevertheless only partially regular.|000|cognacy, proof of cognacy, sound correspondences, allofams 2922|LaPolla2012|This paper presents epistemological and methodological problems found in work on the subgrouping of Sino-Tibetan languages and the reconstruction of features of the languages. A key problem is the lack of an accepted standard for judging this work, one that can stand up to statistical evaluation. An alternative methodology that involves using fixed sets of features to give us the statistical probability of common origin is suggested.|000|epistemological problem, methodological problem, Sino-Tibetan, subgrouping, comparative method 2923|LaPolla2012|The first problem we encounter is an epistemological one, “Teeter’s Law” (@Watkins<1967> 1976:310): “The language of the family you know best always turns out to be [pb] the most archaic”.|117f|history of science, Teeter's Law, reconstruction methodology, linguistic reconstruction 2924|LaPolla2012|Lack of consistent and clear standards and principles for subgrouping. That is, no consensus on methodology. Though there is some excellent work done using the comparative method, and there have been arguments for more rigorous application of the comparative method (using sets of unusual shared innovations—Thurgood 1982), subgrouping within Sino-Tibetan is often based on certain features that the languages are said to share, or on a few shared lexical items, or even on the fieldworker’s intuitions, or on how remote speakers feel different languages are (the degree of mutual intelligibility), or, as we saw above, because the languages just happen to be in the same geographic area.|121|subgrouping, comparative method, criteria, problem, 2925|LaPolla2012|Article offers an interesting elaboration on methodological problems in applying the comparative method for subgrouping. By discussing the problem of finding individual-identifying evidence, the article also offers an apparently suitable way to handle this in Sino-Tibetan, although not mentioning how quantification could be done.|000|subgrouping, Sino-Tibetan, methodology, comparative method 2926|LaPolla2012|Nichols argues that the evidence that has been used in the history of Indo-European linguistics for showing relatedness is not individual word correspondences, but “whole systems or subsystems with a good deal of internal [pb] paradigmaticity, ideally multiple paradigmaticity, and involving not only categories but particular shared markers for them” (@Nichols<1996> 1996:48).|121f|individual-identifying evidence, systemic processes, systematic aspect of evolution, comparative method, subgrouping 2927|LaPolla2012|In monosyllabic morphology-poor languages, achieving this standard is more difficult, but not impossible. What is needed is for particular elements (words and/or morphological markers) to be organized into paradigm-like sets and applied rigorously to determine relatedness. That is, the sets are treated as if they were paradigms and used as individual-identifying evidence of relatedness, because the particular combination of independent elements as an internally-structured set would give us the level of statistical significance we need. It is paradigmaticity in particular that helps us reach the individual-identifying threshold, as the probability for the set as a whole is determined by multiplying the probabilities of the individual forms and categories by each othe|122|systemic processes, comparative method, subgrouping, individual-identifying evidence, Sino-Tibetan 2928|LaPolla2012|:comment:`The master list mentioned in the appendix of sets of features is interesting in so far as it presupposes some degree of homology/cognacy assessment which should have been done preliminarily.` |126|individual-identifying evidence, examples, Sino-Tibetan 2929|LaPolla2012|:comment:`The master list mentioned in the appendix of sets of features is interesting in so far as it presupposes some degree of homology/cognacy assessment which should have been done preliminarily.` |126|individual-identifying evidence, examples, Sino-Tibetan 2930|LaPolla2013|This paper is an attempt to apply insights and methodologies from Nichols (1996) to help us resolve problems in determining genetic relatedness among Sino-Tibetan languages and in our efforts at reconstructing protolanguages of different time depths. The results from the application of Nichols’s methodology are explained with reference to what we know about the migrations of the Sino-Tibetan peoples.|000|subgrouping, individual-identifying evidence, Sino-Tibetan, 2931|LaPolla2013|Interesting paper to be quoted when working on the investigation of patterns in Sino-Tibetan and questions of chance resemblances, as it draws on the statistics by @Nichols1996 and reflects about their implication for Sino-Tibetan.|000|Sino-Tibetan, chance resemblance, individual-identifying evidence, subgrouping 2932|LaPolla2017|Introduction to Topic and Comment in Chinese.|000|introduction, topic and comment, information structure, Chinese 2933|LaPolla2017d|Overview over the Dulong language.|000|introduction, Sino-Tibetan, Dulong, overview, Tibeto-Burman 2935|Barret2011|My aim is to consider the simplest basis upon which we might distinguish between hunter-gatherer and agricultural systems in the hope of providing a foundation upon which to investigate the historical complexities covered by these terms, and of clarifying some of the historical conditions that were necessary for the development of agriculture.|000|hunter gatherers, cultural evolution, introduction, definition, neolithic revolution 2936|Benson2012|GenBank Õ (http://www.ncbi.nlm.nih.gov) is a com- prehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large- scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomed- ical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.|000|gene bank, database, introduction, 2937|Benson2012|Useful to comare in the context of writing dataset papers in linguistics.|000|gene bank, dataset, introduction 2938|Blust2016|The first thing to note with regard to the discovery of the so-called ‘Liangdao man’ is that any recipe used to ‘cook up’ an interpretation of the human history of East and Southeast Asia based on this fragile foundation will require a large pinch of speculation. My purpose in entering this discussion is not to enlarge the compass of the speculations already advanced, but rather to rein them in with some reminders of things about which we have much better observational control.|000|South-East Asian languages, language history, peopling of South-East Asia 2939|Blust2016|This is a direct reply to @Sagart2016 on the same topic of early sceletons and linguistic reconstruction.|000|peopling of South-East Asia, linguistic reconstruction, review 2940|Sagart2016|The discovery of Liangdao Man, a skeleton C-14 dated to c. 6000 BCE, under a shell mound in Liang Island north of the Taiwan strait, and the sequencing by Ko et al. (2014) of its mitochondrial DNA, have brought new light on Austronesian origins. Liangdao man’s mtDNA belongs to a very early form of haplogroup E, which is exclusively Austronesian and well represented in Taiwan in two forms: E1 and E2.|000|Austronesian, language origin, Urheimat, 2941|Bowern2016| Here I present the background to, and a description of, a newly developed database of historical and contemporary lexical data for Australian languages (Chirila), concentrating on the Pama-­‐Nyungan family (the largest family in the country). While the database was initially developed in order to facilitate research on cognate words and reconstructions, it has had many uses beyond its original purpose, in synchronic theoretical linguistics, language documentation, and language reclamation. Creating a multi-­‐audience database of this type has been challenging, however. Some of the challenges stemmed from success: as the size of the database grew, the original data structure became overly unwieldy. Other challenges grew from the difficulties in anticipating future needs, in keeping track of materials, and in coping with diverse input formats for so many highly endangered languages. In this paper I document the structure of the database, provide an overview of its uses (both diachronic and synchronic), and discuss some of the issues that have arisen during the project and choices that needed to be made as the database was created, compiled, curated, and shared. I address here the major problems that arise with linguistic data, particularly databases created for diverse audiences, from diverse data, with little infrastructure support. |000|dataset, Australian languages, description 2942|Harding1988|Genetic distances among speakers of the European language families were computed by using gene- frequency data for human blood group antigens, enzymes, and proteins of 26 genetic systems. Each system was represented by a different subset of 3369 localities across Europe. By subject- ing the matrix of distances to numerical taxonomic procedures, we obtained a grouping of the language families of Europe by their genetic distances as contrasted with their linguistic relationships. The resulting classification largely reflects geo- graphic propinquity rather than linguistic origins. This is evidence for the primary importance of short-range interdemic gene flow in shaping the modern gene pools of Europe. Yet, some language families-i.e., Basque, Finnic (including Lap- pish), and Semitic (Maltese)-have distant genetic relation- ships with their geographic neighbors. These results indicate that European gene pools still reflect the remote origins of some ethnic units subsumed by these major linguistic groups.|000|genetic classification, genetic distance, European languages, 2943|Goddard2010a|In the NSM approach to semantic analysis, semantic molecules are a well-de- fined set of non-primitive lexical meanings in a given language that function as intermediate-level units in the structure of complex meanings in that language. After reviewing existing work on the molecules concept (including the notion of levels of nesting), the paper advances a provisional list of about 180 productive semantic molecules for English, suggesting that a small minority of these (about 25) may be universal. It then turns close attention to a set of potentially universal level-one molecules from the “environmental” domain (‘sky’, ‘ground’, ‘sun’, ‘day’, ‘night’ ‘water’ and ‘fire’), proposing a set of original semantic explications for them. Finally, the paper considers the theoretical implications of the molecule theory for our understanding of semantic complexity, cross-linguistic variation in the structure of the lexicon, and the translatability of semantic explications.|000|semantic molecules, natural semantic metalanguage, 2944|Goddard2006|This paper explains and explores the concept of “semantic molecules” in the NSM methodology of semantic analysis. A semantic molecule is a complex lexical meaning which functions as an intermediate unit in the structure of other, more complex concepts. The paper undertakes an overview of different kinds of semantic molecule, showing how they enter into more complex meanings and how they themselves can be explicated. It shows that four levels of “nesting” of molecules within molecules are attested, and it argues that while some molecules, such as ‘hands’ and ‘make’, may well be language-universal, many others are language- specific.|000|semantic molecules, natural semantic metalanguage 2945|Montemurro2011|Background: The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language. Methodology/Principal Findings: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families. Conclusions/Significance: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.|000|word order, cross-linguistic study, entropy 2946|Shannon1948|Groundbreaking paper on entropy and information and encoding from mathematical perspective.|000|entropy, introduction, fundamental paper, 2947|Saville2005|Book on second language acquisition.|000|language learning, introduction, 2948|Sokal1988|Genetic and taxonomic distances were com- puted for 3466 samples of human populations in Europe based on 97 allele frequencies and 10 cranial variables. Since the actual samples employed differed among the genetic systems studied, the genetic distances were computed separately for each system, as were matrices of geographic distances and of linguistic distances based on membership in the same language family or phylum. Significant matrix correlations between genetics and geography were found for the majority of sys- tems; somewhat less frequent are significant correlations between genetics and language. The effects of the two factors can be separated by means of partial matrix correlations. These show significant values for both genetics and geography, language kept constant, and genetics and language, geography kept constant, with a tendency for the former to be higher. These findings demonstrate that speakers of different language families in Europe differ genetically and that this difference remains even after geographic differentiation is allowed for. The greater effect of geography than of language may be due to the several factors that bring about spatial differentiation in human populations.|000|genetic distance, European languages, 2949|Souag2016|This talk presents a grammar-specific instance of sound change, that is, a sound change which happens only to hold for verbs. This is an interesting contribution in the light of the idea of the universality of sound change in the Neogrammarian paradigm.|000|sound change, comparative method, methodology 2950|Wilson2016|Indispensable resource for comparing coding practices and practices for data managment.|000|coding practice, good practices, dataset, data curation, introduction 2951|Winter2016a|The sound system of a language must be able to support a perceptual contrast between different words in order to signal communicatively relevant meaning distinctions. In this paper, we use a simple agent-based exemplar model in which the evolution of sound-category systems is under- stood as a co-evolutionary process, where the range of variation within sound categories is con- strained by functional pressure to keep different words perceptually distinct. We show that this model can reproduce several observed effects on the range of sound variation. We argue that phonological systems can be seen as finding a relative optimum of variation: Efficient communica- tion is sustained while at the same time, hidden category variation provides pathways for future evolution.|000|simulation studies, language evolution, sound change, phonetic contrast 2952|Yang2015|This study investigates the role of phonation cues in perceiving Mandarin tones in isolated syllables. 1 Mandarin tones have been previously reported to be sufficiently identified by F0 contour, while phonation cues are redundant. This study provides evidence to show that native Mandarin speakers are sensitive to phonation cues in identifying the four mandarin tones in isolated syllables. Moreover, Mandarin Tone 3 more strongly relies on phonation cues in its identification than the other three tones, which probably derives from the usual accompanying non-modal voice (creaky voice) in Tone 3 production. These results thus indicate the need to define language tones in a finer model that incorporates detailed phonation parameters.|000|phonation type, perception, tone perception, Mandarin, Chinese 2953|Whitfield2016|Paper introduces the work of Marc Pagel and the simplifying assumptions regarding evolution of languages/cultures and species.|000|biological parallels, introduction 2954|Zhu2016|Article discusses lexical diffusion in Dongfeng Village of Linyi City in Shandong. It concludes, that the sound changes of Dongfeng Chinese are result of lexical diffusion rather than Neogrammarian sound change processes.|000|lexical diffusion, Neogrammarian sound change, Dongfeng, Chinese dialects 2955|Yanson2011|Article introduces different stages of Burmese by contrasting initials and finals (@Yanson2012).|000|Old Burmese, Burmese, language history, periodization, Sino-Tibetan 2956|Yanson2012|Article introduces different stages of Burmese by contrasting initials (@Yanson2011) and finals.|000|Burmese, Old Burmese, Sino-Tibetan, periodization, language history 2957|Smith2014|In the absence of direct evidence of the emergence of language, the explicitness of formal models which allow the exploration of interactions between multiple complex adaptive systems has proven to be an important tool. Computational simulations have been at the heart of the field of evolutionary linguistics for the past two decades, particularly through the language game and iterated learning paradigms, but these are now being extended and complemented in a number of directions, through formal mathematical models, language-ready robotic agents, and experimental simulations in the laboratory.|000|simulation studies, language change, introduction 2958|Smith2014|Possibly a must-read if it comes to simulation studies in language history and language evolution.|000|introduction, language evolution, language change, simulation studies 2959|Smith2004|Human language is unique among the communication systems of the natural world. The vocabulary of human language is unique in being both culturally-transmitted and symbolic. In this paper I present an investigation into the factors involved in the evolution of such vocabulary systems. I investigate both the cultural evolution of vocabulary systems and the biological evolution of learning rules for vocabulary acquisition. Firstly, vocabularies are shown to evolve on a cultural time-scale so as to fit the expectations of learners — a population’s vocabulary adapts to the biases of the learners in that population. A learning bias in favour of one-to-one mappings between meanings and words leads to the cultural evolution of communicatively- optimal vocabulary systems, even in the absence of any explicit pressure for commu- nication. Furthermore, the pressure to conform to the biases of learners is shown to outweigh natural selection acting on cultural transmission. Human language learn- ers appear to bring a one-to-one bias to the acquisition of vocabulary systems. The functionality of human vocabulary may therefore be a consequence of the biases of human language learners. Secondly, the evolutionary stability of genetically-transmitted vocabulary learning biases is investigated using both static and dynamic models. A one-to-one learning bias, which leads to the cultural evolution of optimal communication, is shown to be evolutionarily stable. However, the evolution de novo of this bias is complicated by the cumulative nature of the cultural evolution of vocabulary systems. This suggests that the biases of human language learners may not have evolved specifically and exclusively for the acquisition of communicatively-functional vocabulary.|000|lexical change, vocabulary evolution, simulation studies, 2960|Smith2012|The distribution of frequency counts of distinct words by length in a language’s vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains the number of distinct words as a function of length. We will be able to derive the frequency count, mean word length, and variance of word length based on the marginal probability of letters and spaces. The second, based on information theory, will demonstrate that the conditional entropies can also be used to estimate the frequency of distinct words of a given length in a language. In addition, it will be shown how these techniques can also be applied to estimate higher order entropies using vocabulary word length.|000|word length, word frequency, distribution, quantitative analysis 2961|Feng2015|Comments on the study by @Sampson2015|000|commentary, monosyllabicity, Chinese, language history, 2962|Duda2016|Over the past two decades numerous new trees of modern human populations have been published extensively but little attention has been paid to formal phylogenetic synthesis. We utilized the “matrix representation with parsimony” (MRP) method to infer a composite phylogeny (supertree) of modern human populations, based on 257 genetic/genomic, as well as linguistic, phylogenetic trees and 44 admixture plots from 200 published studies (1990–2014). The resulting supertree topology includes the most basal position of S African Khoisan followed by C African Pygmies, and the paraphyletic section of all other sub-Saharan peoples. The sub-Saharan African section is basal to the monophyletic clade consisting of the N African–W Eurasian assemblage and the consistently monophyletic Eastern superclade (Sahul–Oceanian, E Asian, and Beringian–American peoples). This topology, dominated by genetic data, is well-resolved and robust to parameter set changes, with a few unstable areas (e.g., West Eurasia, Sahul–Melanesia) reflecting the existing phylogenetic controversies. A few populations were identified as highly unstable “wildcard taxa” (e.g. Andamanese, Malagasy). The linguistic classification fits rather poorly on the supertree topology, supporting a view that direct coevolution between genes and languages is far from universal.|000|human population, human prehistory, population genetics, population history, supertree 2963|Dong2015|The so-called “right side sound system” ( 右音系統 ) of the Fanyi Laoqida Piaotongshi ( 翻譯老乞大•朴通事 ) shows that the entering tone was divided into two categories according to vowel height. Based on data from Yue, Southern Min, Jin and Jianghuai Mandarin dialects which show that entering tone and stop final changes were caused by vowel height, this paper argues that the division pattern of the entering tone in the Fanyi Laoqida Piaotongshi has a factual basis. In the 16th century, the entering tone of Mandarin which Cui Shizhen ( 崔世珍 ) recorded had been divided into two categories according to vowel height. This pattern is very similar to the modern Lu’an ( 六安 ) and Shucheng ( 舒城 ) dialects. The similarity may indicate that the 16th century Mandarin of Cui Shizhen ( 崔世珍 ) and modern Jianghuai Mandarin share a genealogical relationship, but it may also just be a typological similarity caused by the same motivating factors.|000|Liù'ān 六安 dialect, Shūchéng 舒城 dialect, Chinese dialects, Lǎo Qìdà 2964|Donohue2014|Paper elaborates on unifying factors for the tonal-non-tonal aspects of the Himalayan languages. It argues that breathiness is one of the main characteristics.|000|Sino-Tibetan, breathy voice, distribution, linguistic area, cross-linguistic study, tone language 2965|Everett2005|The Pirahã language challenges simplistic application of Hock- ett’s nearly universally accepted design features of human lan- guage by showing that some of these features (interchangeability, displacement, and productivity) may be culturally constrained. In particular, Pirahã culture constrains communication to nonab- stract subjects which fall within the immediate experience of in- terlocutors. This constraint explains a number of very surprising features of Pirahã grammar and culture: the absence of numbers of any kind or a concept of counting and of any terms for quanti- fication, the absence of color terms, the absence of embedding, the simplest pronoun inventory known, the absence of “relative tenses,” the simplest kinship system yet documented, the ab- sence of creation myths and fiction, the absence of any individ- ual or collective memory of more than two generations past, the absence of drawing or other art and one of the simplest material cultures documented, and the fact that the Pirahã are monolin- gual after more than 200 years of regular contact with Brazilians and the Tupi-Guarani-speaking Kawahiv.|000|fundamental paper, recursion, Pirahã, universals, Chomsky syntax 2966|Flack2016|This short paper, rather than providing a thorough ana- lysis of the very broad theme entailed by its title, aims only to programmatically outline the contours of a general framework for future research on structuralism and its genealogy. In essence, I wish to argue that mainstream approaches to struc- turalism’s history need to be significantly broadened, not only to better account for the contributions of Eastern and Central European thinkers, but also to take into full consideration structuralism’s deep, complex and rich roots in 19 th Century German thought. To make this point, I will succinctly com- pare three distinct historiographical models of structuralism (“French”, “East-West”, “Jakobsonian”), each of which pro- vides a very rough and selective, yet highly contrastive map of the intellectual and personal networks that underpinned structuralism's development up to World War II. Thanks to this basic comparative exercise, I hope to highlight the reduc- tionistic, limiting nature of the first two models with regards to the more complete (if not exhaustive or definitive) third one and to cast further light on Jakobson’s crucial function as a communicator, synthesiser and passer of ideas between scholars, disciplines and intellectual traditions.|000|structuralism, Roman Jakobson, history of science, introduction 2967|Gasser2014|Paper discusses and presents cross-linguistic analyses on phonological distinctions in Australian languages. Some figures are quite nice and may be easily enhanced with more developed tools as those provided in the LingPy library. This paper may give good inspiration for a larger framework of visualizations.|000|visualization, Australian languages, phonology, phoneme inventory, phonotactics 2968|Gerard1956|Since Darwin there has been much discussion pro and con as to whether profitable analogies can be drawn between the evolution of species and the development of different kinds of primitive and advanced human societies. This question is reviewed in this article by an interdisciplinary team. It is suggested that orienta- tions and methods which have been employed to investigate bio- logical evolution might also be used in the study of the evolution of society or culture. Perhaps these will throw light on the theo- retical problem of the similarities and differences in the two sorts of evolution. Experiments are suggested in which small groups work together on problems under special constraints of customs and languages dictated by the experimenters. In addition to this experimental approach to the study of different cultures, and normal and abnormal behavior in them, the historical method of comparing biological and social evolution is employed in an ex- amination of language.|000|biological parallels, species evolution, biological evolution, language evolution, 2969|Jacques2017c|Introduction to Japhug language.|000|Japhug, introduction, 2970|Jacques2017b|The cluster of languages variously referred to as Stau, Ergong or Horpa in the literature are spoken over a large area from Ndzamthang county (in Chinese Rangtang) in Rngaba prefecture (Aba) to Rtau county (Daofu) in Dkarmdzes prefecture (Ganzi), in Sichuan province, China. At the moment of writing, it is still unclear how many unintelligible varieties belong to this group, but at least three must be distin- guished: the language of Rtau county (referred to as ‘Stau’ in this paper), the Dgebshes language (Geshizha) spoken in Rongbrag county (Danba), and the Stodsde language (Shangzhai) in Ndzamthang. The people speaking these languages are all classifi ed as Tibetans by the administration.|000|introduction, Stau, Ergong, Horpa, Sino-Tibetan, 2971|Ciancaglini2008|Interesting paper on provind language relationship which somehow represents the state of the art from the classical approaches to language comparison.|000|proof of relationship, introduction, genetic relationship, comparative method, overview, Japanese, Korean 2972|Hashimoto1992|Interesting article giving a good overview on Hakka dialects. Should be consulted if a closer investigation of Chinese dialects is carried out in any way.|000|Hakka, wave theory, Chinese dialects, dialect classification 2973|Kamneva2017|Hybridization events generate reticulate species relationships, giving rise to species networks rather than species trees. We report a comparative study of consensus, maximum parsimony, and maximum likelihood methods of species network reconstruction using gene trees simulated assuming a known species history. We evaluate the role of the divergence time between species involved in a hybridization event, the relative contributions of the hybridizing species, and the error in gene tree estimation. When gene tree discordance is mostly due to hybridization and not due to incomplete lineage sorting (ILS), most of the methods can detect even highly skewed hybridization events between highly divergent species. For recent divergences between hybridizing species, when the influence of ILS is sufficiently high, likelihood methods outperform parsimony and consensus methods, which erroneously identify extra hybridizations. The more sophisticated likelihood methods, however, are affected by gene tree errors to a greater extent than are consensus and parsimony.|000|simulation studies, incomplete lineage sorting, maximum parsimony, maximum likelihood, hybridization 2974|Kamneva2017|Potentially important article on simulation studies to test the effect of lateral transfer and hybridization on phylogenetic methods to detect and deal with incomplete lineage sorting. Result is, according to Morrison's analysis in his blog (http://phylonetworks.blogspot.com/2017/03/detecting-introgression-versus.html), that unbalanced lateral transfer is just very hard to detect.|000|incomplete lineage sorting, horizontal gene transfer, simulation studies, hybridization 2975|Hashimoto1976|Highly interesting paper that draws on typological data to compare languages in South-East Asia, thereby illustrating obvious geographical patternings which may point to language contact and areal diffusion.|000|areal diffusion, linguistic area, language contact, South-East Asian languages, wave theory 2976|Progovac2016|In making an argument for the antiquity of language, based on comparative evidence, Dediu and Levinson (2013) express hope that some combinations of structural features will prove so conservative that they will allow deep linguistic reconstruction. I propose that the earliest stages of syntax/grammar as reconstructed in Progovac (2015a), based on a theoretical and data-driven linguistic analysis, provide just such a conservative platform, which would have been commanded also by Neandertals and the common ancestor. I provide a fragment of this proto-grammar, which includes flat verb-noun compounds used for naming and insult (e.g., rattle-snake, cry-baby, scatter-brain), and paratactic (loose) combinations of such flat structures (e.g., Come one, come all; You seek, you find). This flat, binary, paratactic platform is found in all languages, and can be shown to serve as foundation for any further structure building. However, given the degree and nature of variation across languages in elaborating syntax beyond this proto-stage, I propose that hierarchical syntax did not emerge once and uniformly in all its complexity, but rather multiple times, either within Africa, or after dispersion from Africa. If so, then, under the uniregional hypothesis, our common ancestor with Neandertals, H. heidelbergensis, could not have commanded hierarchical syntax, but “only” the proto-grammar. Linguistic reconstructions of this kind are necessary for formulating precise and testable hypotheses regarding language evolution. In addition to the hominin timeline, this reconstruction can also engage, and negotiate between, the fields of neuroscience and genetics, as I illustrate with one specific scenario involving FOXP2 gene.|000|speculation, language evolution, language origin, Neandertal, 2977|Downer1973|Article gives general information on Chinese historical strata within a dialect of the Mien family (Hmong-Mien). This is discussed in more detail in a blog-post by G. Jacques (http://panchr.hypotheses.org/1694), and also included as evidence into the reconstruction of @Baxter<2014> and Sagart (2014).|000|Hmong-Mien, Old Chinese, linguistics reconstruction, stratification, language contact 2978|Dogan2013|As a result of the maximal clique identification procedure, several redundant cliques were produced that differed from each other by a few sequences, revolving around an underlying clique missing a few connections in the connectivity map. In order to detect and eliminate the redundant cliques, Hamming distances [21] between maximal clique pairs were computed. The Ham- ming distance is a measure of difference between two strings of equal length, counting the number of substitutions to change the first string into the second [21]. We defined the fractional Hamming distance between a pair of cliques as the regular Hamming distance divided by the total number of sequences in both cliques. This normalization eliminated the effects of the possible discrepancy between the clique sizes on the distance measure. Calculation of the fractional Hamming distance is given in Equation 1. :math:`H_{f}ab=\frac{\sum^{n}_{i=1}m_{abi}}{n_{a}+n_{b}}` In the expression above, :math:`H_{f}ab` is the fractional Hamming distance between cliques a and b, :math:`m_{abi}` is a binary variable that represent the match or the mismatch at the i th position between cliques a and b, equalling 0 if there is a match and 1 if there is a mismatch, n is the total number of proteins in the test, n 1 and n 2 are the number of proteins in the corresponding cliques.|4|maximal clique, graph theory, homolog detection, redundant clique reduction, 2979|Kehr2014| Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. |000|alignment graph, sequence alignment, genome alignment, 2980|Kehr2014|Interesting article illustrating how multiple alignments can be represented in different graph structures.|000|alignment graph, graph theory, representation, visualization 2981|Kehr2014|.. image:: static/img/kehr-2014-non-linear-alignment.png :width: 600px :name: alt :comment:`Illustration of non-colinear alignments.`|2|sequence alignment, examples, non-colinear alignment, introduction 2982|Kehr2014|They [colinear alignments] do not influence the order of sequence posi- tions, and thus can be captured by colinear alignment. Structural changes involve longer genomic segments, thereby affecting the structure and order of genomic sequences. They include non-colinear changes like inver- sions, translocations and duplications in addition to inser- tions and deletions of longer segments.|1|definition, non-colinear alignment, introduction, sequence alignment 2983|Kirschner2017|Ausgerechnet die den römischen Kaisern gewidme- ten goldenen Bauinschriften haben die Jahrhun­ derte nicht überdauert. Nun ist es gelungen, einige dieser »litterae aureae« von Monumenten in Klein- asien zu rekonstruieren.|000|Bronze inscriptions, Greek, decipherment, Ancient texts, quantitative analysis 2984|Kirschner2017|Interesting article about a new method that extracts the original inscriptions from the relics of bronze letters in Greek monuments (the wholes by which the letters were fixed to the stone).|000|Bronze inscriptions, decipherment, quantitative analysis, 2985|Springer2017|Interesting article on swarm intelligence: apparently, when being asked what they think other people will think, people tend to give a more precise answer on a problem and as a result, this enables researchers to find out what the truth is, rather than relying only on people's judgment. This could be interesting also for the case of grammaticality judgments: Ask informants what they think others would say, and you get a better idea on the actual forms in a language.|000|swarm intelligence, informants, survey, field work 2986|Ibbotson2017|Verbirgt sich im menschlichen Gehirn tatsächlich eine vorprogrammierte mentale Schablone zum Erlernen von Grammatik? Mit dieser Idee prägte der amerika- nische Linguist Noam Chomsky vom Massachusetts Institute of Technology in Cambridge fast ein halbes Jahrhundert lang die gesamte Sprachwissenschaft. Nun aber verwerfen viele Kognitionswissenschaftler und Lin­ guisten Chomskys Theorie der Universalgrammatik, denn neue Untersuchungen der verschiedensten Sprachen sowie der Art und Weise, wie Kleinkinder in Gemeinschaft kommunizieren, schüren starke Zweifel an Chomskys Behauptungen. Vielmehr setzt sich eine radikal neue Sichtweise durch, der zufolge das Erlernen der Muttersprache kein angebore­ nes Grammatikmodul voraussetzt. Offenbar nutzen Klein- kinder mehrere verschiedene Denkweisen, die gar nicht sprachspezifisch sein müssen – etwa die Fähigkeit, die Welt in Kategorien (wie Mensch oder Sache) einzuteilen oder Beziehungen zwischen Dingen zu begreifen. Hinzu kommt die einzigartige Gabe, intuitiv zu erfassen, was uns andere mitteilen möchten; erst so kann Sprache entstehen. Somit reicht Chomskys Theorie längst nicht aus, um den menschlichen Spracherwerb zu erklären. Diese Schlussfolgerung wirkt sich nicht bloß auf die Linguistik aus, sondern auf ganz unterschiedliche Be- reiche, in denen Sprache eine zentrale Rolle spielt, von der Poesie bis zur künstlichen Intelligenz. Da außerdem Men- schen Sprache auf eine Weise gebrauchen, wie es kein Tier vermag, dürften wir auch die menschliche Natur ein wenig besser begreifen, wenn wir das Wesen der Sprache verstehen.|000|Noam Chomsky, Chomsky syntax, universal grammar, universals, formal language theory 2987|Ibbotson2017|Very interesting discussion of the usefulness of Chomsky's claims regarding language universals and universal grammar. Apparently, there is good evidence for a usage-based theory of grammar, which means that we do not have a central guiding principle but rather learn how to speak out of a complex network of associations.|000|formal language theory, Chomsky syntax, generative grammar, critics 2988|Croft2017|After a long period in which the study of the evolutionary origins of the human language capacity was avoided, some linguists now speculate or use compu- tational modeling to try to infer the evolutionary process that led to modern human language. It is assumed that the evolution of language represents an increase in some sort of linguistic complexity: “pre-language” was less com- plex than modern human language in some ways, and – probably gradually – acquired the sort of complexity that modern human languages display. Complexity in language is frequently measured in two ways: structural com- plexity of communicative signals, and the social-cognitive complexity of the interactional situations in which language or language-like communication is used. Some linguists are now attempting to measure structural complexity of contemporary human languages, after a period in which it was assumed that all modern languages were equal in overall structural complexity. There are undoubtedly significant differences among contemporary human languages in terms of the obligatory expression of certain grammatical semantic categories, and the formal morphological complexity of that expression. Nevertheless, all attested human languages display similar degrees of structural complex- ity in “design features” of language (Hockett 1960). These design features include the combinability and recombinability of meaningful units; some kind of hierarchical structure; recursion (but see Everett 2005, 2009), and duality of patterning.|000|language evolution, complex systems, writing systems, linguistic complexity 2989|Duanmu2008|To many people it may seem obvious what syllables are. For example, the English word buy is a syllable, so is the Chinese word ni ‘you’. The English word city has two syllables, so does the Chinese word mayi ‘ant’. The word potato has three syllables; the word syllabification has six syllables; and so on. However, the clarity can be deceptive. While in some words it is easy to count syllables, in others the answer is not so obvious. For example, is the word hour or shower one or two syllables? In addition, while we can often count the number of syllables, it is not always clear where syllable boundaries are. For example, where is the syllable boundary in city and happy? Are all the sounds in smile in one syllable, or are [s] and [l] outside the syllable [mai]? Linguists have wrestled with various answers. In (1) we see several views of syllable count in hour, flour, flower, and shower, which only differ in the initial consonant(s). [...] If syllables are real, we want to know what their structures are and to what extent they can differ from language to language. For example, if [m] can be a syllable in English (as in prism), should we expect [mdok] ‘color’ in the Tibeto-Burman language Jiarong, mgła [mgwa] ‘fog’ in Polish, and [mba] ‘be’ in the African language Mende, to be two syllables each as well? More generally, is the analysis of a string of sounds, such as [mdok], the same for all languages, or can the analysis vary from language to language? In this book I explore answers to such questions. Since syllables are made of sounds and sounds are made of features, I start in Chapter 2 with a discussion of sounds and features, with a focus on the notion of complex sounds. In Chapter 3 I review theories of syllable structure and argue for a new proposal, which I call the CVX theory, according to which the maximal syllable size is CVX (CVV or CVC) and extra consonants at word edges are explained by morphology.|000|syllable structure, syllable boundaries, phonology, 2990|Duanmu2008|Interesting book that reflects on syllables, and syllable structures. New ideas of analysis are exemplifed for Chinese (Standard and Shanghai), English, German, and Jiarong (rGyalrong), so all in all a nice set of examples. Aspects of tone analysis are also treated in the Chinese chapter. Interesting book presenting an alternative to the CVCV theory, which makes stronger predictions on possible words in the languages of the world (at least from first sight). This can be seen as a follow-up on @Scheer2004 and the idea of CVCV theory.|000|tone language, syllable structure, introduction, phonology, English, German, Rgyalrong, Shànghǎi dialect, Mandarin 2991|LaPolla2016|She seems also to think that it is unusual for a form to grammaticalize in one construction and yet still be used as a verb in another construction, but there is nothing unusual about that, and that is [pb] not “layering”; it is simply a reflection of the fact that grammaticalization is not of words, but of constructions. Compare the grammaticalization of prospective aspect marking in English using a construction involving the word go, for example in I’m going to eat now, vs. the use of the verb go in other constructions, such as I’m going to the store now. It isn’t the word go that grammaticalized into prospective aspect marking, but the construction in which it appeared which came to be used for aspect marking, so there is no problem with the word go being used as a verb in other constructions.|842f|grammaticalization, grammatical change, construction grammar 2992|Kogan2016|Genetic relations among Indo-Aryan languages are still unclear. Existing classifications are often intuitive and do not rest upon rigorous criteria. In the present article an attempt is made to create a classification of New Indo-Aryan languages, based on up-to-date lexicosta- tistical data. The comparative analysis of the resulting genealogical tree and traditional clas- sifications allows the author to draw conclusions about the most probable genealogy of the Indo-Aryan languages.|000|Indo-Aryan languages, Indo-European, lexicostatistics 2993|Kogan2016|The article shows a nice contribution to Indo-Aryan languages, drawing from a large database of 110-item word lists. All data is also in the paper, although it is not clear how well it could be re-used from there. In addition, the paper contains hints to interesting wave visualizations as hypergraphs. They are interesting for historical reasons.|000|Indo-Aryan languages, lexicostatistics, dataset 2994|Kluge2017|This book is a grammar of Papuan Malay. While it is not related to the SEA questions, it may be interesting as far as it contains a larger chapter in compounding and word formation in general, which seems to show some parallels to the compounding processes we encounter in Chinese dialect evolution.|000|Papuan Malay, grammar, introduction 2995|Kheng2017|This research provides a phonological description of the Mkuui variety of Daai Chin. Mkuui is one of seven varieties of Daai Chin and is spoken in six villages in Mindat and Paletwa Townships in the Southern Chin Hills of Chin State, Myanmar. Daai Chin belongs to the Southern branch of the Kuki-Naga-Chin group, which is under the Tibeto-Burman language family. A wordlist of 1693 lexical items was collected from three mother tongue speakers of Mkuui, and the clearest and most accurate lexical items from all the speakers were selected for the phonological analysis. Data from all speakers were transcribed and analyzed using several phonological software programs. The phonological inventory of Mkuui Daai contains a total of 23 consonants, eight vowels, and two contrastive tones. Each vowel has contrastive length. The syllable structure is [(C 1 ) V ( ː ) (C 2 ) (ʔ)]. Thus, there are only 6 types of syllables: V, VC, CV, CVC, CVːC and CVCʔ. There are three contrastive presyllables: the prenasals [m-] and [ŋ-], and the prevelar [k-]. The two contrastive tones are Mid Tone ( M ) and High Tone ( H ). The mid tone has two allotones, mid tone ( M ) and mid falling tone ( F ). The mid falling allotone only occurs at the end of the word and can be analysed as a combination of mid tone + a phonetic low tone. The tone bearing unit (TBU) in Mkuui is the mora, of which a syllable can have one or two. Moras are assigned only to the nucleus and to sonorant segments in the coda. Phonological processes in Mkuui include vowel phenomena and compound word boundary phenomena. Among these are vowel lengthening, vowel lowering, vowel raising, nasal coda manner assimilation, nasal coda place assimilation, oral coda co- articulation, glide lowering, and glide strengthening. In terms of rule ordering, the only obligatory ordering is between vowel lengthening and mid falling tone rule. The vowel lengthening must precedes mid falling tone rule.|000|Daai Chin, Sino-Tibetan, dataset, word list, 2996|Kheng2017|Interesting word list in the end, in phonological encoding, but should be able to retrieve phonetics directly, or one could as the author for a spreadsheet.|000|dataset, Kuki-Chin, Daai Chin, Sino-Tibetan, word list 2997|Ravignani2017|Research on the evolution of human speech and phonology benefits from the compara- tive approach: structural, spectral, and temporal features can be extracted and compared across species in an attempt to reconstruct the evolutionary history of human speech. Here we focus on analytical tools to measure and compare temporal structure in human speech and animal vocalizations. We introduce the reader to a range of statistical meth- ods usable, on the one hand, to quantify rhythmic complexity in single vocalizations, and on the other hand, to compare rhythmic structure between multiple vocalizations. These methods include: time series analysis, distributional measures, variability metrics, Fourier transform, auto- and cross-correlation, phase portraits, and circular statistics. Using computer-generated data, we apply a range of techniques, walking the reader through the necessary software and its functions. We describe which techniques are most appropriate to test particular hypotheses on rhythmic structure, and provide pos- sible interpretations of the tests. These techniques can be equally well applied to find rhythmic structure in gesture, movement, and any other behavior developing over time, when the research focus lies on its temporal structure. This introduction to quantitative techniques for rhythm and timing analysis will hopefully spur additional comparative re- search, and will produce comparable results across all disciplines working on the evolu- tion of speech, ultimately advancing the field.|000|rhythmic pattern, complexity, measure, introduction, tutorial 2998|Ringe1995|It is not always clear whether the similarities observed between the lexica of different languages could easily be the result of random chance or must re- flect some historical relationship. Particularly difficult are cases in which the relationship posited is remote at best; such cases must be evaluated by compari- son with mathematically valid models which realistically simulate chance re- semblances between languages. A simple but relatively detailed method for making such comparisons has been described and exemplified in Ringe (1992, 1993), but more general comparisons are also useful.|000|Nostratic, proof of relationship, significance, xi-square test, chance resemblance 2999|Robinson2012|Te non-Austronesian languages of Alor and Pantar in eastern Indonesia have been shown to be genetically related using the comparative method, but the identified phonological innova- tions are typologically common and do not delineate neat subgroups. We apply computational methods to recently collected lexical data and are able to identify subgroups based on the lexi- con. Crucially, the lexical data are coded for cognacy based on identified phonological innova- tions. Tis methodology can succeed even where phonological innovations themselves fail to identify subgroups, showing that computational methods using lexical data can be a powerful tool supplementing the comparative method.|000|dataset, Alor-Pantar, Papua New Guinea, word list 3000|Progovac2016|In making an argument for the antiquity of language, based on comparative evidence, Dediu and Levinson (2013) express hope that some combinations of structural features will prove so conservative that they will allow deep linguistic reconstruction. I propose that the earliest stages of syntax/grammar as reconstructed in Progovac (2015a), based on a theoretical and data-driven linguistic analysis, provide just such a conservative platform, which would have been commanded also by Neandertals and the common ancestor. I provide a fragment of this proto-grammar, which includes flat verb-noun compounds used for naming and insult (e.g., rattle-snake, cry-baby, scatter-brain), and paratactic (loose) combinations of such flat structures (e.g., Come one, come all; You seek, you find). This flat, binary, paratactic platform is found in all languages, and can be shown to serve as foundation for any further structure building. However, given the degree and nature of variation across languages in elaborating syntax beyond this proto-stage, I propose that hierarchical syntax did not emerge once and uniformly in all its complexity, but rather multiple times, either within Africa, or after dispersion from Africa. If so, then, under the uniregional hypothesis, our common ancestor with Neandertals, H. heidelbergensis, could not have commanded hierarchical syntax, but “only” the proto-grammar. Linguistic reconstructions of this kind are necessary for formulating precise and testable hypotheses regarding language evolution. In addition to the hominin timeline, this reconstruction can also engage, and negotiate between, the fields of neuroscience and genetics, as I illustrate with one specific scenario involving FOXP2 gene.|000|language origin, language evolution, Neandertal, proto grammar, speculation 3001|Progovac2016|Authors deduces from the existence of simple sentences and verb-noun compounds which kind of parataxis was present in the proto-grammar of our ancestors before we separated from the Neandertal homo. It may be interesting, but it seems that this paper is not well-enough fed with linguistic expertise. But giving it a closer look may eventually be worthwhile.|000|Neandertal, proto grammar, speculation 3002|Speckmann2010|Statistical data associated with geographic regions is nowadays globally available in large amounts and hence automated methods to visually display these data are in high demand. There are several well-established thematic map types for quantitative data on the ratio-scale associated with regions: choropleth maps, cartograms, and proportional symbol maps. However, all these maps suffer from limitations, especially if large data values are associated with small regions. To overcome these limitations, we propose a novel type of quantitative thematic map, the necklace map. In a necklace map, the regions of the underlying two-dimensional map are projected onto intervals on a one-dimensional curve (the necklace) that surrounds the map regions. Symbols are scaled such that their area corresponds to the data of their region and placed without overlap inside the corresponding interval on the necklace. Necklace maps appear clear and uncluttered and allow for comparatively large symbol sizes. They visualize data sets well which are not proportional to region sizes. The linear ordering of the symbols along the necklace facilitates an easy comparison of symbol sizes. One map can contain several nested or disjoint necklaces to visualize clustered data. The advantages of necklace maps come at a price: the association between a symbol and its region is weaker than with other types of maps. Interactivity can help to strengthen this association if necessary. We present an automated approach to generate necklace maps which allows the user to interactively control the final symbol placement. We validate our approach with experiments using various data sets and maps.|000|necklace maps, visualization, geography, 3003|Speckmann2010|Interesting visualization: you pick a regon, and they draw the parts in the map by drawing a necklace-like circle around the map, so that all major points are nice to see, and we can see things like, e.g., population structures, size, pie-chart-like things, etc.|000|visualization, necklace maps, geography 3004|Lee2008|This paper investigates the articulatory characteristics of the syllable-initial coronal stop ‘d’, fricatives ‘s sh x’, affricates ‘z zh j’ and approximant ‘r’ in the Peking dialect. Palatograms and linguograms show that the syllable-initial (i) ‘d’ is apico-laminal denti-alveolar [V]; (ii) ‘s z’ are laminal alveolar or denti-alveolar [U1 VU1 ] ; (iii) ‘sh zh r’ are apical postalveolar or pre-palatal [5 ̧ V5 ̧ Š ’ ] ; and (iv) ‘x j’ are anterodorsal postalveolar or alveolo- postalveolar [5 V5]. The findings in this study constitute evidence against the claim that ‘sh zh r’ and ‘x j’ in the Peking dialect are retroflexes and palatals, respectively. The findings also show that it is necessary to include the information about lingual contact, otherwise the description of the articulatory properties of the coronal sounds in question will be incomplete.|000|Běijīng dialect, coronals, experimental phonetics, 3005|Lee2008|Short but interesting overview on the phonetic realizaton of the d-t-n-l-etc. sounds in Mandarin Chinese (Běijīng dialect).|000|Běijīng dialect, phonetic description, coronals 3006|Jacques2017TALKa|Interesting talk with a very good handout, illustrating many important aspects of Middle Chinese and reconstructon in Chinese linguistics.|000|Middle Chinese, introduction, talk, tutorial 3007|Vasilev2017|Статья представляет собой первую часть исследования, посвященного проблеме досто- верности лингвистических датировок, получаемых с помощью метода глоттохроноло- гии. В предлагаемой работе рассматривается процесс лексических замен, происходя- щих в базисной лексике одного языка с течением времени. В качестве исходных данных нами использовались 110­словные списки, собранные на материале 54 современных и нескольких исторических романских идиомов. При этом для измерения скорости за- мен списки современных языков сравнивались со списками классической и архаиче- ский латыни, а также старофранцузского и староитальянского. Временна́ я дистанция между сопоставляемыми идиомами определялась с помощью трех различных глотто- хронологических методов: классического уравнения М. Сводеша, модифицированной формулы С. А. Старостина, а также недавно предложенной потоковой модели. Срав- нение полученных результатов позволило сделать ряд важных выводов о характере лексических изменений, адекватности существующих глоттохронологических моделей, а также численно оценить точность и надежность глоттохронологических расчетов при датировании общего процесса замен. Вторую часть исследования планируется посвя- тить проблеме датирования относительной дивергенции двух родственных языков. :translation:`In this paper we discuss the accuracy of glottochronology, a lexicostatistical method used in the dating of linguistic divergence. Our study provides a detailed analysis of the process of lexical replacement in the basic lexicon of one language over the course of time. To measure replacement rates and determine other statistic features of lexical change we use 110­item wordlists, compiled over the past two years for 54 modern and several historically attested Romance languages. Pairwise comparison of modern wordlists with those of Archaic Latin, Late Classical Latin, Old French, and Old Italian allows to obtain several control points suit- able for calibration of glottochronological equations. To estimate the time distance between the compared idioms, three different methods have been applied: the classic formula of M. Swadesh, the modified glottochronology of S. Starostin and a recently proposed approach based on simulation of lexical changes of every meaning on the Swadesh list as stationary Poisson processes. Further analysis resulted in several important conclusions concerning the following questions: (a) what are the main characteristics of lexical divergence in one lan- guage; (b) which of the existing models maps these characteristics more efficiently; (c) how precise and reliable glottochronological dating can be in general. We plan to follow this re- search by another study in which the process of relative divergence between two or more languages with the same ancestor will be considered.`|000|Romance, dialect data, glottochronology, lexicostatistics, dataset, 3008|Vovin2011|The present paper intends to demonstrate that at least the nominative and geni- tive forms of the first person singular pronouns in Mongolic and both first and second pronouns in Tungusic in both nominative and oblique forms are borrowings from Bulgar Turkic, and, therefore, offer no evidence for the genetic relationship between these lan- guages. This conclusion is reached on both phonological and paradigmatic evidence, as well as on the examination of other inter-‘Altaic’ loans. In addition, I present the evi- dence that personal pronouns in Japanese and Korean that have been also cited as a ‘proof’ of the genetic relationship of these languages to other ‘Altaic’ languages have nothing to do with them except superficial chance rersemblance.|000|lexical borrowing, Altaic, Turkic, 3009|Zaidi2017|The evolutionary reasons for variation in nose shape across human populations have been subject to continuing debate. An import function of the nose and nasal cavity is to condition inspired air before it reaches the lower respiratory tract. For this reason, it is thought the observed differences in nose shape among populations are not simply the result of genetic drift, but may be adaptations to climate. To address the question of whether local adaptation to climate is responsible for nose shape divergence across populations, we use Qst–Fst comparisons to show that nares width and alar base width are more differentiated across populations than expected under genetic drift alone. To test whether this differentiation is due to climate adaptation, we compared the spatial distribution of these variables with the global distribution of temperature, absolute humidity, and relative humidity. We find that width of the nares is correlated with temperature and absolute humidity, but not with relative humidity. We conclude that some aspects of nose shape may indeed have been driven by local adaptation to climate. However, we think that this is a simplified explanation of a very complex evolutionary history, which possibly also involved other non-neutral forces such as sexual selection.|000|human prehistory, biological evolution, nose shape, adaptation, climate 3010|Zaidi2017|Partially interesting paper that sees correlation between human nose shape and climate, although the results remain inconclusve (according to the overview I read).|000|nose shape, climate, adaptation, human prehistory, human evolution 3011|Duanmu2017|Despite its seemingly simple structure, there are many interesting questions about the Chinese syllable, such as its size, its structure, its interaction with stress and tone, and why the syllable inventory is so sparsely populated.|000|Chinese, syllable structure, introduction, overview 3012|Duanmu2015|I first introduce the collection methods and the contents of two existing phonology inventory databases, UPSID and P-base, and then point out some shortcomings in them. Then I introduce the newly constructed Phonology Inventory Database of China (PIDOC), including its collection method, contents, and possible applications. I also use examples to illustrate in detail how phonology inventory databases can be used for theoretical research, especially research on feature theory, syllable theory, and phonemic theory.|000|database, introduction, phonology, phoneme inventory 3013|Duanmu2015|Paper introduces a large database of phoneme inventories, which is really huge, and ordered by initials and rhymes rather than complete inventories, which makes the resource much more valuable.|000|Chinese dialects, phoneme inventory, phonology, database 3014|SoHartmann2009|This is a descriptive grammar of Daai Chin. it is very big, but does not have any entries on lexicon, and no dictionary or word list in the end, which is a pity.|000|grammar, Daai Chin, Kuki-Chin, Sino-Tibetan, description 3015|Mielke2017|A method is demonstrated for creating density-equalizing maps of IPA consonant and vowel charts, where the size of a cell in the chart reflects information such as the crosslinguistic frequency of the consonant or vowel. Transforming the IPA charts in such a way allows the visualization of interactions between phonetic features. Density-equalizing maps are used to illustrate a range of facts about consonant and vowel inventories, including the frequency of consonants and vowels and the frequency of common diacritics, and to illustrate the frequency of deletion and epenthesis involving particular consonants and vowels. Solutions are proposed for issues involving genealogical sampling, counting pairs of very similar phones, and counting diacritics in relation to basic symbols.|000|segment inventories, database, visualization, IPA 3016|Mielke2017|Visualization by transforming according to frequency is nothing really new and not necessarily really helpful, apart from the fact that it looks nice. Anyway, since this article also presents pbase as an official database of segment inventories, it can be used to quote pbase and give a good overview on the database.|000|database, phoneme inventory, sound inventories 3017|Kauhanen2017|Language change is neutral if the probability of a language learner adopting any given linguistic variant only depends on the frequency of that variant in the learner’s environment. Ruling out non-neutral motivations of change, be they sociolinguistic, computational, articulatory or functional, a theory of neutral change insists that at least some instances of language change are essentially due to random drift, demographic noise and the social dynamics of finite populations; consequently, it has remained little investigated in the historical and sociolinguistics literature, which has generally been on the lookout for more substantial causes of change. Indeed, recent computational studies have argued that a neutral mechanism cannot give rise to ‘well-behaved’ time series of change which would align with historical data, for instance to generate S-curves. In this paper, I point out a methodological shortcoming of those studies and introduce a mathematical model of neutral change which represents the language community as a dynamic, evolving network of speakers. With computer simulations and a quantitative operationalization of what it means for change to be well-behaved, I show that this model exhibits well-behaved neutral change provided that the language community is suitably clusterized. Thus, neutral change is not only possible but is in fact a characteristic emergent property of a class of social networks. From a theoretical point of view, this finding implies that neutral theories of change deserve more (serious) consideration than they have traditionally received in diachronic and variationist linguistics. Methodologically, it urges that if change is to be successfully modelled, some of the traditional idealizing assumptions employed in much mathematical modelling must be done away with.|000|language change, language model, neutral evolution, explanation of language change, simulation studies 3018|Kauhanen2017|Apparently an interesting paper dealing with the question of the driving forces of language change and applying simulation studies to test this. |000|simulation studies, language change, reasons for language change, neutral evolution, random drift 3019|Pan2017|This study explores the glottalization of Taiwan Min checked tones 3 and 5 with a CV [p t k ʔ] syllable structure. Electroglottography (EGG) supplements acoustic data on disyllabic words with checked tones collected from 40 speakers from five dialect regions. The results indicated that a final coda can be realized as a full oral/glottal stop closure, an energy dip at vowel’s end, an aperiodic voicing at vowel’s end, or a coda deletion. Over 80% of /ʔ/ codas and less than 20% of /p t k/ codas were deleted. The undeleted /p t k/ codas were more likely to be produced with a full stop closure among tone 3 and sandhi tones. Glottal contact quotient (CQ_H) distinguished tones 3 and 5 from unchecked tones 31 and 51, respectively. In sandhi positions, the vowels of tone [5] /3/ were produced with a longer CQ_H, lower H1 ∗ -A3 ∗ and a higher Cepstral Peak Prominence (CPP), suggesting a longer close phase, a more abrupt glottal closure and more periodic voicing than tone [3] /5/. In juncture position, coda deletion and the merging of H1 ∗ -A1 ∗ , H1 ∗ -A3 ∗ and A1 ∗ -A2 ∗ of tones [3] /3/ and [5] /5/ suggest a sound change among checked tones.|000|Taiwanese, Mǐn, Chinese dialects, experimental phonetics, checked tones, 3020|Pan2017|Paper gives an interesting overview on glottalized final stops in Taiwanese Mǐn syllables, including studies on final stop deletion. These findings may be interesting in the context of studying tone change.|000|tone change, tonogenesis, Taiwanese, Mǐn, Chinese dialects, experimental phonetics 3021|Syrjanen2016|The adoption of evolutionary approaches to study language change as a type of non- biological evolution has gained increasing interest and introduced a variety of quan- titative tools to linguistics. The focus has thus far mainly been on language families, or ‘linguistic macroevolution,’ and taken the shape of linguistic phylogenetics. Here we explore whether evolutionary methods could be applicable for studying intra- lingual variation (‘linguistic microevolution’) by testing a population genetic cluster- ing method for analyzing the ‘population structure’ of Finnish dialects. We compare the results with traditional dialect divisions established in the literature and with k- medoids clustering, which is free from biological assumptions. The results are encouragingly similar to each other and agree with traditional views, suggesting that popula- tion genetic tools could be a useful addition to the dialectological toolkit. We also show how the results of the model-based clustering could serve as a basis for further study.|000|dialectology, Finnish dialects, Finnish, population genetics, biological parallels, 3022|Syrjanen2016|Study is based on abstract data from dialect atlases, so no real concrete information is underlying it. The approach may be interesting, but it is not helpful for data-driven approaches to historical linguistics, as we can only learn about some abstract alternations in very specific languages.|000|Finnish dialects, dialect atlas, population genetics, biological parallels 3023|Tamminga2016|Persistence, the tendency to repeat a recently used variant in speech, has been observed for a range of sociolinguistic variables. This paper uses quantitative data from ING and TD in Philadelphia English to show that persistence reflects morphological structure and can therefore be a useful tool for defining variables at the phonology–morphology interface. For both ING and TD, persistence arises only when prime and target belong to the same morphological category, with additional interactions between morphological category and lexical repetition. This pattern of results suggests that both the linguistic variables and cognitive processes at play are multifactorial.|000|persistence, linguistic variation, language change, 3024|Tamminga2016|In the variationist paradigm, the variable choices speakers make in language use are modeled as independent random events, with a complex set of social and linguistic weights affecting the outcome in each instance. But the view of language variation as a string of binomial trials has been known to be a convenient fiction at least since Sankoff and Laberge’s (1978) demonstration that one of the most powerful influences on a choice between pronominal options in Montreal French is simply which option the speaker used last. This repetition effect, here called persistence, is thought to be akin to what psycholinguists call priming: the increased ease with which a speaker remembers or recognizes a word or structure that she has recently encountered. 1|335|language variation, linguistic variation, persistence, definition, terminology 3025|Tamminga2016|Interesting paper about the phenomenon of persistence, which may help to explain certain aspects of language change. I do not really know how important the concrete findings are, but it seems the background is worth a proper read and should not be forgotten when discussing general causes of language change, also in the light of simulation studies.|000|persistence, introduction, linguistic variation, language variation, language change, 3026|Baxter2017|We use a mathematical model to examine three phenomena involving language change across the lifespan: the apparent time construct, the adolescent peak, and two different patterns of individual change. The apparent time construct is attributed to a decline in flexibility toward language change over one’s lifetime; this explanation is borne out in our model. The adolescent peak has been explained by social networks: children interact more with caregivers a generation older until later childhood and adolescence. We find that the peak also occurs with many other network structures, so the peak is not specifically due to caregiver interaction. The two patterns of individual change are one in which most individuals change gradually, following the mean of community change, and another in which most individuals have more categorical behavior and change rapidly if they change at all. Our model suggests that they represent different balances between the differential weighting of competing variants and degree of accommodation to other speakers.|000|language change, modeling, individual language change, 3027|Baxter2017|I find this paper interesting in light of the question of individual language change, that is, language change across the lifespan of one individual. This is an underinvestigated study which bears Lamarckian characteristics of evolution and should definitely be taken into account when discussing aspects of language change in general.|000|individual language change, language change, mechanism of language change 3028|DeSmet2017|This paper hypothesizes that as an expression becomes more frequent in one grammatical context, its mental retrievability improves, which in turn makes it more easily available in different yet closely related (analogous) grammatical contexts. Such a mechanism can account for the progression of gradual change. The hypothesis generates two testable predictions. First, innovative constructions should be more likely to emerge if their analogical models are better entrenched. Second, an expression’s retrievability can also be improved by priming, which in the short term should have a similar effect to entrenchment. These predictions are tested against the development of the noun key into an adjective (as in a very key argument). The change is gradual, starting with increased productivity of compounds with key as specifying element, leading later to debonded and clearly adjectival uses. The development of key is analyzed using data from the British Houses of Parliament. The effect of entrenchment is tested against individual variation. Next, situations are investigated where key has been primed, either by an earlier instance of key or by a collocate of key. The evidence supports the hypothesis. Innovative uses of key are favored under conditions that improve the retrievability of its more conventionalized uses.|000|productivity, compounding, grammaticalization, language change, gradual change, analogy 3029|DeSmet2017|Interesting article in the light of theory, deserving a proper read. The idea states that frequency is at the core of grammaticalization and may explain how certain words gain grammatical use or change the way they are used by a grammar via different steps involving an increase in frequency and productivity. This may be interesting in the light of analogy, but also in the light of the idea of persistence, as discussed in @Tamminga<2016> (2016).|000|persistence, analogy, language change, grammaticalization, compounding, 3030|Mortarino2009|Historical linguistics needs procedures to evaluate the similarity between languages through the comparison of specific word lists drawn from the whole vocab- ulary. The main issue is to evaluate a fair threshold for the number of similar items beyond which it is sensible to reject the hypothesis of chance similarity. After a short review of papers dealing with that problem, in this paper an extension of those meth- ods is proposed which exploits available data in a more efficient way. In particular, the exact distribution of the new test statistics is calculated and the power of the new procedure is compared with the power of the existing method.|000|Monte-Carlo permutation, proof of relationship, methodology, statistical analysis 3031|Gabelentz1843|[...] wir haben es aber auch für zweckmässig erachtet, solche Wörter, wlche als (wirklich oder angeblich) gothisch von griechischen oder römischen Schriftstellern angeführt werden, wie *Asdiggs*, *Ans*, *Gepanta*, *Bilageineis (Straxa, Sihora)*, nicht wegzulassen, sondern nur, da ihre wahre gothische Form nicht verstteht, zur Unterscheidung mit einem † zu bezeichnen, wogegen es offenbar zu weit geführt haben würde, wenn wir hierunter auch Wörter, die erst aus anderen Sprachen, z. B. dem heutigen Spanischen [...] oder der Sprache krimischer Nationen [...] hätten mitaufnehmen wollen. |VI|dagger symbol, uncertainty, methodology, history of science, linguistic annotation 3032|Gabelentz1843|Wir fanden es bedenklich, auf solche einfache, ganz imaginäre Wurzeln zurückzugehen, wie Graff nach indischen Mustern in seinem Sprachschatz aufstellt, gleichwohl konnten wir nicht umhin, in vielen Fällen zwar für uns erlorene, aber doch als bestehend denkbare Stammwörter, besonders Verba der sogenannten starken Conjugation, aufzustellen, aus welchen eine Anzahl vorhandener Wörter abgeleitet schienen, oder einfache Wörter anzunehmen, [pb] welche nur noch in Zusammensetzungen vorkommen. Solche Wörter haben wir mit einem * bezeichnet. :comment:`This quote has been mentioned by ` @Koerner<1976> :comment:`(1976: 186).|VIIf|history of science, linguistic annotation, asterisk 3033|Proulx1984|. Algonqian-Wiyot-Yurok is one of the more distant genetic relationships regarded as definitely established by adherents of the "splitting" tradition of historical Amerindian linguistics (Campbell and Mithun 1979:37-40).1 Indeed, from 1913-when Sapir first pro- posed it-until 1958 there was academic controversy about whether the three branches were genetically related at all (Haas 1958:sec. 1). Since 1958, it has been a favorite example (in methodological discussions) of a proven language family whose protolanguage has largely defied recon- struction (Haas 1967, Hamp 1970, Teeter 1974, Goddard 1|000|Algic, Algonquian, linguistic reconstruction, comparative method, distant relationship 3034|Proulx1984|Interesting language family to keep in minds, since people now generally consider this proven, although they have been struggling for a long time, so from this family, we may be able to learn how they could establish the remote relationship.|000|Algic, Algonquian, linguistic reconstruction, comparative method 3035|Yang2017|We estimated the divergence time between Tibetan and Han populations using the conventional FST-based approach (17) (SI Appendix, Text S1). As described above, there were 3,008 Tibetan and 373 Han subjects collected from the TP after QC. We included in this analysis an additional set of 1,726 Han subjects collected from the Eye Hospital of Wenzhou Medical University (WZ) after QC (Materials and Methods). We used GCTA-GRM to remove cryptic relatedness in the Tibetan and Han samples (note that the Han sample was a combined set of 373 Han subjects from the TP and 1,726 Han subjects from WZ) at a relatedness threshold of 0.05 and retained 1,998 unrelated Tibetan and 2,059 unrelated Han subjects. There was no genetic difference between WZ-Han and TP-Han as shown by PCA (SI Appendix, Fig. S3), probably because most of the Han subjects, collected from either TP or WZ, were originally from diverse regions of China. The genome-wide mean FST between Tibetans and Han was 0.012 [using the method by Weir and Cockerham (18) implemented in GCTA], consistent with the estimate of the Han subjects from the HGDP (SI Appendix, Table S1). Given the genome-wide mean FST value (Materials and Methods), we estimated that the divergence time between Tibetan and Han populations was 189 generations. Assuming an average generation time of 25 y as in previous studies (3, 19), this estimate suggests that Tibetans and Han split about 4,725 y ago, ∼2,000 y earlier than that estimated from whole-exome sequencing data (3) but consistent with recent evidence from archeological studies (20, 21). |2|divergence, dating, Sino-Tibetan, genetic analysis, population genetics 3036|Yang2017|Indigenous Tibetan people have lived on the Tibetan Plateau for millennia. There is a long-standing question about the genetic basis of high-altitude adaptation in Tibetans. We conduct a genome-wide study of 7.3 million genotyped and imputed SNPs of 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian ancestry. Using this large dataset, we detect signals of high-altitude adaptation at nine genomic loci, of which seven are unique. The alleles under natural selection at two of these loci [methylenetetrahydrofolate reductase (MTHFR) and EPAS1] are strongly associated with blood-related phenotypes, such as hemoglobin, homocysteine, and folate in Tibetans. The folate-increasing allele of rs1801133 at the MTHFR locus has an increased frequency in Tibetans more than expected under a drift model, which is probably a consequence of adaptation to high UV radiation. These findings provide important insights into understanding the genomic consequences of high-altitude adaptation in Tibetans. |000|Sino-Tibetan, Tibetan, Hàn Chinese, genetic analysis, divergence, altitude, adaptation 3037|Yang2017|Study is interesting since it seems to confirm a rather early age of the Sino-Tibetan language family. This will be important to be considered in all articles written on the topic, since it seems that despite the young age of Sino-Tibetan, the diversity between the languages is much, much larger.|000|Sino-Tibetan, divergence, divergence time, Hàn Chinese, Tibetan, population genetics, 3038|Holm2017|Despite dozens of hypotheses, the origin and development of the Indo-European language family are still under debate. A well-known glottochronological approach to this problem using Bayesian computation of language divergence dates claims to have provided evidence for the period of Neolithic expansion known as the “Anatolian hypothesis.” The dates have met with considerable criticism from other disciplines. I decided to investigate the evidence for these dates by replicating and analyzing the approach. During this process, a further approach located a date of origin from between 3950 – 4740 BC. One of the insights of this study was that previous results were significantly disrupted by poorly attested languages, which were consistently removed step by step. This paper supports this finding using data from the previous approaches and my own updated dataset. The resulting date is around 4800 BC. However, the topology of the trees differed considerably over the course of several hundreds of tests. This problem was avoided in previous approaches by rigorous topological forcing. Here we apply a west–east dichotomy from a previous purely lexicostatistical (i.e. without times) approach based on the best available Indo-European dataset of approx. 1,100 verbal roots, which produces dates around 4100 BC. These dates reflect the most recent state of knowledge in linguistics, archeology and genetics in favor of the Steppe hypothesis. A new synopsis of the wheel problem, a primary argument for the divergence date, shows that not one but three different Indo-European denotations coincide in different areas with different types of wheel–axle constructions. Archeological cultures likely to have been affected by the migrations are presented visually at the end of this paper.|000|Indo-European, dating, glottochronology, lexicostatistics, Indo-European homeland 3039|Holm2017|Two aspects of the study are interesting: * the dataset of 17 languages is small but may be interesting, as it uses Swadesh's final list * the discussion about gaps is not really clearly presented but confirms my intuition about gaps in data in general, and would be in favor of the high-coverage argument|000|Indo-European, phylogenetic reconstruction, gaps, missing data 3040|Taylor2017|Archaeological horse remains from Mongolia's late Bronze Age Deer Stone-Khirigsuur (DSK) culture present some of the oldest direct radiocarbon dates for horses in northeast Asia, hinting at an important link between late Bronze Age social developments and the adoption or innovation of horse transport in the region. However, wide error ranges and imprecision associated with calibrated radiocarbon dates obscure the chronology of early domestic horse use in Mongolia and make it difficult to evaluate the role of processes like environmental change, economic interactions, or technological development in the formation of mobile pastoral societies. Using a large sample of new and published radiocarbon dates, this study presents a Bayesian chronological model for the initiation of domestic horse sacrifice at DSK culture sites in Mongolia. Results reveal the rapid spread of horse ritual over a large portion of the Eastern Steppe circa 1200 BCE, concurrent with the first appearance of draught horses in China during the late Shang dynasty. These results suggest that key late Bronze Age cultural transformations e spe- cifically the adoption of mobile pastoralism and early horseback riding e took place during a period of climate amelioration, and may be linked to the expansion of horses into other areas of East Asia.|000|archaeology, horse, domestication, dating, Mongolia, 3041|Taylor2017|Potentially interesting article given that it combines recent archaeological evidence for the spread of horse domestication in Mongolia and around.|000|Mongolia, horse, domestication, archaeology 3042|Schultz2017|The discovery of giant viruses blurred the sharp division between viruses and cellular life. Giant virus genomes encode proteins considered as signatures of cellular organisms, particularly translation system components, prompting hypotheses that these viruses derived from a fourth domain of cellular life. Here we report the discovery of a group of giant viruses (Klosneuviruses) in metagenomic data. Compared with other giant viruses, the Klosneuviruses encode an expanded translation machinery, including aminoacyl transfer RNA synthetases with specificities for all 20 amino acids. Notwithstanding the prevalence of translation system components, comprehensive phylogenomic analysis of these genes indicates that Klosneuviruses did not evolve from a cellular ancestor but rather are derived from a much smaller virus through extensive gain of host genes.|000|giant viruses, domains of life, viral evolution, virus, 3043|Schultz2017|The gist of the paper says that giant viruses are apparently not indicators of a fourth domain of life, but rather show how viruses evolve in a specifical manner by incorporating large parts of their hosts genomes.|000|viral evolution, giant viruses, virus, 3044|Jakobson1960|The set (*Einstellung*) towards the MESSAGE as such, focus on the message for its own sake, is the POETIC function of language. [...] This function, by promoting the palpability of signs, deepens the fundamental dichotomy of signs and objects. |356|poetic function, Roman Jakobson, nice quote, 3045|Guilherme2016|Over the history of mankind, textual records change. Sometimes due to mistakes during transcription, sometimes on purpose, as a way to rewrite facts and reinterpret history. There are several classical cases, such as the logarithmic tables, and the transmission of antique and medieval scholarship. Today, text documents are largely edited and redistributed on the Web. Articles on news portals and collaborative platforms (such as Wikipedia), source code, posts on social networks, and even scientific publications or literary works are some examples in which textual content can be subject to changes in an evolutionary process. In this scenario, given a set of near-duplicate documents, it is worthwhile to find which one is the original and the history of changes that created the whole set. Such functionality would have immediate applications on news tracking services, detection of plagiarism, textual criticism, and copyright enforcement, for instance. However, this is not an easy task, as textual features pointing to the documents’ evolutionary direction may not be evident and are often dataset dependent. Moreover, side information, such as time stamps, are neither always available nor reliable. In this paper, we propose a framework for reliably reconstructing text phylogeny trees, and seamlessly exploring new approaches on a wide range of scenarios of text reusage. We employ and evaluate distinct combinations of dissimilarity measures and reconstruction strategies within the proposed framework, and evaluate each approach with extensive experiments, including a set of artificial near-duplicate documents with known phylogeny, and from documents collected from Wikipedia, whose modifications were made by Internet users. We also present results from qualitative experiments in two different applications: text plagiarism and reconstruction of evolutionary trees for manuscripts (stemmatology).|000|phylogenetic reconstruction, documents, stemmatics, 3046|Guilherme2016|Apart from naive handling and poor implementation, not going beyond distance matrices, this article is worth to be quoted and read more closely, since it deals with the task of making trees from data for which it is clear that the ancestral stages are known to some extent. This is clear for a situation on the web, where we may indeed have a source text which is then converted to numerous other texts afterwards. It may have some interesting applications in other contexts, but this remains to be seen and demonstrated, especially for languages.|000|stemmatics, ancestral nodes, distance-based methods, phylogenetic reconstruction, 3047|FerreriCancho2018|It has been hypothesized that the rather small number of crossings in real syntactic dependency trees is a side-effect of pressure for dependency length minimization. Here we answer a related important research question: what would be the expected number of crossings if the natural order of a sentence was lost and replaced by a random ordering? We show that this number depends only on the number of vertices of the dependency tree (the sentence length) and the second moment about zero of vertex degrees. The expected number of crossings is minimum for a star tree (crossings are impossible) and maximum for a linear tree (the number of crossings is of the order of the square of the sequence length)|000|dependency trees, syntax, quantitative analysis, syntactic tree, dependency, 3048|FerreriCancho2017|It seems that this may be useful in another context, although I have some problems in actually formulating why, but my intuition tells me that the question of hierarchical structures and their linear representation is some topic one should not forget too easily, and which may well turn out to be useful in other contexts, e.g., also in morphology, when dealing with compounding or derivation.|000|syntax, dependency trees, dependencies, 3049|Woodhouse2013|Interesting article defending the laryngeal theory, thereby pointing also to quantitative aspects of the theory, as t reduces the phoneme inventory of the proto-language. Interesting in the context of methodology, as it also reflects the current insufficiencies due to lack of formalization.|000|laryngeal theory, comparative method, methodology 3050|Haspelmath2009|This paper argues that three widely accepted motivating factors subsumed under the broad heading of iconicity, namely iconicity of quantity, iconicity of complexity and iconicity of cohesion, in fact have no role in explaining grammatical asymmetries and should be discarded. The iconicity accounts of the relevant phenomena have been proposed by authorities like Jakobson, Haiman and Givón, but I argue that these linguists did not sufficiently consider alternative usage-based explanations in terms of frequency of use. A closer look shows that the well-known Zipfian effects of frequency of use (leading to shortness and fusion) can be made responsible for all of the alleged iconicity effects, and initial corpus data for a range of phenomena confirm the correctness of the approach.|000|iconicity, frequency, markedness, economy, typology 3051|Haspelmath2009|1. Iconicity of quantity ‘‘Greater quantities in meaning are expressed by greater quantities of form.’’ [...] 2. Iconicity of complexity ‘‘More complex meanings are expressed by more complex forms.’’ [...] 3. Iconicity of cohesion ‘‘Meanings that belong together more closely semantically are ex- pressed by more cohesive forms.’’|2|iconicity, frequency, economy, typology, definition 3052|Emlen2017|The complex, multilayered contact between the Quechuan and Aymaran languages is a central but still poorly understood issue in Andean prehistory. This paper proposes a periodization of that relationship and characterizes some aspects of the languages as they might have existed prior to their first contact. After disentangling the linguistic lineages on the basis of a large corpus of lexical data, the paper makes some observations about the phonology of Pre-Proto-Aymara: first, about aspiration and glottalization; second, about the glottal fricative *h; and third, about the phonotactic structure of lexical roots. The paper also presents lexical reconstructions of Proto-Aymara and Proto-Quechua and proposes provenances for several hundred roots. More than a third of the reconstructed Proto-Aymara lexicon may originate in Proto-Quechua. A method like the one presented here is a prerequisite for testing a hypothesis of genetic relatedness between the two families (and others in the region).|000|Quechua, Aymara, proto-form, etymological dictionary, dataset, language contact, lexical borrowing 3053|Emlen2017|Appendix lists interesting data: reconstructed proto-forms for both Aymara and Quechua. |000|Aymara, dataset, Quechua, proto-form, etymology 3054|Hothorn2006|Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.|000|regression analysis, statistical analysis, conditional inference, 3055|Hothorn2006|Although barely needed for current research, regression analysis seems to be something one might want to track and look into in more detail, as it may come in handy later on.|000|regression analysis, statistical analysis, 3056|Hyland2017|Two newly discovered khipu (Andean twisted cord) epistles are presented as evidence that khipus could constitute an intelli- gible writing system, accessible to decipherment. Recent schol- ars have asserted that khipus were merely memory aides, re- cording only numbers, despite Spanish witnesses who claimed that Inka era (1400–1532 CE) khipus encoded narratives and were sent as letters. In 2015, I examined two khipus preserved by village authorities in Peru. Villagers state that these sacred khipus are narrative epistles about warfare. Analysis reveals that the khipus contain 95 different symbols, a quantity within the range of logosyllabic writing and notably more symbols than in regional accounting khipus. A shared, mutually comprehensive communication system of such complexity presupposes a writ- ing system, possibly logosyllabic. At the end of each khipu epis- tle, cord sequences of distinct colors, animal fibers, and ply di- rection appear to represent lineage (“ayllu”) names.|000|khipu, introduction, writing systems, Inka, twisted cords, 3057|Hyland2017|I remember myself all the textbooks on writing where I read that the khipus of the Inkas would not qualify as a writing system but a merely mnemotechnic aid. The more interesting it is for me now to see that this might well be a wrong assessment of the system. In the end this might mean that there is yet another writing system out there waiting to be deciphered.|000|khipu, writing systems, Inka, decipherment 3058|Lenneberg1967|:comment:`Nice visualization of surface and deep structure` .. image:: static/img/lenneberg-1967-289.jpg :name: lenneberg :width: 500px [visualization of surface and deep structure] |289|surface structure, deep structure, interpretation, visualization 3059|Lenneberg1967|If the fundamentals of language have evolved in response to natural selection pressures, would it be fair to assume that the present nature of langauge constitutes in some sense an optimal solution? Such claims have been made, particularly in connection with measurements of the redundancy and information-transmission capacity of natural lan-[pb]guages. But the explanations are always *post hoc*; languages are optimal, fiven the nature of man. But if the nature of language *is* partly the nature of man, as is suggested in teh present hesis, these assertions become tautological. Is the nature of man, including his language, in any sense optimal? This becomes a question of religion rather than science. |391f|adaptationism, language evolution, natural selection, 3060|Scheer2004|Interesting book on CVCV phonology and government phonology. The major idea is that tree-like structures are not needed to generate the linguistic complexity we observe. As a result, the author needs to make some additional assumptions, which are described lengthy, and not really exhaustive with respect to the exemplary generation of the licensed words of one language. Nevertheless, the ideas seem to be satisfying, and the awareness of sonority to rule certain principles is also nice.|000|sonority hierarchy, CVCV, government phonology, phonotactics, phonological theory, introduction 3061|Scheer2004|.. image:: static/img/scheer-2004-52.png :name: image :width: 500px [sonority hierarchy]|52|sonority hierarchy, phonotactics, CVCV 3062|Scheer2005|We argue that there is no need to split phonological representations into two worlds: one syllabic and another in which word stress is calculated. We show that both syllable- and stress-related phenomena can be accounted for with a single set of representations, if traditional syllabic analysis is modified in one central respect: what is traditionally taken to be a coda-onset cluster is interpreted as two independent onsets enclosing an empty nucleus. Accordingly, our proposal may be understood as a development of the idea that underlies classical metrical grids, i.e. that stress-relevant units project to higher levels and are therefore visible for stress. The units in the proposal made here, however, are uniformly nuclei. Contentful nuclei are always projected, while their empty counterparts (i.e. codas in traditional approaches) may or may not be. The weightlessness of onsets directly follows from this approach.|000|syllable structure, stress, stress patterns, phonological theory, CVCV, representation 3063|Scheer2005|This paper gives some further introduction to CVCV phonology, this time dealing with stress and the syllable. This is important in so far as both stress and syllable boundaries seem to be important for sound change processes and need to be accounted for in computational representations of phonology.|000|syllable structure, stress, stress patterns, annotation, representation, CVCV 3064|Scheer2005|Finally, we assume that long vowels are made up of two (skeletal) units which enjoy a certain degree of independence: contra Hayes (1995: 49f), we show that there are cases where it makes a difference which of the two [pb] parts of a long vowel is stressed. As a consequence, we treat long vowels as two distinct constituents associated to one single chunk of melody.|40f|long vowel, vowel length, CVCV, stress 3065|Scheer2005|One major cross-linguistic generalisation that is supported by these data concerns syllable weight: many stress systems distinguish between light and heavy syllables. Typically, the latter 'attract' stress, while the former are only stressed by default, in the absence of an eligible heavy unit. Cross-linguistically, it appears that the overwhelming majority of lan- guages where syllable weight plays a role distinguish between light and heavy syllables in one of the two ways in (2) (see for example Halle & Vergnaud 1987, Hayes 1989a: 256ff, 1994: 61f, 1995: 50ff). .. image:: static/img/scheer-2005-43.png :name: bla :width: 500px [syllable weights]|43|syllable weight, CVCV 3066|Scheer2005|A third line of division between heavy and light syllables has been reported in the literature: closed syllables are heavy only if the rhymal consonant is a sonorant; if it is an obstruent, its syllable counts as light.|44|syllable weight, coda, sonority, CVCV 3067|Scheer2005|The idea of positing an empty vocalic position between any two super- ficially adjacent consonants in no way weakens the theory. In fact, such a theory is much more constrained that one in which syllabic constituents are allowed to branch freely, without any principled way of limiting the [pb] number of segments that can be sisters within a constituent. Our approach explains why there is a limit on consonant clusters: the intervening empty vocalic positions have to be silenced. Clearly, government cannot be the only way to silence a vocalic position, since this would, for example, pre- dict that three-member consonant clusters could not exist.|54f|CVCV, introduction 3068|Scheer2005|Hayes (1995: 24ff) gives four criteria to distinguish stress from both tone and pitch accent. Of these, three show that the Ancient Greek accentual system involves stress: (i) it is culminative, i.e. 'each word has a single strongest syllable bearing the main stress' (enclitics provide exceptions), (ii) it is rhythmic, i.e. evenly distributed, and clashes are avoided (in cliticisation at least one unstressed mora must separate two stresses) and (iii) it fails to exhibit any kind of assimilation. We are unable to determine whether Ancient Greek stress is also hierarchical, thus meeting Hayes' fourth criterion. Crucially, none of the criteria shows that Ancient Greek accent should be anything but stress.|55|stress, stress patterns, definition, mora, CVCV 3069|Scheer2005|Table I summarises how the parameter regarding the weight of closed syllables is implemented in the different approaches reviewed. .. image:: static/img/scheer-2005-59.png :name: bla :width: 500px [table 1]|59|syllable weight, CVCV, stress 3070|Scheer2005|The only thing that we wish to point out here is that in no event can the two structures be confused: the nucleus preceding a branching sonorant is always empty when the son- orant is syllabic, while it is occupied by a vowel in the Kwakwala pattern. .. image:: static/img/scheer-2005-68.png :name: bla :width: 500px [example for handling of syllabic consonants]|68|syllabic consonants, CVCV, annotation 3071|Scheer2005|In this paper, we have proposed an apparently minor modification in autosegmental representations, i.e. that coda consonants are not defined arboreally as being the sister of the nucleus, but laterally as being followed by a nucleus which remains unpronounced. We hope to have shown that at least three hitherto unexplained robust generalisations regarding stress follow from this: (i) onset weightlessness, (ii) unified representations for stress and the syllable and (iii) the homogeneous character of the units that are counted by stress algorithms (only nuclei). Onsets cannot possibly contribute a mora, since they are by definition followed by a pronounced nucleus, which is in itself moraic. :comment:`[This means that the onset cannot decide on stress in languages that put the stress in a regular fashion]` Codas, on the other hand, are followed by an unpronounced nucleus, which may or may not be moraic, i.e. it is the following nucleus, not the coda, that contributes the mora in Weight- by-Position languages. Our system thus does not necessitate the explicit introduction of moras: their function is already fulfilled by nuclei.|69|stress, stress patterns, mora, CVCV, 3072|Duanmu2008|The CVX theory proposes that a word has the schematic structure in (5). .. image:: static/img/duanmu-2008-40.png :name: bla :width: 500px [CVX account on words (cross-linguistically)]|40|CVCV, CVX, word structure, phonotactics, sound patterns, phonology, phonological theory 3073|Duanmu2008|According to @Jespersen<1904> (1904), speech sounds can be ranked along the sonority scale in (9), where > means “has greater sonority than.” .. image:: static/img/duanmu-2008-42.png :name: bla :width: 500px [sonority scale by Jespersen]|42|sonority hierarchy, history of science, 3075|Duanmu2008|However, recent studies (e.g. Frisch et al. 2000, Myers and Tsay 2005, Zhang 2007) show that speaker judgment on possible words is not always clear-cut. In addition, many studies have noted that consistent judgment on syllable boundaries can be hard to obtain (Gimson 1970, Treiman and Danis 1988, Giegerich 1992, Hammond 1999, Steriade 1999, Blevins 2003).|53|syllable structure, syllable boundaries, native speaker judgment, CVX 3076|Jespersen1904|Wir wollen jetzt versuchen, die Sonoritätsverhältnisse bei einer Reihe von Lautverbindungen, den Worten: *sprengst*, *Tante*, *Attentat*, *keine*, graphisch zu repräsentieren: .. image:: static/img/jespersen-1904-187.png :name: bla :width: 500px [sonority visualization in words]|187|sonority hierarchy, sonority, sonority profile, 3077|Jespersen1904|In der folgenden Übersicht beginne ich mit den wenigst klangvollen; die in Klammern hinzugefügten Laute sollen bloals typische Beispiele aufgefaßt werden; in jeder Klasse finden sich natürlich mehr als die angeführten Laute, und innerhalb jeder Klasse (namentlich vielleicht zwischen den verschiedenen r-Lauten) finden sich auch verschiedene Grade der Schallfülle, so daß das Schema nicht allzu absolut aufgefaßt werden darf. .. image:: static/img/jespersen-1904-186.png :name: bla :width: 500px [sonority hierarchy]|186|sonority hierarchy, Otto Jespersen, 3078|Argue2017|Although the diminutive Homo floresiensis has been known for a decade, its phylogenetic status remains highly contentious. A broad range of potential explanations for the evolution of this species has been explored. One view is that H. floresiensis is derived from Asian Homo erectus that arrived on Flores and subsequently evolved a smaller body size, perhaps to survive the constrained resources they faced in a new island environment. Fossil remains of H. erectus, well known from Java, have not yet been discovered on Flores. The second hypothesis is that H. floresiensis is directly descended from an early Homo lineage with roots in Africa, such as Homo habilis; the third is that it is Homo sapiens with pathology. We use parsimony and Bayesian phylogenetic methods to test these hypotheses. Our phylogenetic data build upon those characters previously presented in support of these hypotheses by broadening the range of traits to include the crania, mandibles, dentition, and postcrania of Homo and Australopithecus. The new data and analyses support the hypothesis that H. floresiensis is an early Homo lineage: H. floresiensis is sister either to H. habilis alone or to a clade consisting of at least H. habilis, H. erectus, Homo ergaster, and H. sapiens. A close phylogenetic relationship between H. floresiensis and H. erectus or H. sapiens can be rejected; furthermore, most of the traits separating H. floresiensis from H. sapiens are not readily attributable to pathology (e.g., Down syndrome). The results suggest H. floresiensis is a long-surviving relict of an early (>1.75 Ma) hominin lineage and a hitherto unknown migration out of Africa, and not a recent derivative of either H. erectus or H. sapiens.|000|homo floresiensis, homo sapiens, archaeology, phylogenetics, human prehistory, maximum parsimony 3079|Argue2017|What is interesting about this study is HOW they do it: They set up a set of characters, morphological characters, and then run parsimony software on it. What is not clear at all to me is: how do they plan on weighting characters, how do they know that those characters are really good in reflecting phylogenies, etc.? Reminds a bit of linguistics. On the other hand, we should try and see how much we can learn from this kind of analysis for our own sake.|000|maximum parsimony, homo floresiensis, homo sapiens, morphological characters, phylogenetic reconstruction 3080|Hoenigswald1963|Of course, there is such a branch of linguistics, but we call it typology. |1|comparative method, methodology 3081|Kotorova2013|The following article gives an analysis of the speech act of SAYING THANK in Russian and German from three different angles: Firstly, from a socio-cultural perspective which concerns the culture of SAYING THANK existing in both language communities. Secondly, from a pragmatic perspective which considers whether and how the situations in which Germans and Russians express SAYING THANK are different. And thirdly, from the linguistic-structural viewpoint which concerns the structure and function of the SAYING THANK expression in both languages; the Russian forms that realize this particular speech act are compared with the German ones.|000|thanking, speech act, Russian, German, contrastive linguistics 3082|Fikke2017|Interesting paper presenting a new hypothesis about the origin of Munch's vision that inspired him to create the Scream. They say it is mother-of-perl clouds, a rare weather phenomenon, which had not been reported or investigated during Munch's time.|000|Edvard Munch, mother-of-pearl clouds, art, painting, Scream, clouds 3083|Fikke2017|Collage showing details of mother-of-pearl clouds together with The Scream (1910 version). .. image:: static/img/fikke-2017-1.png :name: bla :width: 500px [Figure 3]|1|Edvard Munch, Scream, mother-of-pearl clouds, painting, visualization 3084|Prochazka2017|Many of the world’s around 6,000 languages are in danger of disappearing as people give up use of a minority language in favor of the majority language in a process called language shift. Language shift can be monitored on a large scale through the use of mathematical models by way of differential equations, for example, reaction–diffusion equations. Here, we use a different approach: we propose a model for language dynamics based on the principles of cellular automata/agent-based modeling and combine it with very detailed empirical data. Our model makes it possible to follow language dynamics over space and time, whereas existing models based on differential equations average over space and consequently provide no information on local changes in language use. Additionally, cellular automata models can be used even in cases where models based on differential equations are not applicable, for example, in situations where one language has become dispersed and retreated to language islands. Using data from a bilingual region in Austria, we show that the most important factor in determining the spread and retreat of a language is the interaction with speakers of the same language. External factors like bilingual schools or parish language have only a minor influence. |000|language shift, bilingualism, simulation studies, 3085|Prochazka2017|Interesting study that, however, uses a very simplifying approach for the simulation. What we can learn from it is that in order to test certain questions, we do not need to have the good or best or most complex models, but rather simply some good data with which we could compare the results of the evaluation, since, in this way, we could then test which are the factors that make the model more closely mimick reality.|000|simulation studies, demography, Austrian dialects, bilingualism, 3086|Brysbaert2016|Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.|000|knowledge, language acquisition, word frequency, number of words, psycholinguistics 3087|Brysbaert2016|Very nice paper bringing finally attention to the question of how many words an average person knows, which is incredibly useful when dealing with simulations or when trying to figure out how to trace the phylogenies of a language.|000|word frequency, word knowledge, number of words, psycholinguistics, 3088|Brysbaert2016|A group of lemmas that are morphologically related form a word family. The various members are nearly always derivations of a base lemma or compounds made with base lemmas. In our small example corpus the lemmas “the, cat, on, roof, and meow” are all base lemmas of different families, but the lemma “helplessly” can be simplified to “help.”|2|word family, definition 3089|Brysbaert2016|Uninflected word from which all inflected words are derived. In most analyses is limited to alphabetical word types that are seen by the English community as existing words (e.g., they are mentioned in a dictionary or a group of people on the web use them with a consistent meaning). In general, lemmas exclude proper nouns (names of people, places, . . .). Lemmatization also involves correcting spelling errors and standardizing spelling variants. In the small corpus example we are using, there are six lemmas (the, cat, on, roof, meow, and helplessly).|2|lemma, definition 3090|Ullah2015|Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714–5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1–44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323–330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540–1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/.|000|species tree, gene tree reconciliation, phylogenetic reconstruction, software, mixed models 3091|Holen2017|The earliest dispersal of humans into North America is a contentious subject, and proposed early sites are required to meet the following criteria for acceptance: (1) archaeological evidence is found in a clearly defined and undisturbed geologic context; (2) age is determined by reliable radiometric dating; (3) multiple lines of evidence from interdisciplinary studies provide consistent results; and (4) unquestionable artefacts are found in primary context 1,2 . Here we describe the Cerutti Mastodon (CM) site, an archaeological site from the early late Pleistocene epoch, where in situ hammerstones and stone anvils occur in spatio-temporal association with fragmentary remains of a single mastodon (Mammut americanum). The CM site contains spiral-fractured bone and molar fragments, indicating that breakage occured while fresh. Several of these fragments also preserve evidence of percussion. The occurrence and distribution of bone, molar and stone refits suggest that breakage occurred at the site of burial. Five large cobbles (hammerstones and anvils) in the CM bone bed display use-wear and impact marks, and are hydraulically anomalous relative to the low-energy context of the enclosing sandy silt stratum. 230 Th/U radiometric analysis of multiple bone specimens using diffusion–adsorption–decay dating models indicates a burial date of 130.7 ± 9.4 thousand years ago. These findings confirm the presence of an unidentified species of Homo at the CM site during the last interglacial period (MIS 5e; early late Pleistocene), indicating that humans with manual dexterity and the experiential knowledge to use hammerstones and anvils processed mastodon limb bones for marrow extraction and/or raw material for tool production. Systematic proboscidean bone reduction, evident at the CM site, fits within a broader pattern of Palaeolithic bone percussion technology in Africa 3–6 , Eurasia 7–9 and North America 10–12 . The CM site is, to our knowledge, the oldest in situ, well-documented archaeological site in North America and, as such, substantially revises the timing of arrival of Homo into the Americas.|000|mammoth, Mammut, America, California, archaeological site, 3092|Holen2017|If it is true, this would be a sensation, but we would still not know, who was there in America 130000 years ago. Anyway, their proof is shallow, based on one bone that was apparently split by a stone. Are there any proofs in linguistics which are shallow to a similar degree?|000|California, migration, archaeology, paleolithicum, Steinzeit 3093|Brysbaert2016|The power of morphological families has been investigated most extensively in second language education (Goulden et al., 1990), where it was observed that participants were often able to understand the meaning of non-taught members of the family on the basis of those taught. If you know what ‘diazotize’ means, you also know what ‘diazotization, diazotizable, and diazotizability’ stand for, and you have a pretty good idea of what is referred to with the words ‘misdiazotize, undiazotizable, or rediazotization.’ Similarly, if you know the adjective ‘effortless,’ you will understand the adverb ‘effortlessly’ (you can even produce it). You will also quickly discover that the adjective ‘effortless’ consists of the noun ‘effort’ and the suffix ‘-less.’ And if you know the words ‘flower’ and ‘pot,’ you understand the meaning of ‘flowerpot.’|5|morphology, word family, productivity, perception 3094|Brysbaert2016|Psycholinguistic research (Schreuder and Baayen, 1997; Bertram et al., 2000) has also pointed to the importance of word family size in word processing. Words that are part of a large family (‘graph, man, work, and fish’) are recognized faster than words from small families (‘wren, vase, tumor, and squid’).|5|word family, language processing, speech processing 3095|Brysbaert2016|Goulden et al. (1990) analyzed the 208,000 distinct lemma entries of Webster’s Third New International Dictionary (1961) described above. Of these, they defined 54,000 as base words (referring to word families), 64,000 as derived words, and 67,000 as compound words (there were also 22,000 lemmas that could not be classified). The person, who has spent most energy on defining word families, is Nation (e.g., Nation, 2006) 8 . His current list has pruned (and augmented) Goulden et al.’s (1990) list of [pb]54,000 base words to some 28,000 word families, of which 10,000 suffice for most language understanding.|5f|word family, English, statistics 3096|Sagart2001|Although Hakka and Nanxion differ in their mode of devoicing, the dialect of Nanxion city has a split treatment of the entire *zhuoshang* category remarkably similar to Hakka in its lexical incidence. This suggests that they share a recent common ancestor, from which teh *zhuoshang* split was inherited. It is argued that the set of *zhuoshang* words which have tone 1 in standard Hakka and tone 1 or 2 in Nanxiong had tone 4 in the parent language. It is also argued that this common ancestor had not yet merged its *quanzhuo* initials with the voiceless aspirated initials. The ancestral tone 4 then merged with tone 1 in Hakka, and with tone 1 aor 2 in Nanxiong; and devoicing occurred separately in Nanxiong and Hakka.|000|Chinese dialects, dialect classification, Nanxiong, Hakka, subgrouping 3097|Vovin2017|In this modest contribution I intend to demonstrate to major points: first, that the gradual language loss is an inevitable part of the historical development of the humanity, and second that the language revitalization is essentially doomed. In other words, the history (an objective) process) cannot be stopped or reversed as far as language death is concerened, no matter how much we can lament (a subjective process) this sad state of affairs. In order to illustrate these two points with concrete examples, I chose several case studies for both,|000|language loss, language death, revitalization, discussion 3098|Yakhontov1991|Если так, то нет смысла спрашивать, что такое milk — слово или морфема (вопросы этого рода довольно часто ставятся и решаются в китаеведении); это все равно что спрашивать, что такое и в сочетании стол и стул — слово или фонема: и то и другое — смотря какой уровень мы имеем в виду в данном конкретном случае (подробнее см. [Рефор- матский 1967: 28]); утверждение, что такой-то элемент в одних случаях является морфемой, а в других — словом, обычно фактически означает, что речь идет о свободной морфеме (см. ниже). :comment:`[It is the realization of a function in a given level of speech that distinguishes morpheme from word or phoneme from morpheme, not the form itself.]`|330|morphology, phonology, functional units, 3099|Yakhontov1966|Early text on dialect classification which is rather explicit and not often mentioned in the literature.|000|dialect classification, Chinese dialects, subgrouping 3100|Beckwith2016|The earliest known texts in a Tibeto-Burman language are three songs preserved in Chinese transcription from the first century of the present era. They are odes praising the Chinese government, and have been argued by [Coblin 1974], Tung Tso-pin, and other scholars, to have been translated from Chinese into the language of the Pai-lang 白狼 (Báiláng), who pre- sented them to the Chinese emperor in Loyang. It has also been noted that Pai-lang appears to belong to the Lolo-Burmese branch of Tibeto-Burman.|000|Late Old Chinese, transcription, Old Chinese, Tibeto-Burman, Báiláng, Pai-lang 3101|Beckwith2016|Text presents the Pai-lang texts by using modern Old Chinese reconstruction systems and exhaustive annotation of rhymes.|000|rhyme patterns, rhyme analysis, Pai-lang, Báiláng, transcription, Late Old Chinese, Old Chinese 3102|Beckwith2016|There are seven principal types of rhyme: 1) head-rhymes (i. e., the same initial consonant(s) shared by the first syllable of two or more lines) 1 . 2) head-syllable rhymes (i. e., the first syllable of two or more lines) 3) end-rhymes (i. e., end-syllable rhymes) 4) line-internal rhymes horizontally (i. e., within the same line) 5) internal rhymes vertically between lines (i.e., other than at the begin- ning or the end of the affected lines) 6) skewed rhymes (i. e., vertical rhymes in two or more rhymes, but shifted one syllable in either direction, typically in pairs — e. g., ab in one line directly over ba in the next — or longer strings) 7) a skipping pattern within consecutive text (in this pattern, which is particularly frequent, two syllables rhyme sequentially, but another non- rhyming syllable intervenes between them).|453|rhyme patterns, rhyme analysis, introduction, classification, 3103|Kortvelyessy2017|Introduction to language typology, potentially useful when giving introductory courses and discussing questions of definitions etc.|000|language typology, introduction, handbook 3104|Hu2017|The indigenous people of the Tibetan Plateau have been the subject of much recent interest because of their unique genetic adaptations to high altitude. Recent studies have demon- strated that the Tibetan EPAS1 haplotype is involved in high altitude-adaptation and origi- nated in an archaic Denisovan-related population. We sequenced the whole-genomes of 27 Tibetans and conducted analyses to infer a detailed history of demography and natural selec- tion of this population. We detected evidence of population structure between the ancestral Han and Tibetan subpopulations as early as 44 to 58 thousand years ago, but with high rates of gene flow until approximately 9 thousand years ago. The CMS test ranked EPAS1 and EGLN1 as the top two positive selection candidates, and in addition identified PTGIS, VDR, and KCTD12 as new candidate genes. The advantageous Tibetan EPAS1 haplotype shared many variants with the Denisovan genome, with an ancient gene tree divergence between the Tibetan and Denisovan haplotypes of about 1 million years ago. With the exception of EPAS1, we observed no evidence of positive selection on Denisovan-like haplotypes.|000|Sino-Tibetan, Tibetan, genetic classification, population genetics, dating, genome analysis 3105|Hu2017|Paper dates separation of Tibetans from the rest of Sino-Tibetan to 9000 years before now.|000|dating, Sino-Tibetan, Tibetan, genome analysis 3106|Harmon2008|Summary: GEIGER is a new software package, written in the R language, to describe evolutionary radiations. GEIGER can carry out simulations, parameter estimation and statistical hypothesis testing. Additionally, GEIGER’s simulation algorithms can be used to analyze the statistical power of comparative approaches. Availability: This open source software is written entirely in the R language and is freely available through the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/.|000|software, simulation studies, R-language, 3107|Harmon2008|Software can be used to model morphological evolution in biology, may also be useful for linguistic purpose.|000|R-language, software, simulation studies 3108|Sperber1923|Sehr häufig entdeckt man, daß einem Wort bloß deshalb ein Bedeutungswandel zugeschrieben wird, weil seine *tatsächliche* Bedeutung sich nicht oder nicht völlig mit derjenigen deckt, die der *etymologische* Wortsinn anzudeuten scheint. Z. B. liegt in dem Wort *Schneider* kein Hinweis darauf, daß das, was der so bezeichnete Handwerker schneidet, gerade zu Kleidern bestimmte Stoffe sein müssen. Es sieht vielmehr ganz so aus, als habe hier ein Wort, das ursprünglich ganz allgemein „Mensch, welcher (etwas beliebiges) schneidet” bedeutete, sekundär eine Einschränkung seines Begriffsumfanges erlitten, die zur Entwicklung seines heutigen engbegrenzten Sinnes führte. Oder: Gewehr bedeutet etymologisch nicht „Handfeuerwaffe von einem bestimmten Typus” sondern „irgendetwas, womit man sich wehrt” usw. ins Unendliche. [pb] Schon Paul hat mit Recht darauf hingewiesen, daß in solchen Fällen immer erst untersucht werden muss, ob die durch die Etymologie angedeutete allgemeine- re Bedeutung jemals bestanden hat, oder ob nicht das betreffende Wort vom ersten Augenblick and nur die engere Bedeutung besessen hat, eine Einschränkung seines Begriffsumfanges also gar nicht erfolgt ist. [...] Die erste Bedingung für die Ansetzung eines Bedeutungsübergangs ist doch offenbar, daß mindestens zwei Bedeutungen ei- ne ursprüngliche und eine abgeleitete, im Sprachbewußtsein wirklich vorhanden sind oder waren. Bevor man von einem Bedeutungswandel spricht, muß man sich daher immer zuerst überzeugen, daß nicht die eine dieser Bedeutungen auf einem unbe- fugten Rückschluß aus dem buchstäblichen Sinn des Wortes beruht.|11f|semantic change, evidence, conceptualization, literal meaning, 3109|Paul1886|Während der lautwandel durch eine widerholte unterschiebung von etwas unmerklich verschiedenem zu stande kommt, wobei also das alte untergeht, zugleich mit der entstehung des neuen, ist beim bedeutungswandel die erhaltung des alten durch die entstehung des neuen nicht ausgeschlossen. Er besteht immer in einer erweiterung oder einer verengung des umfangs der bedeutung, denen eine verarmung oder bereicherung des inhalts entspricht. Erst durch die aufeinanderfolge von erweiterung und verengung kann eine von der ursprünglichen völlig verschiedene bedeutung sich bilden.|66|transition, sound change, semantic change, cumulation, reduction, 3110|Paul1886|Zunächst gehören hierher alle die fälle, in denen die lautliche übereinstimmung bei verschiedenheit der bedeutung nur auf zufall beruht, wie bei nhd. *acht* = *diligentia* [Aufmerksamkeit] – *proscriptio* [Bann]– *octo* [acht]. [...] lautlich besteht [...] identität, und derjenige, welcher einen solchen lautcomplex ausser zusammenhang aussprechen hört, hat kein mittel zu erkennen, welche von den verschiedenen damit verknüpften bedeutungen der sprechende im sinne hat. [...] Wirkliche mehrheit von bedeutungen muss man aber auch in sehr vielen fällen anerkennen, wo nicht bloss lautliche, sondern auch etymologische identität besteht. Man vergleiche z.b. nhd. *fuchs* *vulpes* – *pferd von fuchsiger farbe* – *rothaariger mensch* – *schlauer mensch* – *goldstück* – *student im ersten semester*, *boc* *hircus* – *bock der kutsche* – *fehler*, *futter* *pabulum* – *überzug* oder *unterzug*, [...]. In den meisten der angeführten fälle ist es ohne geschichtliche studien überhaupt nicht möglich den ursprünglichen zusammenhang zwischen den einzelnen bedeutungen zu erkennen, und dieselben verhalten sich dann gar nicht anders zu einander, als wenn die lautliche identität nur zufällig wäre.|68|etymological relations, polysemy, homophony, semantic change, 3111|Erdmann1900|„Worte sind Zeichen für Begriffe.” Dieser Satz bedarf – wie schon dargelegt – in mehr als einer Hinsicht der Einschränkung. Aber auch wenn man den Sinn des Wortes „Begriff” so weit faßt wie nur irgend möglich, giebt der Satz keinesfalls eine erschöpfende Definition. Worte sind noch anderes und mehr als Zeichen für Begriffe. Sie enthalten Werthe, die nichts mit dem zu schaffen haben, was wir bisher an den Wortbedeutungen beachtet und untersucht haben; Werthe, auf denen gerade die feinsten Wirkungen der Sprache beruhen. Es empfielt sich, sie von dem gewöhnlichen, dem begrifflichen Wortsinne abzusondern und ihm gegenüber zu stellen.|78|connotation, literal meaning, meaning, 3112|Frege1892|Es liegt nun nahe, mit einem Zeichen (Namen, Wortverbindung, Schriftzeichen) außer dem Bezeichneten, was die Bedeutung des Zeichens heißen möge, noch das verbunden zu denken, was ich den Sinn des Zeichens nennen möchte, worin die Art des Gegebenseins enthalten ist. Es würde danach in unserm Beispiele zwar die Bedeutung der Ausdrücke „der Schnittpunkt von a und b” und „der Schnittpunkt von b und c” dieselbe sein, aber nicht ihr Sinn. Es würde die Bedeutung von „Abendstern” und „Morgenstern” dieselbe sein, aber nicht der Sinn.|26f|Morgenstern, Abendstern, Sinn, Bedeutung, Gottlob Frege, intension, extension, 3113|Erdmann1900|Um eine bequeme Verständigung zu erzielen, wird es sich empfehlen, wie ich schon andeutete, scharf zwischen dem begrifflichen Inhalt und der Gesammtbedeutung des Wortes zu unterscheiden; zwischen dem begrifflichen Inhalt, der alle objectiven Merkmale einschließt, und der allgemeinen Wortbedeutung, die außer dem Begriff noch alle anderen Werthe enthält, die das Wort zum Ausdruck bringt. Diese Werthe sondere ich also von der Wortbedeutung ab, stelle sie dem Begriff gegenüber und fixiere sie sprachlich als „Nebensinn” und „Gefühlswerth” (Stimmungsgehalt). Nach dieser Auffassuns- und Ausdrucksweise ist es dann eindeutig zu sagen, dass Leu und Löwe einerseits und Hose und Beinkleid andererseits Worte von verschiedener Bedeutung aber gleichem begrifflichen Inhalt seien.|80|connotation, literal meaning, meaning, semantics, 3114|Sperber1923|Die Aufgabe, die Bedeutung eines Wortes festzustellen, mag für den Logiker besagen: seinen begrifflichen Inhalt möglichst genau erfassen und abgrenzen, eine Definition liefern, die nach altbewährtem Rezept die übergeordnete Art und die unterscheidenden Merkmale angibt. Für den Sprachforscher bedeutet sie etwas anderes und in der Regel weit Schwierigeres, denn er ist schon hier, wie in allen weiteren Stadien seiner Arbeit, genötigt, neben den klar erfaßbaren logischen auch psychologische Faktoren von oft schwer greif- und wägbarer Natur in Betracht zu ziehen. :translation:`The task of determining the meaning of a word may mean for a logician to define its conceptual content with high precision, providing, as usually, its superordinated concepts and its defining characteristics. For a linguist, this task means something different, and usually much more difficult, since the linguist is, as in all further stages of his work, forced to include -- next to the clearly logical factors -- also those factors which are psychological and often difficult to grasp.`|1|meaning, literal meaning, connotation, 3115|Francois2008|A given language is said to colexifiy two functionally distinct senses if, and only if, it can associate them with the same lexical form.|170|colexification, definition, terminology 3116|Francois2008|In particular, "strict colexification" (same lexeme in synchrony) should be carefully distinguished from "loose colexification" (covering all other cases mentioned here). |171|loose colexification, colexification, partial colexification, definition, terminology 3117|Francois2008|In principle, the colexification model itself consists first and foremost in stating the facts -- that is, detecting and documenting the cases of colexification that are empirically attested across languages. The interpretation of these semantic connetions, whether it takes a historical or a cognitive perspective or otherwise, arguably belongs to another phase of the study.|172|colexification, interpretation, description, descriptive terminology, descriptive term 3118|Sperber1923|Aber auch wo sich bei ein und demselben Lautkomplex zwei Bedeutungen, eine ältere und eine jüngere, klar und deutlich be[pb]legen lassen, muß noch nicht notwendig ein Bedeutungswandel vorliegen. [...] wenn der Lautkomplex *Limburger* einerseits einen Menschen aus Limburg, andererseits eine Art Käse bezeichnet. Wer statt *Limburger Käse* der Kürze halber *Limburger* sagt, der ist sich in der Regel ebensowenig bewußt, daß es ein *Limburger* ,,Mann aus Limburg" gibt, wie etwa jemand, der an den Worten *Gorgonzolakäse* oder *Goudakäse* analoge Verkürzungen vornimmt, über die ursprüngliche Bedeutung der Lautkomplexe *Gouda* oder *Gorgonzola* zu reflektieren pflegt. Dei beidem *Limburger* sind zwei genetisch ganz verschiedene Worte, das eine gebildet durch normale Ableitung vom Namen *Limburg*, das andere durch Verkürzung von *Limburgerkäse*. Die Homonyme stehen daher zu einander nicht im Verhältnis des Bedeutungswandels [...].|12f|homonymy, homophony, semantic similarity, meaning, polysemy, semantic change, 3119|Urban2016|This article tackles a question raised by one of the founding figures of lexical typology, Stephen Ullmann: to what degree do languages differ in the extent to which they resort to morphologically analyzable lexical items? Drawing on a worldwide sample of 78 languages for which a standard set of 160 mostly nominal meanings is investigated, the article demonstrates that variability in this area is indeed profound. Correlations between the relative prevalence of analyzable items in a language with the size of its consonant inventory, the complexity of its syllable structure, and the length of its nominal roots suggest that, typologically, languages with a simple phonological structure are those in which analyzability in the lexicon is most profound. Possible explanations for this observation in terms of the avoidance of homonymy and pressure exerted by different linguis- tic subsystems on each other are discussed.|000|morphological complexity, semantic motivation, motivation, morphological motivation, phonetic complexity, lexical typology, 3120|Wichmann2017|This paper presents properties of a computer simulation of language migration. It takes as input a simulated phylogeny and a database of today’s populated places. At each time step, a language moves within a geographical quadrilateral defined by the minimal number, ch, of choices of populated places within the quadrilateral. The result is a constrained random walk defined by a combination of the ch parameter and the landscape, which comes into play via the restriction of the walk to populated places. The distribution of move distances is qualitatively similar across values of ch and resembles a Gamma distribution. Through comparisons with densities of real-world language families, the values of ch which yield the closest fits between real and simulated data are found.|000|simulation studies, language change, demography 3121|Wichmann2017|Paper simulates language evolution through space. This may be interesting to test against different models, especially alternative demography simulators.|000|simulation studies, demography, language evolution, language in space 3122|Law2017|Definitions of ‘mixed’ or ‘intertwined’ languages derive almost entirely from studies of languages that combine elements from genetically unrelated sources. The Mayan language Tojol-ab’al displays a mixture of linguistic features from two related Mayan languages, Chuj and Tseltal. The systematic similarities found in related languages not only make it methodologically difficult to identify the source of specific linguistic features but also mean that inherited similarity can alter the processes and outcomes of language mixing in ways that parallel observed patterns of code-switching between related languages. Tojol-ab’al, therefore, arguably represents a distinct type of mixed language, one that may only result from mixture involving related languages.|000|language mixture, genetic similarity, mixed languages, Mayan languages, comparative method 3123|Law2017|Paper is potentially interesting as it proposes a methodology to distinguish language mixing despite inheritance. In fact, what the author does, is listing "innovations" and "retentions", so the methodology requires a whole knowledge of the rest of the family, and who knows how well this actually holds. Anyway, the approach is worth being kept in mind and the paper is a careful contribution to classical methodology of subgrouping.|000|subgrouping, Mayan languages, mixed languages, methodology, 3124|Alderete2013|This article gives a detailed quantitative account of Samoan root phonotactics. In particular, count data is given in eleven tables of segment frequencies (i.e., consonants, short and long vowels, diphthongs) and frequencies of combinations of segments (i.e., syllable types, consonant-vowel combinations, V-V and C-C combinations across syllables). Systematic patterns of over- and under-representation of these structures in the lexicon are documented and related to prior research. Beyond the detailed frequency facts presented here, new empirical patterns documented include positional preferences for bilabials and non-labial sonorants, extensions of a known pattern of gradient vowel assimilation, and identification of a role for manner and segment order in consonant co-occurrence restrictions. |000|phonology, phonotactics, Samoan, Austronesian, empirical study, co-occurrence 3125|Alderete2013|Interesting article demonstrating how phonotactics can be empirically studied by checking co-occurrence patterns in root forms.|000|phonotactics, co-occurrence, empirical study, computer-aided approaches 3126|Carvalho2006|This paper aims at showing that the scope of structural phonemics transcends the limits of the ‘foundations of phonology’, contrary to what is tacitly assumed and appears from some textbooks. It will be argued that the classical concept phoneme, defi ned as a set of distinctive features, presents both obsolete and still relevant properties. One of these properties, linearity, should clearly be abandoned, as follows from acoustic-perceptual evidence as well as from some types of sound change. Thereby, the phoneme in its purest sense can be said to have been super- seded by one major trend characterizing post-SPE phonological theory: multilinearity. However, a phoneme-based property of distinctive features, their locality, is still valid, and is empirically supported by cross-linguistic variation. Now, locality and non-linearity are apparently contradic- tory. It will be shown that this contradiction cannot be resolved, and that both feature properties cannot be captured, unless consonants and vowels are assumed to be universally segregated within phonological representations. This issue leads to several predictions on C/C and V/V interactions, converges with independent processual evidence like vowel-to-vowel assimilation, and addresses the question of the relationship between phonology and morphology.|000|CVCV, phonology, phonotactics, phonological theory 3127|Carvalho2006|Paper discusses important aspects of phonology and phonotactics, namely palatalization and the interaction of segments inside phonetic sequences.|000|phonology, phonotactics, phonological theory, palatalization, modeling, CVCV 3128|Crosswhite2003|This article presents the results of a nonce-probe experiment conducted with 13 native speakers of Russian and examines the implications of these results for the linguistic analysis of Russian stress. Experimental items were novel words that ended in a sequence of segments either homophonous with a Russian case ending or not. Carrier sentences were manipulated to either morphosyntactically support a case-marked form or not. Results show a strong morphological effect: speakers stressed the last syllable of the stem, i.e., the ultima in words without inflections, and the antepenult or penult in words with inflections (depending on length of the inflection). This is relevant for linguistic analysis of Russian because it uncovers a default location for stress that is not abundantly apparent in the synchronic phonology.|000|Russian, stress, stress patterns, empirical study, phonology 3129|Crosswhite2003|Interesting paper based on experimental phonology states that normally Russian stress (in pseudo-words) is placed on the last syllable of a morpheme.|000|stress, Russian, empirical study, experimental phonetics, phonology 3130|John2015|Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and in running time, and the analysis and visualization of results. We review the past work on analyzing and comparing trees by their shape as well as recent work that incorporates trees with weighted branch lengths.|000|tree space, phylogenetic tree, phylogenetic reconstruction, methodology, consensus tree 3131|John2015|Paper introduces the statistics of tree space which is probably an interesting introduction to the recent state of the art.|000|tree space, introduction, review, 3132|Gontier2017|Universal Darwinism provides a methodology to study the evolution of anatomical form and sociocul- tural behavior that centers on defining the units and levels of selection, and it identifies the conditions whereby natu- ral selection operates. In previous work, I have examined how this selection-focused evolutionary epistemology may be universalized to include theories that associate with an extended synthesis. Applied evolutionary epistemology is a metatheoretical framework that understands any and all kinds of evolution as phenomena where units evolve by mechanisms at levels of an ontological hierarchy; and it provides three heuristics to search for these units, levels and mechanisms. The heuristics are applicable to language and sociocultural evolution, and here, we give an in-depth analysis of how the unit-heuristic can be implemented into language origin and evolution studies. The importance of developing hierarchy theories is also more fully explained.|000|origin of language, language evolution, philosophy 3133|Gontier2017|Rather boring paper mixing language evolution with the question of language origin. It contains an interesting idea regarding evolution in general, however, namely the distinction between: * unit of evolution (what) * level of evolution (where) * mechanism/process of evolution (how) This seems to be worthwhile to be investigated in the context of language history. |000|language origin, evolutionary theory, philosophy, language evolution 3134|Gontier2017|[units, levels, and mechanisms of evolution] .. image:: static/img/literature/gontier-2017-p5.png :name: bla :width: 800px [Table 2]|p5|universal evolutionary theory, evolutionary theory, biological parallels 3135|Dettmer2000|n Zeiten einschneidender politischer, ökonomischer oder gesellschaftlicher Veränderungen in einem Land kann es zu Wandlungsprozessen in bezug auf die Sprache kommen. Ein solch einschneidendes Erlebnis für den russischen Sprachraum stellen zweifellos die mit Glasnost' und Perestrojka verbundenen politischen Umwälzungen dar, die den Zusammenbruch der Sowjetunion zur Folge hatten. Solche außersprachlichen Faktoren sind natürlich nicht allein verantwortlich für den Wandel einer Sprache; wenn dies so wäre, könnte der Sprachwandel durchaus als ein leicht zu steuernder Vorgang verstanden werden (vgl. Z YBATOW 1995: 186). Allerdings spielen die extralinguistischen Aspekte im Hinblick auf den Wandel des Russischen eine nicht unbedeutende Rolle. Ich gehe jedoch davon aus, daß – wie R. Keller in seiner Theorie von der "unsichtbaren Hand in der Sprache" darlegt (vgl. K ELLER 1990) – Sprachwandel in der Regel als ein spontaner und unbeabsichtigter, d. h. nicht geplanter Prozeß betrachtet werden muß. [...] Linguistisch betrachtet vollzieht sich der Wandel auf allen Ebenen der Sprache; im Bereich der Phonologie (z.B. in Form von Lautwandel) und Morphologie ebenso wie im Bereich der Syntax. Die meisten Veränderungen betreffen jedoch die Lexik: Hier setzen sich Neuerungen erfahrungsgemäß am ehesten durch, denn der Wortschatz ist sehr anfällig für derartige Veränderungen. Ständig dringen neue Wörter in die Sprache ein; die Gründe hierfür sind mannigfaltig und werden im Laufe der Arbeit noch mehrfach angesprochen. Wenn man jedoch in die Ebenenauffassung von Sprache auch die Textebene oder die pragmatische Ebene einbezieht, so vollzieht sich auch hier ein Wandel.|000|extra-linguistic factors, language change, Russian, case study, jargon, linguistic variation, social variation 3136|Dettmer2000|Potentially interesting case study on on-going change (mostly lexical) in Russian, based on the interaction of social varieties and political changes during the last decades.|000|Russian, social variation, linguistic variation, language change, extra-linguistic factors, 3137|Bailey2016|Forced alignment software is now widely used in contemporary sociolinguistics, and is quickly becoming a crucial methodological tool as an increasing number of studies begin to utilise ‘big data.’ This study investigates the possibility of taking forced alignment one step further towards the goal of complete automation; specifically, it expands the functionality of FAVE-align to fully automate the coding of three sociolinguistic variables in British English: (th)-fronting, (td)-deletion, and (h)-dropping. This involved the expansion of pronouncing dictionaries to reflect the surface output of these variable rules; FAVE then compares the fit of competing acoustic models with the speech signal to determine the surface variant. It does so with an impressive degree of accuracy, largely comparable to inter-transcriber agreement for all variables; however, the pattern of its mistakes, which are largely false positives, suggests a difficulty in identifying the voiceless segments of (td) and (th). Although it is reassuring that inter-transcriber agreement was also lowest for these tokens, it should be noted that FAVE’s accuracy decreases in faster speech rates while no comparable effect is found for agreement among human transcribers.|000|forced alignment, sound alignment, Praat, sociolinguistics, language variation 3138|Bailey2016|Potentially interesting contribution, less because of the content (which is too specific for historical linguistics) but more for the information on the idea of forced alignment.|000|forced alignment, sound alignment, Praat, sociolinguistic variation 3139|Dale2012|Human language is unparalleled in both its expressive capacity and its diversity. What accounts for the enormous diversity of human languages [13]? Recent evidence suggests that the structure of languages may be shaped by the social and demographic environ- ment in which the languages are learned and used. In an analysis of over 2000 languages Lupyan and Dale [25] demonstrated that socio-demographic variables, such as popula- tion size, significantly predicted the complexity of inflectional morphology. Languages spoken by smaller populations tend to employ more complex inflectional systems. Lan- guages spoken by larger populations tend to avoid complex morphological paradigms, employing lexical constructions instead. This relationship may exist because of how language learning takes place in these different social contexts [44, 45]. In a smaller pop- ulation, a tightly-knit social group combined with exclusive or almost exclusive language acquisition by infants permits accumulation of complex inflectional forms. In larger pop- ulations, adult language learning and more extensive cross-group interactions produce pressures that lead to morphological simplification. In the current paper, we explore this learning-based hypothesis in two ways. First, we develop an agent-based simulation that serves as a simple existence proof: As adult interaction increases, languages lose inflections. Second, we carry out a correlational study showing that English-speaking adults who had more interaction with non-native speakers as children showed a relative preference for over-regularized (i.e. morphologically simpler) forms. The results of the simulation and experiment lend support to the linguistic niche hypothesis: Languages may vary in the ways they do in part due to different social environments in which they are learned and used. In short, languages adapt to the learning constraints and biases of their learners.|000|morphological complexity, linguistic niche hypothesis, adaptation, simulation studies 3140|Dale2012|Paper has a strong adaptationist view on the origins of morphological complexity and uses simulations to prove the point. To which degree this is true is difficult to say, but the paper is a valuable contribution to the debate about adaptation and linguistic evolution. To some extend, the model is self-fulfilling, because their basic assumption is that morphology is difficult to learn. But what about other aspects of language? If we only integrate morphology into the agent model, but exclude other factors, we may as well miss important aspects.|000|simulation studies, adaptation, morphological complexity 3141|Katz2016|This paper argues that processes traditionally classified as lenition fall into at least two subsets, with distinct phonetic reflexes, formal properties and characteristic contexts. One type, referred to as loss lenition, frequently neutralises contrasts in positions where they are perceptually indistinct. The second type, referred to as continuity lenition, can target segments in perceptually robust positions, increases the intensity and/or decreases the duration of those segments, and very rarely results in positional neutralisation of contrasts. While loss lenition behaves much like other phonological processes, analysing continuity lenition is difficult or impossible in standard phonological approaches. The paper develops a phonetically based optimality-theoretic account that explains the typology of the two types of lenition. The crucial proposal is that, unlike loss lenition, con- tinuity lenition is driven by constraints that reference multiple prosodic positions.|000|perception, lenition, sound change, sound change types, neutralisation, 3142|Katz2016|Paper is quite valuable in so far as it lists numerous examples for sound change types which could then be practically implemented in tiers. It is also interesting as it seems to have quite a nice view on lenition or at least introduces the concept while most authors just use it withouth thinking too much about the content.|000|sound change, sound change types, lenition, neutralisation, perception, 3143|Katz2016| a. Degemination: a long consonant becomes short (:sampa:`t:→t`) b. Debuccalisation: oral obstruents become glottal (:sampa:`t’→?`) c. Voicing: voiceless obstruents become voiced (:sampa:`t→d`) d. Spirantisation: stops become continuants (:sampa:`t→T`) e. Flapping: stops and/or trills become flaps (:sampa:`t→4`)|46|lenition, examples, sound change types 3144|Mohammad2013|Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion- annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion.|000|crowdsourcing, word emotions, speech norms, word association, dataset 3145|Mohammad2013|The data is available online and can be retrieved by contacting the author (free for scientific use). Further usage, however, is not allowed, so one cannot publish the data officially, which is a problem, but using it inside of studies should be fine, if only subsets are taken, or one might just email the author and think of what he'd say.|000|dataset, word emotions, crowdsourcing, word association, speech norms 3146|Tummers2012|Corpus linguistic research relies on corpora which generally display an unbalanced structure. We will discuss a potential corollary of this biased structure which is rarely accounted for in (corpus) linguistics, namely confounding variables. These are variables increasing, diminishing or reversing an explanatory variable’s marginal effect compared to its conditional effect. Analyzing four instances of confounding in a variational case study governed by a series of categorical explanatory variables, we will argue that these latent confounders can be unveiled modeling the co-occurrence patterns of the explanatory variables by means of a multiple correspondence analysis.|000|multiple correspondence analysis, categorical variables, statistics, clustering, 3147|Tummers2012|This paper is not important for historical linguistics per se, but rather a proof of concept that MCA has its place in linguistic applications, and although it is not too common so far, it may be interesting to check its potential for the purpose of clustering or comparing categorical variables in linguistics.|000|multiple correspondence analysis, MCA, statistics, 3148|Flemming2017|This paper investigates the phonetic specification of contour tones through a case study of the Mandarin rising tone. The patterns of variation in the realisation of the rising tone as a function of speech rate indicate that its specifications include targets pertaining to both the pitch movement and its end points: the slope of the F0 rise, the magnitude of the rise, and the alignment of the onset and offset of the rise. This analysis implies that the rising tone is overspecified, in that any one of the target properties can be derived from the other three (e.g. slope is pre- dictable from the magnitude and timing of the rise). As a result, the targets conflict, and cannot all be realised. The conflict between tone targets is resolved by a compromise between them, a pattern that is analysed quantitatively by for- mulating the targets as weighted, violable constraints.|000|Chinese, tone, tone language, Mandarin, contour tones, experimental phonetics 3149|Flemming2017|Rather strange paper, as I don't understand the gist of it, but they seem to investigate the second tone of Mandarin in an experimental setting and check how it is realized, so in some sense, this may give interesting hints regarding tone sandhi phenomena.|000|Chinese, tone sandhi, Mandarin, experimental phonetics, rising tone, 3150|Matzke2016|Tip-dating methods are becoming popular alternatives to traditional node cali- bration approaches for building time-scaled phylogenetic trees, but questions remain about their application to empirical datasets. We compared the per- formance of the most popular methods against a dated tree of fossil Canidae derived from previously published monographs. Using a canid morphology dataset, we performed tip-dating using BEAST v. 2.1.3 and M R B AYES v. 3.2.5. We find that for key nodes (Canis, approx. 3.2 Ma, Caninae approx. 11.7 Ma) a non-mechanistic model using a uniform tree prior produces estimates that are unrealistically old (27.5, 38.9 Ma). Mechanistic models (incorporating line- age birth, death and sampling rates) estimate ages that are closely in line with prior research. We provide a discussion of these two families of models (mechanistic versus non-mechanistic) and their applicability to fossil datasets.|000|tip dates, priors, Bayesian approaches, dating, phylogenetic reconstruction 3151|Matzke2016|They introduce tip-dating as an alternative to tree-internal priors. That is, basically using old and ancient languages to set up the analysis in a Bayesian framework rather than fixing the age of some internal clades.|000|tip dates, priors, Bayesian approaches, phylogenetic reconstruction 3152|Durand1990|Despite undeniable differences in outlook between the current alternative frameworks, there are many lines of convergence between them and a fair measure of agreement on what a theory of phonology should account for. In particular, they all share the belief that phonological representations need to be much more articulated than traditionally assumed and that a number of phenomena (particularly, but not exclusively, stress and tone contours) cannot be appropriately accounted for if phonological representations are limited to string-like arrangements of segments and boundaries. Chapters 6, 7 and 8 are devoted to non-linear issues and it will be shown that the insights of various independent frameworks can be combined in 'multidimensional' representations. Whether the label 'non-linear' is always appro- priate in connection with current proposals is not a question with which I will be concerned here (see J. M. Anderson 1987b). I shall use it as a cover term for a set of hypotheses concerning phonological structure which I will in turn subsume under the umbrella term 'generative phonology'.|000|phonology, phonological theory, non-linear phonology, CVCV, introduction, 3153|Durand1990|Book is an interesting comparative account on the importance of more complex models that allow to create hierarchical or pseudo-hierarchical structures in phonology. In the context of CVCV phonology, this may be interesting, even if it is a bit older.|000|phonology, phonological theory, CVCV, introduction, handbook 3154|Blust2017|Paper discusses the development of implosives in Lowland Kenyah. The discussion is interesting for several reasons, as it shows many examples of easily alignable words, also interesting sound changes, and it gives additional ideas on how people transcribe what phenomena in Austronesian.|000|Lowland Kenyah, Austronesian, implosives, sound change, examples, 3155|Li2017| Man’s life has always depended on animals and plants, a dependency most directly relevant to primitive societies. What animals and plants were available to the Proto-Austronesian (PAN) people of Taiwan 5,000 BP and earlier? What animals and plants had been domesticated at that stage? What animals and plants were endemic, with other alien species introduced to the island at later stages? In this paper I shall address myself to such problems, drawing upon various disciplines, including linguistics and archaeology, as well as zoology and botany. Lists of PAN cognates for animals, plants, and a few related cognates are given in the appendices. |000|animals, plants, conceptualization, Austronesian, linguistic palaeography, 3156|Tamariz2017|We investigate the emergence of iconicity, specifically a bouba-kiki effect in miniature artificial languages under different functional constraints: when the languages are reproduced and when they are used communicatively. We ran transmission chains of (a) participant dyads who played an inter- active communicative game and (b) individual participants who played a matched learning game. An analysis of the languages over six generations in an iterated learning experiment revealed that in the Communication condition, but not in the Reproduction condition, words for spiky shapes tend to be rated by naive judges as more spiky than the words for round shapes. This suggests that iconicity may not only be the outcome of innovations introduced by individuals, but, crucially, the result of interlocutor negotiation of new communicative conventions. We interpret our results as an illustra- tion of cultural evolution by random mutation and selection (as opposed to by guided variation).|000|iconicity, iterated learning, cultural evolution, miniature artificial language games 3157|Dipper2017|This paper investigates diatopic variation in a historical corpus of German. Based on equivalent word forms from different lan- guage areas, replacement rules and map- pings are derived which describe the re- lations between these word forms. These rules and mappings are then interpreted as reflections of morphological, phonological or graphemic variation. Based on sample rules and mappings, we show that our ap- proach can replicate results from historical linguistics. While previous studies were restricted to predefined word lists, or con- fined to single authors or texts, our ap- proach uses a much wider range of data available in historical corpora.|000|phonetic alignment, historical corpus, corpus studies, Anselm corpus, diachronic linguistics, Early New High German, 3158|Dipper2017|They use rewrite rules to normalize between text with different spelling variants and also some version of edit distance, so it is pretty standard, with the difference, that they map variation in space. It is an interesting point, as so far most analyses did not take geography or demography into account.|000|rewrite rules, demography, geography, diatopic variation, Anselm corpus, Early New High German 3159|Rama2017|This paper presents a computational anal- ysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data us- ing different techniques from dialectome- try, deep learning, and computational bi- ology. We show that the methods largely agree with each other and with the ear- lier non-computational analyses of the lan- guage group.|000|Gondi dialects, Dravidian, edit distance, phonetic alignment, dialectometry, dataset 3160|Rama2017|Interesting study, especially because of the data they publish along with the paper, and which is freely available on GitHub.|000|Gondi dialects, Dravidian, dataset, dialectometry 3161|Schliep2017|1. The fields of phylogenetic tree and network inference have dramatically advanced in the past decade, but independently with few attempts to bridge them. 2. Here we provide a framework, implemented in the PHANGORN library in R, to transfer information between trees and networks. 3. This includes: (i) identifying and labelling equivalent tree branches and network edges, (ii) transferring tree branch support to network edges, and (iii) mapping bipartition support from a sample of trees (e.g. from boot- strapping or Bayesian inference) onto network edges. 4. The ability to readily combine tree and network information should lead to more comprehensive evolutionary comparisons and inferences.|000|phylogenetic network, network comparison, R, software, exploratory data analysis 3162|Schliep2017|Interesting paper showing software that can be used to compare trees and their corresponding networks (or, say: networks and the trees they contain).|000|software, phylogenetic tree, phylogenetic network, 3163|Yu2017|1. We present an r package, ggtree, which provides programmable visualization and annotation of phylogenetic trees. 2. ggtree can read more tree file formats than other softwares, including newick, nexus, NHX, phylip and jplace formats, and support visualization of phylo, multiphylo, phylo4, phylo4d, obkdata and phyloseq tree objects defined in other r packages. It can also extract the tree/branch/node-specific and other data from the analysis outputs of beast, epa, hyphy, paml, phylodog, pplacer, r8s, raxml and revbayes software, and allows using these data to annotate the tree. 3. The package allows colouring and annotation of a tree by numerical/categorical node attributes, manipulating a tree by rotating, collapsing and zooming out clades, highlighting user selected clades or operational taxonomic units and exploration of a large tree by zooming into a selected portion. 4. A two-dimensional tree can be drawn by scaling the tree width based on an attribute of the nodes. A tree can be annotated with an associated numerical matrix (as a heat map), multiple sequence alignment, subplots or silhouette images. 5. The package ggtree is released under the artistic-2.0 license. The source code and documents are freely available through bioconductor (http://www.bioconductor.org/packages/ggtree).|000|software, tree annotation, ancestral state reconstruction, visualization, R-language 3164|Edelman2017|Similar to other complex behaviors, language is dynamic, social, multimodal, patterned, and purposive, its purpose being to promote desirable actions or thoughts in others and self (Edelman, 2017b). An analysis of the functional characteristics shared by complex sequential behaviors suggests that they all present a common overarching computational problem: dynamically controlled constrained navigation in concrete or abstract situation spaces. With this conceptual framework in mind, I compare and contrast computational models of language and evaluate their potential for explaining linguistic behavior and for elucidating the brain mechanisms that support it.|000|cognitive linguistics, language, simulation studies, double articulation 3165|Edelman2017|Sequential dependencies in utterances, such as number or gender agreement between noun and verb, can leap over one or more—sometimes, many—intervening words. If the intervening words are all subsumed under the same root in a tree-structured unit, the dependency could be considered short-range, skipping just one unit (this is the case, e.g., in the tree-adjoining grammar or TAG formalism; Joshi and Schabes, 1997). Given that human languages do tend to minimize actual word-level dependency length in their utterances (Futrell et al., 2015), it seems likely that long-range dependencies are indeed psychologically real. In terms of graph structure, non-local dependencies correspond to several paths that are initially distinct, then traverse the same sequence of items, then part ways again. Tasks with this structure are commonly used in studies of animal navigation (e.g., the 8-maze; Wood et al., 2000). Following a proposal by Levy (1996, fig. 2), Eichenbaum et al. showed that rats can learn to perform an analogous navigation task set in an abstract space of odor sequences (Agster et al., 2002 ; Fortin et al., 2002).|99|graph theory, language model, dependencies, 3166|Edelman2017|Paper deals with complexity of language from a computational view-point and is therefore extremely interesting, although, as a drawback, the authors do not really try hard to make it easy for the readers to understand their thoughts.|000|language model, linguistic complexity, Turing machine, Markov chain, formal grammar, 3167|Mueller2017|In the “Greater Burma Zone”, an area that includes Myanmar and adjacent regions of the neighboring countries, there are two different systems of personal pronouns that occur predominantly: a grammatical one and one that we call “hierarchical system”. The aim of this paper is to explain the two systems and their development. A sample of 42 languages shows that smaller language communities have grammatical systems and the most dominant languages today have hierarchical ones. Besides these two groups, there are also some languages with a “mixed system”, which means the grammatical system is retained and only a few honorific terms are added as pronouns. An important question will therefore be why the systems are distributed just in this way. Several factors seem to play a role, among them sociocultural structures, historical developments and language contact.|000|Greater Burma Zone, Burma, pronoun systems, lexical typology, dataset 3168|Mueller2017|The data underlying this study is available at zenodo: https://zenodo.org/record/582206 |000|Greater Burma Zone, Burma, lexical typology, pronoun systems 3169|Yabu2014|Whenever dealing with the old written sources, the sound expressed by a written character (sonus grammae, translated by Yabu Shiro from Zìyīn...) should always be distinguished from the phonemes of speech. Sonus grammae is a sound customarily and universally expressed by both the orthographical prototypes of the earliest stages of any language group and a number of its current written characters of the same origin. The transliteration of written characters is generally done with sonus grammae in mind; and while, in general, each sonus grammae is by no means unrelated to sounds of the language at its earliest stages, or palaeographical value (C. O. Blagden), phonemes which exist as the language's system of sound must be strictly segregated from the discussion ("Studies in the an cient Burmese language through the Myazedi inscriptions," pt 1, 1955).|187|sonus grammae, orthography, writing systems, phoneme, definition, 3170|Yabu2014|Obituari for the great Tibeto-Burman scholar Nishida.|000|obituary, Nishida Tatsuo, Tibeto-Burman, history of science 3171|Shiers2007|Not clear to me whether this is a good or a naive paper. I have the impression it is rather naive, although the use of acoustic features is of course fascinating many people.|000|speech acoustics, Romance, functional data, canonical function analysis, phylogenetic reconstruction 3172|Shiers2007|Evolutionary models of languages are usually considered to take the form of trees. With the development of so-called tree constraints the plausibility of the tree model assumptions can be assessed by checking whether the moments of observed variables lie within regions consistent with Gaussian latent tree models. In our linguistic application, the data set comprises acoustic samples (audio recordings) from speakers of five Romance languages or dialects. The aim is to assess these functional data for compatibility with a hereditary tree model at the language level. A novel combination of canonical function analysis (CFA) with a separable covariance structure produces a representative basis for the data. The separable-CFA basis is formed of components which emphasize language differences whilst maintaining the integrity of the observational language-groupings. A previously unexploited Gaussian tree constraint is then applied to component-by- component projections of the data to investigate adherence to an evolutionary tree. The results highlight some aspects of Romance language speech that appear compatible with an evolutionary tree model but indicate that it would be inappropriate to model all features as such.|000|canonical function analysis, Romance, speech acoustics, phylogenetic reconstruction 3173|Gamallo2017| Corpus-based assessment of linguistic distance. When understanding this as a synchronic analysis (and their network suggests this, this can be considered a nice study, although one should not interpret their data diachronically.|000|language distance, n-gram model, corpus studies 3174|Gamallo2017| In this paper, we define two quantitative distances to measure how far apart two languages are. The distance measure that we have identified as more accurate is based on the perplexity of n-gram models extracted from text corpora. An experiment to compare forty-four European languages has been performed. For this purpose, we computed the distances for all the possible language pairs and built a network whose nodes are languages and edges are distances. The network we have built on the basis of linguistic distances represents the current map of similarities and divergences among the main languages of Europe. |000|language distance, corpus studies, n-gram model 3175|Fuss2017|The split of our own clade from the Panini is undocumented in the fossil record. To fill this gap we investigated the dentognathic morphology of Graecopithecus freybergi from Pyrgos Vassilissis (Greece) and cf. Graecopithecus sp. from Azmaka (Bulgaria), using new μCT and 3D reconstructions of the two known specimens. Pyrgos Vassilissis and Azmaka are currently dated to the early Messinian at 7.175 Ma and 7.24 Ma. Mainly based on its external preservation and the previously vague dating, Graecopithecus is often referred to as nomen dubium. The examination of its previously unknown dental root and pulp canal morphology confirms the taxonomic distinction from the significantly older northern Greek hominine Ouranopithecus. Furthermore, it shows features that point to a possible phylogenetic affinity with hominins. G. freybergi uniquely shares p4 partial root fusion and a possible canine root reduction with this tribe and therefore, provides intriguing evidence of what could be the oldest known hominin.|000|hominin evolution, Graecopithecus, archaeology 3176|Fuss2017|Paper will probably be highly disputed, as it argues in some sense against the out-of-Africa hypothesis.|000|hominin evolution, Graecopithecus, Out-of-Africa, 3177|Kreft2017|Motivation: Comparative and evolutionary studies utilise phylogenetic trees to analyse and visualise biological data. Recently, several web-based tools for the display, manipulation, and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available. Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualisation in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties’ web sites.|000|software, tree viewer, phylogenetic reconstruction, visualization, interactive visualization, d3 3178|Lu2016|The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (303–603) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ~6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Nean- derthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ~62,000–38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ~15,000 to ~9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ~200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (~82%), Central Asia and Siberia (~11%), South Asia (~6%), and western Eurasia and Oceania (~1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present- day Tibetans. In particular, highly differentiated sequences harbored in highlanders’ genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection.|000|Sino-Tibetan, Tibetan, population genetics, dating, 3179|Lu2016|Based on a figure in the end, we can retrieve their preliminary dating of Tibetans and also of the split between Tibetans and Han Chinese. This is rather old, pointing to 9000 years ago.|000|Tibetan, Sino-Tibetan, population genetics, dating, 3180|Blasi2017|The study of the regularities in the structures present across languages has al- ways been a quest in close contact with the analysis of data. Traditionally, causal dependencies between pairs of typological variables (like word order patterns or the composition of segment inventories) have been argued for on the basis of language counts, namely how many languages out of a sample exhibit certain patterns in contrast to others. Regularities of this kind have been used in virtu- ally all theoretical camps, and researchers made them part of their discussion on functional pressures on language, cognitive schemes and the architecture of a putative common computational core underlying language, among other things. This popularity resides, without doubt, in the strength and simplicity of the idea: if a set of languages with no recent genealogical history nor traces of areal con- tact tend to share the same pair of properties again and again, then there seems to be something about the properties of probable languages in general.|000|binary dependencies, dependencies, language evolution, language structure, correlational studies 3181|Blasi2017|Potentially interesting and rather concise introduction to causal dependencies and statistical aspects. Has also nice graphics.|000|causal inference, dependencies, binary dependencies, language structure 3182|Poplack1988|This paper represents a comprehensive stufy of English loanword usage in five diverse francophone neighborhoods in the national capital region of Canada. [...]|000|loanword integration, corpus studies, fieldwork, statistical analysis, lexical borrowing, language contact 3183|Poplack1988|Potentially very interesting study on actual data on loanword usage and integration of borrowings in Canadian French speakers.|000|loanword integration, corpus studies, fieldwork, statistical analysis, lexical borrowing, language contact 3184|Muehlenbernd2017|Age estimation of language families is an important task in historical linguistics. In our study we pre- sent an approach that utilizes information about the diversity across sound inventories of language families for the task of age estimation. Our approach involves three steps: (1) the construction of a phoneme network, which is a bipartite network structure that represents language families and its phoneme inventories in network-theoretic terms, (2) the reconstruction of such a real-world data net- work in form of a preferential attachment synthetic process, and (3) the detection of the optimal pref- erential attachment noise parameter, for which the synthetic network is the best approximation of the real-world data network. Our statistical analysis reveals that the optimal noise parameter appears to be a good predictor for the age of a language family.|000|dating, phoneme inventory, networks, quantitative analysis, bipartite network 3185|Muehlenbernd2017|Apart from the usage of bipartites which may be interesting, it is not clear how the age aspect of languages can be inferred robustly from this study, given the incredible shakiness of phoneme inventory data.|000|dating, phoneme inventory, bipartite network, quantitative analysis 3186|Coblin2017|During much of the last century the field of Hakka linguistic studies was fairly sedate and monolithic. The majority of Sino-linguists agreed regarding what the term “Hakka” meant and what the general characteristics of the dialect family were. Today, on the other hand, there have been new developments in the area of Hakka studies, which have changed that earlier picture in significant ways, though familiarity with these changes is for the most part limited to a small number of specialists. The purpose of the present paper is to introduce certain of these new advances, especially as regards their historical and comparative phonological implications.|000|Hakka, history of science, overview, Chinese dialectology, subgrouping 3187|Coblin2017|Interesting paper which offers not only a historical overview on Hakka studies but also an overview on most recent developments.|000|Hakka, history of science, Chinese dialectology, subgrouping 3188|Latham1862|Very interesting resource showing word lists that are supposed to show cloes relatedness for a couple of different languages.|000|concept list, history of science, language comparison, Sino-Tibetan 3189|Wheeler1990|A data dependent weighting procedure is developed to allow the comparison of phylogenetic trees based on nucleic acid sequence data. The sampling error of this cladogram “cost” is then examined, permitting statistical evaluation of the cost differential.|000|step matrix, maximum parsimony, combinatorial weights, 3190|Wheeler1990|This seems to be the first paper where step matrices where officially introduced.|000|step matrix, maximum parsimony, 3191|Jacques2012|Quite the contrary, it is often the case that archaic features are only preserved in a few distant branches of the family. Even though a considerable part of Sino-Tibetan languages have no trace of verbal agreement, we have to bear in mind that losing an agreement system is a much quicker process than creating one.|112|agreement system, feature loss, grammatical feature, Sino-Tibetan, subgrouping 3192|StuddertKennedy2017|We review Berwick and Chomsky’s Why Only Us, Language and Evolution, a book premised on lan- guage as an instrument primarily of thought, only secondarily of communication. The authors con- clude that a Universal Grammar can be reduced to three biologically isolated components, whose computational system for syntax was the result of a single mutation that occurred about 80,000 years ago. We question that argument because it ignores the origin of words, even though Berwick and Chomsky acknowledge that words evolved before grammar. It also fails to explain what evolutionary problem language uniquely solved (Wallace’s question). To answer that question, we review recent discoveries about the ontogeny and phylogeny of words. Ontogenetically, two modes of nonverbal re- lation between infant and mother begin at or within 6 months of birth that are crucial antecedents of the infant’s first words: intersubjectivity and joint attention. Intersubjectivity refers to rhythmic shared affect between infant and caretaker(s) that develop during the first 6 months. When the infant begins to crawl, they begin to attend jointly to environmental objects. Phylogenetically, Hrdy and Bickerton describe aspects of Homo erectus’ ecology and cognition that facilitated the evolution of words. Hrdy shows how cooperative breeding established trust between infant and caretakers, laying the ground- work for a community of mutual trust among adults. Bickerton shows how ‘confrontational scaveng- ing’ led to displaced reference, whereby an individual communicated the nature of a dead animal and its location to members of the group that could not see it. Thus, both phylogenetically and ontogenet- ically, the original function of language was primarily an instrument of communication. Rejecting Berwick and Chomsky’s answer to Wallace’s question that syntax afforded better planning and infer- ence, we endorse Bickerton’s view that language enabled speakers to refer to objects not immediately present. Thus arose context-free mental representations, unique to human language and thought.|000|Wallace's question, Noam Chomsky, language origin, universal grammar 3193|Kelly2017|Lateral transfer, a process whereby species exchange evolutionary traits through non- ancestral relationships, is a frequent source of model misspecification in phylogenetic inference. Lateral transfer obscures the phylogenetic signal in the data — the signal of the taxa ancestry — as the histories of affected traits are mosaics of the species phylogeny and may conflict with the underlying phylogeny. We control for the effect of lateral transfer in a Stochastic Dollo model and a Bayesian setting. We infer rooted phylogenetic trees. Our likelihood is highly intractable as its parameters are given by the solution of a sequence of systems of differential equations which represent the expected evolution of traits along a tree and grow exponentially in dimension with the number of taxa under consideration. We construct an accurate parameter approximation framework, and from this we derive an efficient exact- approximate inference scheme. We illustrate our method on data sets of lexical traits in Eastern Polynesian and Indo-European languages and obtain improved fits over the corresponding model without lateral transfer.|000|lateral transfer, lexical borrowing, stochastic Dollo model, Bayesian inference, Eastern Polynesian, Indo-European 3194|Kelly2017|Potentially interesting dissertation. It is only annoying that the data sources are not really made clear, they are not supplemented, and it is difficult to find out WHERE the author actually talks about them. Maybe it's there, but I did not find it when quickly reading the thesis.|000|Indo-European, Eastern Polynesian, stochastic Dollo model, lateral transfer, lexical borrowing, Bayesian inference 3195|Parker2002|A long-standing controversy in the interface between phonetics and phonology involves the nature of sonority. This dissertation seeks to help resolve this problem by showing that the sonority hierarchy is both physically and psychologically real. This is accomplished by reporting the results of two rigorous and in-depth experiments. The first of these involves phonetic (instrumental) measurements of five acoustic and aerodynamic correlates of sonority in English and Spanish: intensity, frequency of the first formant, total segmental duration, peak intraoral air pressure, and combined oral plus nasal air flow. Intensity values are found to consistently yield a correlation of at least .97 with typical sonority indices. Consequently, sonority is best defined in terms of a linear regression equation derived from the observed intensity results. The second major experiment — this one psycholinguistic in nature — involves a common process of playful reduplication in English. A list of 99 hypothetical rhyming pairs such as roshy-toshy was evaluated by 332 native speakers. Their task was to judge which order sounds more natural, e.g., roshytoshy or toshy-roshy. The data again confirm the crucial importance of sonority in accounting for the observed results. Specifically, the unmarked (preferred) pattern is for the morpheme beginning with the more sonorous segment in each pair to occur in absolute word-initial position. A generalized version of the Syllable Contact Law is utilized in the formal analysis of this phenomenon in terms of Optimality Theory. Finally, a complete and universal sonority hierarchy is posited by building on the findings of the two experiments as a whole.|000|sonority hierarchy, quantitative analysis, experimental phonetics, 3196|Parker2002|This experimental study seems to prove that the sonority hierarchy is not a Hirngespinst, but is somehow mentally represented. This has important implications for all arguments about sonority as a driving factor of things like sound change.|000|sonority hierarchy, experimental phonetics, 3197|Lowenstamm1999|In this paper an argument is made for the existence of an initial CV site to the left of every major category. In section 1, I will first discuss a noted asymmetry in the organization of word-initial consonant sequences. In section 2, the issue of word boundaries will be briefly revisited. Section 3 contains an answer to the questions raised in section 1. Finally, section 4 is a case study of definite article cliticization in French, and failure thereof in Biblical Hebrew.|000|CVCV, Biblical Hebrew, phonological theory 3198|Lowenstamm1999|Paper discusses consonant clusters in words and tries to explain these with the CVCV hypothesis.|000|CVCV, phonological theory, consonant cluster 3199|Fox1995|Schleicher had followed the principles inherent in linguistic comparison through to their logical conclusion, with the development of the Stammbaum model of linguistic relationships, and the use of linguistic reconstruction as part of the historical study of languages|26|August Schleicher, history of science, family tree, linguistic reconstruction 3200|Schwink1991|Part of the process of “becoming” a competent Indo-Europeanist has always been recognized as coming to grasp “intuitively” concepts and types of changes in language so as to be able to pick and choose between alternative explanations for the history and development of specific features of the reconstructed language and its offspring.|29|intuition, methodology, 3201|Zuo2017|Phytolith remains of rice (Oryza sativa L.) recovered from the Shangshan site in the Lower Yangtze of China have previously been recognized as the earliest examples of rice cultivation. However, because of the poor preservation of macroplant fossils, many radiocarbon dates were derived from undifferentiated organic materials in pottery sherds. These materials remain a source of debate because of potential contamination by old carbon. Direct dating of the rice remains might serve to clarify their age. Here, we first validate the reliability of phytolith dating in the study region through a comparison with dates obtained from other material from the same layer or context. Our phytolith data indicate that rice remains retrieved from early stages of the Shangshan and Hehuashan sites have ages of approximately 9,400 and 9,000 calibrated years before the present, respectively. The morphology of rice bulliform phytoliths indicates they are closer to modern domesticated species than to wild species, suggesting that rice domestication may have begun at Shangshan during the beginning of the Holocene. **Significance** When the domestication of rice began in its homeland, China, is an enduring and important issue of debate for researchers from many different disciplines. Reliable chronological and robust identification criteria for rice domestication are keys to understanding the issue. Here, we first use phytolith dating to constrain the initial occupation of Shangshan, an important site with early rice remains located in the Lower Yangtze region of China. We then identify the rice phytoliths of Shangshan as partly domesticated based on their morphological characteristics. The results indicate that rice domestication may have begun at Shangshan in the Lower Yangtze during the beginning of the Holocene. |000|rice cultivation, China, archaeology, carbon-14 dating, 3202|Eder2017|The aim of this article is to discuss reliability issues of a few visual techniques used in stylometry, and to introduce a new method that enhances the explanatory power of visualization with a procedure of validation inspired by advanced stat- istical methods. A promising way of extending cluster analysis dendrograms with a self-validating procedure involves producing numerous particular ‘snapshots’, or dendrograms produced using different input parameters, and combining them all into the form of a consensus tree. Significantly better results, however, can be obtained using a new visualization technique, which combines the idea of nearest neighborhood derived from cluster analysis, the idea of hammering out a clustering consensus from bootstrap consensus trees, with the idea of mapping textual similarities onto a form of a network. Additionally, network analysis seems to be a good solution for large data sets.|000|network, consensus tree, consensus network, visualization, clustering, partitioning 3203|Eder2017|Potentially interesting technique that creates a network partition out of a set of family trees.|000|consensus tree, consensus network, clustering, partitioning, network 3204|Koplenig2015|The Google Ngram Corpora seem to offer a unique opportunity to study lin- guistic and cultural change in quantitative terms. To avoid breaking any copy- right laws, the data sets are not accompanied by any metadata regarding the texts the corpora consist of. Some of the consequences of this strategy are analyzed in this article. I chose the example of measuring censorship in Nazi Germany, which received widespread attention and was published in a paper that accompanied the release of the Google Ngram data (Michel et al. (2010): Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176–82). I show that without proper metadata, it is unclear whether the results actually reflect any kind of censorship at all. Collectively, the findings imply that observed changes in this period of time can only be linked directly to World War II to a certain extent. Therefore, instead of speaking about general linguistic or cultural change, it seems to be preferable to explicitly restrict the results to linguistic or cultural change ‘as it is represented in the Google Ngram data’. On a more general level, the analysis demonstrates the importance of metadata, the availability of which is not just a nice add-on, but a powerful source of information for the digital humanities.|000|n-gram model, Google N-Grams, meta data, 3205|Koplenig2015|Paper points to a certain bias in interpreting results from the Google Ngram Corpus. Conclusion is: one should be careful when trying to read history out of this data, but should rather be careful and not forget how important meta-data is for the assessment.|000|meta data, Google N-Grams, 3206|Swadesh1963|This is in principle one of the earliest articles on computer-assisted language comparison, in so far as Swadesh thinks of ways to improve cognate detection by arranging data smartly in punch cards. A very nice view, and very rare that linguists openly talk about this (@Gabelentz1891 being one exception).|000|Morris Swadesh, cognate detection, computer-assisted analysis, punch card, 3207|Schrijver2015|Article presents potentially shared innovations for Italo-Celtic. Interesting in this context, as it reflects how people think about subgrouping.|000|shared innovation, cladistics, historical linguistics, subgrouping 3208|Richter2017|The timing and location of the emergence of our species and of associated behavioural changes are crucial for our understanding of human evolution. The earliest fossil attributed to a modern form of Homo sapiens comes from eastern Africa and is approximately 195 thousand years old1, 2, therefore the emergence of modern human biology is commonly placed at around 200 thousand years ago3, 4. The earliest Middle Stone Age assemblages come from eastern and southern Africa but date much earlier5, 6, 7. Here we report the ages, determined by thermoluminescence dating, of fire-heated flint artefacts obtained from new excavations at the Middle Stone Age site of Jebel Irhoud, Morocco, which are directly associated with newly discovered remains of H. sapiens8. A weighted average age places these Middle Stone Age artefacts and fossils at 315 ± 34 thousand years ago. Support is obtained through the recalculated uranium series with electron spin resonance date of 286 ± 32 thousand years ago for a tooth from the Irhoud 3 hominin mandible. These ages are also consistent with the faunal and microfaunal9 assemblages and almost double the previous age estimates for the lower part of the deposits10, 11. The north African site of Jebel Irhoud contains one of the earliest directly dated Middle Stone Age assemblages, and its associated human remains are the oldest reported for H. sapiens. The emergence of our species and of the Middle Stone Age appear to be close in time, and these data suggest a larger scale, potentially pan-African, origin for both.|000|dating, hominin evolution, 3209|Richter2017|Article claims that hominin evolution took also place in Northern Africa, not exclusively in South-Africa, etc.|000|hominin evolution, Out-of-Africa 3210|Kumar2017|Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to large amounts of phylogenetic conflict. Genome- scale analyses lead to a more complete understanding of complex evolutionary processes. Evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.|000|gene flow, bears, mammals, lateral gene transfer 3211|Kumar2017|Interesting article, also mentioned in David Morrisons blog (http://phylonetworks.blogspot.com/2017/06/bears-genomes-and-gene-flow.html), and pointing to gene flow in ancient times which cannot be explained by incomplete lineage sorting alone.|000|gene flow, bears, mammals, animal evolution, lateral gene transfer 3212|Bullock2011|This paper describes an approach used to test the expressive power of the Natural Semantic Metalanguage (NSM) and its tiny set of semantic primitives. A small diction- ary was created, using NSM to paraphrase definitions for each word in the controlled defining vocabulary of the Longman Dictionary of Contemporary English (LDOCE). Student participants performed several headword-identification tasks to evaluate the quality of these definitions. The resulting 2000-word dictionary is non-circular, and by extension provides non-circular definitions for all the words in the LDOCE.|000|natural semantic metalanguage, controlled English, Longman Dictionary of Contemporary English, 3213|Weiss2014|A second example shows the value of the regularity hypothesis as a heuristic tool. :comment:`Shows data by` @Wakelin1975 :comment:`that are essentially correspondence patterns for vowels in similar environments across different locations` .. image:: static/img/Weiss2014-134.png :name: dummy :width: 800px Since the environments are not complementary there is no way to unite these six partially overlapping sets. But it is also impossible to set up six different proto-segments. For we may be fairly con dent that no human language has or ever has had six distinct vowels in this narrow region of the vowel space. So is this perhaps a sound change that has not proceeded [pb] on a segmental basis but on a word-by-word basis, captured at different points in its progression through the lexicon? No. It is clear that the distribution is not random but shows a distinct tendency for æ to be more common in the most easterly points and a to become more frequent as one moves west. In fact, the vowel in question in all of these forms goes back to Middle English a. The native dialectal re ex is a, but the Standard English re ex was æ, as it still is across the board in General American. Since the more eastern parts of Cornwall are in contact with other English dialects having æ, it is probable that the speech of these regions has replaced the native re ex with a form imported from the standard language (@Wakelin<1975> 1975: 112–114). |134f|comparative method, sound correspondences, correspondence patterns 3214|Handel2016b|Over the last 20 years, stimulated by William G. Boltz’s influential monograph The origin and early development of the Chinese writing system, a number of scholars in the West have engaged in a debate over the historical status of the traditional huìyì 會意 (‘conjoined meaning’) category of Chinese characters. While the existence of such characters within the Chinese writing system at various points in its history is not in dispute, the role of huìyì characters in the formative stages of the script remains a matter of controversy. In this paper I draw on some recent significant publications on Chinese writing in order to advance a theoretical argument in defense of the existence of huìyì graphs during the formative stages of the script. I argue that iconic combinations of graphs are well motivated and meaningful to script users, and therefore could well have played a role in the formation of the Chinese script. Comparative evidence from other early logographic writing systems as well as evidence from later stages of Chinese both support this argument, and provide an explanation for some early Chinese characters that would seem to defy any interpretation that assigns a phonetic role to one of the components.|000|Huìyì, Chinese characters, Chinese character formation, Chinese writing system 3215|Nielsen2017|Advances in the sequencing and the analysis of the genomes of both modern and ancient peoples have facilitated a num- ber of breakthroughs in our understanding of human evolutionary history. These include the discovery of interbreeding between anatomically modern humans and extinct hominins; the development of an increasingly detailed description of the complex dispersal of modern humans out of Africa and their population expansion worldwide; and the characteriza- tion of many of the genetic adaptions of humans to local environmental conditions. Our interpretation of the evolutionary history and adaptation of humans is being transformed by analyses of these new genomic data.|000|genome analysis, human prehistory, genomics, population genetics 3216|Das2011|We describe a novel approach for inducing unsupervised part-of-speech taggers for lan- guages that have no labeled training data, but have translated text in a resource-rich lan- guage. Our method does not assume any knowledge about the target language (in par- ticular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowl- edge transfer and use the projected labels as features in an unsupervised model (Berg- Kirkpatrick et al., 2010). Across eight Eu- ropean languages, our approach results in an average absolute improvement of 10.4% over a state-of-the-art baseline, and 16.7% over vanilla hidden Markov models induced with the Expectation Maximization algorithm.|000|part of speech, part-of-speech tagging, automatic approach 3217|Petrov2012|To facilitate future research in unsupervised induction of syntactic structure and to stan- dardize best-practices, we propose a tagset that consists of twelve universal part-of- speech categories. In addition to the tagset, we develop a mapping from 25 different tree- bank tagsets to this universal set. As a re- sult, when combined with the original tree- bank data, this universal tagset and mapping produce a dataset consisting of common parts- of-speech for 22 different languages. We high- light the use of this resource via two experi- ments, including one that reports competitive accuracies for unsupervised grammar induc- tion without gold standard part-of-speech tags.|000|part of speech, part-of-speech tagging, dataset, universals, 3218|Petrov2012|Interesting paper in so far as it proposes a universal set for part-of-speech tagging, which is worth being considered also in line with the Concepticon efforts and other attempts to arrive at a universal way of comparing characteristics of languages. The tagset had some influence on the universal-dependencies project (http://universaldependencies.org), which offers treebanks for some 50 languages, all organized in the same tagset.|000|universals, dataset, tree-bank, syntax 3219|Garbo2016|This paper proposes a set of principles and methodologies for the crosslinguistic investigation of grammatical complexity and applies them to the in-depth study of one grammatical domain, gender. The complexity of gender is modeled on the basis of crosslinguistically documented properties of gender systems and by taking into consideration interactions between gender and two other grammatical domains: nominal number and evaluative morphology. The study proposes a complexity metric for gender that consists of six features: “Gender values”, “Assignment rules”, “Number of indexation (agreement) domains”, “Cumulative exponence of gender and number”, “Manipulation of gender assignment triggered by number/countability”, and “Manipulation of gender assignment triggered by size”. The metric is tested on a sample of 84 African languages, organized in subsamples of genealogically related languages. The results of the investigation show that: (1) the gender systems of the sampled languages lean towards high complexity scores; (2) languages with purely semantic gender assignment tend to lack pervasive gender indexation; (3) languages with a high number of gender distinctions tend to exhibit pervasive gender indexation; (4) some of the uses of manipulable gender assignment are only attested in languages with a high number of gender distinctions and/or pervasive indexation. With respect to the distribution of the gender complexity scores, the results show that genealogically related languages tend to have the same or similar gender complexity scores. Languages that display exceedingly low or high gender complexity scores when compared with closely related languages exhibit distinctive sociolinguistic profiles (contact, bi- or multilingualism). The implications of these findings for the typology of gender systems and the crosslinguistic study of grammatical complexity and its distribution are discussed.|000|grammatical complexity, gender systems, African languages, language typology, cross-linguistic study 3220|Garbo2016|The author uses a rather naive numerical system to measure gender complexity, based on a floating point representation of originally categoric variables. The general endeavour, however, can be inspiring to work on better and improved measures.|000|gender systems, African languages, grammatical complexity, linguistic complexity, cross-linguistic study 3221|Phaidros|And in this instance, you who are the father of letters, from a paternal love of your own children have been led to attribute to them a quality which they cannot have; for this discovery of yours will create forgetfulness in the learners' souls, because they will not use their memories; they will trust to the external written characters and not remember of themselves. The specific which you have discovered is an aid not to memory, but to reminiscence, and you give your disciples not truth, but only the semblance of truth; they will be hearers of many things and will have learned nothing; they will appear to be omniscient and will generally know nothing; they will be tiresome company, having the show of wisdom without the reality. :translation:`Denn diese Erfindung wird der Lernenden Seelen vielmehr Vergessenheit einflößen aus Vernachlässigung des Gedächtnisses, weil sie im Vertrauen auf die Schrift sich nur von außen vermittelst fremder Zeichen, nicht aber innerlich sich selbst und unmittelbar erinnern werden. Nicht also für das Gedächtnis, sondern nur für die Erinnerung hast du ein Mittel erfunden, und von der Weisheit bringst du deinen Lehrlingen nur den Schein bei, nicht die Sache selbst. Denn indem sie nun vieles gehört haben ohne Unterricht, werden sie sich auch vielwissend zu sein dünken, da sie doch unwissend größtenteils sind, und schwer zu behandeln, nachdem sie dünkelweise geworden statt weise.` :comment:`[translation by Schleiermacher]` |275|writing systems, emergence of writing, nice quote 3222|Pinget2016|This study aims at testing whether there are regional differences in the perception of the labiodental fricative contrast in Dutch. Previous production studies have shown that the devoicing of initial labiodental fricatives is a change in progress in the Dutch language area. We present the results of a speeded identification task in which fricative stimuli were systematically varied for two phonetic cues, voicing and duration. Listeners (n=100) were regionally stratified, and the regions (k=5) reflect different stages of this sound change in progress. Voicing turned out to be the strongest categorization cue in all regions; duration only played a minor role. Regional differences showed up in the perception of the consonantal contrast that matched regional differences in production reported in previous studies. The addition of random slopes in the mixed model regression showed the importance of within-regional variation.|000|sound change, consonant change, sound change in progress, Dutch, Dutch dialects 3223|Calboli2017|I discussed a passage in the Rhetorica ad C.Hernnium (82 BC) where two versions of the same talk are delivered ([1a] and [1b]), the former in a colloquial style, the second in a worse, almost “vulgar” Latin. The difference appears also in the two constructions (1a) coepit defricari (instead of *coeptus est defricari which is usually employed by Cicero, Caesar, Livy) and (1b) praesente multis (instead of *praesentibus multis). The former construction is correct but less used by classical authors; the second is not correct and appears as a kind of “vulgar” Latin. However, the Auctor ad Herennium knew very well the differences, because he uses coeptum est dici (4.30.41) and contra intercedentibis collegis (1.12.21). It seems therefore that the Auctor established a kind of barrier inside of which the language, albeit low, was correct, outside incorrect, and acknowledged that inside/outside was ruled by gradation and frequency of use and social condition.|000|sociolinguistics, sociolinguistic variation, Latin, colloquial language, 3224|Calboli2017|Potentially interesting paper which discusses colloquial use of Latin as a stylistic element in writing. This gives some interesting hints on the degree of sociolinguistic variation in early Latin times.|000|Latin, sociolinguistic variation, historical sociolinguistics 3225|Pagel2017|Human language is unique among all forms of animal communication. It is unlikely that any other species, including our close genetic cousins the Neanderthals, ever had language, and so-called sign ‘language’ in Great Apes is nothing like human language. Language evolution shares many features with biological evolution, and this has made it useful for tracing recent human history and for studying how culture evolves among groups of people with related languages. A case can be made that language has played a more important role in our species’ recent (circa last 200,000 years) evolution than have our genes.|000|biological parallels, language origin, human language, language evolution 3226|Pagel2017|Paper contains as always many naiveties, and a very bad figure of the evolution of words for "hand" in IE languages (claiming that Germanic derives from Proto-Latin). And yet another table of parallels.|000|biological parallels, language evolution, human language, language origin 3227|Pagel2017| .. image:: static/img/pagel-2017-table2.png :width: 500px :name: table :comment:`Table 2 showing biological analogies` |4|biological parallels, analogy, biological evolution, language evolution 3228|Botigue2017|Europe has played a major role in dog evolution, harbouring the oldest uncontested Palaeolithic remains and having been the centre of modern dog breed creation. Here we sequence the genomes of an Early and End Neolithic dog from Germany, including a sample associated with an early European farming community. Both dogs demonstrate continuity with each other and predominantly share ancestry with modern European dogs, contradicting a previously suggested Late Neolithic population replacement. We find no genetic evidence to support the recent hypothesis proposing dual origins of dog domestication. By calibrating the mutation rate using our oldest dog, we narrow the timing of dog domestication to 20,000–40,000 years ago. Interestingly, we do not observe the extreme copy number expansion of the AMY2B gene characteristic of modern dogs that has previously been proposed as an adaptation to a starch-rich diet driven by the widespread adoption of agriculture in the Neolithic.|000|domestication, dog, animal evolution, 3229|Botigue2017|Paper does not find evidence for dual evidence of dog domestication.|000|dog, domestication, animal evolution 3230|Serva2017|Given a population of N elements with their geographical positions and the genetic (or lexical) distances between couples of elements (inferred, for example, from lexical differences between dialects which are spoken in different towns or from genetic differences between animal populations living in different faunal areas) a very interesting problem is to reconstruct the geographical positions of individuals using only genetic/lexical distances. From a technical point of view the program consists in extracting from the genetic/lexical distances a set of reconstructed geographical positions to be compared with the real ones. We show that geographical recovering is successful when the genetic/lexical distances are not a simple consequence of phylogenesis but also of horizontal transfers as, for example, vocabulary borrowings between different languages. Our results go well beyond the simple observation that geographical distances and genetic/lexical distances are correlated. The ascertainment of a correlation, in our perspective, merely is a prerequisite.|000|genetic distance, lexical distance, geographic distance, geography 3231|Serva2017|They don't really link their data or their code, but they uploaded it, so it can be accessed.|000|genetic distance, linguistic distance, geographic distance 3232|Koch2013|Paper introduces different thoughts on the quality of cognates with a specific focus on Australian languages. Nevertheless, the paper can also be considered as presenting contributions to methodology in general.|000|cognacy, oblique cognacy, transparancy, comparative method, methodology, Australian languages 3233|Szemerenyi1962|Paper addresses interesting aspects of methodology in etymological research from an Indo-European perspective. |000|methodology, comparative method, cognacy, terminology 3234|Szemerenyi1962|In the light of these developments it is clear that, for the Indo-Europeanist, etymology today embraces two essentially different questions: 1) the original form and, if attainable, the original meaning of the Indo-European prototype; 2) its subsequent history down to a point indicated by the documentation of the daughter-language concerned.|178|etymology, comparative method, methodology 3235|Szemerenyi1962|What is *needed to day is a body of principles which will so guide the researcher as to reach a solution in a methodical fashion.*|178|etymology, evidence, word equations, methodology 3236|Koch2013|In Australian linguistics there has been a tendency to rely heavily, for the establishment of cognate sets, on strict translation equivalents such as can be constructed from wordlists formed from translations of the same set of elicitation words [...]. Such comparative data is relatively accessible in comparative wordlists which can be displayed in a two-dimensional table ordered by language and gloss [...]. The procedures of lexicostatistics, in fact, are restricted to registering as "cognates" lexemes that manifest such semantic euqivalence. We shall refer to such cognates as *s-cognates*, where s can stand for "synynymous" or "semantically equivalent".|34|cognacy, Swadesh cognacy, terminology, methodology 3237|Koch2013|Such reliance on strict synonymous cognates is useful as the first step in language comparison. I tis also the most reliable base for establishing the sound correspondences from which the historical phonology can be established. But for the further goals of coparative linguistics -- describing the totality of historical changes affecting the languages, working out the genealogical relationship between the languages, and presenting the complete etymological evidence that justifies these historical relations -- it is necessary to go beyond the obvious s-cognates; one must attempt to establish a full set of cognates. Failure to move beyond the s-cognates artificially restricts the number of discoverable cognates. Any comparative study which limist the evidence to s-cognate sets is liable to present a distorted picture of the historical situation; it is therefore scientifically unsound.|34|synonymous cognates, Swadesh cognacy, cognacy, terminology, methodology, 3238|Koch2013|Proper etymological study consists of finding and listing *all* cognates, regardless of whether they are equivalent semantically or have shifted their meaning, whether they are whole words or just parts of words, and whether their relationship is transparent or obscure. Special effort may be required to find the non-transparent, obscure cognates, which may survive only in a rather disguised form. Compare @Szemerenyi<1962> principle D, one of his six principles of etymological research proposed for the Indo-European languages: "*If a word is guaranteed for Indo-European, its (alleged) absence in any individual language requires an explanation; careful search will often lead to its belated identification [...]* Its alleged absence is very often due to the fact that it survives in derivatives only" (@Szemerenyi<1962> 1977: 315).|35|cognacy, comparative method, etymology, 3239|Szemerenyi1962|:comment:`Lists various principles of etymology for Indo-European:` A) *If a given etymon, though fundamentally evident, involves phonological difficulties, the researcher should seek for a more "economic" solution.* |180|methodology, comparative method, etymology, Indo-European 3240|Szemerenyi1962|B) Turning now to morphology, we may state that any etymology must statisfy the known rules of word-formation in the language concerned and in Indo-European in general. *If a given etymon, though fundamentally evident, is at variance with the rules of word-formation, the researcher should seek for a more economic solution.*|182|word formation, etymology, methodology, comparative method 3241|Szemerenyi1962|C) We now come to the point, which, as Schuchardt so often complained, si very frequently taken very lightly, meaning. Our procedure should be governed by the simple rule: *If an etymon involves the assumption of an unusual semantic development, the researcher must re-examine the phonological and morphological apsects of the derivation. More often than not the result will be the discovery of an entirely different, evident, solution. |187|etymology, semantic change, semantic reconstruction, methodology, 3242|Szemerenyi1962|D) *If a word is guaranteed for Indo-European, its (alleged) absence in any individual languge requires an explanation; careful search will often lead to its belated identification.*|192|lexical change, etymology, word loss, lexical replacement, methodology, 3243|Szemerenyi1962|E) Details of phonology and morphology, in particular word-formation, can generally be determined by bearing in mind the following principle: *If a word, guaranteed for Indo-European, is also found in Latin, its formation must be presumed to show the greatest resemblance to the formations of the Western Indo-European languages.* It goes without saying that, *mutatis mutandis*, the principle applies to any other Indo-European langauge. For a Baltic word, for instance, one would expect the largest measure of agreement with Slavid and Germanic, and so on.|199|linguistic reconstruction, etymology, methodology, word formation, phonological reconstruction 3244|Szemerenyi1962|F) *If a given word is not a loanword, it is most likely to be paralleled by a similar or identical formation in one of the contiguous areas.* :comment:`Basically means: only assume a borrowing if you find the prove, but don't stop searching for the real cognates.`|205|lexical borrowing, linguistic reconstruction, methodology, 3245|Koch2013|While not insisting, with Szemerényi, on an explanation for the absence of reflexes of an established etymon in a particular language or subgroup, we recommend that comparativists should be alert to the presence of such disguised reflexes. we propose as a slightly watered-down version of @Szemerenyi<1962> prinziple D, and one applicable in any language family, this paraphrase: The reflex of an established proto-form may survive in a disguised form. In comparative grammar, special value for reconstruction is placed on such "relics" or [pb] "archaisms", that is, reflexes of an earlier form that survive in an altered and possibly disguised function (@Hock1986 610). The search for such disguised forms should be an integral component of the etymological study of vocabulary as much as it is of comparative grammar.|35f|archaic forms, methodology, linguistic reconstruction, etymology, 3246|Starostin2013a|Таким образом, лексикостатистика без глоттохронологиче- ской поддержки может существовать в лучшем случае как один из возможных классификационных критериев, удобный за счет своего формализированно-унифицированного характера, но без- условно вспомогательный по отношению к «нетривиальной» методике поиска общих инноваций на фонологическом и морфо- логическом уровнях. Устранение идеи «глотточасов» (англ. glotto- clock), неприемлемой для многих исследователей скорее исходя из их теоретических представлений о природе языковых изменений, чем из эмпирических данных, в целом скорее обессмысливает лексикостатистический метод, чем совершенствует его.|81|lexicostatistics, glottochronology, subgrouping 3247|Starostin2013a|Большинство таких замечаний можно, по-видимому, раз- делить на три типа: (а) сущностно-теоретические, подвергающие сомнению базисные принципы лексикостатистики в силу того, что их существование невозможно объяснить теоретическими причинами; (б) сущностно-эмпирические, пытающиеся опроверг- нуть универсальность постулатов Сводеша и его последователей на конкретных языковых примерах; (в) технические, в рамках которых всячески подчеркиваются трудности практического [pb] характера, связанные с составлением конкретных списков и кри- териями их анализа. Разберем каждую из этих групп отдельно.|82f|lexicostatistics, glottochronology, methodological problem, 3248|Starostin2013a|:comment:`Very interesting treatment of various theoretical arguments against lexicostatistics. Should be thoroughly read and included here.`|83f|theoretical problems, methodological problem, lexicostatistics, critics 3249|Starostin2013a|Так, например, личные местоимения ʽяʼ, ʽтыʼ, ʽмыʼ обычно отличаются высокой стабильностью, поскольку для них нехарак- терны ни появление переносных значений, ни квази-синонимия; исключение — языки с категорией вежливости (Юго-Восточная Азия и ряд других регионов), где для этих местоимений часто встречаются богатые синонимические ряды, заметно снижающие их общий индекс стабильности. Для таких частей тела, как ʽглазʼ [pb] или ʽухоʼ, обычно характерно обилие переносных значений (в основном через метафорические развития: ʽглазокʼ как ʽотверстиеʼ или ʽнарост, пятнышкоʼ по отношению к самым разным объектам, ʽушкоʼ примерно с такой же семантикой и т. п.), но квази-сино- нимия для этих слов обычно довольно ограничена, и заменяются они, как правило, на сильно маркированные жаргонизмы. Напро- тив, для цветообозначений квази-синонимия чрезвычайно рас- пространена в самых разных (возможно, во всех) языках мира — это связано с возможностью сложной и разнообразной детали- зации цветовой палитры, в ходе которой частотные, «базовые» цветообозначения постоянно находятся под угрозой замещения одним из многочисленных «второстепенных» терминов.|89f|basic vocabulary, lexical change, semantic change, lexical replacement 3250|Haarmann1990|The ʽbasicnessʼ of certain sectors of the vocabulary in natural languages cannot be substantiated because the notion is itself a misconception. An effort has to be made to eradicate the idea of a so-called precultural strata of concepts which have been assumed for setting up the list of lexical items on which lexicostatistics is based... The most crucial factor in glotto- chronology, arguably, is the assumption that there are sectors of the voca- bulary in any language which are more resistant to borrowing than other sectors. It has never been made explicit why certain sectors of the lexicon should be, predominantly or exclusively, comprised of cog nates. Among the diffuse implications of the notion of a ʽbasicʼ vocabulary is the idea that denominations for concepts which are elements of universal human experience are not easily replaced in a given speech community. Borrow- ings in the assumed ʽbasicʼ sector of the lexicon are considered by glotto- chronologists as rare exceptions, whereas their occurrence is in fact a much more common phenomenon in processes of language contacts than is acknowledged even by very critical opponents of lexicostatistics. :comment:`[Quoted after` @Starostin2013a :comment:`96]`|150f|basic vocabulary, methodology, problem 3251|Starostin2013a|Для того, чтобы не попасть здесь в терминологическую ло- вушку (как это, с нашей точки зрения, случилось в работе Г. Хаар- манна), необходимо лишний раз подчеркнуть, что для кон- кретных целей, преследуемых лексикостатистическим методом, разница между «базисной» и «культурной» лексикой на самом деле не является принципиальной. Конкретные слова в 100-словном списке Сводеша во многом представляют собой произвольную выборку, обусловленную не только и не столько «базисностью», «универсальностью», «культурной независимостью» соответству- ющих значений, сколько элементарным удобством исследователя.|99|basic vocabulary, lexicostatistics, methodology, 3252|Starostin2013a|1.6.3. Технические проблемы лексикостатистики :comment:`lists in the following various different technical problems of lexicostatistics, which will be summarized here.` [pb] (а) 100-словный список изначально составлен не на метасеман- тическом языке, а на английском; т. е. элементы его представлены не значениями, а словами естественного языка. Если эти слова полисемичны, какое значение следует выбирать? :comment:`translation problems` (б) Как поступать с проблемой синонимии, когда одному и тому же значению в словаре или текстовом корпусе языка соот- ветствует два или более эквивалента? :comment:`problem of synonymy` (в) Как быть в тех случаях, когда эквивалентов не обнаружи- вается вообще, будь то по техническим причинам (недостаток данных) или сущностным (значение действительно отсутствует в языке)? :comment:`problem of finding translation equivalents.` (г) При проставлении помет, определяющих когнацию, как следует поступать с композитными основами, состоящими из двух или более лексических морфем, а также с теми случаями, когда слова, объединенные общим этимологическим корнем, имеют при этом различное морфологическое оформление? :comment:`treatment of compound words when doing the cognation` (д) Если этимология слова неизвестна, каким образом опре- делить, заимствовано ли оно (чтобы исключить его из подсчетов) или является исконным (чтобы подсчитать его как внутреннюю замену)? :comment:`problem with uncertain etymologies` (е) Правильно ли мы поступаем, не проводя никаких фор- мальных разграничений между полной утратой лексической еди- ницы (исчезновением ее из языка) и частичной, когда она остается в языке либо с изменившимся значением, либо в качестве [pb] архаизма? Не приводит ли это к существенной потере инфор- мации и искажению результатов? :comment:`problem of treating only semantic equivalents for computation` |103-105|lexicostatistics, problem, methodology, technical problem, 3253|Starostin2013a|1.6.3.4. Проблема полиморфных основ. Одно из ключевых пред- писаний лексикостатистики гласит, что когнация постулируется в тех случаях, когда сравниваемые слова имеют общий лексический корень. Нередки, однако, ситуации, в которых то или иное базис- ное значение выражается словом, состоящим из двух (или, очень редко, даже более чем двух) корней. Отдельные случаи такого рода встречаются и в индоевропейских языках, хотя наиболее [pb] «болезненный» характер эта проблема имеет для языковых семей Юго-Восточной Азии и Северной Америки, где корнесложение является одним из основных способов словообразования. Языки Африки в этом плане занимают скорее промежуточное положе- ние, но и в них композитные основы — не редкость.|118f|lexicostatistics, cognation, cognate judgments, compounding 3254|Starostin2013a|Предположим, что некое «стословное» значение в языке X 1 выражено сложной основой A+B, где A и B — лексические корни; в родственном ему языке X 2 — простой основой A; наконец, в родственном им обоим языке X 3 — сложной основой A+C, где C это третий лексический корень, этимологически отличный от B. С исторической точки зрения самое вероятное объяснение дан- ной ситуации таково: искомое значение в праязыке X выражалось простой основой A, которая в двух из трех потомков независимо друг от друга была затем расширена за счет дополнительного корня. Как в таком случае следует маркировать когнацию? Воз- можны два пути рассуждения: :comment:`Points to problem of intransitive cognate sets. Three words AB, A, AC` (а) исконный корень A сохраняется во всех трех языках, следо- вательно, отношение когнации во всех случаях положительно; :comment:`Since A occurs in all words, they can be treated as cognate.` (б) расширение основы A за счет дополнительного корня можно рассматривать как случай лексического замещения, по- скольку структура означающего изменилась не только за счет фонетических переходов (которые лексикостатистика не учиты- вает), но и за счет существенного изменения морфологической структуры. Следовательно, отношение когнации во всех случаях должно иметь отрицательное значение. :comment:`Assume that each case is different, as they are clear-cut deviations from the origin.` |119|cognation, methodology, lexicostatistics, cognate judgments 3255|Starostin2013a|Интуитивно кажется, что стратегия (б) представляет собой чрезмерное ужесточение требований. Однако для того, чтобы оправдать это интуитивное представление, можно прибегнуть к следующему объективному аргументу: в основе процессов лекси- ческого замещения, когда значение перестает выражаться мор- фемой X и начинает выражаться морфемой Y, и лексического наращения, когда значение вместо простой морфемы X начинает выражаться конкатенацией X+Y, как правило (исключения допус- тимы и неизбежны, но речь идет именно о типичной, а не универ- сальной ситуации), лежат принципиально разные механизмы.|120|compounding, cognation, lexical replacement, methodology, lexicostatistics 3256|Starostin2013a|Процедура, допускающая наличие в языках абсолютной (не- транзитной) синонимии, хотя бы для подобного рода случаев корнесложения, требует помечать основу *waRi номером 1, основу *mata номером 2, а основу hiŋa в фиджи номером 3. Соответст- венно, все формы группы (б) в рамках 100-словного списка будут иметь по два номера (1, 2), и, таким образом, индонезийское mata hari будет одновременно находиться в отношении когнации к иснаг mata и танна mǝt (по первому корню) и к рукай vai (по второму) — при том, что иснаг mata и рукай vai когнатами счи- таться не будут.|121|methodology, lexicostatistics, compounding, cognation, 3257|Starostin2013a|:comment:`Interesting example from Austronesian: "sun" becomes "eye of the day" with "day" being equivalent to "sun" and even words with "eye of something". Starostin argues that one should try to think of the most plausible process, and, e.g., thus say that "eye of something" is not cognate with the rest.`|121-123|cognation, cognate judgments, compounding, methodology, lexicostatistics 3258|Starostin2013a|Очевидно, что единого формального критерия, по которому можно было бы разграничивать внешние и внутренние замены, существовать не может. В стословных списках, подготовленных и обработанных в рамках деятельности Московской школы компа- ративистики, заимствования чаще всего помечаются на основа- нии сравнительно-исторического анализа данных: чем детальнее изучена та или иная семья, тем надежнее можно аргументировать принятие соответствующего решения.|125|cognation, compounding, lexicostatistics, etymology, cognate judgments, methodology 3259|Christiansen2017|It has long been assumed that grammar is a system of abstract rules, that the world's languages follow universal patterns, and that we are born with a ‘language instinct’. But an alternative paradigm that focuses on how we learn and use language is emerging, overturning these assumptions and many more.|000|language, science, methodology, popular science 3260|Christiansen2017|Seemingly interesting article that claims that language should be studied more profoundly and from more and more different angles. They make a strong argument for construction grammar, which is also quite interesting in the light of Chomsky-bashing.|000|construction grammar, methodology, language science, 3261|Christiansen2017|Linguistics seemed unable to assist computer-based natural language processing, as illustrated by the IBM (International Business Machines) engineer, Fred Jelinek, who famously remarked that “Every time we fire a linguist, the performance of our system goes up.” (@Moore2005)|1|nice quote, nlp, 3262|Moore2005|In 1997 the author conducted a survey at the IEEE workshop on ‘Automatic Speech Recognition and Understanding’ (ASRU) in which attendees were offered a set of twelve putative future events to which they were asked to assign a date. Six years later at ASRU’2003, the author repeated the survey with the addition of eight additional items. This paper presents the combined results from both surveys.|000|survey, linguistics, NLP 3263|Moore2005|For over 20 years, the IEEE has organised a biennial workshop covering the latest developments in automatic speech recognition and attended by the leading researchers in the field. Many of these meetings have been of high significance; for example, it was at the 1985 workshop entitled ‘Frontiers of Speech Recognition’ that Fred Jelinek uttered the now immortal phrase “Every time we fire a phonetician/linguist, the performance of our system goes up”. By 1995 the series had become known as ‘ASRU’ - the IEEE workshop on Automatic Speech Recognition and Understanding.|1|nice quote, linguistics, NLP 3264|Goldberg2006|At the heart of this emerging alternative framework are constructions (@Goldberg2006), which are learned pairings of form and meaning ranging from meaningful parts of words (such as word endings, for example, ‘-s’, ‘-ing’) and words themselves (for example, ‘penguin’) to multiword sequences (for example, ‘cup of tea’) to lexical patterns and schemas (such as, ‘the X-er, the Y-er’, for example, ‘the bigger, the better’). The quasi-regular nature of such construction grammars allows them to capture both the rule-like patterns as well as the myriad of exceptions that often are excluded by fiat from the old view built on abstract rules. From this point of view, learning a language is learning the skill of using constructions to understand and produce language. |2|construction grammar, exceptions, grammar, 3265|Goldberg2006|Constructions—form and meaning pairings—have been the basis of major advances in the study of grammar since the days of the ancient Stoics. Observations about particular linguistic constructions have shaped our understanding of both particular languages and the nature of language itself. But only recently has a new theoretical approach emerged that allows obser- vations about constructions to be stated directly, providing long-standing traditions with a framework that allows both broad generalizations and more limited patterns to be analyzed and accounted for fully. Many linguists with varying backgrounds have converged on several key insights that have given rise to a family of approaches, here referred to as constructionist approaches.|3|construction grammar, construction, grammar, 3266|Goldberg2006|Constructionist approaches share certain foundational ideas with the mainstream ‘‘generative’’ approach that has held sway for the past several decades (Chomsky 1957, 1965, 1981). Both approaches agree that it is essential to consider language as a cognitive (mental) system; both approaches ac- knowledge that there must be a way to combine structures to create novel utterances, and both approaches recognize that a non-trivial theory of lan- guage learning is needed.|4|construction grammar, Chomsky syntax, generative grammar 3267|Hanks2000|A dictionary is an inventory of the words of a language, with explanations or translations. All major languages and many others have dictionaries. This chapter traces the development of dictionaries for over 2,000 years, starting with China, India, Persia, classical Greece, and Rome. Arabic and Hebrew dictionaries in the Middle Ages were of comparable cultural importance. A major impact was the invention of printing. During the Renaissance, the Latin dictionaries of Calepino and Estienne set standards for future lexicography. The prescriptive aims of European Academies during the Enlightenment are contrasted with Johnson’s descriptive principles. The historical principles of OED are contrasted with the synchronic principles of dictionaries intended as a collective cultural index and dictionaries as aids for foreign learners. In Russia (unlike America), lexicography developed harmoniously with linguistics. The relationships between dictionaries and language development in different countries are discussed. The chapter concludes with a summary of the impact of computer technology, corpora, and changing business models on lexicography.|000|history of science, linguistics, lexicography 3268|Haspelmath2006|This paper first provides an overview of the various senses in which the terms ‘ marked’ and ‘ unmarked’ have been used in 20th-century linguistics. Twelve different senses, related only by family resemblances, are distinguished, grouped into four larger classes : markedness as complexity, as difficulty, as abnormality, and as a multidimensional correlation. In the second part of the paper, it is argued that the term ‘markedness ’ is superfluous, because some of the concepts that it denotes are not helpful, and others are better expressed by more straightforward, less ambiguous terms. In a great many cases, frequency asymmetries can be shown to lead to a direct explanation of observed structural asymmetries, and in other cases additional con- crete, substantive factors such as phonetic difficulty and pragmatic inferences can replace reference to an abstract notion of ‘ markedness ’.|000|markedness, grammar, grammaticalization, methodology, terminology 3269|Haspelmath2006|Very nice paper on markedness and why we don't need it.|000|markedness, grammaticalization, methodology, terminology 3270|Stevens2017|Animal camouflage is a longstanding example of adaptation. Much research has tested how camouflage prevents detection and recognition, largely focusing on changes to an animal’s own appearance over evolution. However, animals could also substantially alter their camouflage by behaviourally choosing appropriate substrates. Recent studies suggest that individuals from several animal taxa could select backgrounds or positions to improve concealment. Here, we test whether individual wild animals choose backgrounds in complex environments, and whether this improves camouflage against predator vision. We studied nest site selection by nine species of ground-nesting birds (nightjars, plovers and coursers) in Zambia, and used image analysis and vision modelling to quantify egg and plumage camouflage to predator vision. Individual birds chose backgrounds that enhanced their camouflage, being better matched to their chosen backgrounds than to other potential backgrounds with respect to multiple aspects of camouflage. This occurred at all three spatial scales tested (a few centimetres and 5 m from the nest, and compared with other sites chosen by conspecifics) and was the case for the eggs of all bird groups studied, and for adult nightjar plumage. Thus, individual wild animals improve their camouflage through active background choice, with choices highly refined across multiple spatial scales.|000|animal camouflage, mimicry, animals, 3271|Stevens2017|In its essence, this paper means that some birds know how they look, as otherwise they could not chose the best way to hide when breeding. This may also suggest that knowing who one is is not self-perception, etc., or theory of mind, but just an evolutionary function.|000|bird evolution, animal camouflage, mimicry 3272|Helmstetter2017|Interesting article which presents the history or at least some rudimentary history of punctuation marks. It is true, that these marks are seldom discussed in the literature on writing, although they definitely deserve to be discussed, since they are simply quite important for a successful and enhanced reading experience.|000|punctuation mark, history of science, writing systems 3273|Pechenick2017|Of basic interest is the quantification of the long term growth of a language’s lexicon as it develops to more completely cover both a culture’s communication requirements and knowledge space. Here, we explore the usage dynamics of words in the English language as reflected by the Google Books 2012 English Fiction corpus. We critique an earlier method that found decreasing birth and increasing death rates of words over the second half of the 20th Century, showing death rates to be strongly affected by the imposed time cutoff of the arbitrary present and not increasing dramatically. We provide a robust, principled approach to examining lexical evolution by tracking the volume of word flux across various relative frequency thresholds. We show that while the overall statistical structure of the English language remains stable over time in terms of its raw Zipf distribution, we find evidence of an enduring ‘lexical turbulence’: The flux of words across frequency thresholds from decade to decade scales superlinearly with word rank and exhibits a scaling break we connect to that of Zipf’s law. To better understand the changing lexicon, we examine the contributions to the Jensen-Shannon divergence of individual words crossing frequency thresholds. We also find indications that scholarly works about fiction are strongly represented in the 2012 English Fiction corpus, and suggest that a future revision of the corpus should attempt to separate critical works from fiction itself.|000|birth-death models, lexical replacement, lexical evolution, Google N-Grams 3274|Pechenick2017|Not clear how interesting this article is in the end, but it deals with the death and birth of words, using Google n-grams, and it might contain some interesting ideas here, so it is definitely worth a closer read, even if there's no time for this at the moment.|000|Google N-Grams, birth-death models, lexical evolution 3275|Sankoff1980|Paper deals with wave vs. family tree theory and interpretes waves as diffusions across boundaries, while family tree evolution is innovation inside boundaries. The conclusion is that neither model is satisfactory alone. Instead both processes seem to be important. In fact, this means nothing else than that the usage of combined methods is required in which vertical and lateral edges are drawn on a network. If diffusion occurs inside an unresolvable unit, this is displayed in the vertical edges of the evolutionary network, if it crosses boundaries, this is displayed in the lateral edges.|000|wave theory, family tree, Stammbaumtheorie, evolutionary networks, linguistics, 3276|Posth2016|Posth et al. recover 35 new mitochondrial genomes from Late Pleistocene and early Holocene European hunter-gatherers. Major human mtDNA haplogroup M, absent in contemporary Europeans, is discovered in several pre-Last Glacial Maximum individuals. Demographic modeling reveals a major population turnover during the Late Glacial 14,500 years ago.|000|Out-of-Africa, population genetics, population of Europe, 3277|Posth2016|What makes this paper interesting is the fact that evidence which was absent before and thought to be "significantly" absent, turned later out to be present. As a result, the common theories that stated that pre-Neolithic humans did not reach Europe turns out to be wrong. |000|missing data, evidence, philosophy of science, methodology, 3278|Dalonzo2017|And this notwithstanding the fact that the 19th century was also the epoch of the well-known official interdiction of that topic promoted by the Société de Linguistique de Paris (founded in 1866). Article 2 of its constitution states, “The Society does not admit any communication regarding language origins as well as the creation of a universal language” (cited after Auroux 1989: 123; transl. mine: JDA).|47|language origin, nice quote, Société Linguistique de Paris, history of science 3279|Dalonzo2017|Such scepticism was almost certainly reinforced by the main goal of linguis- tics during the 19th century. Linguistics wanted to appear as a science and to strengthen its own academic position (see Auroux 1989). Questions of a more philosophical nature, such the origins of language, were officially left out. For instance, it should be recalled that the refusal of Neogrammarians to recognize the relevance of the question of language origins.|47|history of science, language origin, 3280|Dalonzo2017|Article gives an interesting summary on the investigations and speculations about language origin in the 19th century and before in linguistics. While later scholars ignored the question completely, there was a vivid debate which seeked for explanations as to the origins of language, invoking various reasons, including rudimentary evolutionary accounts as well as a divine origin.|000|history of science, language origin, overview 3281|Goldberg2006|There is always a tension between being a ‘‘lumper’’ and being a ‘‘splitter.’’ As a biologist once put it, ‘‘splitters see very small, highly diVerentiated units— their critics say that if they can tell two animals apart, they place them in diVerent genera . . . and if they cannot tell them apart, they place them in diVerent species. Lumpers, on the other hand, see only large units—their critics say that if a carnivore is neither a dog nor a bear, they call it a cat’’ (Simpson 1945).|45|classification, grammar, categorization, lumping and splitting, 3282|Goldberg2006|There is a good deal of evidence in the field of non-linguistic categorization that information about specific exemplars is stored. :comment:`Goes on by showing many examples for this.`|45|categorization, exemplar theory, cognition 3283|Goldberg2006|In recognition of data such as these, there are a growing number of psychological models of categorization that combine exemplar-based know- ledge with some type of generalizations. :comment:`many examples follow`|48|categorization, exemplar theory, grammar, cognition 3284|Goldberg2006|Language learning must involve memories of individual examples because the end state of grammar is only partially general (Bybee 1985; Bybee and McClel- land 2005; Culicover 1999; Daugherty and Seidenberg 1995; LakoV 1970; Plunkett and Marchman 1993). :comment:`Many examples follow for this.`|49|exemplar theory, categorization, cognition, grammar, 3285|Goldberg2006|Gahl and Garnsey (2004) demonstrate that phonological reductions (/t,d/ deletions) are more likely in high-probability constructional contexts than in low-probability contexts.|49|exemplar theory, frequency, usage, construction grammar, categorization 3286|Kirby2015|Language exhibits striking systematic structure. Words are composed of combinations of reusable sounds, and those words in turn are combined to form complex sentences. These properties make language unique among natural communication systems and enable our species to convey an open-ended set of messages. We provide a cultural evolu- tionary account of the origins of this structure. We show, using simulations of rational learners and laboratory experiments, that structure arises from a trade-off between pres- sures for compressibility (imposed during learning) and expressivity (imposed during com- munication). We further demonstrate that the relative strength of these two pressures can be varied in different social contexts, leading to novel predictions about the emergence of structured behaviour in the wild.|000|compressibility, expressivity, language evolution, structure, review, 3287|Kirby2015|Paper talks about the pressure on language structure evolving from the need to be concise (compressibility) in messages, as well as being explicit (expressivity), which drives language change. This remembers of the speaker-hearer problem in linguistics, advocated by @Ohala1993 and others.|000|expressivity, compressibility, language evolution, language change, review, 3288|Kirby2015|The idea that key features of language arise from the trade-off between competing pressures has a long history. Competing motivations of speaker and hearer, for instance, have been a rich explanatory tool for cognitive scientists (e.g. Zipf, 1949; Ferrer i Cancho & Solé, 2003; Piantadosi, Tily, & Gibson, 2012) and linguists seeking explanations for typological universals of language (e.g. Givón, 1979; DuBois, 1987; Kirby, 1997; Jäger, 2007): for example, utterances in a language will tend to minimise effort for the speaker as long as distinctiveness for the hearer is not compromised (Zipf, 1949). This kind of observation can be couched in terms of compression, i.e., optimisation of a repertoire of signals such that the energetic cost of unambiguously conveying any meaning is minimised. This leads naturally to the inverse relationship between frequency and length of words identified by Zipf (1936); more generally, it has been suggested that such optimally-compressible signal inventories are a universal feature of natural communication systems across all spe- cies (Ferrer i Cancho et al., 2013).|88|language change, speaker-listener-model, reasons for language change, 3289|LaBar2016|A major aim of evolutionary biology is to explain the respective roles of adaptive versus non- adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origina- tion of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological popula- tions. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small popula- tions evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations.|000|complexity, evolution of complexity, digital experimental evolution, simulation studies, 3290|LaBar2016|Main point of the paper is that drift can explain how complexity evolves, while heavy selection pressure may not be sufficient.|000|selection, natural selection, complexity, evolution of complexity, simulation studies 3291|Pigliucci2008|In recent years, biologists have increasingly been asking whether the ability to evolve — the evolvability — of biological systems, itself evolves, and whether this phenomenon is the result of natural selection or a by-product of other evolutionary processes. The concept of evolvability, and the increasing theoretical and empirical literature that refers to it, may constitute one of several pillars on which an extended evolutionary synthesis will take shape during the next few years, although much work remains to be done on how evolvability comes about.|000|evolvability, evolution, introduction, review, 3292|Pigliucci2008|Nice review paper that explains a lot of concepts in evolution, including evolvability, heritability, and innovation.|000|evolvability, heritability, innovation, terminology, review, 3294|Pigliucci2008|A philosophical approach, tracing back at least to Aristotle, that seeks explanations in terms of final causes. In evolutionary biology, teleology has often taken the form of some sort of vitalistic force that pushes evolution in a particular direction, for example, increased complexity.|79|teleology, biology, definition, 3295|Pigliucci2008|A concept first introduced by C.H. Waddington in the 1940s to refer to the fact that development is often resistant to perturbation and seems to proceed along certain preferential directions (to be ‘canalized’ along certain channels).|78|canalization, definition, biology, terminology 3296|Pigliucci2008|Neutral spaces are broad genotypic regions in which mutations do not change the phenotype or fitness. Neutral spaces mean that devel- opmental systems are ‘robust’, and can accumulate genetic variants that might be non-neutral in a different environmental context, thereby augmenting evolvability.|77|definition, neutral spaces, biology, robustness, terminology 3297|Pigliucci2008|The second keystone of evolvability is modularity 22,50 ; that is, the degree to which groups of phenotypic characteristics are independent.|77|modularity, construction, evolvability, definition, terminology, biology, 3298|Pigliucci2008|Table 1: ============================================ ================== ======================================================================================================================================================================== =============================================================================================================================== Suggested term Scale Description Effects ============================================ ================== ======================================================================================================================================================================== =============================================================================================================================== Heritability (sensu Houle) Within populations Standing pool of genetic variation and covariation Determines the response to natural selection within populations Evolvability (sensu Wagner & Altenberg) Within species Includes variability (sensu Wagner & Altenberg), depends on genetic architecture and developmental constraints Affects long-term adaptation, channels evolution along non-random trajectories, allows mid-term exploration of phenotypic space Innovation (sensu Maynard-Smith & Szathmary) Within clades As for within species, but includes the capacity to overcome standing genetic and developmental constraints, opening new areas of phenotypic space for further evolution Generates major phenotypic (morphological, behavioural or physiological) breakthroughs (novelties) ============================================ ================== ======================================================================================================================================================================== =============================================================================================================================== .. image:: static/img/Piggliucci-2008-76.png :name: table :width: 1000px :comment:`Image taken from the article.`|76|heritability, innovation, evolvability, biology, definition 3299|Wagner2008|Neutralism and selectionism are extremes of an explanatory spectrum for understanding patterns of molecular evolution and the emergence of evolutionary innovation. Although recent genome-scale data from protein- coding genes argue against neutralism, molecular engineering and protein evolution data argue that neutral mutations and mutational robustness are important for evolutionary innovation. Here I propose a reconciliation in which neutral mutations prepare the ground for later evolutionary adaptation. Key to this perspective is an explicit understanding of molecular phenotypes that has only become accessible in recent years.|000|neutral evolution, network, modularity, selection, drift, biology, 3300|Wagner2008|Paper argues that neutral evolution can actually help selection in evolution, since it may give the organism time to prepare enough variation that could then help to react on pressure if the environment requires it.|000|natural selection, neutral evolution, network, evolvability, overview, 3301|Wagner2008|Robustness and network size Because even genotypes of moderate length n can have a huge number of phenotypes (for example, ~1.8 n RNA secondary structures) , a sequence space is filled by a myriad of neutral networks, each one associated with a different phenotype. These phenotypes have very different neutral network sizes. Some phenotypes are adopted by many sequences. The neutral network of such phenotypes is large, and the average sequence on such a large network has many immediate neighbours. Other phenotypes are adopted by fewer sequences. Their neutral networks are smaller, and sequences on them have fewer immediate neighbours. So the larger the phenotype’s neutral network, the greater the phenotype’s robustness to mutations that change single amino acids or nucleotides. The existence of neutral networks has many implications for the evolutionary dynamics of populations of sequences .|967|robustness, neutral evolution, neutral network, phenotype, 3302|Wagner2008|Figure 2 .. image:: Wagner-2008-969.png :name: wagner :width: 1000px :comment:`Figure showing differences in the networks in situations of robustness and less robust systems.|-|robustness, neutral network, neutral evolution, evolvability, 3303|Wagner1996|he problem of complex adaptations is studied in two largely disconnected research traditions: evolutionary biology and evolutionary computer science. This paper summarizes the results from both areas and compares their implications. In evolutionary computer science it was found that the Darwinian process of mutation, recombination and selection is not universally effective in improving complex systems like computer programs or chip designs. For adaptation to occur, these systems must possess "evolvability," i.e., the ability of random variations to sometimes produce improvement. It was found that evolvability critically depends on the way genetic variation maps onto phenotypic variation, an issue known as the representation problem. The genotype-phenotype map determines the variability of characters, which is the propensity to vary. Variability needs to be distinguished from variations, which are the actually realized differences between individuals. The genotype-phenotype map is the common theme underlying such varied biological phenomena as genetic canalization, developmental constraints, biological versatility, devel- opmental dissociability, and morphological integration. For evolutionary biology the representation problem has im- portant implications: how is it that extant species acquired a genotype-phenotype map which allows improvement by mutation and selection? Is the genotype-phenotype map able to change in evolution? What are the selective forces, if any, that shape the genotype-phenotype map? We propose that the genotype-phenotype map can evolve by two main routes: epistatic mutations, or the creation of new genes. A common result for organismic design is modularity. By modularity we mean a genotype-phenotype map in which there are few pleiotropic effects among characters serving different functions, with pleiotropic effects falling mainly among characters that are part of a single functional complex. Such a design is expected to improve evolvability by limiting the interference between the adaptation of different functions. Several population genetic models are reviewed that are intended to explain the evolutionary origin of a modular design. While our current knowledge is insufficient to assess the plausibility of these models, they form the beginning of a framework for understanding the evolution of the genotype-phenotype map.|000|adaptation, natural selection, neutral evolution, 3304|Wagner1996|Paper gives an overview on evolvability by also comparing across sciences.|000|evolvability, evolution, complexity, neutral evolution, natural selection 3305|Karlin1990|An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydrophobicity, or secondary structure potential; for multiple sequences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are "optimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biological features. These include distinctive charge regions in transcription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport proteins, and statistically significant subalignments involving the recently characterized cystic fibrosis gene.|000|similarity score, scoring matrix, sequence alignment, significance, 3306|Maurits2017|We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.|000|software, BEAST, Bayesian approaches, phylogenetic reconstruction, Python 3307|Barron2017|Paper introduces basic approaches to word frequency statistics. Seems to offer a good overview on this topic.|000|statistics, word frequency, significance, introduction, tutorial 3308|Murawaki2015|Phylogenetic models, originally developed to demonstrate evolutionary biology, have been applied to a wide range of cultural data including natural language lexicons, manuscripts, folktales, material cultures, and religions. A fundamental question regarding the application of phylogenetic inference is whether trees are an appropriate approximation of cultural evolutionary history. Their validity in cultural applications has been scrutinized, particularly with respect to the lexicons of dialects in contact. Phylogenetic models organize evolutionary data into a series of branching events through time. However, branching events are typically not included in dialectological studies to interpret the distributions of lexical terms. Instead, dialectologists have offered spatial interpretations to represent lexical data. For example, new lexical items that emerge in a politico-cultural center are likely to spread to peripheries, but not vice versa. To explore the question of the tree model’s validity, we present a simple simulation model in which dialects form a spatial network and share lexical items through contact rather than through common ancestors. We input several network topologies to the model to generate synthetic data. We then analyze the synthesized data using conventional phylogenetic techniques. We found that a group of dialects can be considered tree-like even if it has not evolved in a temporally tree-like manner but has a temporally invariant, spatially tree-like structure. In addition, the simulation experiments appear to reproduce unnatural results observed in reconstructed trees for real data. These results motivate further investigation into the spatial structure of the evolutionary history of dialect lexicons as well as other cultural characteristics.|000|Japanese, Japonic, dialectology, Japanese dialects, family tree, network, wave theory, simulation studies 3309|Murawaki2015|Paper naively simulates language change in a network fashion as an exchange of words between neighboring dialects which are connected via links in a network. It shows that phylogenetic analyses can nevertheless infer a tree with low reticulation rates (using the distance-based approach with delta-scores in Neighbor-Net), if the network shows a strong division with bottlenecks in the data. This is on the one hand clear, on the other hand also interesting, but it seems like there are certain shortcomings in the findings: (a) the model for simulation is really simple and not convincingly explained, (b) it is not clear to which the approach assumes that dialects change themselves without even being in contact with other varieties, (c) it is questionable how appropriate it is to derive these findings from using NeighborNet: distance-based methods are forcing things heaviliy into a tree, so maybe this is not that big a thing to wonder about. On the other hand, the very idea to investigate data which has not been created by a tree-process and testing what phylogenetic algorithms do with it, is an interesting one, and it should be pursued further.|000|phylogenetic reconstruction, dialect network, simulation studies, Japonic, Japanese dialects, family tree, tree-likeness 3310|Karlin1989|An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be asigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydro- phobicity, or secondary structure potential; for multiple se- quences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are "optimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biolog- ical features. These include distinctive charge regions in tran- scription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport pro- teins, and statistically ignificant subalignments involving the recently characterized cystic fibrosis gene.|000|significance, sequence comparison, scoring function, scoring matrix, sequence alignment 3311|Karlin1989|Have not yet thoroughly read this article, but what I would hope to find in there is a way to use a scoring schema to predict how good an alignment scores on average. I wonder whether this is even possible, but if it is possible, this would be extremely helpful.|000|sequence alignment, significance, BLAST, E-Value, introduction 3312|Gontier2017a|Modern evolutionary biology is currently characterized by epistemological divergence because, beyond organisms and genes, scholars nowadays investigate a plurality of units of evolution, they recognize multilevel selection, and especially from within the Extended Synthesis, scholars have identified a plurality of evolutionary mechanisms that besides natural selection can explain how the evolution of anatomical form and functional behavior occur. Evolutionary linguists have also implicated a multitude of units, levels and mechanisms involved in (aspects of) language evolution, which has also brought forth epistemological divergence on how language possibly evolved. Here, we examine how a general evolutionary methodology can become abstracted from how biologists study evolution, and how this methodology can become implemented into the field of Evolu- tionary Linguistics. Applied Evolutionary Epistemology (AEE) involves a systematic search and analysis of the units (that what evolves), levels (loci where evolution takes place), and mechanisms (means whereby evolution occurs) of language evolution, allocating them into ontological hierarchies, and distinguishing them from other kinds of evolution. In this paper in particular, we give an in-depth analysis of how AEE enables an identification, exami- nation, and evaluation of levels and mechanisms of language evolution, and we hone in on how hierarchies and mechanisms of language (evolution) can and have been defined differentially. For an in-depth analysis of units of language evolution, we refer the reader to Gontier (2017) for which this paper functions as a follow-up. Thus, rather than present a specific theory of how language evolved, we present a methodology that enables us to unite existing research programs as well as to develop theories on the subject at hand.|000|language evolution, biological parallels, evolutionary theory, universal evolutionary theory, selection, methodology 3313|Gontier2017a|Paper contains some interesting ideas about how to compare evolution across domains, especially by distinguishing *units*, *levels*, and *mechanisms* of evolution (in general and language-specific).|000|language evolution, evolutionary theory, methodology, philosophy of science 3314|Gontier2017a|:comment:`Units, levels, and mechanisms of evolution` .. image:: static/img/Gontier2017a-4.png :width: 1000px :name: table [Table 2]|4|terminology, units of evolution, levels of evolution, evolutionary mechanisms, definition 3315|Yao2017|The origin and diversification of Sino-Tibetan speaking populations have been long-standing hot debates. However, the limited genetic information of Tibetan populations keeps this topic far from clear. In the present study, we genotyped 15 forensic autosomal short tandem repeats (STRs) from 803 unrelated Tibetan individuals from Gansu Province (635 from Gannan and 168 from Tianzhu) in northwest China. We combined these data with published dataset to infer a detailed population affinities and genetic substructure of Sino-Tibetan populations. Our results revealed Tibetan populations in Gannan and Tianzhu are genetically very similar with Tibetans from other regions. The Tibetans in Tianzhu have received more genetic influence from surrounding lowland populations. The genetic structure of Sino-Tibetan populations was strongly correlated with linguistic affiliations. Although the among-population variances are relatively small, the genetic components for Tibetan, Lolo-Burmese, and Han Chinese were quite distinctive, especially for the Deng, Nu, and Derung of Lolo- Burmese. Han Chinese but not Tibetans are suggested to share substantial genetic component with southern natives, such as Tai-Kadai and Hmong-Mien speaking populations, and with other lowland East Asian populations, which implies there might be extensive gene flow between those lowland groups and Han Chinese after Han Chinese were separated from Tibetans. The dataset generated in present study is also valuable for forensic identification and paternity tests in China.|000|Sino-Tibetan, population genetics, dating, 3316|Yao2017|Paper discusses Sino-Tibetan origins and is quite problematic regarding the linguistic perspective of Sino-Tibetan. Otherwise, however, it shows that more and more data is available in genetics, offering people to carry out larger and larger analyses.|000|Sino-Tibetan, population genetics, date estimation 3317|Wichmann2017a|Different methods exist for classifying languages, depending on whether the task is to work out the relations among languages already known to be related—internal language classification—or whether the task is to establish that certain languages are related—external language classification. The comparative method in historical linguistics, developed during the latter part of the 19th century, represents one method for internal language classification; lexicostatistics, developed during the 1950s, represents another. Elements of lexicostatistics have been transformed and carried over into modern computational linguistic phylogenetics, and currently efforts are also being made to automate the comparative method. Recent years have seen rapid progress in the development of methods, tools, and resources for language classification. For instance, computational phylogenetic algorithms and software have made it possible to handle the classification of many languages using explicit models of language change, and data have been gathered for two thirds of the world’s language, allowing for rapid, exploratory classifications. There are also many open questions and venues for future research, for instance: What are the real-world counterparts to the nodes in a family tree structure? How can shortcomings in the traditional method of comparative historical linguistics be overcome? How can the understanding of the results that computational linguistic phylogenetics have to offer be improved? External language classification, a notoriously difficult task, has also benefitted from the advent of computational power. While, in the past, the simultaneous comparison of many languages for the purpose of discovering deep genealogical links was carried out in a haphazard fashion, leaving too much room for the effect of chance similarities to kick in, this sort of activity can now be done in a systematic, objective way on an unprecedented scale. The ways of producing final, convincing evidence for a deep genealogical relation, however, have not changed much. There is some room for improvement in this area, but even more room for improvement in the way that proposals for long-distance relations are evaluated. |000|comparative method, subgrouping, genetic classification, methodology 3318|Wichmann2017a|Very interesting paper, that introduces the problem of shared innovations resulting from missing data. Gist is: you can never really tell whether similar traits are really uniquely shared innovations, unless you prove that they were indeed missing in ancestral stages, which is difficult, as we are dealing with missing data.|000|comparative method, shared innovation, cladistics, methodology, subgrouping, missing data 3319|Ziem2017|A multimodal construction is said to be a conventional pairing of a complex form, comprising at least a verbal and a kinetic element, with a specific meaning or a specific function. Do we need a new constructional approach to account for such multimodal constructions? What are the challenges to account for multimodality? The aim of this contribution is to provide a precise notion ‘multimodal construction’ and, on this basis, to indicate possible pathways for future investigations. The paper opts for cautiously extending the scope of existing constructional approaches in order to include non-linguistic meaningful behavior. In particular, it is argued that even though Construction Grammar invites for treating multimodal on a par with linguistic constructions, there is a huge lack of substantial empirical support to arrive at a more detailed and data-based understanding of the nature of multimodal constructions.|000|construction grammar, gesture, face-to-face communication 3320|Ziem2017|Not clear yet how relevant this paper could be, but it seems to be interesting as an additional read in the context of construction grammar, arguing that certain constructions are not only language-internal, but properties of other levels (gesture, communication, etc.).|000|construction grammar, multi-modal communication, gesture 3321|Collingridge2012| **Background** The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column. **Results** Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses. **Conclusion** Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research. |000|multiple sequence alignment, consensus alignment, algorithms, network 3322|Collingridge2012|Interesting algorithm describes how to combine several different multiple alignments with help of a graph-based approach that makes consensus alignments.|000|consensus alignment, multiple sequence alignment, algorithms 3323|Iacaponi2012|We will present a complete syllabifier for Italian (Sylli), that is based on phonological principles, flexible and easy to adapt for other uses, alphabets and languages. Crucial concepts re- garding syllabification principles in modern phonological the- ory will be discussed (§1.1); specific issues concerning Italian syllabification will then be summarised (§1.2) and an overview of the available automatic syllabification models will be pro- vided (§1.3). We will then move on to describe the program structure, the syllabification algorithm and two particular issues concerning syllabification in Italian (§2). Finally, we will illus- trate the results of a manual syllabification test carried out by linguists to verify the accuracy of the algorithm (§3).|000|syllabification, Italian, algorithms, syllable segmentation 3324|Iacaponi2012|When discussing automatic syllabification, it may be worthwhile checking this algorithm, as it seems to be easily expandable for other languages as well.|000|syllabification, automatic syllable segmentation, algorithms, Italian, 3325|Baker2017|Article discusses problems of reproducibility in chemistry, resulting from contaminated samples. It is yet another example of the problem of replicability and reproducibility in the sciences.|000|reproducibility, chemistry, contamination, examples 3326|Poornima2010|Though the canonical case for the counterpart relation is that there will be one counterpart for a given concept, this is often not the case in lan- guages and in our data. To take an example from a familiar language, the English counterpart for MOVIE could reasonably be film or movie, and it is quite easy to imagine a wordlist for English containing both words. The entry in (4) from the dataset we are working with gives an example of this from a wordlist of North Asmat, a language spoken in Indonesia. The concept GRANDFATHER has two counterparts, whose relationship to each other has not been specified in our source.|4|word list, concept list, counterpart, sub-counterpart, terminology 3327|Poornima2010|This paper describes work being done on the modeling and encoding of a legacy re- source, the traditional descriptive wordlist, in ways that make its data accessible to NLP applications. We describe an abstract model for traditional wordlist entries and then provide an instantiation of the model in RDF/XML which makes clear the re- lationship between our wordlist database and interlingua approaches aimed towards machine translation, and which also al- lows for straightforward interoperation with data from full lexicons.|000|concepticon, introduction, concept list, semantics, overview 3328|Poornima2010|This paper introduces the idea of a concepticon and has been quoted in our concepticon articles as a major inspiration source. Furthermore, it also introduces some interesting terminology, such as counterpart and sub-counterpart which are worth giving them a read when writing on the topic of lexical typology and similar things.|000|sub-counterpart, counterpart, terminology, concepticon, concept list, introduction 3329|Levinson2014|Hauser, Chomsky, and Fitch ( 2002 ) speculate that perhaps the sole feature of language that may be domain specifi c is the recursive nature of syntax. The implica- tion is that it was the evolution of this syntactic ability that accounts for the species- unique character of human language. This chapter sets out a rival possibil- ity, namely, that the focal type of recursion—understood here as centre embedding— has its natural home in principles of language use, not language structure.|000|recursion, pragmatics, syntax, language origin 3330|Levinson2014|It has long been noted that there are comprehension problems associated with repeated centre embeddings. Chomsky and Miller ( 1963 ) said of the sentence The rat [the cat [the dog chased ] killed] ate the malt that it is ‘... surely confusing and improbable but it is perfectly grammatical and has a clear and unambiguous mean- ing’.|6|Chomsky syntax, center embedding, definition, introduction 3331|Levinson2014|The psycholinguistic findings and the corpus findings converge: after degree 2 embedding, performance rapidly degrades to a point where degree 3 embeddings hardly occur.|7|written language, center embedding, Chomsky syntax 3332|Levinson2014|Paper on recursion potentially interesting when discussing certain general properties of language.|000|recursion, center embedding, Chomsky syntax, language origin, 3333|Moro2013|One of the major discoveries in the history of 20th century linguistics is that the linear sequence of words constituting a sentence is organized in a hierarchical and recursive fashion. Is this hierarchical structure similar to action and motor planning, as recent proposals suggest? Some crucial differences are highlighted on both theo- retical and empirical grounds that make this parallel unsuitable, with far-reaching consequences for evolu- tionary perspectives.|000|motor-action, hierarchies, syntax, Chomsky syntax, 3334|Moro2013|Paper argues against the claim that there is a crucial similarity between language and actions. This is important in the context of tool-fabrication in animals.|000|motor-action, syntax, Chomsky syntax, 3335|Moro2014|Let us now turn to actions, as in the case observed by Pulvermueller: [open the door [open the bottle close the bottle] close the door]. What we see here is analogous to the sequence of letters in the alphabet rather than the sequence of words in the sentence: the action [open the bottle close the bottle] does not have any special distribu- tion and there is no dependency between [open the door] and [close the door], which is the structure where [open the bottle close the bottle] is ‘embedded’.|221|motor-action, embedding, syntax, Chomsky syntax, 3336|Moro2014|Paper argues against the analogy between actions as being similar to syntactic structures by claiming that embedding does not occur there. It is a reply to the paper by @Pulvermuller2014.|000|motor-action, embedding, Chomsky syntax, 3337|Pulvermuller2014|This complex action sequence can be broken down into several local complementary sequences (indexed by different colors), which are hierarchically embedded into each other, and one may indeed suggest that there is no strict upper limit for the number of such iterations and embeddings. This point, about a possibly limitless human ability to embed structures into other similar structures, was first made with regard to syntax [1], despite the fact that the hierarchical-structural complexity of syntactic structures (defined in terms of levels of center embedding) hardly parallels complex human action sequences, such as the one illustrated in Figure 1 (ten levels of embedding).|219|embedding, motor-action, Chomsky syntax, syntax, 3338|Pulvermuller2014|Paper compares motor-actions, like brushing one's teeth, with syntax in language, claiming that it shows the same structure of embedding one action into another. A reply that argues against this paper is provided in @Moro2014|000|embedding, syntax, motor-action, Chomsky syntax, 3339|Boeckx2014|In a recent exchange, Moro (@2014) and Pulvermüller (@2014) re-open a long- standing debate in the language sciences. Ever since the 1975 Royaumont encounter that set the agenda for linguistics and the classical cognitive sciences, generative lin- guists, whose ultimate goal we take to be to shed light on the biological basis for language, have rejected any attempt of a rapprochement between natural language syntax and action grammar (motor plan- ning), even if the hierarchical structure of plans is well established in the litera- ture (see already Miller et al., 1960). The parallelism between syntax and action grammar has enjoyed a new lease of life recently (Jackendoff, 2007; Fujita, 2009; Pulvermüller, 2010; Stout, 2010; Arbib, 2012; Knott, 2012), with neuroscientists like Pulvermüller ready to reap the fruits, but Moro reiterates the standard genera- tive stance that the parallelism is at best a metaphor.|000|motor-action, syntax, Chomsky syntax, embedding 3340|Boeckx2014|Moro usefully lays out two classic argu- ments against a deep relation between syn- tax and action: (i) the atomic units these systems manipulate are very different, and (ii) the locality conditions or constraints characteristic of syntax are not found in the domain of action. We think Moro’s arguments fail, for reasons that are worth highlighting, for both sides of the debate: First, the very same arguments Moro puts forth could be raised to argue against a relation- ship between syntax and other human capacities such as mathematics or music, although here generative linguists have in fact been known to promote such parallelisms, suggesting that they see some underlying similarity despite the differ- ences in terms of lexical units or locality constraints. Because we think that the pursuit of these parallelisms have led to substantial progress (see Patel, 2008 in the domain of music), we don’t see why the domain of action should be treated differently. |1|embedding, syntax, motor-action, Chomsky syntax, 3341|Boeckx2014|Paper answers quickly on the discussion between @Pulvermuller2014 and @Moro2014 in favor of the point by Pulvermuller, namely that language and motor-action are indeed similar, regarding the usage of embeddings and hierarchies.|000|motor-action, embedding, syntax, Chomsky syntax 3342|Fortemps2014|The problem situations used by Cohors-Fresenborg ( 1978 ) for his Dynamical Mazes were basically the following: he described verbally a situation and asked the subject to create a fi nite automaton which would “give the solution.” This author’s word problems were similar to this one: “Create a machine which will give a stamp for four coins.” This can be viewed as follows: a list of exits was given (e.g., “more coins 1,” “more coins 2,” “more coins 3,” “your stamp is available”) and the subject had to create the fi nite automaton which would yield that list as output. The use of this device in an NVCD-type approach (Lowenthal, 1984 ; Lowenthal & Lefebvre, 2014 ) is based on the reverse problem: given a fi nite automaton, the subject has to fi nd the list of exits produced by this machine. This implies that the researcher must fi rst construct an adequate network. This task can be arduous and the result can be biased. The creation of an ad hoc network has been done previously by trial and error. It is not obvious to create in such a way a network with exactly the required complexity, a network which can serve as reference and which is not biased. The purpose of this chapter is to describe an algorithmic method enabling the researcher to create networks in order to test the infl uence of the possible “recursive” or “embedded” dimensions on the cognitive development and more specifi cally on language development. The essential aim is to have a method such that each net- work constructed in this way can be used as reference since it has been constructed according to an algorithm and is thus detached of the experimenter’s biases.|000|maze network, graph theory, pathways, recursion, Chomsky syntax, finite automaton 3343|Fortemps2014|Not yet clear how to understand this paper, but it seems to be able to generate a network of pathways out of a given number of input sequences (?). Since it is a finite-state automaton, it should not result in very complex structures. It also seems that the paper gives some hints regarding the power of finite-state automatons, etc.|000|finite automaton, regular language, regular expression, maze network, pathways, 3344|Fong2014|Paper describes computational and efficiency aspects of syntactic theories. It is not yet clear, how well it describes this, but probably a good candidate to start with when dealing with grammar and Chomsky syntax.|000|Chomsky syntax, complexity, computational approaches, 3345|Badir2017|Arbitrariness is commonly seen as a major concept in Saussure’s thought, and it even receives the status of a “principle” in his theory. It is not only the characteristic feature of the relation between the signifier and the signified (the semiological relation is arbitrary), but moreover it is constitutive of this very relation (the relation is semiological because it is arbitrary; there is a so-called “semiological relation” established between a signifier and a signified because of the principle of arbitrariness). And when linguists and other Saussurean interpreters comment on the concept of arbitrariness, they usually imply a binary relation: the semiological relation is arbitrary due to the fact that the signifier is arbitrarily chosen vis-à-vis the signified, and vice versa. I will be questioning the latter assertion in this paper. In my opinion, the symmetry of the semiological relation has not been properly demonstrated. The signifier can be seen as arbitrary with regard to the signified, but no reason has been provided to recognize the converse. Instead a number of arguments can be put forth to see arbitrariness as a concept that implies a non-symmetrical relation.|000|Ferdinand de Saussure, arbitrariness, signifier, signified, terminology 3346|Badir2017|Apart from the philologists, who aim at some final truths about their master, we should admit that what we are looking for in Saussure’s work may no longer make part of Saussure’s own theoretical interests. As a semiotician, I suggest that a non-symmetrical concept of arbitrariness can have great significance. It throws into question the isomorphism between the expression and the content we have inherited from Hjelmslev’s theory. Besides, it allows to reconsider the discrepancy between arbitrary signs, like words, and “motivated signs” as pictures are supposed to be, especially if the Saussurean notion of arbitrariness is integrated into Peirce’s triadic theory of sign. I would suggest that pictures become meaningful when we see how arbitrary they are. A small square on a screen can be the manifestation of a ball, or a patch of red in a painting by Picasso the manifestation of a grey coat. This is all a matter of negative values in a differential system. The struggle against the nomenclaturist – childish but powerful – study of signs in social life still goes on.|18|arbitrariness, form, meaning, signifier, signified, Ferdinand de Saussure, terminology 3347|Joseph2017|In his three courses of lectures on general linguistics given between 1907 and 1911, Ferdinand de Saussure (1857–1913) sometimes drew diagrams and figures on the chalkboard to add a visual dimension to the novel and challenging theoretical concepts he was laying out. Those which the editors redrew for inclusion in the Cours de linguistique générale (1916) – in some cases with significant changes – have had a surprisingly strong impact on readers ever since. If Saussure hoped that the drawings would clear up ambiguities in his verbal text, he might have been disappointed; for while they extend a hand to students and readers to guide them into his conceptual world via the stepping-stones of the semi-familiar, accessible and concrete, they have opened up whole new realms of ambiguity, and strengthened the ones already present in the verbal text. This article examines seven of the illustrations or sets of illustrations in the CLG and the various interpretations to which they have given or could give rise, treating these not as erroneous but as contrapuntal to the text when the two appear to be in contradiction.|000|visualization, Ferdinand de Saussure, history of science 3348|Joseph2017|Text is potentially interesting in so far as it discusses the visualizations given in Saussure's published book, along with the visualizations he drew on the blackboard.|000|Ferdinand de Saussure, visualization, history of science 3349|Haspelmath2017|The general distinction between morphology and syntax is widely taken for granted, but it crucially depends on a cross-linguistically valid concept of ‘(morphosyntactic) word’. I show that there are no good criteria for defining such a concept. I examine ten criteria in some detail (potential pauses, free occurrence, mobility, uninterruptibility, non-selectivity, non-coordinatability, anaphoric islandhood, nonextractability, morphophonological idiosyncrasies, and deviations from bi-uniqueness), and I show that none of them is necessary and sufficient on its own, and no combination of them gives a definition of ‘word’ that accords with linguists’ orthographic practice. ‘Word’ can be defined as a language-specific concept, but this is not relevant to the general question pursued here. ‘Word’ can be defined as a fuzzy concept, but this is theoretically meaningful only if the continuum between affixes and words, or words and phrases, shows some clustering, for which there is no systematic evidence at present. Thus, I conclude that we do not currently have a good basis for dividing the domain of morphosyntax into morphology and syntax, and that linguists should be very careful with general claims that make crucial reference to a cross-linguistic ‘word’ notion.|000|morpheme, word, syntax, word segmentation, typology, 3350|Haspelmath2017|It is now very widely recognized that many complex words are semantically compositional in exactly the same way as phrases and clauses, and that conversely many phrases are idiomatic and thus not semantically compositional. Phrases like spill the beans or fat cat must be learned and stored as wholes and are lexical entries, but not morphosyntactic words.|36|morphology, syntax, word formation, 3351|Haspelmath2017|:comment:`Discusses the free occurrence criterion of giving an one-word answer to a question to determine what is a word.` On the one hand, it is too strict, because by this definition compounds such as firewater or blood-red would not be words, but phrases, because they have constituents that are themselves free forms. On the other hand, [pb] it is much too loose, because many phrases such as a flower, to Lagos, or put it away would count as words, because the elements a, to, put, and even put it cannot occur on their own without something following them.|39f|wordhood, free occurrence, criteria, compositionality 3352|Haspelmath2017|On this view, there is thus no general word concept, and the term word potentially has as many different meanings as there are languages. The same should apply to the terms morphology and syntax, which cannot be defined in any other way than in terms of some word concept.|61|universals, wordhood, syntax, morphology, 3353|Haspelmath2017|If ‘word’ is a fuzzy concept, it might still be universal, i.e. the conclusions drawn by the authors cited earlier in (38) might be premature. However, if ‘word’ is a fuzzy concept, the consequence is that the difference between words and phrases cannot be modelled by positing two separate compo- nents of grammar, morphology and syntax. One could continue using the notions of morphology and syntax, but as fuzzy concepts.|63|wordhood, word formation, morphology, syntax, fuzzy concept, 3354|Haspelmath2017|Linguists have no good basis for identifying words across languages, and hence no good basis for a general distinction between syntax and morphology as parts of the language system.|65|syntax, morphology, wordhood, 3355|Haspelmath2017|Another way of stating the notion of lexical integrity that avoids the danger of circularity is the claim that “the principles that regulate the internal structure of words are quite different from those that govern sentence structure” (Katamba 1993: 217), or that “words are built out of dif- ferent structural elements and by different principles of composition than syntactic phrases” (Bresnan & Mchombo 1995: 181). But such claims are very weak: While it may be that words and phrases are ‘different’ in some [pb] ways, they are also very similar in many ways, and there are also striking differences between different kinds of words, as well as differences between different kinds of phrases. That the distinction between words and phrases should have a special status is not warranted by the evidence that has been presented in favour of lexical integrity.|68f|wordhood, morphology, syntax, 3356|Haspelmath2017|I propose that for the purposes of cross-linguistic comparison, we limit ourselves to more primitive concepts that are readily definable in cross-linguistic terms (i.e. to comparative concepts, cf. Haspelmath 2010), such as those in (46). [pb] a. formative: a minimal coherent set of phonological features that plays a role in the language system (= a minimal sign) b. morph: a formative that biuniquely expresses a meaning c. root: a morph with a concrete meaning d. construct: a set of formatives that together play a role in the language system e. bound construct: a construct that cannot occur on its own as a complete utterance f. free construct: a construct that may occur on its own as a complete utterance |69f|definition, formative, morph, root, construct, bound construct, free construct, terminology, 3357|Haspelmath2017|Very good and very interesting article which argues that there are no good reasons to set up a strict universal distinction between morphology and syntax in linguistics, given that wordhood is defined idiosyncratically across languages.|000|wordhood, morphology, syntax, word formation 3358|Haspelmath2017|The conclusion that we do not know what words are also means that we have no good basis for a morphology–syntax distinction. The part of (the study of) language structure that deals with sign combinations can be called morphosyntax, and for theoretical purposes this is currently best viewed as a unitary domain.|72|morphology, syntax, morphosyntax, conclusion, 3359|Yao2017a|**Objectives** The Tibetan-Yi Corridor located on the eastern edge of Tibetan Plateau is suggested to be the key region for the origin and diversification of Tibeto-Burman speaking populations and the main route of the peopling of the Plateau. However, the genetic history of the populations in the Corridor is far from clear due to limited sampling in the northern part of the Corridor. **Materials and methods** We collected blood samples from 10 Tibetan and 10 Han Chinese individuals from Gansu province and genotyped about 600,000 genome-wide single nucleotide polymorphisms (SNPs). **Results** Our data revealed that the populations in the Corridor are all admixed on a genetic cline of deriving ancestry from Tibetans on the Plateau and surrounding lowland East Asians. The Tibetan and Han Chinese groups in the north of the Plateau show significant evidence of low-level West Eurasian admixture that could be probably traced back to 600∼900 years ago. **Discussion** We conclude that there have been huge population migrations from surrounding lowland onto the Tibetan Plateau via the Tibetan-Yi Corridor since the initial formation of Tibetans probably in Neolithic Time, which leads to the current genetic structure of Tibeto-Burman speaking populations.|000|Sino-Tibetan, population genetics, Tibetan, 3360|Tschopp2015| Diplodocidae are among the best known sauropod dinosaurs. Several species were described in the late 1800s or early 1900s from the Morrison Formation of North America. Since then, numerous additional specimens were recovered in the USA, Tanzania, Portugal, and Argentina, as well as possibly Spain, England, Georgia, Zimbabwe, and Asia. To date, the clade includes about 12 to 15 nominal species, some of them with questionable taxonomic status (e.g., ‘Diplodocus’ hayi or Dyslocosaurus polyonychius), and ranging in age from Late Jurassic to Early Cretaceous. However, intrageneric relationships of the iconic, multi-species genera Apatosaurus and Diplodocus are still poorly known. The way to resolve this issue is a specimen-based phylogenetic analysis, which has been previously implemented for Apatosaurus, but is here performed for the first time for the entire clade of Diplodocidae. The analysis includes 81 operational taxonomic units, 49 of which belong to Diplodocidae. The set of OTUs includes all name-bearing type specimens previously proposed to belong to Diplodocidae, alongside a set of relatively complete referred specimens, which increase the amount of anatomically overlapping material. Non-diplodocid outgroups were selected to test the affinities of potential diplodocid specimens that have subsequently been suggested to belong outside the clade. The specimens were scored for 477 morphological characters, representing one of the most extensive phylogenetic analyses of sauropod dinosaurs. Character states were figured and tables given in the case of numerical characters. The resulting cladogram recovers the classical arrangement of diplodocid relationships. Two numerical approaches were used to increase reproducibility in our taxonomic delimitation of species and genera. This resulted in the proposal that some species previously included in well-known genera like Apatosaurus and Diplodocus are generically distinct. Of particular note is that the famous genus Brontosaurus is considered valid by our quantitative approach. Furthermore, “Diplodocus” hayi represents a unique genus, which will herein be called Galeamopus gen. nov. On the other hand, these numerical approaches imply synonymization of “Dinheirosaurus” from the Late Jurassic of Portugal with the Morrison Formation genus Supersaurus. Our use of a specimen-, rather than species-based approach increases knowledge of intraspecific and intrageneric variation in diplodocids, and the study demonstrates how specimen-based phylogenetic analysis is a valuable tool in sauropod taxonomy, and potentially in paleontology and taxonomy as a whole. |000|dinosaurs, cladistics, phylogenetic reconstruction, morphological characters 3361|Tschopp2015|Paper has a very good review in the phylogenetic networks blog: http://phylonetworks.blogspot.fi/2017/08/more-non-treelike-data-forced-into.html The gist of this review is that the authors force non-tree-like data into a tree. The bad resolution of the parsimony analysis results mainly from missing data, it seems.|000|missing data, cladistics, dinosaurs, morphological characters 3362|Lewis2016|Are the forms of words systematically related to their meaning? The arbitrariness of the sign has long been a foundational part of our understanding of human language. Theories of communication predict a relationship between length and meaning, however: Longer descriptions should be more conceptually complex. Here we show that both the lexicons of human languages and individual speakers encode the relationship between linguistic and conceptual complexity. Experimentally, participants mapped longer words to more complex objects in comprehension and production tasks and across a range of stimuli. Explicit judgments of conceptual complexity were also highly correlated with implicit measures of study time in a memory task, suggesting that complexity is directly related to basic cognitive processes. Observationally, judgments of conceptual complexity for a sample of real words correlate highly with their length across 80 languages, even controlling for frequency, familiarity, imageability, and concreteness. While word lengths are systematically related to usage-both frequency and contextual predictability-our results reveal a systematic relationship with meaning as well. They point to a general regularity in the design of lexicons and suggest that pragmatic pressures may influence the structure of the lexicon.|000|speech norms, complexity, psycholinguistics, word length, 3363|Lewis2016|Norm data set on user-provided complexity ratings, provides 500 words of the English language, without semantic specification.|000|concept list, speech norms, word length, complexity 3364|Evans1992|The study of semantic change in Australian Aboriginal languages is hindered by two major factors not encountered in Indo-European linguistics: the lack of written records of any historical depth, and the extremely different nature of the Aboriginal conceptual and ideological system with respect to more familiar European systems. Confronted by putative cognates such as T iwi 2 taka 'tree' and Lardillaka 'custom', or Kayardild kathirr 'digging stick' and Dyirbal bala gajin 'girl', how does one support or falsify claims of semantic shift?|475|cross-semantic cognates, Australian languages, semantic shift 3365|Evans1992|In this paper I discuss an approach, being developed by myself and David Wilkins, that is flexible enough to handle semantically bizarre changes, while constrained enough to rule out ad hoc explanations. In seeking to uncover the synchronic linguistic manifestations of culturally unfamiliar conceptualizations, we draw heavily on the study of synchronic polysemy: cultural "explanations" are only appropriate, or necessary, when the proposed semantic change has been demonstrated as plausible from purely linguistic evidence. Betwen O'Grady's "traditional Australian Weltanschauungen" and his "implausible semantic changes" we interpose the constraining step of demonstrating traditional Australian polysemy.|476|Australian languages, semantic change, plausibility, 3366|Evans1992|Our basic premise, then, is that putative semantic change *explicatum* and attested synchronic polysemy are *explicans*. Given a hypotheslzed semantic change from A to Z, Our problem is to find a chain of attested synchronic polysemies A-B, B-C . . . Y-Z that connect A to Z. Since these chainsns may involve a large number of links, it is not usually possible to find them all in the language under study, so evidence from other related languages or semiotic systems must be brought in.|476|semantic change, plausibility, methodology, 3367|Evans1992|In addition to the parallels *between* different languages, there exist a number of special secondary semiotic systems which reveal parallels *within* the same language: the "avoidance" or "mother-in-law" registers used between taboo relatives, "initiation languages" learn on admission to ritual manhood, signed languages employed in various social settings, and systems of iconography. [...] I shall refer to the lavish polysemy found in such systems as *hyperpolysemy*. |476|plausibility, semantic change, language-internal comparison, 3368|Evans1992|Derivation may show recurrent semantic patterns across languages in the same way as regular polysemy. :comment:`[gives example in Australian languages]`|477|semantic change, word derivation, 3369|Evans1992|Evidence from sign language bears on postulated semantic changes in two, ways. Firstly, there are cases in which the vocabulary of sign language is less elaborated than that of speech, with a concomitant increase in polysemy or abstractness. Secondly, the form of the sign itself is often significant.|485|sign language, semantic change, semantic change patterns 3370|Evans1992|We cannot simply assume that the patterns of polysemy found in alternative semiotic systems will be parralel.|488|polysemy, semantic change, cross-linguistic study 3371|Evans1992|In this section I briefly sketch the type of data-base system (using Hypercard) that we are using to study semantic change in Australian languages. It is basically an elaboration of the method employed by Matisoff (@1978) to represent networks of semantic connections in Tibeto-Burman languages: the 'points' represent meanings and the 'links' or paths between them represent attested cases of polysemy, formal connection etc. between two meanings. :comment:`[Shows a nice graphic on the next page]`|489|semantic change, pathways, semantic change patterns, network 3372|Evans1992|A methodological limitation of this network approach stems from the questionable theoretical assumption it makes that meanings can be represented as unanalyzable points, and that a given point represents precisely the same meanlng across different languages.|492|polysemy network, network, problem, 3373|Evans1992|A third limitation of this approach is that it fails to show explicitly certain parallels in polysemy. For example, the polysemies 'tooth - dog', 'tooth - snake' and 'tooth - rat' are all instantiations of a more general synecdochic link between 'tooth' and 'animal saliently possessing a tooth'. It may be possible to represent this by 'bundling' groups of links, or it may turn out to be better to leave this for a later phase of analysis.|492|semantic change patterns, polysemy network, problem, 3374|Evans1992|Very interesting paper that outlines an early methodology, based on @Matisoff<1980> to handle semantic change consistently. It inspires in multiple ways, first to seek for ways to make cross-linguistic pathways of semantic change comparable across language families, second when it comes to improving graph representations of colexifications.|000|Australian languages, semantic change patterns, semantic change, polysemy network, colexification, methodology, 3375|Trips2017|The word shrimpburger is a primary example of reanalysis within the word. Reanalysis is a development that alters the structure of a form because this structure is or becomes ambiguous. In our case, the basis of the reanalysis was the word hamburger, which originally denotes a person from the city of Hamburg. But at the beginning of the 20th century in the United States a type of food called Hamburger steak became popular, which was often shortened to hamburger. According to the Oxford English Dictionary (OED), burger is a terminal element denoting a roll or a sandwich that contains the foodstuff specified in the first element. So evidently at some point in time (or more precisely in the 1940s in the United States as stated by the OED), speakers reanalyzed the structure of [hamburger] as the new structure [ham][burger], which then made way for many new formations, for example, krautburger, kosherburger, soyburger, veggieburger, beefburger, (bacon-)cheeseburger (all found in the Corpus of Contemporary American English, COCA), and shrimpburger from the example above. What the reanalysis of hamburger shows is that this change on the level of word structure was triggered by a new semantic analysis of innovative speakers, so morphology and semantics interacted and brought about the change.|12|reanalysis, morphology, hamburger, 3376|Trips2017|Introductory article on morphology has a nice section on reanalysis of non-compound words on compounds, including *hamburger* and *watergate*.|000|morphology, morphological change, reanalysis, introduction, overview 3377|Trips2017|This article provides an overview of morphological change and takes, as will become evident shortly, a theoretical stance. Under the assumption that morphology is the study of the structure of words, it seems quite clear what kind of changes we are dealing with here.|000|morphological change, introduction, overview 3378|Haspelmath2017a|This paper reasserts the fundamental conceptual distinction between language- particular categories of individual languages, defined within particular systems, and comparative concepts at the cross-linguistic level, defined in substantive terms. The paper argues that comparative concepts are also widely used in other sciences, and that they are always distinct from social categories, of which linguistic categories are special instances. Some linguists (especially in the generative tradition) assume that linguistic categories are natural kinds (like biological species, or chemical elements) and thus need not be defined, but can be recognized by their symptoms, which may be different in different languages. I also note that category-like comparative concepts are sometimes very similar to categories, and that different languages may sometimes be described in a unitary commensurable mode, thus blurring (but not questioning) the distinction. Finally, I note that cross- linguistic claims must be interpreted as being about the facts of languages, not about the incommensurable systems of languages.|000|comparative concept, descriptive linguistic categories, cross-linguistic study, language typology, discussion 3379|Haspelmath2017a|Important article (draft) discussing the distinction between comparative categories and descriptive categories. |000|comparative concept, terminology, cross-linguistic study, language typology, universals, 3380|Bastkowski2017|Split-networks are a generalization of phylogenetic trees that have proven to be a powerful tool in phylogenetics. Various ways have been developed for computing such networks, including NeighborNet, QNet and FlatNJ. Many of these approaches are implemented in the user-friendly SplitsTree software package. However, to give the user the option to adjust and extend these approaches and to facilitate their integration into analysis pipelines, there is a need for robust, open-source implementations of associated data structures and algorithms. Here we present SPECTRE, a readily available, open-source library of data structures written in Java, that comes complete with new implementations of several pre-published algorithms and an interactive graphical interface for visualizing planar split networks. SPECTRE also supports the use of longer running algorithms by providing command line interfaces, which can be executed on servers or in High Performance Computing (HPC) environments.|000|reticulate evolution, software, tools, split network, phylogenetic network 3382|Hoellmann2015|:comment:`[Quote from the Yanshi Jiaxun, chapter 5, translated by the author]` .. pull-quote:: Falls sich auf [einem Blatt] Papier Ausführungen aus den fünf konfuzianischen Klassikern oder die Namen weiser Männer finden, verwende ich es nicht beim Toilettengang. (*Yanshi jiaxun*, um 590, Kap. 5) |52|paper, nice quote, toilet, defecation, history, 3383|Muysken2016|This note argues that the link between Language Typology and Language Contact Studies should become much tighter, since the nature of typology has changed in the last decade due to the growing realization that patterns of language may be reflections of long-term genealogical inheritance as well as of contact-induced language change, in addition to functional and cognitive pressures.|000|language contact, language typology, inheritance, genetic inheritance, cross-linguistic study 3384|Varasdi2017|The semantics of progressive sentences presents a challenge to linguists and philosophers alike. According to a widely accepted view, the truth-conditions of progressive sentences rely essentially on a notion of inertia. Dowty (Word meaning and Montague grammar: the semantics of verbs and times in generative grammar and in Montague’s PTQ, D. Reidel Publishing Company, Dordrecht, 1979) suggested inertia worlds to implement this “inertia idea” in a formal semantic theory of the progressive. The main thesis of the paper is that the notion of inertia went through a subtle, but crucial change when worlds were replaced by events in Landman (Nat Lang Semant 1:1–32, 1992) and Portner (Language 74(4):760–787, 1998), and that this new, event-related concept of inertia results in a possibility-based theory of the progressive. An important case in point in the paper is a proof that, despite its surface structure, the theory presented in Portner (1998) does not implement the notion of inertia in Dowty (1979); rather, it belongs together with Dowty’s earlier, 1977 theory according to which the progressive is a possibility operator.|000|inertia, semantics, logic, truth value 3385|Varasdi2017|Natural languages often have grammatical constructions that express the fact that an activity or event is in progress. In English, for example, the progressive aspect is encoded through the use of the auxiliary verb be together with the -ing form of the main verb [...].|303f|progressive, English, intertia, semantics, 3386|Varasdi2017|As we will see below, according to a widely accepted view, the truth-conditions of progressive sentences rely essentially on a notion of “inertia.” Dowty (1979) proposed inertia worlds to implement this “inertia idea” in a formal semantic theory of the progressive.|304|progressive, intertia, truth value, semantics, logic 3387|Varasdi2017|The formal semantic theories of the progressive aspect aim at modeling the truth- conditions of progressive sentences. While natural language sentences rarely have clear-cut truth-conditions, progressive sentences are infamous to have remarkably elastic truth-conditions over and above the usual vagueness pervading language. 1 The reason for this appears to be connected with the fact that progressive sentences classify a spatiotemporal segment of the world in terms of the complete events into which that segment might develop under certain conditions. This gives rise to a kind of double uncertainty: uncertainty with respect to what to include in the spatiotemporal segment being described, and uncertainty with respect to the event properties used in classifying it. Since these two parameters are clearly not independent of each other, identifying the truth-conditions of a progressive sentence is often very difficult.|305|English, progressive, semantics, logic, truth value 3388|Ponterotto2017|This study reports a methodological itinerary aimed at developing a statistically supported investigative procedure useful for the empirical verification of hypotheses in Cognitive Linguistics research. It targets motion–emotion construals and explores the possible conceptual link between upward-oriented movements encoded in some motion verbs and the emotional state of happiness. The results emerging from the observation of two typologically different languages (English and Italian) lend empirically verified evidence for basic hypotheses in cognition and language research regarding the conceptualization of emotions and also for findings in cross-linguistic research on emotion representation.|000|word emotions, Italian, English, cross-linguistic study, emotion concepts, 3389|Kenner2017|The National Institutes of Health has deemed illiteracy a national health crisis based on reading proficiency rates among American children. In 2002, the National Early Literacy Panel identified six pre-reading skills that are most crucial precursors to reading mastery and predict future reading outcomes. Of those skills, phonological awareness, and in particular phonemic awareness, is the strongest independent predictor of early reading outcomes. However, limited research has addressed the development of these component skills due in part to the fact that many of the measures used to assess sub-skills such as phonemic awareness are oral production measures that cannot easily be administered with children under the age of five, and are not designed to detect implicit or emerging knowledge. To address this limitation, we developed and administered two receptive measures of phonemic awareness to 2.5- and 3.5-year-old children. We found evidence for the emergence of this component skill earlier in ontogeny than is currently acknowledged in the literature. Overall, children performed at above chance rates on measures of receptive phonemic awareness at the level of the individual phoneme as early as 2.5- years-old. Results are discussed in terms of the need for a paradigm shift in pre- vailing models of how phonological awareness develops, as well as the potential to identify children at-risk for reading failure at an earlier point in ontogeny than is currently feasible.|000|phonemic awareness, language acquisition, child language, 3390|Abdaoui2017|Sentiment analysis allows the semantic evaluation of pieces of text according to the expressed sentiments and opinions. While considerable attention has been given to the polarity (positive, negative) of English words, only few studies were interested in the conveyed emotions (joy, anger, surprise, sadness, etc.) especially in other languages. In this paper, we present the elaboration and the evaluation of a new French lexicon considering both polarity and emotion. The elaboration method is based on the semi-automatic translation and expansion to synonyms of the English NRC Word Emotion Association Lexicon (NRC-Emo- Lex). First, online translators have been automatically queried in order to create a first version of our new French Expanded Emotion Lexicon (FEEL). Then, a human professional translator manually validated the automatically obtained entries and the associated emotions. She agreed with more than 94 % of the pre-validated entries (those found by a majority of translators) and less than 18 % of the remaining entries (those found by very few translators). This result highlights that online tools can be used to get high quality resources with low cost. Annotating a subset of terms by three different annotators shows that the associated sentiments and emotions are consistent. Finally, extensive experiments have been conducted to compare the final version of FEEL with other existing French lexicons. Various French benchmarks for polarity and emotion classifications have been used in these evaluations. Experiments have shown that FEEL obtains competitive results for polarity, and significantly better results for basic emotions.|000|word emotions, emotion concepts, corpus, concept list, dataset, 3391|Abdaoui2017|Dataset is online available at: * http://www.lirmm.fr/~abdaoui/FEEL|000|word emotions, emotion concepts, French, dataset, 3392|Regier2015|Why do languages have the semantic categories they do? Each language partitions human experience into a system of semantic categories, labeled by words or mor- phemes, which are used to communicate about experience. These categories often differ widely across languages. Thus, languages do not merely provide different labels for the same universally shared set of categories – instead, both the labels and the categories themselves may be to some extent language-specific. However, this cross-language variation is constrained. Words with similar or identical meanings often appear in unrelated languages, and most logically possible meanings are unattested – suggesting that there are universal forces constraining the cross-language diversity. Accounting for this pattern of wide but constrained variation is a central theoretical challenge in understanding why languages have the particular forms they do.|000|expressivity, expression, colexification, semantic space, semantics, 3393|Wilson2004|A common problem for linguists, philosophers and psychologists is that linguistically specified ('literal') word meanings are often modified in use. The literal meaning may be narrowed (e.g. drink used to mean 'alcoholic drink'), approximated (e.g. square used to mean 'squarish') or undergo metaphorical extension (e.g. rose or diamond applied to a person). Typically, narrowing, approximation and metaphorical extension are seen as distinct pragmatic processes with no common explanation: thus, narrowing has been analysed in terms of default rules (e.g. Levinson 2000), approximation has been treated as a type of pragmatic vagueness (e.g. Lewis 1983; Lasersohn 1999) and metaphorical extension is standardly seen as a case of maxim violation with resulting implicature (Grice 1989). I will propose a unified account on which narrowing, approximation and metaphorical extension are special cases of a more general pragmatic adjustment process which applies spontaneously, automatically and unconsciously to fine-tune the interpretation of virtually every word, and show how the notion of an ad hoc concept introduced by Barsalou (1987, 1992) might interact with a pragmatic comprehension procedure developed in relevance theory (Sperber & Wilson 1998; Carston 2002; Wilson & Sperber 2002) to account for a wide range of examples.|000|semantics, pragmatics, communication, language-use 3394|Wilson2003|The goal of lexical pragmatics is to explain how linguistically specified (‘literal’) word meanings are modified in use. While lexical-pragmatic processes such as narrowing, broadening and metaphorical extension are generally studied in isolation from each other, relevance theorists (Carston 2002, Wilson & Sperber 2002) have been arguing for a unified approach. I will continue this work by underlining some of the problems with more standard treatments, and show how a variety of lexical- pragmatic processes may be analysed as special cases of a general pragmatic adjustment process which applies spontaneously, automatically and unconsciously to fine-tune the interpretation of virtually every word.|000|language use, pragmatics, lexicon, lexical pragmatics 3395|Wilson2003|This is apparently the same as @Wilson2004, but the abstracts differ drastically.|000|lexical pragmatics, pragmatics, semantics, 3396|Clarke1991|With the passage of time, words change their meanings. What is more, and different, meanings change their words. How this comes about, however, is still quite unclear. In the short term it has been suggested that speakers use metaphor and metonymy when they want to create novel expressions that will be readily understood, and that these devices continually renew the diversity and the vitality of language. Nerlich and Clarke (1988). and Nerlich (1989), put forward a theory of semantic change focusing on such strategies for meaning-making, and on the linguistic, semantic and meta-semantic knowledge-bases of speakers and hearers on which the strategies operate.|000|simulation studies, lexical change, semantic change, 3397|Clarke1991|By coincidence, while this result was emerging from the computer, a book on polylectal grammar caught our attention, in the last part of which Bcrrendonner put forward a hypothesis for the pattern of ‘lects’ competing for survival, involving a succession of frequency waves very like the results WC‘ were getting. It seemed that hc had found a general pattern, for which our simulation could provide the process, at least as it occurs in semantic change. Our result fitted nicely with Berrcndonncr’s view that ‘Ie changement diachronique est une oscillation regulee de la variation’ (Berrendonner ef al., 1983, p. 63).|232|lexical replacement, word frequency, lexical change, semantic change 3398|Clarke1991|The first substantive question was: why should changes in usage occur? The answer seems to involve the ‘mental lexicon’, or people’s knowledge of words. It requires no psycholinguistic sophistication to suppose that speakers will tend to use most often the word they can call to mind most easily to convey a given meaning. This means that differences between words in their accessibility, or ease of memory retrieval, will determine their frequency of use,|228|correlational studies, word frequency, accessibility, lexical replacement, semantic change 3399|Clarke1991|Interesting paper which derives some general conclusions about word frequency and word accessibility as being constitutive for how words are used and how easily they change.|000|word frequency, word accessibility, accessibility, usage, lexical change, semantic change 3400|Georg2017|I cannot resist mentioning @Pereltsvaig<2015>/Lewis 2015 here, who show that, in some circles, attempts are being made these days to switch out the light which has illuminated Indo- European linguistics for generations (by switching on some computers) and to reduce this discipline to the pre-modern guesswork stage in which large areas of the Altaic de- bate still are in—the belief that all that processing power can replace the available knowl- edge about these languages (and spare the toil to familiarize oneself with them) and will produce ‘results’ which are worth the paper they are printed on, is impossible to compre- hend. But it helps to get into the newspapers, of course.|372-footnote|computational approaches, historical linguistics, methodology, critics 3401|Georg2017|Paradigmatic morphology is a central and crucial concept for several branches of comparative linguistics. The observation of shared paradigms in languages which were not suspected of having a common ancestry stands at the cradle of modern ge- nealogical linguistics and dominates the discussion(s) about not firmly established or merely putative language families or phyla to this day, the very different morphological techniques different languages use for the formation of paradigms mark the beginning of language typology, now a major pillar of the language sciences, and the question, to which degree languages—closely, distantly, or not at all related with each other— may borrow morphological paradigms (part or whole) from each other or might have done so in the past (which, if true and not properly detected, might lead to super- ficially ­persuasive, but factually erroneous, claims and hypotheses of genealogical relatedness) continues to be an important theoretical and practical issue in compara- tive linguistics.|000|Trans-Eurasian, morphology, word paradigm morphology, historical linguistics, comparative method, methodology, review, 3402|Georg2017|Review of the collection by @Robbeets2014 on morphology in Altaic languages.|000|morphology, Altaic, Trans-Eurasian, methodology, word paradigm morphology 3403|Leavens2017|In his classic analysis, Gould (The mismeasure of man, WW Norton, New York, 1981) demolished the idea that intelligence was an inherent, genetic trait of different human groups by emphasizing, among other things, (a) its sensitivity to environmental input, (b) the incommensurate pre-test preparation of different human groups, and (c) the inadequacy of the testing contexts, in many cases. According to Gould, the root cause of these oversights was confirmation bias by psychometricians, an unwarranted commitment to the idea that intelligence was a fixed, immutable quality of people. By virtue of a similar, systemic interpretive bias, in the last two decades, numerous contemporary researchers in comparative psychology have claimed human superiority over apes in social intelligence, based on two-group comparisons between postindustrial, Western Europeans and captive apes, where the apes have been isolated from European styles of social interaction, and tested with radically different procedures. Moreover, direct comparisons of humans with apes suffer from pervasive lapses in argumentation: Research designs in wide contemporary use are inherently mute about the underlying psychological causes of overt behavior. Here we analyze these problems and offer a more fruitful approach to the comparative study of social intelligence, which focuses on specific individual learning histories in specific ecological circumstances.|000|ape cognition, animal cognition, evolution of complexity, animal evolution, 3404|Leavens2017|The gist of this paper is that humans insufficiently study cognition of apes and therefore may well come to the wrong conclusion that they are much less intelligent than us. This seems to have a discrimination bias that puts humans above the other animals.|000|ape cognition, animal evolution, animal cognition, critics, 3405|Hamans2017|Morphological change is not a result of mechanical, predictable processes, but of the be- havior of language users. Speakers reinterpret opaque data in order to assign a more transparent structure to them. Subsequently successful reinterpretation may form the ba- sis of new derivations. The moment such a derivative word formation process becomes productive a language change has taken place. In addition, this paper shows how language change obscures the distinction between separate morphological processes such as compounding and derivation and thus between morphological categories. Moreover, the data under discussion show that there is not a preferred natural direction of language change. Most of the examples are taken from English and Dutch, but also a few French, Frisian, German and Afrikaans data are dis- cussed.|000|productivity, morphological change, word derivation, reinterpretation, 3406|Hamans2017|This paper deals primarily with language change. It discusses how reinterpreta- tion of opaque data may cause morphological change. In addition, it also demonstrates how language change blurs the distinction between separate mor- phological processes such as compounding and derivation and therefore also obscures the distinction between morphological categories. Finally, it shows that the arguments for a natural or preferred direction in language change are ex- tremely weak.|1|morphological change, directionality, methodology, 3407|Hamans2017|This paper seems worthwhile to be read more thoroughly, as it has the strong claim that morphological processes are not usually based on preference directions (unlike we would expect for sound change). Interestingly, the processes described are similar to fusion and fission accounts in biology. |000|directionality, morphological change, reanalysis, fusion, fission, 3408|Chen2017|Paper introduces personalities like Wáng Lì and Chao Yuenren and the role they played in the history of linguistics in China.|000|biography, Wáng Lì, Chao Yuenren, introduction, history of science, linguistics, 3409|Mailhammer2010|Article treats the early investigation of etymology of Germanic languages in the begin of the 21st century.|000|history of science, etymology, introduction, Germanic, 3410|Moore2017|Background Phylogenetic trees are an important analytical tool for examining species and community diversity, and the evolutionary history of species. In the case of microorganisms, decreasing sequencing costs have enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of large sequence datasets present challenges to extracting meaningful trends from complex trees. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree, including label color, branch color, and other features. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience. To address these limitations, we developed Iroki, a program for fast, automatic customization of phylogenetic trees. Iroki allows the user to incorporate information on a broad range of metadata for each experimental unit represented in the tree. Results Iroki was applied to four existing microbial sequence datasets to demonstrate its utility in data exploration and presentation. Specifically, we used Iroki to highlight connections between viral phylogeny and host taxonomy, explore the abundance of microbial groups associated with Shiga toxin-producing Escherichia coli (STEC) in cattle, examine short-term temporal dynamics of virioplankton communities, and to search for trends in the biogeography of Zetaproteobacteria. Conclusions Iroki is an easy-to-use application having both command line and web-browser implementations for fast, automatic customization of phylogenetic trees based on user-provided categorical or continuous metadata. Iroki enables hypothesis testing through improved visualization of phylogenetic trees, streamlining the process of biological sequence data exploration and presentation. Availability Iroki can be accessed through a web browser application or via installation through RubyGems, from source, or through the Iroki Docker image. All source code and documentation is available under the GPLv3 license at https://github.com/mooreryan/iroki. The Iroki web-app is accessible at www.iroki.net or through the VIROME portal (http://virome.dbi.udel.edu), and its source code is released under GPLv3 license at https://github.com/mooreryan/iroki_web. The Docker image can be found here: https://hub.docker.com/r/mooreryan/iroki.|000|software, family tree, visualization, 3411|Baker2017|Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx’a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.|000|population genetics, co-evolution, archaeogenetics, African languages 3412|Baker2017|We were able to annotate 249 samples with language (Table S1). Our data set covers an estimated 21.3% of the 141 primary language families but 97.8% of people40. By focusing on ancestries rather than samples, confounding due to recent admixture is removed. We therefore evaluated correlations among ancestries and languages (Table S6). Southern African ancestry correlates with Kwadi-Khoe, Kx’a, and Tuu languages (r = 0.960, p = 4.78 × 10−138, Fig. 3A). Central African ancestry corresponds to Pygmies, both Eastern and Western (Table S3). Pygmies are thought to have lost their original language and now speak Niger-Congo or Nilo-Saharan languages, presumably adopted from neighboring tribes41. Consequently, Central African ancestry does not meaningfully correlate with extant language families.|3|Africa, African languages, human prehistory, population genetics, 3413|Li2014|Over the past 100 years, the paradigm under which Chinese diglossia operates has undergone significant change, morphing from a system using literary Chinese for writing and regional vernaculars for speech to a setup in which Mandarin and its new offshoots replace both the literary language and spoken dialects in both written and oral modes. This paper traces the transition from writing in the literary language to the use of Mandarin for all manner of communication and shows how higher literacy and education in Mandarin and English are sounding the death knell for the regional dialects. Many of these dialects are going from mainstream to obsolete in the course of a generation, especially in the younger segment of the population in urban centers traditionally regarded as bastions of regional speech, sparking backlash (e.g., pro-Cantonese demonstrations in Guang- zhou) and attempts to revive dying vernaculars (e.g., Taiwan's indigenous language education movement). By examining the balance of power between Mandarin and dialect in Guangzhou, Shanghai, Taiwan, Singapore, and Malaysia, it will be shown that unless there is a counterbalance of prestige or economic utility, attempts to reverse the proliferation of Mandarin will prove futile, although the speech varieties being replaced will ultimately resurface in the phonology, lexicon, and syntax of the standard language, giving rise to new regional varieties of Mandarin |000|dialect death, language death, Chinese dialects, diglossia, 3414|Ciuffo2017|The silent reading fluency is not an observable behaviour and, therefore, its evaluation is perceived as more challenging and less reliable than oral reading fluency. The present research is aimed to measure the silent reading speed in a sample of proficient students, assessed by an original silent reading fluency task, based on behavioural indicators of the silent reading speed. A total of 325 high school and university skilled students (age range 14�23 years) have been assessed using 3 tasks aimed to evaluate the oral reading speed (lists of words, lists of pseudowords and narrative text) and one task aimed to measure the silent reading speed. The average silent reading speed in our sample was around 12.5 syll/sec, almost double than the oral reading speed rate. The silent reading speed had an increase from 9.13 to 12.38 syll/sec from the first year of high school (ninth grade) to the fifth year of University. Conversely, the oral reading speed remained sub- stantially unchanged for the entire academic course. Our results showed that the reading fluency in silent mode tends to increase up to the last years of University and it may be considered the most rapid and efficient reading mode. This study highlights the importance of including both silent and oral reading modes in the assessment of the older students and young adults, since silent reading is the main reading mode for proficient readers. |000|silent reading, oral reading, reading, fluency, psycholinguistics 3415|Ciuffo2017|Silent reading is commonly used in everyday activities and it is the more dominant and faster style of reading for older students and proficient readers. Mature students primarily read silently rather than orally and proficient readers typically read faster when reading silently. |1668|silent reading, psycholinguistics, definition, introduction 3416|Guo2004|Status planning essentially represents both social concerns and social implementation of language planning (Kaplan & Baldauf, 1997, p. 37). Whatever social concerns there are, status planning always selects among competing languages/dialects a language/dialect for a higher status and designates it for dominant and broader functions (Cooper, 1989, pp. 99�121; Kaplan & Baldauf, 1997, p. 31; Spolsky & Shohamy, 1999, p. 39). Depending on the motivating social concerns, status planning may or may not make any overt policy statement about the fate or status of the competing languages/dialects when the selection is made. For example, while selecting Hebrew as the official language, status planning in Israel treats Arabic as a statutory language, recognizing the linguistic and ethnic distinctiveness of the Arabic (see Cooper, 1989, pp. 100�101; Spolsky & Shohamy, 1999). On the other hand, designating English as the only official language, California's 1986 and 1998 propositions denied the legitimacy of the use of non- English languages in governmental and educational domains, attempting to force minority groups to assimilate into the English-speaking majority (cf. Crawford, 2000, pp. 104-127; Cooper, 1989, pp. 101-102). Thus, it is enlightening for students of language policy to study the relationship between the selected language/dialect and the once-competing ones. |000|Chinese dialects, pǔtōnghuà, diasystem, overview, roof language, 3417|He2007|Tone analysis is the first step in investigations of Chinese dialects. Traditionally, this is done by well-trained linguists. In this paper, a SOM-based automatic tone analysis approach is proposed. SOM is facilitated to map the F0 contour to 2-dimensional space, and then the data are clustered to indicate tones of the dialect. We also introduce a phonology factor to the DB-index, which is used to validate the clustering, in order to incorporate phonology knowledge. Our version of DB-Index assumes that characters that have similar tones in the past tend to have similar tones today. Experiments show that the approach attains satisfactory results. |000|tone language, speech acoustics, tone analysis, Chinese 3418|Jiang2012|The field-work of Chinese dialects, also called field survey, is a direct research method. The field sampling of computer-aiding Chinese dialects is a platform of field work, where we apply the computer and computer technology in aiding dialect investigation and research system of CADI (Computer-Aiding Dialects Investigation). Chinese dialects field-work can be traced back to about Zhou Dynasty and Qin Dynasty (about B.C.1100- B.C.206), when there had already formed the tradition of observing and collecting local dialects and sayings everywhere. This can be regarded as the beginning of traditional Chinese dialects field-work. Modern Chinese dialects field-work was originated in the early 20th century and further developed by the founder of Chinese Dialectology--Yuenren Chao. The requirement of the investigators (field-workers) is that they must investigate and record the dialects of the informants face to face, strictly according to the facts. This method is generally classified into three steps: Step one: Preparation stage. Determine the specific purpose and task of the dialect investigation, formulate the outline, make survey forms, and prepare the necessary survey tools, etc. |000|field work, computer-assisted analysis, computer-aided approaches, Chinese dialects, introduction 3419|Kays2011|With the human population in many areas of the world already above the carrying capacity of the land, food availability and advances in productivity are becoming increasingly critical. It is predicted that the world population will reach 9 billion people by 2050, which represents a dramatic increase in the number of people to feed, with increasingly scarce crop land, water, fertilizer, and fuel. Vegetables make up a major portion of the diet of humans, providing not only calories but essential vitamins, fiber, and minerals and will play an increasingly important role in food availability. Of the over 400 vegetable crops that are grown commercially, the genetic diversity in critical attributes is exceptional. The edible portion of vegetables is comprised of an array of morphological parts [e.g. leaves, stems, lateral buds, young plants, shoots, petioles, flowers, flower buds, fruit (immature to mature), seeds, seedlings, roots, tubers, corms, bulbs, rhizomes]. Vegetables vary widely in the length of time required from planting to harvest, postharvest longevity, and day length, temperature, water, fertility, and pesticide requirements. They provide a diverse range of tastes, aromas, textures, colors, and nutritional attributes, greatly increasing the variety in the foods we eat and satisfying a myriad of personal preferences. The length of time required to produce a crop (i.e. planting to harvest) varies from days to several years and the length of optimum harvest maturity from hours (gerkins � Cucumis sativus L.) to years (giant taro � Alocasia macrorrhiza (L.) Schott). The fact that short production cycle vegetables allow multiple cropping and a significant volume of the vegetables grown worldwide are produced on small plots, militates against accurate production statistics, preventing a clear understanding and appreciation of the value of these crops to the world food supply. |000|vegetables, dataset, lexical database, concept list, 3420|Kays2011|Book offers words for a large number of different vegetables as they are named across the world. This is a very interesting resource that could surely be used for some kind of quantitative studies, given the large number of terms and languages.|000|vegetables, dataset, concept list, lexical database 3421|Kirby2007|Human language arises from biological evolution, individual learn- ing, and cultural transmission, but the interaction of these three processes has not been widely studied. We set out a formal framework for analyzing cultural transmission, which allows us to investigate how innate learning biases are related to universal properties of language. We show that cultural transmission can magnify weak biases into strong linguistic universals, undermining one of the arguments for strong innate constraints on language learning. As a consequence, the strength of innate biases can be shielded from natural selection, allowing these genes to drift. Furthermore, even when there is no natural selection, cultural transmission can produce apparent adaptations. Cultural transmis- sion thus provides an alternative to traditional nativist and adap- tationist explanations for the properties of human languages. |000|innateness, culture, cultural evolution, language evolution, language origin 3422|Liang2015|The perception and definition of a linguistic variety as a "language" or a "dialect" can be an ideological issue at the societal level and an attitudinal decision for each individual. If there is any consensus among linguists on the distinction between "language" and "dialect", it would be that a clear-cut distinction is largely unattain- able. In a descriptive, synchronic sense, a language may refer to a group of related linguistic norms or a single norm, and a dialect is presumably one of the norms (Haugen 1966). Hence every dialect is a language, but not every language is a dia- lect. Language as a generalised notion means that every speaker of a language is a speaker of at least one dialect (Chambers and Trudgill 2004)--standard English is, for example "just as much a dialect as any other form of English" (p. 3). The gen- eralised and specific dual sense of the term "language" in its synchronic dimension adds to the muddle of the issue, and thus scholars have proposed the term "variety" to be used instead as a descriptive label for a single linguistic entity in a "neutral" and ad hoc manner. While this is methodologically convenient, the superordinate� subordinate model of the language�dialect relationship cannot avoid the question of genetic relationship. How closely related would a group of linguistic norms need to be considered as dialects of the same language? When would these norms be con- sidered separate languages? To what extent does mutual intelligibility play a role in determining answers to these two questions? |000|chinese dialects, sociolinguistics, introduction, diasystem 3423|Li2015|The height of 26,940 Chinese Han adults (16,503 rural and 10,437 urban adults) from 11 Han ethnic groups was measured and analyzed in the current survey. The top three highest dialect groups in rural Han popula- tions are Jianghuai (male 167.3 � 6.4 cm, female 156.5 � 5.6 cm), North China (male 167.3 � 6.4 cm, fe- male 155.7 � 5.7 cm), and Wu (male 166.7 � 6.9 cm, female 155.6 � 5.9 cm) groups. In urban Han populations, the top three groups are as follows: the Northeast China (male 169.5 � 6.7 cm, female 158.0 � 6.1 cm), North China (male 168.5 � 6.2 cm, female 157.3 � 5.8 cm), and Jianghuai (male 169.2 � 6.2 cm, female 157.1 � 5.6 cm) dialect groups. The Gan dialect group (male 164.0 � 6.3 cm, female 153.9 � 5.0 cm) was the shortest in both rural and urban groups. The different stature of Han dialect groups may be a result of interaction between ge- netic background and different environmental factors, labor intensity, diet composition and nutrition intake in different areas in China. |000|height, Han Chinese, Chinese dialects, stature, 3424|Muthukrishna2016|Innovation is often assumed to be the work of a talented few, whose pro- ducts are passed on to the masses. Here, we argue that innovations are instead an emergent property of our species' cultural learning abilities, applied within our societies and social networks. Our societies and social networks act as collective brains. We outline how many human brains, which evolved primarily for the acquisition of culture, together beget a col- lective brain. Within these collective brains, the three main sources of innovation are serendipity, recombination and incremental improvement. We argue that rates of innovation are heavily influenced by (i) sociality, (ii) transmission fidelity, and (iii) cultural variance. We discuss some of the forces that affect these factors. These factors can also shape each other. For example, we provide preliminary evidence that transmission efficiency is affected by sociality--languages with more speakers are more efficient. We argue that collective brains can make each of their constituent cultural brains more innovative. This perspective sheds light on traits, such as IQ, that have been implicated in innovation. A collective brain perspective can help us understand otherwise puzzling findings in the IQ literature, includ- ing group differences, heritability differences and the dramatic increase in IQ test scores over time. |000|innovation, collective brain, cultural evolution, cognition, language origin, 3425|Lampe2017|Cognition is one of the most flexible tools enabling adaptation to environmental variation. Living close to humans is thought to influence social as well as physical cognition of animals throughout domestication and ontogeny. Here, we investigated to what extent physical cognition and two domains of social cognition of dogs have been affected by domestication and ontogeny. To address the effects of domestication, we compared captive wolves (n=12) and dogs (n=14) living in packs under the same conditions. To explore developmental effects, we compared these dogs to pet dogs (n=12) living in human families. The animals were faced with a series of object-choice tasks, in which their response to communicative, behavioural and causal cues was tested. We observed that wolves outperformed dogs in their ability to follow causal cues, suggesting that domestication altered specific skills relating to this domain, whereas developmental effects had surprisingly no influence. All three groups performed similarly in the communicative and behavioural conditions, suggesting higher ontogenetic flexibility in the two social domains. These differences across cognitive domains need to be further investigated, by comparing domestic and non-domesticated animals living in varying conditions. |000|wolves, domestication, dog, cognition, animal cognition 3426|Steele2010|Evolutionary approaches to cultural change are increasingly influential, and many scientists believe that a `grand synthesis' is now in sight. The papers in this Theme Issue, which derives from a sym- posium held by the AHRC Centre for the Evolution of Cultural Diversity (University College London) in December 2008, focus on how the phylogenetic tree-building and network-based tech- niques used to estimate descent relationships in biology can be adapted to reconstruct cultural histories, where some degree of inter-societal diffusion will almost inevitably be superimposed on any deeper signal of a historical branching process. The disciplines represented include the three most purely `cultural' fields from the four-field model of anthropology (cultural anthropology, archaeology and linguistic anthropology). In this short introduction, some context is provided from the history of anthropology, and key issues raised by the papers are highlighted. |000|evolutionary approach, linguistic diversity, cultural diversity, cultural transmission, evolutionary theory 3427|Meillet1908|Les rapprochements qui ne s'étendent pas à plus de deux dialectes doivent être tenus pour plus ou moins suspects, sauf raisons particulières: car la ressemblance de deux mots exprimant le même sens dans deux langues différentes peut être due à une rencontre fortuite: c'est ainsi que l'anglais *bad* «mauvais» n'est past apparenté, même de loin, avec le persan *bad* signifiant aussi «mauvais» : mais ce serait un hasard étrange que *bad* signifiàt «mauvais» dans une troisième langue. La coïncidence de trois langues non contiguës suffit donc pratiquement à garantir le caractère «indo-européen» d'un mot, sous le bénéfice des réserves indiquées ci-dessus. :comment:`Later quoted by` @Hoenigswald1950 :comment:`but with wrong page mark` |345|Antoine Meillet, Indo-European, lexical reconstruction, linguistic reconstruction, methodology 3428|Meillet1908|Quand on a une fois éliminé les mots dont la ressemblance s'explique par des emprunts, il en reste un grand nombre qui, en tenant compte de l'action des lois phonétiques, se laissent identifier les uns aux autres [...]. de ces concordances, la plupart proviennent sans doute de ce que les mots correspondants existaient déjà en indo-européen, mais d'autres peuvents s'expliquer par l'extension plus ou moins tardive de certains mots sur tout ou partie du domain indo-européen. [...] Ces deux cas, celui de l'identité originelle et celui de l'extension postérieure à la division dialectale (c'est-à-dire de l'emprunt), sont au fond absolument différents, mais il est impossible la plupart du temps de faire le départ de ce qui appartient à l'un et à l'autre : et l'on en est réduit à entendre par mots indo-européens les mots communs à plusieurs dialectes indo-européens, à la seule condition qu'ils présentent toutes les altérations phonétiques et morphologiques caratéristiques des dialectes auxquels ils appartiennent, et que des témoignages historiques n'en attestent pas le caractère récent. |344|diffusion, Indo-European, lexical reconstruction, lexical borrowing, dialect diffusion 3429|Meillet1908|Toutefois, il importe de ne jamais l'oublier, le terme des *mots indo-européens* recouvre deux choses hétérogènes et qui [pb] be restent confondues que par suite de l'absence d'un critère donnant le moyen de les distinguer; et la part des emprunts préhistoriques d'un dialecte indo-européen à un autre ou de plusieurs dialectes ndo-européens à des langues d'autres familles n'est certainement pas négligeable.|344f|Indo-European, dialect diffusion, linguistic diffusion, lexical reconstruction, methodology 3430|Dorren2017|Ah, for the days of fact-free linguistics! The pre-scientific era might have produced a lot of codswallop and hogwash, but how entertaining it is to look back upon. Scholars erred in ways that few modern linguists ever would. Today, their field of study is a respectable social science, exacting in its methods, broad in its scope and generous in its harvest. Without phoneticians, computers wouldn’t be able to process spoken English. Without sociolinguists, prejudice against dialects and non- Western languages would still be rife – or rather, rifer still. Forensic linguists help to solve crimes, clinical linguists treat people with language impairments, historical linguists shed light on language change and even on prehistoric culture and migration – the list goes on and on. As in other disciplines, pertinent questions and rigorous methods to answer them have been at the root of success.|000|linguistics, popular science, essay, history of science 3431|Dorren2017|Very entertaining essay on the historical attitudes towards linguistics and the search for the "best" language.|000|essay, history of science, linguistics, popular science, 3432|Atkinson1875|Interesting article that is an early example for somebody who starts thinking about the "methodology" of comparative grammar.|000|methodology, comparative method, historical linguistics, historical language comparison 3433|Atkinson1875|Thus, then, we need at least three things before any satisfactory results can even be hoped for, viz., full me thodic comparative tables, showing all the forms in their mutual connexion; 2, a methodic phonetic; and 3, the natural favour of some key-language, within the sphere of the languages compared : and of these, the first and second become still more important and indispensable, when the third is absent.|64|workflow, comparative method, methodology, history of science 3434|Carling2017|This is the paper that describes the DIACL database, which is a rich resource on different language families in some idiosyncratic format which is shared in XML online. They also have a couple of new concept list, and for this reason, it is also quite interesting.|000|database, cross-linguistic study, dataset, concept list 3435|CarstairsMcCarthy2016|My argument, and plea, here is that typological linguists could do more to assist research on language evolution in three areas: (i) the possibility that there are languages with no distinction between clausal and nominal constructs; (ii) factors influencing the variation between languages in their preference for morphological and syntactic structure; (iii) the extent of pure juxtapositional compounding and its possible status as a linguistic “coelacanth” (as Jackendoff has termed it).|000|language typology, language origin, language evolution, nominal construct, clausal construct, compounding 3436|CarstairsMcCarthy2016|Author argues that we can get hints to how language evolved the first time when looking at certain compound structures where we cannot determine whether the structure is morphological or syntactical. Since these neither fit into syntax nor morphology, the author speculates that they reflect a primordial stage of language.|000|language origin, language typology, compounding, syntax, morphology 3437|Gibson2017|What determines how languages categorize colors? We analyzed results of the World Color Survey (WCS) of 110 languages to show that despite gross differences across languages, communication of chromatic chips is always better for warm colors (yellows/reds) than cool colors (blues/greens). We present an analysis of color statistics in a large databank of natural images curated by human observers for salient objects and show that objects tend to have warm rather than cool colors. These results suggest that the cross-linguistic similarity in color-naming efficiency reflects colors of universal usefulness and provide an account of a principle (color use) that governs how color categories come about. We show that potential methodological issues with the WCS do not corrupt information-theoretic analyses, by collecting original data using two extreme versions of the color- naming task, in three groups: the Tsimane’, a remote Amazonian hunter-gatherer isolate; Bolivian-Spanish speakers; and English speakers. These data also enabled us to test another prediction of the color-usefulness hypothesis: that differences in color cat- egorization between languages are caused by differences in overall usefulness of color to a culture. In support, we found that color naming among Tsimane’ had relatively low communicative efficiency, and the Tsimane’ were less likely to use color terms when describing familiar objects. Color-naming among Tsimane’ was boosted when naming artificially colored objects compared with natural objects, suggesting that industrialization promotes color usefulness.|000|color terms, warm colors, cool colors, psycholinguistics, 3438|Gibson2017|The authors argue that color naming terms are developed first mentally on the basis of trying to distinguish cool and warm colors, since warm colors are apparently more salient for objects, and in a second step, languages individually start to develop their color space.|000|color terms, warm colors, color categories, psycholinguistics 3439|Hamans2017|The difference between an affixoid and an affix is that an affixoid is an af- fix-like bound form, which corresponds to a free lexeme, whereas there is no [pb] corresponding lexeme next to an affix. However, the difference between a free lexeme and the corresponding affixoid is that affixoids necessarily imply se- mantic change. An affixoid is a transitional in between noun 5 and affix. Affixes and affixoids can both be used productively.|3f|definition, affixoid, affix, free lexeme, vocabulary item, 3440|Hamans2017|Whatever label is assigned to this process; it is clear a process of language change in which a free lexeme becomes a bound morpheme. The examples also show that the change is a gradual process, in which one cannot clearly distin- guish between compound and derived form and so between the different mor- phological categories of noun and affixoid.|7|free lexeme, bound lexeme, lexical change, 3441|Hamans2017|The processes of language change that are discussed here are gradual, as has been shown. Since these processes go from compounding to derivation and vice versa the distinction between these morphological processes becomes less clear and absolute than the handbooks suggest. As a consequence, the same applies to the distinction between morphological categories.|20|lexical change, gradual change, fusion, fission 3442|Hamans2017|Finally, the morphological changes described and analyzed here show that there is not a natural or naturally preferred direction of change in the languages under discussion. Maybe the number of changes in one direction (from free to bound morphemes) are higher than in the other, as often claimed, 37 but frequen- cy is an extremely weak argument for a supposed natural direction. The results of the language changes discussed here may refute suggestions of unidirection- ality, as defended within grammaticalization theory (see for instance @Haspelmath2004).|20|directionality, lexical change, grammaticalization, reanalysis, free lexeme, bound lexeme 3443|Hamans2017|Another similar example offers forms derived from the English forms Wa- tergate and hamburger. * 17 *watergate* * 17a *closetgate*, *nipplegate*, *donutgate* * 18 *hamburger* * 18a *fishburger*, *cheeseburger*, *weedburger* |12|lexical change, fission, splinter, 3444|Hamans2017|Traditionally one uses the term splinter for such non-morphemic portions of a word that has been split off [...]. Since the term splinter focus- es on the result of the process of splitting and not on the potential of the remain- ing portion, we prefer the new term libfix, in which its allusion on affix empha- sizes the possible productive word formation aspect, that will be shown hereaf- ter. In addition, the term splinter quite often is defined as “parts of words in blends which are intended to be recognized as belonging to a target word, but which are not independent formatives” (Lehrer 1996: 361). 16 As we will see also splinters of opaque, non-blend, forms may become a recurrent element in a se- ries of new formations, which is another reason to prefer the new term libfix.|11|definition, terminology, blend splinters, libfix, splinter 3445|Hammond2017|In this paper, we examine morphological complexity through the lens of Input Optimization. We take as our starting point the dimensions of complexity proposed in Anderson (2015). Input Optimization is a proposal to account for the statistical distribution of phonological properties in a constraint-based framework. Here we develop a framework for extending Input Optimization to the morphological domain and then test the morphological dimensions Anderson proposes with that framework. The dimensions we consider and the framework we develop are both supported by empirical tests in English and in Welsh.|000|input optimization, morphological complexity, measure, morphology 3446|Hammond2017|Provides a measure for morphological complexity, based on comparing mismatches in input and output forms: the more different the output for a given input, the more complex a morphology is assumed to be. The problem of the account is that it depends on how transparent the processes are for the speakers. We need to include the weight that we give to a certain operation, but it seems to me that the author does not really elaborate on this. Thinking of Umlaut-spreading in German, it is clear that something that increases complexity is being used for the simple reason because it seems transparent to the speakers when they coin it and apply it to more forms. The theory falls short of explaining how complexity can arise and still be tolerated.|000|morphological complexity, morphology, measure, 3447|Hammond2017|1 Words should have fewer morphemes. 2 Haplology should be avoided. 3 More marked morphological opterations (per the hierarchy above) should be avoided. 4 Morphophonology should be avoided. 5 Ablout, umlaut, truncation, etc. should be avoided. 6 Zero-marking should be avoided. |168|morphological complexity, input optimization, 3448|Hammond2017|**Phonological Complexity** (PC) The phonological complexity of some set of forms is defined as the vector sum of the constraint violation vectors for surface forms paired with their respective optimal inputs. To produce a *relative* measure of PC given some set of *n* surface forms, divide the PC score for those forms by *n*. |159|phonological complexity, linguistic complexity, measure, 3449|Hock2017|The common Indo-Europeanist view is that reconstructed Proto-Indo-European (PIE) cannot go back farther than the beginning of the fourth millennium BC and that its speakers must have been located somewhere in the Eurasian steppes. This perspective is based on the evidence of linguistic palaeontology – the linguistic evidence for reconstructing PIE words for ‘horse’ (*h 1 (e)ḱwos) and ‘wheel’ (*k w ek w los, *rotHo), 1 combined with the archaeological evidence for the earliest horse domestication in the Eurasian steppes in the early fourth millennium BC, and the first archaeological evidence for wheels and wheeled vehicles in the mid-fourth millennium BC (see Anthony (2007) for a comprehensive discussion and Outram et al. (2009) for additional evidence of horse domestication). Significantly, conditions in other areas of Eurasia were not conducive for horse domestication, except for the Iberian peninsula (Bendrey 2012). Outside the steppes (or Iberia), therefore, words for ‘horse’ can only refer to already domesticated horses.|000|Indo-European, methodology, linguistic palaeography, horse, wheel, 3450|Hock2017|Author argues against the assumption that one could reconstruct fake words back if the sound correspondences hold (like *Aluminium* in Tibetan languages). By showing how the example by Atkinson and Gray can be shown to still reflect its borrowing status, he tries to argue against the whole theory, claiming that the words for *horse* and *wheel* are enough to argue that IE speakers come from the Steppe. |000|linguistic palaeography, Steppe Hypothesis, Indo-European, lexical borrowing, sound correspondences 3451|Hock2017|Now, there are indeed cases where linguistic forms showing regular phonological correspondences should not be reconstructed to the proto-language even though they do contain inherited elements (see e.g. the discussion in Hock 1991: §18.8). These cases involve morphologically complex forms, such as English there-by : German da-bei ‘with that’, where the component elements can be reconstructed, but the combinations of these elements could be (and in fact, were) created independently in the two languages.|64|proto-form, lexical reconstruction, methodology, parallel development 3452|Hock2017|The fact that Atkinson et al. failed to inquire more deeply into the fuller range of Micronesian and Palauan evidence may be considered to be significant, and not just a minor slip. It is reminiscent of the problem pointed out by Donohue et al. (2012a: 519) about Atkinson et al.’s approach, namely that it is not able to distinguish between “social and spacial proximity” and “inheritance” in the lexicon-based classification of Polynesian. Ultimately, the difficulty appears to be attributable to the fact that Atkinson et al. draw on corpora produced by other scholars, without themselves examining the reliability of these corpora by employing standard comparative-historical methodology to critically scrutinize the evidence that they are based on.|65|lexical borrowing, inheritance, phylogenetic reconstruction, language contact, bad data 3453|Sampson2016|For the study of writing systems (still very much a minority branch of linguistics) the whole concept of typology is controversial, because there are influential scholars who believe that all the world’s scripts are essentially of the same type. There has been a long though not particularly honourable tradition of discussing all writing systems as if they were more successful or less successful attempts to approximate the Roman alphabet, seen as the only possible ideal from which any other kind of script could only be viewed as a falling-off. In 1960, for instance, the distinguished anthropologist Sir Jack Goody and the literary critic Ian Watt co-authored a widely- read paper (Goody & Watt 1963) which used the term “literate societies” explicitly to mean societies using an alphabetic script, as opposed to societies like China which for over three millennia has (as Goody and Watt saw it) been struggling with a system of writing too crude to confer the benefits of literacy on the society which uses it. Michael & Jennifer Cole (2006: 305) note that Goody and Watt’s paper, and subsequent related writings of Goody’s, “have had an especially influential and continuing impact on a wide range of different disciplines [...] Goody’s work on this topic continues to be used by anthropologists and historians, psychologists and sociologists”.|000|writing systems, typology, monophyly, 3454|Sampson2016|Author argues nicely that writing systems (especially alphabets) have a monophyletic origin which makes it difficult to get any insights into human universals, since we never know whether things evolved due to inheritance or due to independant innovation.|000|inheritance, parallel development, writing systems, monophyly, typology 3455|Sampson2016|In this case, any serious hypothesis that /ks/ is written with a single letter because it is a natural phonological unit could be refuted by the fact that not all alphabetic orthographies treat it as such. The Cyrillic alphabet has no X equiva- lent, and some versions of the Roman alphabet do not use X even in “interna- tional” words (the Welsh for ‘taxi’ is tacsi). But, with so few unrelated scripts extant, there is no guarantee that in other, comparable cases we could find orthographies which make it clear that some apparent generalization about writing systems was spurious. A generalization which turned out to hold for each one of the thousands of spoken languages, on the other hand, could hardly be a mere coincidence. Hence it is easier in the case of spoken languages than in the case of scripts to establish that some group of properties genuinely belong together and define a natural type.|566|monophyly, generalization, typology, universals 3456|Sampson2016|To summarize: relative to other branches of linguistics, for the study of writing systems issues of typology are unusually contentious, unusually signifi- cant, and also unusually difficult to research. That combination is perhaps unfortunate. But it is the way things are.|566|typology, writing systems, monophyly 3457|Sampson2016|It is as if we aimed to establish general theories about spoken [pb] language in a world where the only languages spoken were the Romance languages plus Finnish and Hungarian. The result is that it becomes difficult to distinguish between facts which stem from the nature of human reading and writing behaviour, and facts which are mere matters of historical accident.|565f|human nature, perception, natural similarities, historical similarities, similarity 3458|Hoenigswald1950|[Reconstruction by the comparative method (as distinct from internal reconstruc- tion based on alternations between phonemes in a paradigm) is essentially a problem in phonemics, in which the place of allophones is taken by sets of sound correspond- ences that are partially alike (share one component) and in complementary distribu- tion. The principle is illustrated by the IE dental and labial stops as reconstructed from Sanskrit and Germanic, by IE *s in Greek and Latin, and by the IE aspirates in Italic|000|methodology, correspondence patterns, sound correspondences, linguistic reconstruction, phonological reconstruction 3459|Hoenigswald1950|Important article that uses only language pairs to dive into the task of correspondence patterns. It shows that distribution matters, but also that a first establishment of sound correspondences is needed. This gives much room for substantiation and should be commented on when writing the paper on correspondence patterns.|000|sound correspondences, correspondence patterns, methodology, phonological reconstruction 3460|Avolio1988|The hypothesis that people have differential access (as meas- ured by decision-processing time) to descriptive categories of what is appli- cable to male and female managers, who were effective or ineffective, was tested. A list of adjectives was presented sequentially on a CRT screen to 96 participants (48 men and 48 women, students and university employees), who evaluated each item as to "how characteristic" or "how uncharacteristic" the adjective was in describing a male or female effective (ineffective) man- ager. "How characteristic," or rated protorypicality and decision-time were de- pendent measures. Analysis indicated that sex of target had linle influence on either rated prototypicality or decision times when performance information was presented. Differences in correlations between decision times and proto- rypicality ratings varied primarily with the manipulation of effectiveness.|000|speech norms, dataset, concept list, 3461|Avolio1988|This is a norm-data set, unfortunately not available digitally, in which proto-typicality and accessibility norms are presented.|000|speech norms, prototypicality, word accessibility 3462|DeDeyne2008|Features are at the core of many empirical and modeling endeavors in the study of semantic concepts. This article is concerned with the delineation of features that are important in natural language concepts and the use of these features in the study of semantic concept representation. The results of a feature generation task in which the exemplars and labels of 15 semantic categories served as cues are described. The importance of the generated features was assessed by tallying the frequency with which they were generated and by obtaining judgments of their relevance. The generated attributes also featured in extensive exemplar by feature applicability matrices covering the 15 different categories, as well as two large semantic domains (that of animals and artifacts). For all exemplars of the 15 semantic categories, typicality ratings, goodness ratings, goodness rank order, genera- tion frequency, exemplar associative strength, category associative strength, estimated age of acquisition, word frequency, familiarity ratings, imageability ratings, and pairwise similarity ratings are described as well. By making these data easily available to other researchers in the field, we hope to provide ample opportunities for continued investigations into the nature of semantic concept representation. These data may be downloaded from the Psychonomic Society’s Archive of Norms, Stimuli, and Data, www.psychonomic.org/archive|000|speech norms, word accessibility, concept accessibility, 3463|DeDeyne2008|Potentially important, freely available norm dataset showing how concepts are judged to be linked with certain features.|000|speech norms, feature applicability, concept list, dataset, 3464|Barabas2017|Building on a classical theoretical model in which species possess a suite of traits and the degree of competition between them is governed by trait similarity, they show that the way competition should vary with trait distance depends sensitively on whether the different traits introduce new ways of being regulated. If not — and the authors argue that this case is typical — then the number of trait combinations that can coexist is much smaller than the full set of available permutations, leaving many potential ecological possibilities unfilled. :comment:`This is important to note, as it means, we could also ask for languages, instead of asking why there are so many languages, why there are so few, or how many could be there.`|000|biological diversity, species evolution, diversity, Hilbert question 3465|Nascimento2017|Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software for running sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC) sampling, the diagnosis of an MCMC run, and ways of summarizing the MCMC sample. We discuss the specification of the prior, the choice of the substitution model and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software packages and recommend appropriate applications.|000|Bayesian approaches, introduction, tutorial, review 3466|Staffanson2017|Taste is a complex blend of several factors and is highly dependent on many of our other senses. The study of taste is more limited in comparison with the study of the senses of sight and hearing and the typological study of taste terms has not played a particularly important role in lexical typology so far. This study is based on a triangulation where taste terms and their use is investigated in a fairly representative sample of the world's languages as well as in more detail in Swedish. The purpose of the study is to explore how universal words for sweet are positively and words for bitter are negatively connotated in individual languages such as Swedish and more generally in the world's languages. This is done by looking into polysemic relations, colexification patterns and metaphors across languages. The result shows that general trends are the same from the triangulation's two studies, but that there are differences at the level of details. The taste term sweet is used predominantly positively and bitter is used predominantly negatively.|000|CLICS, colexification, semantic change, 3467|LaPolla2012|This paper is an attempt to apply insights and methodologies from Nichols 1996 to help us resolve problems in determining genetic relatedness among Sino-Tibetan languages, and in our efforts at reconstruction of protolanguages of different time depths. The results from the application of Nichols’ methodology are explained with reference to what we know about the migrations of the Sino-Tibetan peoples.|000|individual-identifying evidence, Sino-Tibetan, subgrouping 3468|Soderqvist2017|Colour terms is a highly interesting field when investigating linguistic universals and how language vary cross-linguistically. Colour semantics, the investigation of the meaning of colour, consists in largely of two opposing sides: the universalists, proposing that colour terms are universal (Berlin & Kay 1969) and the relativists claiming a variation in meaning cross-linguistically (Wierzbicka 2008).|000|CLICS, Sino-Tibetan, colexification, color terms 3469|Hoff2002|Network models are widely used to represent relational information among interacting units. In studies of social networks, recent emphasis has been placed on random graph models where the nodes usually represent individual social actors and the edges represent the presence of a specified relation between actors. We develop a class of models where the probability of a relation between actors depends on the positions of individuals in an unobserved "social space." We make inference for the social space within maximum likelihood and Bayesian frameworks, and propose Markov chain Monte Carlo procedures for making inference on latent positions and the effects of observed covariates. We present analyses of three standard datasets from the social networks literature, and compare the method to an alternative stochastic blockmodeling approach. In addition to improving on model fit for these datasets, our method provides a visual and interpretable model-based spatial representation of social relationships and improves on existing methods by allowing the statistical uncertainty in the social space to be quantified and graphically represented|000|network analysis, graph theory, latent space, methodology, 3470|Hoff2002|This analysis apparently puts nodes in a network into a space and tries to reflect the network characteristics in this space as closely as possible. There is an implementation in Python, with a working example here: http://nbviewer.jupyter.org/github/blei-lab/edward/blob/master/notebooks/latent_space_models.ipynb In parts, it seems also possible to infer directed networks, but it is not entirely clear to me yet, how this works. |000|latent space, network analysis, graph theory, 3471|DeLancey2017|The provenance and antiquity of verb agreement 1 in the Trans-Himalayan family has been a matter of controversy since the phenomenon was first reported in the mid-19 th century. With the tremendous increase in the available documentation over the last generation, we now have enough information to reconstruct an ancestral paradigm in some detail. But controversy remains over whether the descendants of that ancestor include all the Trans-Himalayan (or Tibeto-Burman) languages, or only those where an archaic paradigm can still be found. The reconstruction of verb agreement to PTH (or PTB) follows the principles of the Comparative Method, the foundation of historical linguistics since the Neogrammarian revolution. The argument that the ancestral paradigm should be reconstructed only to some hypothetical lower-level branch is presented within an idiosyncratic alternative framework. In this paper I will review the arguments for agreement in PTH in terms of the traditional Comparative Method, and show that the alternative proposal is not only idiosyncratic, but logically incoherent.|000|comparative method, Sino-Tibetan, Antoine Meillet 3472|DeLancey2017|Paper contains some problems, since @Meillet<1908> is misunderstood as pointing to "three witnesses" in general reconstruction, although Meillet was only talking about word reconstruction. This is also made explicit in @Hoenigswald1950.|000|Antoine Meillet, three witness theory, comparative method, Sino-Tibetan 3473|Serzant2015|The paper is primarily devoted to a methodological discussion. There are two different types of inquiries into diachronic syntax and, more generally, grammar: stage reconstruction and etymological reconstruction. The aim of the first type is to reconstruct and compare diachronic stages within a particular functional domain, while the second type focuses on the etymology or the origin of a particular grammatical category. It is the second type of inquiry that is the topic of this paper. I argue for a methodology based on the Historical-Comparative Method that should ensure a higher degree of reconstructional probability and exclude factors other than inheritance that might also be potentially responsible for correlations across related languages. On this approach, the construction under investigation must be individualized against its respective typological background: creating lists of morphological, lexical (input), syntactic and semantic properties – a procedure that I refer to as profiling (notion borrowed from Cognitive Linguistics). The general principle here is that correlations of typologically quirky properties increase the degree of probability of any reconstruction. An obvious typological quirk is the morphological profile of a construction, since the phonetic realization of morphological markers and their combinations is purely accidental and is not subject to typological universals. The morphological inventory of the construction must be reconstructible in the proto-language on the basis of the Historical-Comparative Method. The ability to reconstruct the morphological inventory also excludes language contact as a potential source for correlations. Other typologically idiosyncratic properties – if reconstructible – may also increase the degree of reconstruction probability. To illustrate how this method may be applied, I focus on the development of the independent partitive genitive from Proto-Indo-European into Baltic and Russian and, finally, into North Russian dialects. On the basis of this method I show that this category is inherited from Proto-Indo-European. I examine the syntactic profiles of this category at different stages and account for changes.|000|syntactic reconstruction, comparative method, onomasiological approach, onomasiological reconstruction, 3474|Serzant2015|Potentially interesting paper as it may give some additional thoughts on differences between the reconstruction of the onomasiological aspects and the semasiological aspects of grammatical categories (etymologies and their functions).|000|syntactic reconstruction, onomasiological reconstruction, comparative method 3475|Priestly1973|Over the last quarter-century, most of the procedural problems which arise in the application of the Comparative Method (CM) have been more or less adequately described. Some scholars dismiss efforts at precision, with e.g. the excuse that the CM is 'an art and not a science'. It is of course true that some of the problems faced by the comparativist may be impervious to rigorous analysis; but it does not follow that he must always put his trust in experience and intuition alone. Pedagogical purposes apart, it will surely be accepted that the controversial areas within recon- struction may be better understood when those procedural matters which can be clearly described, are clearly described.|000|subgrouping, comparative method, methodology, art 3476|Ding2017|This paper argues that a linguistic or any other type of sign should not be viewed as a static unit in a synchronic system, but rather as a dynamic entity that appears in real communicative events. As discrete entries in a dictionary, words are given definitions as their meanings, but these definitions may lead to interpretations that have not been sanctioned by a language community and yet are totally reasonable and understandable. In this way, “old” words can be used to stand for “new” segments of our life experience through which some meanings are born and others modified. It is through our inferential use of this living sign that we keep adapting to an ever expanding world.|000|sign model, linguistic sign, Ferdinand de Saussure, semiotics 3477|Ding2017|In actual language use, however, the relationship between the signifier and the signified of a linguistic sign is far more complicated than is presented in Saussure’s model. This is so because most words in a natural language contain several and sometimes dozens of meanings or senses that correspond to the same linguistic form.|139|signifier, signified, linguistic sign, Ferdinand de Saussure, 3478|Neumann2017|The idea that abstract words are grounded in our sensorimotor experience is gaining support and popularity, as observed in the increasing number of studies dealing with “neurosemantics.” Therefore, it is important to form models that explain how to bridge the gap between basic bodily experiences and abstract language. This paper focuses on the embodiment of connotations, such as “sweet” in “sweet baby,” where the adjective has been abstracted from its concrete and embodied sense. We summarize several findings from recent studies in neuroscience and the cognitive sciences suggesting that emotion, body, and language are three factors required for understanding the emergence of abstract words, and (1) propose a model explaining how these factors contribute to the emergence of connotations, (2) formulate a computational model instantiating our theoretical model, and (3) test our model in a task involving the automatic identification of connotations. The results support our model pointing to the role of embodiment in the formation of connotations.|000|word emotions, emotion concepts, neurology, semantics, embodiment 3479|Walworth2017|Old Rapa, the indigenous Eastern Polynesian language of the island of Rapa Iti, is no longer spoken regularly in any cultural domains and has been replaced in most insti- tutional domains by Tahitian. The remaining speakers are elders who maintain it only through linguistic memory, where elements of the language are remembered and can be elicited but they are not actively used in regular conversation. Reo Rapa, a contact language that fuses Tahitian and Old Rapa, which has developed from the prolonged and dominant influence of the Tahitian language in Rapa Iti since the mid nineteenth century, has replaced the indigenous Old Rapa language at home and between most people in regular social interaction. This article analyzes Reo Rapa through an exami- nation of its genesis and its structure. This article furthermore defines Reo Rapa as a unique contact variety, a shift-break language: a language that resulted from stalled shift due to a collective anti-convergence sentiment in the speech community. This article further discusses a variety of Reo Rapa speech, New Rapa, which presents impor- tant questions for the natural-ness of language change and the visibility of actuation.|000|Reo Rapa, Austronesian, language contact, purification, 3480|Diamond2011|Tracing a common ancestry between languages becomes harder as the connection goes further back in time. A new test has revealed a surprisingly ancient relationship between a central Siberian and a North American language family.|000|proof of relationship, genetic relationship, Na-Dene, Yeniseian 3481|Diamond2011|Such tests have been extended by Kessler and Lehtonen [@Kessler2006] to multilateral comparisons, after first confirming statistical significance among 11 languages already known to belong to the Indo-European family. This was true even for Albanian, a language whose Indo– European affinities had proved difficult to establish by bilateral comparisons. This suc- cess vindicates Greenberg’s view that mul- tilateral comparisons can uncover evidence of a relationship that is obscure in bilateral comparisons (because any single Indo-Euro- pean language alone happens to lack certain Indo-European roots retained in Albanian).|292|multilateral comparison, significance, proof of relationship, genetic relationship 3482|Vialou2017|The earliest peopling of South America remains a contentious issue. Despite the growing amount of new evidence becoming available, and improved excavation and dating techniques, few sites have yet to be securely assigned to a period earlier than 12 000 BP. The Santa Elina shelter in Brazil, located at the convergence of two major river basins, is one of them. The excavations at the site, including the results of various dating programmes, are described here along with reflections on the unique insights offered by Santa Elina into early migration routes into the Southern Cone.|000|peopling of South America, archaeology, 3483|Vialou2017|Paper suggests a much earlier peopling of South America, which may have happened even 20 000 years ago.|000|peopling of South America, archaeology, dating, 3484|Bergstroem2017|New Guinea shows human occupation since ~50 thousand years ago (ka), independent adoption of plant cultivation ~10 ka, and great cultural and linguistic diversity today. We performed genome-wide single-nucleotide polymorphism genotyping on 381 individuals from 85 language groups in Papua New Guinea and find a sharp divide originating 10 to 20 ka between lowland and highland groups and a lack of non–New Guinean admixture in the latter. All highlanders share ancestry within the last 10 thousand years, with major population growth in the same period, suggesting population structure was reshaped following the Neolithic lifestyle transition. However, genetic differentiation between groups in Papua New Guinea is much stronger than in comparable regions in Eurasia, demonstrating that such a transition does not necessarily limit the genetic and linguistic diversity of human societies.|000|peopling of New Guinea, archaeogenetics, population genetics 3485|Tang2017|Speakers of all human languages regularly use intonational pitch to convey linguistic meaning, such as to emphasize a particular word. Listeners extract pitch movements from speech and evaluate the shape of intonation contours independent of each speaker’s pitch range. We used high-density electrocorticography to record neural population activity directly from the brain surface while participants listened to sentences that varied in intonational pitch contour, phonetic content, and speaker. Cortical activity at single electrodes over the human superior temporal gyrus selectively represented intonation contours. These electrodes were intermixed with, yet functionally distinct from, sites that encoded different information about phonetic features or speaker identity. Furthermore, the representation of intonation contours directly reflected the encoding of speaker-normalized relative pitch but not absolute pitch.|000|neurology, intonation, tone, 3486|Tang2017|Paper seems to show that pitch and intonation play an important role in decoding speech signals. This article is quickly discussed in @Zeibig2017|000|pitch, intonation, neurology, neurolinguistics, speech processing 3487|Zeibig2017|Article provides short summary on the research by @Tang2017.|000|speech signal, popular science, summary 3488|Kassian2017|This paper deals with the genealogical structure of the Tsezic language group with a special focus on the position of the Hinukh language within the group, which has remained controversial in previous research. Most specialists have treated Hinukh as the closest relative of the Tsez (Dido) language, and some previously proposed formal classifications based on lexical data appear to confirm this. However, the new and advanced lexicostatistical classification used in this paper suggests that Hinukh and Tsez do not in fact form a distinct clade. We examine Hinukh and Tsez historical phonology as well as morphological and syntactic features, upon which the traditional expert classifications have rested, and find that all exclusive traits seemingly shared by Hinukh and Tsez are either illusory or due to contact-driven parallel development or may simply represent Proto-Tsezic retentions. In other words, there is no phonological and grammatical evidence for a hypothetical Tsez-Hinukh protolanguage. On the contrary, there are some exclusive grammatical features of the Tsez and Khwarshi languages that may be regarded as innovations of a Tsez-Khwarshi protolanguage, conforming with a distinct Tsez-Khwarshi node in the new lexicostatistical classification of Tsezic. Therefore the traditional criterion of shared innovations supports the validity of the modern version of the lexicostatistical method employed in this paper.|000|shared innovation, methodology, parallel development, subgrouping 3489|Kassian2017|Interesting article that deals with the general problem of identifying shared innovations for the purpose of subgrouping.|000|shared innovation, subgrouping, methodology 3490|Arendt2007|Biologists often distinguish ‘convergent’ from ‘parallel’ evolution. This distinction usually assumes that when a given phenotype evolves, the underlying genetic mech- anisms are different in distantly related species (convergent) but similar in closely related species (parallel). However, several examples show that the same phenotype might evolve among populations within a species by changes in different genes. Conversely, similar phenotypes might evolve in distantly related species by changes in the same gene. We thus argue that the distinction between ‘convergent’ and ‘parallel’ evolution is a false dichotomy, at best representing ends of a continuum. We can simplify our vocabulary; all instances of the independent evolution of a given phenotype can be described with a single term – convergent.|000|convergent evolution, parallel evolution, methodology 3491|Arendt2007|Interesting article discusses that the distinction between parallel and convergent evolution may not be adequate in biology. This may have interesting implications in the light of the debate on subgrouping and shared innovations in linguistics.|000|subgrouping, parallel evolution, parallel development, convergent evolution 3492|Boswell1950|Article very brief discusses the Burmese wild dog species (probably the Dhole, or the "Mountain Dog").|000|species, biology, animals, introduction, Burma 3493|Burton1950|Short article on the "wild dog" in Burma.|000|animals, dog, summary, biology, 3494|Kleijn2017|Two co-authored articles in Nature (Haak et al., 2015; Allentoft et al., 2015) caused a sensation. They revealed genetically the mass migration of steppe Yamnaya culture people in the Early Bronze Age to central and northern Europe. The authors considered this event as the basis of the spread of Indo- European languages. In response, the Russian archaeologist, Leo S. Klejn, expresses critical remarks on the genetic inference, and in particular its implications for the problem of the origins of Indo-European languages. These remarks were shown to the authors and they present their objections. Klejn, however, has come to the conclusion that the authors’ objections do not assuage his doubts. He analyses these objec- tions in a further response.|000|Yamnaya culture, Indo-European, population genetics, archaeology, migration, Urheimat 3495|KoptjevskajaTamm2017|The aim of this chapter is to define, exemplify and problematize the lexico- semantic aspects of areality / language convergence. Just as with areal linguistics in general, this subfield is an intersection of a number of more general approaches: the study of language contact, language change and typological research. Particularly prominent in our treatment is the cross-fertilization of meticulous documentation of linguistic features in specific geographical regions or areas, often grounded in fieldwork, and large-scale cross-linguistic findings and generalizations.|000|colexification, areal diffusion, linguistic area, methodology, lexical change, lexical typology 3496|Meylan2017|The inverse relationship between the length of a word and the frequency of its use, first identified by G.K. Zipf in 1935, is a classic empirical law that holds across a wide range of human languages. We demonstrate that length is one aspect of a much more general property of words: how distinctive they are with respect to other words in a language. Distinctiveness plays a critical role in recognizing words in fluent speech, in that it reflects the strength of potential competitors when selecting the best candidate for an ambiguous signal. Phonological information content, a measure of a word’s string probability under a statistical model of a language’s sound or character sequences, concisely captures distinctiveness. Examining large- scale corpora from 13 languages, we find that distinctiveness significantly outperforms word length as a predictor of frequency. This finding provides evidence that listeners’ processing constraints shape fine-grained aspects of word forms across languages.|000|word length, information content, distinctivity, distinctiveness, phonological information content, 3497|Meylan2017|Interesting paper deals with information content of words and shows that these are the driving factor for Zipf's law rather than word length.|000|Zipf's law, word length, phonological information content, information content, 3498|Haeussler2017|Even though teams have become the dominant mode of knowledge production, little is known regarding how they divide work among their members. Conceptualizing knowledge production as a process involving a number of functional activities, we first develop a conceptual framework to study the division of labor in teams. This framework highlights three complementary perspectives: (1) individual level (the degree to which team members specialize vs. work as generalists), (2) activity level (the degree to which activities are concentrated among few team members vs. distributed among many) and (3) the intersection between the two (e.g., which activities are performed jointly by the same individual). We then employ this framework to explore team-based knowledge production using a newly available type of data – the disclosures of author contributions on scientific papers. Using data from over 12,000 articles, we provide unique descriptive insights into patterns of division of labor, demonstrating the value of the three complementary perspectives. We also apply the framework to uncover differences in the division of labor in teams of different size, working in novel vs. established fields, and on single vs. interdisciplinary projects. Finally, we show how division of labor is related to the quality of teams’ research output. We discuss opportunities for extending and applying our framework as well as implications for scientists and policy makers.|000|division of labor, team work, social science, 3499|Haendler1989|This article discusses computer-assisted approaches in dialectology.|000|computer-assisted analysis, computer-aided approaches, dialectology, historical linguistics, summary, history of science 3500|Lowe1995|Dissertation discusses computer-assisted approaches in studying Sino-Tibetan languages. |000|Sino-Tibetan, computer-assisted analysis, computer-aided approaches, comparative method, database, history of science 3501|Wilkinson2016|There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.|000|data-sharing, dataset, scientific data, data managment, 3502|Smokotin2015|The article examines the age-old search of humankind for a universal language, which would make it possible to overcome the linguistic and cultural barriers. The authors a) state the importance of the problem, b) analyze how the perspective on the language of common communication changed over time, and c) discuss the reasons for the failures in constructing a global lingua franca on the basis of an artificial language. A conclusion is made that artificial languages have failed as means of overcoming language and cultural barriers because they find it very difficult to answer the challenges of the changing world in all spheres of life due to the absence of any ethnocultural ties. The authors point out the topicality of this area as a subject for academic study.|000|universal language, history of science, 3503|Gehr2015|The translation of medically relevant academic inventions that could transform public health has been notoriously difficult, stemming largely from cultural differences been academia and industry. New initiatives to kindle academic entrepreneurship and establish stronger public/private part- nerships are helping to align these differences and accelerating the translation of promising new therapies.|000|translation, academia, medicine, industry, 3504|Janssen2017|Popular methods for exploring the space of rooted phylogenetic trees use rearrangement moves such as rNNI (rooted Nearest Neighbour Inter- change) and rSPR (rooted Subtree Prune and Regraft). Recently, these moves were generalized to rooted phylogenetic networks, which are a more suitable representation of reticulate evolutionary histories, and it was shown that any two rooted phylogenetic networks of the same complexity are connected by a sequence of either rSPR or rNNI moves. Here, we show that this is possible using only tail moves, which are a restricted version of rSPR moves on net- works that are more closely related to rSPR moves on trees. The connectedness still holds even when we restrict to distance-1 tail moves (a localized version f tail-moves). Moreover, we give bounds on the number of (distance-1) tail moves necessary to turn one network into another, which in turn yield new bounds for rSPR, rNNI and SPR (i.e. the equivalent of rSPR on unrooted networks). The upper bounds are constructive, meaning that we can actually find a sequence with at most this length for any pair of networks. Finally, we show that finding a shortest sequence of tail or rSPR moves is NP-hard.|000|phylogenetic network, tree space, network space, rooted phylogenetic network, NP-hard 3505|Uria2017|After defining grammatical (as opposed to lexical) homonymy as concerning either inflection or the conflict between different parts of speech, attention is paid to those contexts in which Varro and Quintilian dealt with processes falling under that concept. The paper remarks on the acute distinction Quintilian seems to make between lexical and grammatical homonymy by deal- ing with the former in relation to rhetoric and the latter within the grammatical chapters of book I. The similarity of Quintilian’s approach to homonymy is then shown with the use Apollonius Dyscolus would later make of the term synemp- tosis as a morphological coincidence of word forms. The parallel doctrine and terminology in later Latin traditions is also considered.|000|homonymy, colexification, history of science, grammar, Quintilian, 3506|Heine2017|The linguistic history of the Ghana-Togo Mountain (GTM) languages, spoken in southeastern Ghana and the southern half of Togo, has been the subject of detailed research for more than a century. Nevertheless, there are still problems both with the external and the internal classification of the group. The present paper provides a state of the art overview of this research field. It is argued that linguistic reconstructions that can claim to provide credible hypoth- eses on genetic relationship patterns among languages are best based on the application of the comparative method.|000|African languages, comparative method, proof of relationship, Niger-Congo languages, methodology, 3507|Heine2017|As Heine (1969) shows, it is possible to establish entirely regular sound corre- spondences for all Buem languages, and to proceed further to the reconstruction of a hypothetically set up ancestor language, called Proto-Buem.|278|Buem languages, sound correspondences, African languages, 3508|Heine2017|Blench (2009) is one of the examples of the method of resemblances. Unlike Bennett and Sterk (1977) and Stewart (1989), he is concerned mainly with GTM data, presenting a catalogue of lexical correspondences which, as he maintains, are diagnostic of genetic relationship, the relevant items being ‘six’, ‘meat’, ‘water’, ‘give’, ‘animal’, ‘oil/fat’, ‘bird’, ‘hill/mountain’, ‘blood’.|279|lookalikes, methodology, comparative method, mass comparison 3509|Heine2017|To conclude, with few exceptions, outlined in Section 2, historical linguistics has not made much headway in the study of GTM languages, and future work in this area is well advised to focus on reconstructions based on the comparative method in language groups where this is possible, rather than engaging in large-scale comparisons which are beyond the scope of this method.|280|mass comparison, regular sound change, sound correspondences, African languages, methodology 3510|Griffiths2017|This article studies the modern development of the comparative method in the human- ities and social sciences within Europe and the United States, and specifically addresses comparative subfields of philology, linguistics, anthropology, sociology, political sci- ence, literature, history, and folklore studies. A juxtapositional study of these disciplin- ary histories demonstrates the historical relation between their methods and relation to other fields, like comparative anatomy. It elucidates several recurrent features of the different applications of comparativism, particularly a consistent tension between ge- netic (or historical) versus functionalist (or contextual) explanations for common pat- terns, and suggests that comparatists would benefit from closer study both of the history of the method and its development within other fields. Ultimately this study casts fresh light on the modern history of the humanities, their incomplete differentiation from social-scientific fields like sociology and political science, and the interdisciplinary exchanges that have often shaped entire fields of study.|000|comparative method, humanities, overview, history of science 3511|Wang2017|We show that faces contain much more information about sexual orientation than can be perceived and interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 74% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women. The accuracy of the algorithm increased to 91% and 83%, respectively, given five facial images per person. Facial features employed by the classifier included both fixed (e.g., nose shape) and transient facial features (e.g., grooming style). Consistent with the prenatal hormone theory of sexual orientation, gay men and women tended to have gender-atypical facial morphology, expression, and grooming styles. Prediction models aimed at gender alone allowed for detecting gay males with 57% accuracy and gay females with 58% accuracy. Those findings advance our understanding of the origins of sexual orientation and the limits of human perception. Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.|000|machine learning, neural network, sexual orientation, 3512|Smith2017|The frequency with which new mutations occur (known as the mutation rate) influences the ability of a virus to adapt and evade the host’s immune system, and researchers have long been interested in accurately measuring these mutation rates (Parvin et al., 1986; Nobusawa and Sato, 2006). However, existing approaches to measuring mutation rates may have potential biases and shortcomings that have not been fully explored or corrected for. Now, in eLife, Matthew Pauly, Megan Procario and Adam Lauring of the University of Michigan report that using a new twist on an old method can overcome the major flaws of a current approach (Pauly et al., 2017).|000|mutation rate, virus evolution, measure, 3513|Smith2017|What is interesting for linguistic purposes is how biologists try to infer the mutation rate of different organisms or species. It seems that we might need similar measures, even if we are far away from being able to even think about it.|000|mutation rate, virus evolution 3514|KruzelDavila2017|Two gene variants provide different levels of protection against sleeping sickness, but this comes with an increased risk of developing chronic kidney disease.|000|African trypanosomiasis, sleeping sickness, resistance, mutation, 3515|KruzelDavila2017|As far as I understand this report, the scholars describe research that identified different mechanisms in African people to resists the pathogenes that lead to sleeping sickness, but the resistance strategy seems to increase the risk of kidney sickness.|000|African trypanosomiasis, sleeping sickness, 3516|Hoellmann2015|:comment:`[Quote from Zhuzi yulei, 1270, Chap 10]` .. pull-quote:: Seit es Bücher in gedruckten Ausgaben gibt, lesen die Leute nicht mehr aufmerksam und konzentriert. [...] Sie scheuen sich sogar davor, die Texte abzuschreiben. (*Zhuzi yulei*,1270, Kap. 10)|71|Chinese writing system, book printing, unlearning, language decay, 3517|Hoellmann2015|:comment:`[Quote from the Juyi, 834, translated by the author]` .. pull-quote:: «Mitteilsan sund nur Ignoranten. Der Weise schweigt.» So ist es von Laozi überliefert. Wenn es sich aber so verhält, warum hat der Meister dann selbst ein Werk von fünftausend Zeichen verfasst? (Bo Juyi, 834)|72|Lǎo Zǐ, Daoism, critics, nice quote 3518|Ciancaglini2008|Chiefly, the method can be applied only when the compared languages have a long and uninterrupted written tradition in phonographic writing systems; otherwise it is impossible to reconstruct the proto-language phonology. From this point of view Japanese and Korean do not display the necessary conditions, attested too late (Japanese is attested from the seventh-eighth century AD; the earliest records of Korean, the Kirim wordlist, are from around the tenth century AD) and written in Chinese characters, which are logographic. The Indo-European languages, on the other hand, are attested from the second millennium BC (Hittite, Mycenaean Greek) until today and the Semitic languages from the third millennium BC (Akkadian etc.) onwards, and both families are attested in syllabic or alphabetic scripts.|292|comparative method, methodology, limitations, 3519|Ciancaglini2008|As often remembered by Meillet, if for exam[pb]le we had only French, Bulgarian and Modern Armenian, or French and Modern English, as as representatives of the Indo-European family, it would be almost impossible to detect the genetic relatedness among these languages and to reconstruct Proto-Indo-European. :comment:`[Not may scholars would really agree with this nowadays.]`|292f|Antoine Meillet, limitations, time depth, comparative method 3520|Dediu2016|Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS (@Dryer<2013> & Haspelmath, 2013), Glottolog (@Hammarstroem<2015> et al., 2015) and Phoible (@Moran<2014> & McCloy, 2014). Databases such as Phoible contain the actual segments used by various languages as they are given in the primary language descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations over classes of segments that share theoretically interesting features. Here we present a method and the associated R (R Core Team, 2014) code that allows the flexible definition of such meaningful classes and that can identify the sets of segments falling into such a class for any language inventory. The method and its results are important for those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to extra-linguistic factors and processes such as climate, economics, history or human genetics.|000|dataset, distinctive features, phonology, cross-linguistic study, 3521|Dediu2016|Essentially does the following: * set up new features system, called Fonetikode (but has some strange features, like "stroke up", which are not phonetic) * link this to Phoible (@Moran2014) * link this to @Ruhlen2008 (but only partially, as they have problems in interpreting Ruhlen's characters)|000|distinctive features, feature system, dataset, cross-linguistic study 3522|Frost2012|In the last decade, reading research has seen a paradigmatic shift. A new wave of computational models of orthographic processing that offer various forms of noisy position or context-sensitive coding have revolutionized the field of visual word recognition. The influx of such models stems mainly from consistent findings, coming mostly from European languages, regarding an apparent insensitivity of skilled readers to letter order. Underlying the current revolution is the theoretical assumption that the insensitivity of readers to letter order reflects the special way in which the human brain encodes the position of letters in printed words. The present article discusses the theoretical shortcomings and misconceptions of this approach to visual word recognition. A systematic review of data obtained from a variety of languages demonstrates that letter-order insensitivity is neither a general property of the cognitive system nor a property of the brain in encoding letters. Rather, it is a variant and idiosyncratic characteristic of some languages, mostly European, reflecting a strategy of optimizing encoding resources, given the specific structure of words. Since the main goal of reading research is to develop theories that describe the fundamental and invariant phenomena of reading across orthographies, an alternative approach to model visual word recognition is offered. The dimensions of a possible universal model of reading, which outlines the common cognitive operations involved in orthographic processing in all writing systems, are discussed.|000|reading, writing systems, Chinese writing system, modelling, cross-linguistic study 3523|Frost2012|Paper compares different writing systems and concludes that a universal model of reading needs to acknowledge that writing systems are optimized with respect to the language they encode. This is essentially wrong, as we know from history that writing systems are used inconsistently and have many idiosyncratic aspects. Writers do not necessarily modernize their writing systems, but rather follow them out of conservatism. The paper has a lot of comments attached to it, and also some nice summary on a couple of writing systems as well as modern literature on reading models for machine learning or similar things, so it is a nice contribution, even if it is a bit misleading, as I think.|000|writing systems, reading, reading model, Chinese writing system, Hebrew script, 3524|Maddieson2005|It is often suggested that languages are likely to ‘compensate’ complexity in one subsystem by simplicity elsewhere. In this paper evidence against this idea is presented by examining several subsystems of the basic phonology in a set of over 600 languages selected to represent genetic and areal diversity. The relationships between elaboration of the syllable canon, the size of segment inventories and the complexity of tone systems are studied. The languages are grouped into three syllable complexity classes. The ‘simple’ class permits only (C)V patterns. ‘Moderate’ languages allow CC onsets with common structures (those with an approximant or ‘liquid’ as C2) and/or permit a single coda consonant. ‘Complex’ languages allow more elaborate clusters. Languages are also divided into three tonal groupings, those with no tone contrasts, those with simple tone systems (two levels) and those with more elaborate tone systems. Consonant, total vowel and vowel quality inventories are numeric values. There is a significant positive correlation between the complexity of syllable type and size of consonant inventory, but no correlation between syllable complexity and size of total vowel or vowel quality inventories. Consonant and vowel inventory sizes show no correlation with each other. Complex syllable structure shows some association with absence of tone, but none of the other comparisons show that complexity is systematically compensated for by simplicity elsewhere.|000|tone language, cross-linguistic study, phoneme inventory, 3525|Maddieson2005|Paper presents a correlation study that seems to confirm that languages with less complex syllables tend to have less complex if any tone systems. The problem is again, that the data is not entirely empirical, as they bin it into categories before doing their analysis, while a more elaborate study would not do so, but take the raw material.|000|phonological complexity, onsets, cross-linguistic study, phoneme inventory, correlational studies 3526|Anraud2017|We present a system for identifying cog- nate sets across dictionaries of related lan- guages. The likelihood of a cognate re- lationship is calculated on the basis of a rich set of features that capture both pho- netic and semantic similarity, as well as the presence of regular sound correspon- dences. The similarity scores are used to cluster words from different languages that may originate from a common proto- word. When tested on the Algonquian lan- guage family, our system detects 63% of cognate sets while maintaining cluster pu- rity of 70%.|000|cognate detection, cross-semantic cognates, source code, algorithms, 3527|Anraud2017|Method offers source code online at github, as indicated in paper. Essentially, they train SVMs using different approaches. The basic idea is to put the burdon of cross-semantic search on wordnet links, which are not very suitable for this purpose, but seem to work well enough for their case. To which degree sound correspondence detection and language-specific similarities play a role is not entirely clear, since the source code needs to be tested first.|000|cognate detection, cross-semantic cognates, algorithms, Python 3528|Haslam2017|Since its inception, archaeology has traditionally focused exclusively on humans and our direct ancestors. However, recent years have seen archaeological techniques applied to material evidence left behind by non-human animals. Here, we review advances made by the most prominent field investigating past non-human tool use: primate archaeology. This field combines survey of wild primate activity areas with ethological observations, excavations and analyses that allow the reconstruction of past primate behaviour. Because the order Primates includes humans, new insights into the behavioural evolution of apes and monkeys also can be used to better interrogate the record of early tool use in our own, hominin, lineage. This work has recently doubled the set of primate lineages with an excavated archaeological record, adding Old World macaques and New World capu- chin monkeys to chimpanzees and humans, and it has shown that tool selection and transport, and discrete site formation, are universal among wild stone-tool-using primates. It has also revealed that wild capuchins regularly break stone tools in a way that can make them difficult to distinguish from simple early hominin tools. Ultimately, this research opens up opportunities for the development of a broader animal archaeology, marking the end of archaeology’s anthropocentric era.|000|primates, ape cognition, archaeology, tool making, 3529|Prendergast2017|Human-mediated biological exchange has had global social and ecological impacts. In sub- Saharan Africa, several domestic and commensal animals were introduced from Asia in the pre-modern period; however, the timing and nature of these introductions remain conten- tious. One model supports introduction to the eastern African coast after the mid-first millen- nium CE, while another posits introduction dating back to 3000 BCE. These distinct scenarios have implications for understanding the emergence of long-distance maritime connectivity, and the ecological and economic impacts of introduced species. Resolution of this longstanding debate requires new efforts, given the lack of well-dated fauna from high- precision excavations, and ambiguous osteomorphological identifications. We analysed fau- nal remains from 22 eastern African sites spanning a wide geographic and chronological range, and applied biomolecular techniques to confirm identifications of two Asian taxa: domestic chicken (Gallus gallus) and black rat (Rattus rattus). Our approach included|000|flora and fauna, exchange, domestication, human prehistory, East Africa, South-East Asia, domestic chicken, black rat, 3530|Prendergast2017|Paper is an interesting example regarding the usefulness of linguistic data (where it is available), given that domesticated animals are still reflected and reconstructible in some proto-languages (with some certainty). As a result, one might think of combining archaeological and linguistic data more efficiently.|000|domestic chicken, black rat, flora and fauna, exchange, domestication, East Africa, South-East Asia 3531|Brown2011a|This paper uses the comparative method of historical linguistics to investigate the hy- pothesis that languages of two well-established families of Mesoamerica, Totonacan and Mixe-Zoquean, are related in a larger genetic grouping dubbed Totozoquean. Proposed cog- nate sets comparing words reconstructed for Proto-Totonacan (PTn) and Proto-Mixe- Zoquean (PMZ) show regular sound correspondences attesting to the descent of these two languages from Proto-Totozoquean (PTz). Identification of sound correspondences facili- tates reconstruction of PTz’s phonological inventory and vocabulary. The PMZ words used in the comparison are from Wichmann (1995). The PTn words are reconstructed by the au- thors, who provide the Totonacan cognate sets on which these reconstructions are based, as well as discussion of the classification and phonological history of Totonacan languages. Evidence is cited indicating that Totozoquean is comparable to Indo-European in chrono- logical depth.|000|comparative method, methodology, South American languages, Mixe-Zoquean, Totonacan 3532|Schrader2017|Deutsche Forscher setzen Regenwürmer in Kanada aus. Sie wollen damit unter kontrollierten Bedingungen beobachten, was passiert, wenn diese als invasive Art in eine Region eindringen, in der sie keine Feinde haben.|000|earth worm, Regenwurm, invasive species, Canada, 3533|Emeny2017|Epigenetic regulation in anxiety is suggested, but evidence from large studies is needed. We conducted an epigenome-wide association study (EWAS) on anxiety in a population-based cohort and validated our finding in a clinical cohort as well as a murine model. In the KORA cohort, participants (n=1522, age 32–72 years) were administered the Generalized Anxiety Disorder (GAD-7) instrument, whole blood DNA methylation was measured (Illumina 450K BeadChip), and circulating levels of hs-CRP and IL-18 were assessed in the association between anxiety and methylation. DNA methylation was measured using the same instrument in a study of patients with anxiety disorders recruited at the Max Planck Institute of Psychiatry (MPIP, 131 non-medicated cases and 169 controls). To expand our mechanistic understanding, these findings were reverse translated in a mouse model of acute social defeat stress. In the KORA study, participants were classified according to mild, moderate, or severe levels of anxiety (29.4%/6.0%/1.5%, respectively). Severe anxiety was associated with 48.5% increased methylation at a single CpG site (cg12701571) located in the promoter of the gene encoding Asb1 (β-coefficient=0.56 standard error (SE)=0.10, p (Bonferroni)=0.005), a protein hypothetically involved in regulation of cytokine signaling. An interaction between IL-18 and severe anxiety with methylation of this CpG cite showed a tendency towards significance in the total population (p=0.083) and a significant interaction among women (p=0.014). Methylation of the same CpG was positively associated with Panic and Agoraphobia scale (PAS) scores (β=0.005, SE=0.002, p=0.021, n=131) among cases in the MPIP study. In a murine model of acute social defeat stress, Asb1 gene expression was significantly upregulated in a tissue-specific manner (p=0.006), which correlated with upregulation of the neuroimmunomodulating cytokine interleukin 1 beta. Our findings suggest epigenetic regulation of the stress-responsive Asb1 gene in anxiety-related phenotypes. Further studies are necessary to elucidate the causal direction of this association and the potential role of Asb1-mediated immune dysregulation in anxiety disorders.|000|anxiety, epigenetics, inheritance, biology, biological evolution 3534|Fristoe2017|The cognitive buffer hypothesis posits that environmental variability can be a major driver of the evolution of cognition because an enhanced ability to produce flexible behavioural responses facilitates coping with the unexpected. Although comparative evidence supports different aspects of this hypothesis, a direct connection between cognition and the ability to survive a variable and unpredictable environment has yet to be demonstrated. Here, we use complementary demographic and evolutionary analyses to show that among birds, the mechanistic premise of this hypothesis is well supported but the implied direction of causality is not. Specifically, we show that although population dynamics are more stable and less affected by environmental variation in birds with larger relative brain sizes, the evolution of larger brains often pre-dated and facilitated the colonization of variable habitats rather than the other way around. Our findings highlight the importance of investigating the timeline of evolutionary events when interpreting patterns of phylogenetic correlation.|000|bird evolution, brain size, habitat, colonization, 3535|Fristoe2017|Basic finding is that it is not the environment that shapes the brain size, but the brain size that allows to enter different environments. This relates to the hen-egg problem, or general problems of causes in just-so-stories: It is often difficult to predict what was causing what if one finds a correlation between two variables.|000|just-so stories, hen-egg problem, causal inference, correlational studies, brain size, bird evolution, 3536|Anikin2017|Recent research on human nonverbal vocalizations has led to considerable progress in our understanding of vocal communication of emotion. However, in contrast to studies of animal vocalizations, this research has focused mainly on the emotional inter- pretation of such signals. The repertoire of human nonverbal vocalizations as acoustic types, and the mapping between acoustic and emotional categories, thus remain under- explored. In a cross-linguistic naming task (Experiment 1), verbal categorization of 132 authentic (non-acted) human vocalizations by English-, Swedish- and Russian-speaking participants revealed the same major acoustic types: laugh, cry, scream, moan, and pos- sibly roar and sigh. The association between call type and perceived emotion was sys- tematic but non-redundant: listeners associated every call type with a limited, but in some cases relatively wide, range of emotions. The speed and consistency of naming the call type predicted the speed and consistency of inferring the caller’s emotion, suggesting that acoustic and emotional categorizations are closely related. However, participants preferred to name the call type before naming the emotion. Furthermore, nonverbal categorization of the same stimuli in a triad classification task (Experiment 2) was more compatible with classification by call type than by emotion, indicating the former’s greater perceptual salience. These results suggest that acoustic categorization may precede attribution of emotion, highlighting the need to distinguish between the overt form of nonverbal signals and their interpretation by the perceiver. Both within- and between-call acoustic variation can then be modeled explicitly, bringing research on human nonverbal vocalizations more in line with the work on animal communication.|000|concept list, word emotions, vocalization, cross-linguistic study, speech norms, dataset, 3537|Hill2017b|It is disappointing that so many among the authors of newly commissioned articles did not cite their data; this failing is particuarly perplexing in the case of those authors who benefited from the generosity of agencies that explicitly require archiving in public repositories. The move toward open data is still in its early days.|306|data-sharing, problem, data problems, historical linguistics, 3538|Hill2017b|This m- is a ghost morpheme. The English translation of Tibetan verbs beginning with m- are transitive as often as intransitive. Snellgrove draws attention to mthoṅ “see,” mchod “honor,” mdzad “do.” The Paradebeispiel for this would-be prefix is the pair mnam “smell, stink” versus snom, bsnams “sniff, take a smell of,” but Jacques demonstrates that the m- belongs to the verbal root; it is missing from the transitive verb due to a sound change *smn- > sn-. |309|derivational morphology, grammatical function, morphology, word derivation, problem, methodology 3539|Wilkins2017|**Ancient Weaponry** Hafting, which allowed projectile points to be attached to a staff, was an important technological advance that greatly increased the functionality of weapons of early humans. This technology was used by both Neandertals and early Homo sapiens and is readily seen after about 200,000 to 300,000 years ago, but whether it was used by a common ancestor or was separately acquired by each species is unclear. Supporting use by a common ancestor, Wilkins et al. (p. 942) report that stone points in a site in central South Africa were hafted to form spears around 500,000 years ago. The evidence includes damaged edges consistent with this use and marks at the base that are suggestive of hafting. **Abstract** Hafting stone points to spears was an important advance in weaponry for early humans. Multiple lines of evidence indicate that ~500,000-year-old stone points from the archaeological site of Kathu Pan 1 (KP1), South Africa, functioned as spear tips. KP1 points exhibit fracture types diagnostic of impact. Modification near the base of some points is consistent with hafting. Experimental and metric data indicate that the points could function well as spear tips. Shape analysis demonstrates that the smaller retouched points are as symmetrical as larger retouched points, which fits expectations for spear tips. The distribution of edge damage is similar to that in an experimental sample of spear tips and is inconsistent with expectations for cutting or scraping tools. Thus, early humans were manufacturing hafted multicomponent tools ~200,000 years earlier than previously thought.|000|spear technology, early humans, hominid evolution, language evolution, origin of language 3540|Blank1997|Expansion und Attraktion sind komplementäre Prozesse: Bei der Expansion handelt es sich um ein semasiologisches Verfahren: Hier erhält ein Wort neue Bedeutungen. Die Expansion kann somit auch bei der Erstbenennung von Neuerungen eine wichtige Rolle spielen. Bei der Attraktion hingegen handelt es sich primär um ein onomasiologisches Verfahren: Für ein und denselben Sachverhalt werden neue Bezeichnungen geschaffen. Sekundär führt natürlich auch die Attraktion zu Bedeutungswandel, nämlich bei den Wörtern, die zur Bezeichnung des «attraktiven» Sachverhaltes herangezogen wurden. Man kann das Funktionieren dieser beiden Prinzipien folgendermaßen darstellen: .. image:: static/img/Blank1997-21.png :width: 800px :name: Blank-1997 :comment:`[illustration figure from Blank, showing expansion and attraction in semantic shift]` |21|expansion, attraction, semantic change, methodology, 3541|Frey2017|Die Wettervorhersage ist eine der größten Erfolgsgeschichten der Wissen- schaft. Aber viele Wetter-Apps scheinen unzuverlässig. Was ist da los? Wo- möglich sind auch Nutzer verantwortlich, die exakte Langfristprognosen wollen, ohne zu überlegen, ob das überhaupt möglich ist.|000|weather forecast, predictability, environmental science 3542|Frey2017|Article explains common problems and misunderstandings regarding weather forecast. This is probably interesting in the context of linguistics where people often ask for some prediction as to how the future languages might sound, although it is clear that prediction cannot follow the model of prediction in the natural sciences.|000|weather forecast, predictability, environmental science 3543|Lashley2017|Mass mortality events are characterized by rapid die-offs of many individuals within a population at a specific location. These events produce a high concentration of remains within a given locale and the frequency and magnitude of these events may be increasing (Fey et al. 2015). Mass mortality events may be caused by physical (e.g., lightning strikes, fire), chemical (e.g., pollutants, hypoxia), or biological processes (e.g., disease, phenological mismatch with food source).|000|mass mortality, biology, ecology, experimental study 3544|Lashley2017|The fun aspect of the article is that they conducted a rather crude experiment in which they let tons of wild pigs rot in the same place in order to see how animals hunt for the remainders.|000|mass mortality, ecology, experimental study, 3545|Doenges2017|Article discusses the study by @Lashley2017 about wild pigs which were left to rot in one place in order to see how the enviroment reacts.|000|ecology, experimental study, mass mortality 3546|Viering2017|Wo der Mensch die großen Raubtiere aus-gerottet hat, steigen oft kleinere Jäger auf. Das kann zu massiven ökologischen Problemen führen, die man nur schwer in den Griff bekommt.|000|wolf, ecology, ecosystem, invasive species 3547|Viering2017|Article describes the importance of species like wolves and sharks in order to keep the environment in balance. If wolves are hunted down and killed, other species will take their place and those may often be even more harmful than people would think that wolves could be.|000|invasive species, wolves, ecosystem 3548|Bomfleur2017|The Osmundales (Royal Fern order) originated in the late Paleozoic and is the most ancient surviving lineage of leptosporangiate ferns. In contrast to its low diversity today (less than 20 species in six genera), it has the richest fossil record of any extant group of ferns. The structurally preserved trunks and rhizomes alone are referable to more than 100 fossil species that are classified in up to 20 genera, four subfamilies, and two families. This diverse fossil record constitutes an exceptional source of information on the evolutionary history of the group from the Permian to the present. However, inconsistent terminology, varying formats of description, and the general lack of a uniform taxonomic concept renders this wealth of information poorly accessible. To this end, we provide a comprehensive review of the diversity of structural features of osmundalean axes under a standardized, descriptive terminology. A novel morphological character matrix with 45 anatomical characters scored for 15 extant species and for 114 fossil operational units (species or specimens) is analysed using networks in order to establish systematic relationships among fossil and extant Osmundales rooted in axis anatomy. The results lead us to propose an evolutionary classification for fossil Osmundales and a revised, standardized taxonomy for all taxa down to the rank of (sub)genus. We introduce several nomenclatural novelties: (1) a new subfamily Itopsidemoideae (Guaireaceae) is established to contain Itopsidema, Donwelliacaulis, and Tiania; (2) the thamnopteroid genera Zalesskya, Iegosigopteris, and Petcheropteris are all considered synonymous with Thamnopteris; (3) 12 species of Millerocaulis and Ashicaulis are assigned to modern genera (tribe Osmundeae); (4) the hitherto enigmatic Aurealcaulis is identified as an extinct subgenus of Plenasium; and (5) the poorly known Osmundites tuhajkulensis is assigned to Millerocaulis. In addition, we consider Millerocaulis stipabonettiorum a possible member of Palaeosmunda and Millerocaulis estipularis as probably constituting the earliest representative of the (Todea-)Leptopteris lineage (subtribe Todeinae) of modern Osmundoideae.|000|morphological characters, maximum parsimony, phylogenetic network, 3549|Bomfleur2017|What makes this article so valuable for linguists are the very explicit descriptions of character coding for multi-state morphological characters. It seems that by adopting similar schemas in similar situations in linguistics, we could force even binarized characters to yield a linguistically valid analysis in a parsimony framework.|000|maximum parsimony, multi-state models, morphological characters 3550|Bomfleur2017|Figure 2: Diagram showing different tissue compositions of selected types of osmundalean stem cores as seen in cross-section through the stem core, together with the respective character scoring used in the matrix (for definition of characters and of character states see text). .. image:: static/img/Bomfleur2017-figure-2.jpg :width: 800px :name: figure2 :comment:`[Figure 2, see description above]`|3|ordered character states, morphological characters, character coding, maximum parsimony 3551|Rask1818|Men ikke blot i Endelserne, ogsaa i Ordene selv foregas mangfoldige Forandringer, det vil maaskje ikke være af Vejen her at mærke sig de hyppigste af disse Overgange fra Græsk og Latin til Islandsk. :translation:`Aber nicht nur in den Endungen, auch im Wort selbst gehen vielfältige Veränderungen vor sich. Es ist vielleicht nicht unangebracht, auf die häufigsten von diesen Übergängen vom Griechischen und Lateinischen zum Isländischen hinzuweisen.` :translation:`But not only in the endings, in the word itself we find multiple changes. It is probably useful to mention those transitions from Greek and Latin to Icelandic which can be encountered as the most frequent ones.`|169|sound change, Rasmus Rask, nice quote 3552|Grimm1822|Noch merkwürdiger als die einstimmung der liq. und spir. ist die abweichung der lippen- zungen- und kehllaute nicht allein von der gothischen, sondern auch der alth. einrichtung. Nämlich genau wie das alth. in allen drei graden von der goth. ordnung eine stufe abwärts gesunken ist, war bereits das goth. selbst eine stufe von der lateinischen (griech. indischen) herabgewichen. Das goth. verhält sich zum lat. gerade wie das alth. zum goth. |584|sound change, sound law, nice quote, methodology, 3553|Greenhill2017|Understanding how and why language subsystems differ in their evolutionary dynamics is a fundamental question for historical and comparative linguistics. One key dynamic is the rate of language change. While it is commonly thought that the rapid rate of change hampers the reconstruction of deep language relationships beyond 6,000–10,000 y, there are suggestions that grammatical structures might retain more signal over time than other subsystems, such as basic vocabulary. In this study, we use a Dirichlet process mixture model to infer the rates of change in lexical and grammatical data from 81 Austronesian languages. We show that, on average, most grammatical features actually change faster than items of basic vo- cabulary. The grammatical data show less schismogenesis, higher rates of homoplasy, and more bursts of contact-induced change than the basic vocabulary data. However, there is a core of gram- matical and lexical features that are highly stable. These findings suggest that different subsystems of language have differing dy- namics and that careful, nuanced models of language change will be needed to extract deeper signal from the noise of parallel evolution, areal readaptation, and contact.|000|grammatical feature, lexical change, rate of change, grammatical change, Austronesian, 3554|Creemers2017|A recent debate in the morphological literature concerns the status of derivational affixes. While some linguists (Marantz 1997, 2001; Marvin 2003) con- sider derivational affixes a type of functional morpheme that realizes a categorial head, others (Lowenstamm 2015; De Belder 2011) argue that derivational affixes are roots. Our proposal, which finds its empirical basis in a study of Dutch derivational affixes, takes a middle position. We argue that there are two types of derivational affixes: some that are roots (i.e. lexical morphemes) and others that are categorial heads (i.e. functional morphemes). Affixes that are roots show ‘flexible’ categorial behavior, are subject to ‘lexical’ phonological rules, and may trigger idiosyncratic meanings. Affixes that realize categorial heads, on the other hand, are categorially rigid, do not trigger ‘lexical’ phonological rules nor allow for idiosyncrasies in their interpretation.|000|affix, root, head, syntax, derivational morphology, word derivation 3555|Creemers2017|Paper is an interesting example for synchronic linguistics creating questions that stem from diachronic processes. The examples for affixes which are distinguished are far away from being "real", given that the authors name affix examples which would never be recognized by any speaker of the languages. The paper is a nice example for the "islands of order in an ocean of chaos" claim that should be advanced.|000|ocean of chaos, island of order, word derivation, derivational morphology, affix, Dutch 3556|Greenhill2015a|The introductory article is useful for quoting what phylogenetic trees can be used for on the long run. |000|#summary|introduction, phylogenetic reconstruction, language evolution, family tree 3559|Creemers2017|A recent debate in the morphological literature concerns the status of derivational affixes. While some linguists (Marantz 1997, 2001; Marvin 2003) con- sider derivational affixes a type of functional morpheme that realizes a categorial head, others (Lowenstamm 2015; De Belder 2011) argue that derivational affixes are roots. Our proposal, which finds its empirical basis in a study of Dutch derivational affixes, takes a middle position. We argue that there are two types of derivational affixes: some that are roots (i.e. lexical morphemes) and others that are categorial heads (i.e. functional morphemes). Affixes that are roots show ‘flexible’ categorial behavior, are subject to ‘lexical’ phonological rules, and may trigger idiosyncratic meanings. Affixes that realize categorial heads, on the other hand, are categorially rigid, do not trigger ‘lexical’ phonological rules nor allow for idiosyncrasies in their interpretation.|000|introduction, phylogenetic reconstruction, language evolution, family tree 3557|Greenhill2015a|Language phylogenies are a potentially powerful way to answer questions about how languages and cultures evolve. Recently, phylogenetic methods have been applied to a range of questions about the evolution of human languages and cultures. This article reviews the historical background of these approaches and provides a detailed methodological overview. Three different applications of phylogenetic methods are discussed: how language phylogenies can be used to test population dispersal hypotheses, to investigate processes in language evolution, and to infer patterns in cultural evolution. The article discusses briefly some controversies over the use of these methods before closing with some future prospects.|000|language evolution, phylogenetic reconstruction, family tree, examples 3558|Browne2017|This short note proposes the study of ancient games as a new frontier for game AI research. This aspect of games research has been largely neglected so far from an AI perspective, but could benefit significantly from the application of modern computational techniques.|000|ancient games, phylogenetic reconstruction, artificial intelligence 3559|Bromham2017|One of the major benefits of interdisciplinary research is the chance to swap tools between fields, to save having to reinvent the wheel. The fields of language evolution and evolutionary biology have been swapping tools for centuries to the enrichment of both. Here I will discuss three categories of tool swapping: (1) conceptual tools, where analogies are drawn between hypotheses, patterns or pro- cesses, so that one field can take advantage of the path cut through the intellectual jungle by the other; (2) theoretical tools, where the machinery developed to process the data in one field is adapted to be applied to the data of the other; and (3) analytical tools, where common problems encountered in both fields can be solved using useful tricks developed by one or the other. I will argue that conceptual tools borrowed from linguistics contributed to the Darwinian revolution in biology; that theoretical tools of evolutionary change can in some cases be applied to both genetic and linguistic data without having to assume the underlying evolutionary processes are exactly the same; and that there are practical problems that have long been recognised in historical linguistics that may be solved by borrowing some useful analytical tools from evolutionary biology.|000|analogy, biological parallels, Charles Lyell, history of science, Charles Darwin, 3560|Brown2011a|:comment:`Lists multiple correspondence patterns for the Totonacan languages along with proto-form, which is a very nice example for a correspondence table as we find them so often in historical linguistics.`|336f|sound correspondences, correspondence patterns, examples, Totonacan 3561|Haspelmath2017b|Linguists are sometimes confronted with choices concerning language names. For example, one and the same language may be referred to as Persian or Farsi. This short paper discusses some principles that one might use for making decisions when there are variant forms in use, or when one feels that none of the existing names is appropriate. The principles discussed here arose from work on Glottolog, an English-language database of the world’s languages (Glottolog.org), where each language has a single primary English name (though variant forms are of course included), and where the goal is to choose the best variant form as the primary name of the language. Whenever the question arises which variant name form to choose, the Glottolog editors are guided by these principles, so they are formulated in a prescriptive way, but with explanation and justification for each principle. It seems that the general issue is also quite important for language docu- menters, because the names of non-major languages are often not fully established yet, and naming decisions have to be made.|000|language names, Glottolog, guidelines, 3562|Haspelmath2017b|Thus, a language name in English is often different from the autoglottonym (or en- donym), and sometimes very different, e.g. German vs. Deutsch, Finnish vs. Suomen kieli, Japanese vs. nihongo, Navajo vs. Diné bizaad. This is not a problem, and there is no general expectation that language names should be similar across languages.|82|language names, translation, 3563|Finland2002|This paper proposes 133 characters for addition to the BMP of the UCS. This document contains a discussion of the Uralic Phonetic Alphabet proposal, and annexes containing proposed character names, glyphs, etc. The proposal summary form is added at the end of the document; this paper replaces one with the same number sent out 2002-03-10 apart from updates to the summary form.|000|Uralic Phonetic Alphabet, Uralic languages, phonetic transcription 3564|Evans2016|Linguistic typology has much to be proud of and has arguably done more than any other subfield of linguistics to chart and systematise the full dimensions of variability across the world’s languages. Its quest to balance the study of variability with the need for comparability has done much to hold together those parts of the field in danger of unintegrable particularism (always a danger for descriptivists from the structuralist tradition, treating each language solely in terms of its own “genius”) with the grand questions of what general principles underlie human language in all its spectacular diversity. One only has to look at other fields studying human culture where the prevailing ethos has rejected the possibility of comparability – anthropology and ethnomusicology are two exam- ples – to imagine how different the field of linguistics would now be without typology to hold together these opposing forces over the last half a century or more.|000|linguistic typology, historical linguistics, 3565|Evans2016|Important papers that argues to start some kind of a re-thinking in linguistic typology, away from the hunt for universals, towards a more integrated explanation of linguistic phenomena in a diachronic framework. At least parts of the paper can be read like this.|000|linguistic typology, historical linguistics, universals, diachrony and synchrony 3566|Evans2016|I would argue that the explanation is simple: that the really interesting questions about variability arise once one adopts a framework closer to SYSTEMATICS in biology (Michener et al. 1970) which is equally interested in the description, taxonomy, evolutionary history, and adaptations of organisms. Systematics thus integrates the study of the “what” and the “where” – long staples of typology – with the “how” (diachronic typology) and, less familiarly for typologists, the “why”, in the form of explanations that should ultimately account not only for how each point in the design space has arisen, but also for what is common, what is rare, what typological traits go together or not, and – more controversially – what types of extralinguistic setting they are found in.|506|extra-linguistic factors, universals, diachronic explanations 3567|Evans2016|More extensive knowledge of what are common and what are rare changes will be of great help in weighting the likelihoods of alternative phylogenies – historical linguists already do this implicitly and intuitively as part of their unquantified art and craft – but explicit measures need to be developed and tested.|507|language change, tendencies, measure 3568|Evans2016|It is clear that until feature sets and grouping methods are developed which perform comparably to the comparative method on known language families, historical linguists will not accept the validity of applying them to previously undemonstrated groupings. (Compare this to the trial phase that thermoluminescence dating had to go through, showing it could deliver dates of comparable accuracy to radiocarbon dating, before archaeologists would accept its use for more remote time depths than radiocarbon dating could be used for).|508|comparative method, phylogenetic reconstruction, testing, 3569|Evans2016|Couldn’t at least some of crosslinguis- tic variability reflect “external” factors? Opening up typologists’ research agenda [pb] to this possibility, in my view, is a hugely promising enterprise, re-engaging the field with the study of human cultural, demographic, epidemiological, and environmental diversity in a way that re-engages with fundamental questions about human history and the diversity of cultural responses to the myriad of circumstances in which speaking humans have evolved their linguistic instru- ments.|509f|extra-linguistic factors, linguistic variation, language change 3570|Dediu2007|The correlations between interpopulation genetic and linguistic diversities are mostly noncausal (spurious), being due to historical processes and geographical factors that shape them in similar ways. Studies of such correlations usually consider allele frequencies and linguistic groupings (dialects, languages, linguistic families or phyla), sometimes controlling for geographic, topographic, or ecological factors. Here, we consider the relation between allele frequencies and linguistic typological features. Specifically, we focus on the derived haplogroups of the brain growth and development-related genes ASPM and Microcephalin, which show signs of natural selection and a marked geographic structure, and on linguistic tone, the use of voice pitch to convey lexical or grammatical distinctions. We hypothesize that there is a relationship between the population frequency of these two alleles and the presence of linguistic tone and test this hypothesis relative to a large database (983 alleles and 26 linguistic features in 49 populations), showing that it is not due to the usual explanatory factors represented by geography and history. The relationship between genetic and linguistic diversity in this case may be causal: certain alleles can bias language acquisition or processing and thereby influence the trajectory of language change through iterated cultural transmission. |000|tone language, linguistic typology, population size, natural selection 3571|Evans2016|@Dediu<2007> & Ladd’s (2007) earlier study on the correlations between the population frequency of the genes ASPM and Microcephalin, and the incidence of tone as shown in the WALS sample exemplifies external-selectionalist logic applied to the interaction between genes and the phonological system [...].|511|natural selection, tone language, language change, extra-linguistic factors 3572|Evans2016|I will call this the macro-from-micro hypothesis: the hypothesis that any pair of values for some typological feature, known to differ across languages, will also be found as variants inside some individual speech communities.|512|macro-from-micro hypothesis, language change, synchronic variation, diachrony and synchrony, diachronic explanations 3573|Evans2016|What this formulation omits – and what makes unresolved debates like this possible – is that the transition from non-tone language to tone language is not a quantum jump. More often, it takes in intermediate steps such as the following: (i) no tones, contrastive codal consonants: ta vs. taʔ vs tah; (ii) incidental pitch patterns conditioned by following consonants: ta vs. táʔ vs. tàh; (iii) incidental coda consonant with main contrast coding by pitch pattern: ta vs. tá ʔ vs. tà h ; (iv) final coda consonants disappear, leaving a tone language: ta vs. tá vs. tà.|514|tone language, tonogenesis, transition, language change, tone change 3574|Evans2016|This makes coding decisions into difficult judg- ment calls prone to interanalyst disagreement. At what point did, or will, each West Germanic language switch the position of its verb? When does a language go from having a case system to no longer having one, or from four cases to three, or three genders to two? Fine-grained studies – of the type whose value has long been pioneered by dialectologists with typological interests (Kortmann (ed.) 2004) – are needed if we are to have a richer set of categorisations more compatible with the variation that can be observed at a given moment in time.|515|data coding, grammatical feature, language typology 3575|Wiesinger1964|This article presents a summary on the phonetic transcription system Teuthonista, which was used mainly to describe German dialects.|000|phonetic transcription, Teuthonista, introduction 3576|Moehn1964|Paper gives an account of the usage of the phonetic transcription system Teuthonista.|000|Teuthonista, phonetic transcription, introduction, overview 3577|Moehn1964|Im Jahre 1924 erschien das erste Heft der Zeitschrift Teuthonista als ein bewußter Neubeginn linguistischer Publikationstätigkeit. In dem programmatischen „Zum Geleit" führte Hermann Teuchert gegenüber der „Zeitschrift für deutsche Mundarten" als Vorgängerin aus:1 „Sowohl räumlich wie sachlich umfaßt der Teuthonista ein Gebiet mit wesentlich weiter gezogenen Grenzen. Der gesamte Bereich der deutschen Zunge soll in ihm zu Worte kommen, außer dem Deutschen Reich alle Länder, die sprachlich zu ihm in Beziehung stehn, also Elsaß-Lothringen, die Schweiz, Deutsch-Österreich, die deutschen Bezirke in den Reichen des Ostens, ferner aber auch soll die zukunftsreiche Arbeit an der Volks- sprache in dem niederländischen und dem gesamten nordischen Sprach- gebiet ihren Niederschlag in dem neuen Organ finden".|21|Teuthonista, Germanic, phonetic transcription 3578|Anttila1972|:comment:`Longer discussion of internal reconstruction in the classical handbook.`|264-273|internal reconstruction, comparative method, linguistic reconstruction, methodology 3579|Anttila1972|:comment:`Longer introduction to the comparative method.`|229-263|comparative method, methodology, introduction 3580|Gyarmathy2017|In this paper, I present a modification of the scale-based framework of Sauerland & Stateva (2007, 2011) for modelling granularity—more precisely, of the so called granularity functions. Briefly, a granularity function maps a degree of a scale to an interval containing it such that the points within that interval are mutually indistinguishable. The proposed modification is a generalisation of the original framework suited for application in a wider range of domains, and it also fares better in overcoming some formal problems in modelling granularity-sensitive phenomena and vagueness.|000|semantics, granularity, modelling 3581|Hill2014a|In the first half of the 20th century following the Neogrammarian tradition, most researchers believed that sound change was always conditioned by phonetic phenomena and never by grammar. Beginning in the 1960s, proponents of the generative school put forward cases of grammatically conditioned sound change. From then until now, new cases have continued to come to light. A close look at the development of intervocalic -s- in Greek, reveals the divergent approach of the two schools of thought. All examples of grammatical conditioning are amenable to explanation as some combination of regular sound change, analogy, or borrowing. Neither the Neogrammarian belief in exceptionless phonetically conditioned sound change nor the generative inspired belief in grammatical conditioning is a falsifiable hypothesis. Because of its assumptions are more parsimonious and its descriptive power more subtle, the Neogrammarian position is the more appealing of these two equally unprovable doctrines.|000|sound change, conditioned sound change, Neogrammarian sound change, methodology 3582|Hill2014a|Potentially interesting paper in which it is proposed that neither unconditioned (phonologically conditioned) nor grammatically conditioned sound change can be really proven. This falls in line with the idea of multi-tiers which basically ignore conditioning, or more properly, seek to find a potential condition rather than superimposing ones grounded in theoretical linguistics.|000|conditioned sound change, sound change, regular sound change, Neogrammarian sound change, methodology 3583|Lee2017|Can the language we speak determine how we represent the world around us? To those familiar with the theory of linguistic relativity, this may seem like an age-old question about which everyone has their own answer. Although the evidence supporting linguistic relativity remains controversial, the long reach of language into our perception and behavior is nevertheless an intriguing possibility that deserves further investigation. Here I take a closer look at a case of linguistic relativity that had a particularly strong impact on cross-cultural research: the pronoun-drop effect. The theory of pronoun-drop effect posits that languages that allow their speakers to drop subject pronouns in verbal communication would lead their speakers to create collectivistic culture. It was argued that the absence of pronouns necessitates the speakers to embed their self-identities in the context of social interaction, so the linguistic practice of omitting pronouns would reduce the sense of individuality in the minds of speakers. After conducting a series of Bayesian multilevel analyses on the original dataset, however, the current study concludes that the pronoun-drop effect is unlikely to be a robust, universal phenomenon. The analyses revealed that the majority of statistical signal supporting the phenomenon comes from the Indo-European language family, and other families provided little or inconsistent evidence. It was also observed that the Indo-European languages alone made up 61 per cent of the original data- set, and dropping them from analysis completely nullified the pronoun-drop effect. These observa- tions suggest that the pronoun-drop effect is a consequence of failing to account for (i) varying effects among language families and (ii) overrepresentation of the Indo-European languages. With these re- sults, this article suggests that the theory of pronoun-drop effect should be thoroughly revised. Additionally, the article provides several suggestions for many similar cross-cultural studies that suffer from the same problems as the pronoun-drop effect study.|000|pronoun drop, pro-drop, correlational studies, extra-linguistic factors, economy 3584|Lee2017|If it is true that this study reveals inconsistencies in the pronoun-drop hypothesis that states that societies behave differently depending on whether they drop the pronoun or not, this is a very nice illustration of how statistics should be done, but also how they should not be done. This paper deserves more proper reading.|000|pronoun drop, pro-drop, correlational studies, extra-linguistic factors 3585|Strasser2014|Scientific datasets have immeasurable value, but they lose their value over time without proper documentation, long-term storage, and easy discovery and access. Across disciplines as diverse as astronomy, demography, archeology, and ecology, large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifying factor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered with Microsoft Research Connections and the Gordon and Betty Moore Foundation to create the DataUp data management tool for Microsoft Excel. Many researchers creating these small, heterogeneous datasets use Excel at some point in their data collection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curve for researchers. The DataUp project began in August 2011. We first formally assessed the needs of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of data centers or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirable components and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayed to the software developers, and DataUp was successfully launched in October 2012.|000|tabular data, data modeling, Excel, spreadsheet, tools, interfaces, metadata 3586|Strasser2014|Authors propose a way to regularize Excel spreadsheets in science. They also conduct a study where they show that many scholars use Excel or spreadsheet software on a daily basis and for almost all of their data. Their conclusion is wrong though: putting a complicated tool on top of Excel is not the right way. Instead, one should tell people to make their data comparable by converting it to textfiles with metadata that CAN be easily used by other people.|000|Excel, spreadsheet, data modeling, data science, data sharing 3587|Konnikova2017|Nice summary on Sherlock Holmes and the influential character on science, reaching from the "magical number seven" by @Miller1956 up to different ways of thinking as summarized in @Kahnemann2011.|000|Sherlock Holmes, abduction, philosophy of science, 3588|Konnikova2017|It was while working as Bell’s ward assistant that Conan Doyle witnessed the doctor correctly identifying the former profession of a retired army officer, as well as where he had served — Barbados.|333|abduction, deduction, modes of reasoning, Sherlock Holmes 3589|Konnikova2017|It is perhaps as psychologist that Holmes’s contribution to popular science is most evident. Take George Miller’s @1956 paper ‘The magical number seven, plus or minus two’, which posits that humans can cognitively process only around seven pieces of information at any time.|333|processing power, human brain 3590|Konnikova2017|Emotion is responsible for many of the biases in decision-making explored by psychologists such as Daniel @Kahneman<2011>, in his Thinking, Fast and Slow (Farrar, Straus and Giroux, 2011).|333|decision making, emotion, 3591|Miller1956|My problem is that I have been persecuted by an integer. For seven years this number has followed me around, has intruded in my most private data, and has assaulted me from the pages of our most public journals. This number assumes a variety of disguises, being sometimes a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecognizable. The persistence with which this number plagues me is far more than a random accident. There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution.|000|human brain, processing capacity, 3592|Kahnemann2011|Book treats the influence of emotions on human thinking, claiming that we basically have two systems which we use to reason: one system that reacts quickly and emotional, and a slow system that requires a lot of effort to be used. |000|emotion, thinking, human brain, reasoning, 3593|Gagne2017|Understanding how properties are extended to combined concepts is critical to theories of concepts. In human judgments, properties true of a noun (ducks have webbed feet) become less true when that noun is modified (baby ducks have webbed feet), while properties false of a noun (candles have teeth) become less false when that noun is modified (purple candles have teeth). These modifi- cation and inverse modification effects have been shown to be extremely robust. Gagné and Spalding (2011, 2014b; Spalding and Gagné 2015) have argued that these effects are driven by expectation of contrast. The current experiment shows that, as expected, the modification and inverse modification effects are unaffected by the normative force with which a property is predicated of the head noun, supporting the expected contrast explanation. The results are discussed with respect to an Aristotelian-Thomistic approach to concepts (Spalding and Gagné 2013).|000|concept combination, compositionality, semantics, cognition 3594|Gagne2017|The human conceptual and language systems are compositional and highly pro- ductive. One important mechanism underlying this compositional and productive character is nominal compounding. Nominal compounding involves the combina- tion of two or more free morphemes to form a noun. This process can lead to the construction of new concepts in the conceptual system and new words in the mental lexicon. Consequently, the process provides insight into both word formation and conceptual combination. Combined concepts are typically expressed as either [pb] modifier-noun phrases (e.g., mountain magazine) or compound words (e.g., snow- ball) and these linguistic expressions are among the most basic examples of lin- guistic productivity in that they consist of the combination of two morphological constituents. |223f|compositionality, compounding, concept combination, cognition 3595|Gewirtz2017|Popularity may not be a single vector answer, but students and professionals still want to know if they're guiding their careers and companies in the right direction.|000|programming languages, Python, ranking, 3596|Gewirtz2017|Article provides a recent ranking of programming languages, with Python ranking on position 3.|000|Python, programming languages, ranking, 3597|Goldbert2017|Neural networks are a family of powerful machine learning models. is book focuses on the application of neural network models to natural language data. e first half of the book (Parts I and II) covers the basics of supervised machine learning and feed-forward neural networks, the basics of working with machine learning over language data, and the use of vector-based rather than symbolic representations for words. It also covers the computation-graph abstraction, which allows to easily define and train arbitrary neural networks, and is the basis behind the design of contemporary neural network software libraries. e second part of the book (Parts III and IV) introduces more specialized neural net- work architectures, including 1D convolutional neural networks, recurrent neural networks, conditioned-generation models, and attention-based models. ese architectures and techniques are the driving force behind state-of-the-art algorithms for machine translation, syntactic parsing, and many other applications. Finally, we also discuss tree-shaped networks, structured prediction, and the prospects of multi-task learning.|000|neural network, introduction, tutorial, overview, handbook 3598|Hampton2017|By highlighting relations between experimental and theoretical work, this volume explores new ways of addressing the problem of concept composition, which is one of the central challenges in the study of language and cognition. An introductory chapter lays out the background to the problem. The subsequent chapters by leading scholars and younger researchers in psychology, linguistics and philosophy, aim to explain how meanings of different complex expressions are derived from simple lexical concepts, and to analyze how these meanings connect to concept representations. This work demonstrates an important advance in the interdisciplinary study of concept composition, where points of convergence between cognitive psychology, linguistics and philosophy emerge and lead to new findings and theoretical insights.|000|concept combination, compositionality, handbook, semantics, cognition 3599|Hampton2017|Given that we can study compoundhood in CLICS, this book with the articles may serve as an important starting point to derive interesting theories which could then be tested.|000|compounding, compoundhood, compositionality, CLICS, semantics, concept combination 3600|Hampton2017a|In this chapter I aim to explain how psychology understands concepts, and why there is a need for semantic theory to take on the challenge of psycho- logical data. All of the contributors to this volume are (presumably) in the business of trying to understand and explain how language has meaning, and the primary source of evidence for this has to be our intuitions of what things mean. Further- more, if my semantic intuitions (as a theorist) are out of kilter with those of the common language user, then it is my theory which should be called into question and not the lay intuition. This chapter describes a range of results from my research program over the last 30 years, some old and some new, with the aim of giving a general account of using Prototype Theory as a way to explain semantic intuitions.|000|compositionality, concepts, psychology, concept combination, semantics, cognition, 3601|Hampton2017a|In The Compositionality Papers, @Fodor<2002> and Lepore (2002) return frequently to a “knock-down” argument against the suggestion that concepts might be prototypes. Concepts, they argue, must be compositional. It must be possible, if one accepts the representational theory of mind, to explain how the meaning of a complex phrase is based solely on the meaning of the elements from which it is constructed, plus the syntactic structure into which they are placed. To account for our ability to understand the meaning of sentences such as (1) and (2) and countless other similar sentences the semantic system needs a set of fixed symbols to represent the con- ceptual atoms in the sentences (John, Mary, Bill, loves, hates) which can then be inserted into suitable syntactically structured sentence frames to yield the appro- priate meaning for the sentence as a whole. [pb] (1) John loves Mary. (2) John loves Mary, but Mary loves Bill and Mary hates John. They claim that without this type of compositionality it is not possible to provide an account of how thoughts (and indeed utterances) can express ideas. |95f|compositionality, cognition 3602|Hampton2017a|Given the claim that concepts must be compositional, the argument continues that prototypes are in fact anything but compositional. For example, the proto- typical pet fish is not simply the prototype pet conjoined with the prototype fish. Indeed something like a goldfish or guppy, while being a good match for the pet fish prototype, has little in common with the cats and dogs that are typical pets, or the cod and trout that are typical fish (Osherson and Smith 1981).|96|prototype, compositionality 3603|Hampton2017a|As an empirical researcher into word meanings, I find myself strongly drawn to the conclusion that lexical meanings do not compose according to the rules of set logic applied to extensions. That is to say that the meaning of a complex phrase will not always be determined simply by the meanings of its components and their mode of combination. I will argue that the construction of complex concepts proceeds (most naturally) through the interactive combination of the intensional meanings of the individual concepts. Wherever two concepts are combined whose intensional contents overlap or interact semantically, then extensional compositionality of meaning will tend to fail. Moreover, if we consider thought rather than language, our capacities to combine concepts greatly exceed the simple combinatorial rules provided by extensional logic, as will become apparent in the last part of this chapter.|96|extensional logic, compositionality, concept combination, compoundhood, 3604|Hampton2017a|Paper may be an interesting starting point to look into compoundhood in CLICS, as it provides some stronger theories on what constitutes a compound and whether meaning is simply compositional in the sense of extensional logic, or concept combination follows more complex pathways.|000|compositionality, introduction, extensional logic, compoundhood 3605|Bondebjerg2015|Since the 1980s the study of the brain has developed from being a primarily biological field to a significant interdisciplinary area with a strong influence on the humanities and social sciences. In this article I describe fundamental elements in what I call the embodied mind paradigm, and new understanding of the relation between mind, body and emotions. The new paradigm challenges certain notions of constructivism in the humanities and social sciences, but also opens up fruitful venues for new interdisciplinary research. Here I outline such possibilities in the particular areas of linguistics, philosophy, sociology and film studies. This article is published as part of an ongoing collection dedicated to interdisciplinary research.|000|embodiment, language and mind, theory of mind 3606|Bondebjerg2015|The self, who we are and what we feel and think as a “me”, is, according to Damasio (2010: 22–23), a very complicated process in which the self is constituted by three different kinds of self: the “protoself” with the primordial feelings, mainly in the cerebral cortex and brain stem; the “core self” or what he calls a “material me” where interactions between the organism and objects take place, and “the autobiographical self”, which is our aggregated knowledge [pb] and memory of both the past and projections of the future. Finally we have what Damasio calls “a knower”, where the core and autobiographical self give our minds a “subjectivity”.|2f|definition, theory, embodied mind, 3607|Renwick2015|The relationship between biological and social science is a long-standing area of interest for researchers on both sides of the divide, as well as in the humanities, where historians, among others, have been fascinated by its wider social, political and cultural implications. Yet interdisciplinary work in this area has always been problematic, not least because researchers are understandably concerned about interdisciplinarity being a cover for importing ideas and methods wholesale from other fields. This article explores the lessons, both positive and negative, that can be drawn from an ongoing project focused on building links between biology, social science and history. The article argues that dialogue between different disciplines is a difficult process to get going but ultimately rewarding. However, the article also argues that interdisciplinary practice is a much more elusive goal. The key to developing such practices lies in identifying new spaces for cooperative work rather than areas that are already occupied by researchers. This article is published as part of a thematic collection on the concept of interdisciplinarity.|000|interdisciplinary research, biology, social science, biological parallels 3608|Renwick2015|Sometimes we do need a spur to consider questions beyond the ones we focus on as lone scholars—the preferred research model in the humanities. But that spur is only valuable on two conditions. The first is that it leaves space for researchers to pursue enough of their own discipline-specific work; otherwise the obvious question for many will be whether the enterprise is worthwhile. The second is that, while preserving that space, there needs to be a meaningful understanding of interdisciplinary work; one rooted in the idea that participants should be able to achieve different goals together.|3|interdisciplinary research, nice quote, 3609|Barsalou2017|If a theory of concept composition aspires to psychological plausibility, it may first need to address several preliminary issues associated with naturally occurring human concepts: content variability, multiple representational forms, and pragmatic constraints. Not only do these issues constitute a significant challenge for explaining individual concepts, they pose an even more formidable challenge for explaining concept compositions. How do concepts combine as their content changes, as different representational forms become active, and as pragmatic con- straints shape processing? Arguably, concepts are most ubiquitous and important in compositions, relative to when they occur in isolation. Furthermore, entering into compositions may play central roles in producing the changes in content, form, and pragmatic relevance observed for individual concepts. Developing a theory of concept composition that embraces and illuminates these issues would not only constitute a significant contribution to the study of concepts, it would provide insight into the nature of human cognition.|000|concept combination, compositionality, cognition, 3610|Barsalou2017|From the perspective of human cognition, principles of conceptual processing have potential implications for theories of concept composition. First, the content active for a concept on a given occasion appears to vary considerably. Rather than having a stable core, a concept exhibits dynamically varying content across situations, perhaps resulting from Bayesian sampling. Second, multiple representational forms may constitute the dynamically constructed representation of a concept on a given occasion. Although these representations could include abstractions, they may also include exemplars, and are at least likely to include exemplar-level information. Additionally, multimodal simulations and distributed linguistic representations may become active and play central roles in concept composition, along with amodal symbols. On a given occasion, the specific mix of representational forms may vary widely. Third, the dynamic representation of a concept on a given occasion is likely to reflect a variety of pragmatic constraints. On the one hand, these representations could contain conceptual content established through extensional feedback on earlier occasions. On the other, they could contain information associated with the current background situation.|23|compositionality, concept combination, 3611|Zeng2017|Language classification is a matter of scale and scaling. Most basically, it assigns languages into mutually exclusive categories. The scale underpins the categorization but does not come from nowhere. It displays an historical configuration and is invariably centered on the authoritative voice of science (Gal 2016:95). Both linguistics and anthropology are essentially Western disciplines that scale themselves up from provincial to universal sciences (Chakrabarty 2008). For instance, comparative linguistics depends on the expandability of the colonial project, which is best exemplified by the British colonialization of India and the discovery of Indo-European linguistics (Trautmann 1997). To understand the dialectical relationship between language classification and political agenda, it is useful to follow the recent conceptualization of scaling as pragmatics (Carr and Lempert 2016). Language classification is necessarily political in terms of both pragmatic presupposition (colonial exploration) and entailment (governing colonized territory).|000|language classification, Edward Sapir, Li Fang-Kuei, history of science, Na-Dene, Sino-Tibetan 3612|Zeng2017|The reception of Li’s classification of Chinese languages is also divided by different scales. Although still well respected in China, there has been quite a harsh attack in the United States. Using the term “’The ‘Indo-Chinese’ Pseudo- Stock,” Matisoff was especially dissatisfied that “Li lumps Chinese and Tibeto- Burman together with Kam-Tai and Miao-Yao, largely based on monosyllabicity and tone” (1973:471). What Matisoff attempted to back was the massive lexical comparison done by his colleague Paul Benedict, who aligns “Tai not with Chinese, but rather with Austronesian (=Malayo-Polynesian)” (1973:472). Herein, Tai is inter-scaly rescaled via replacing tone with lexicon as the measuring unit.|8/15|Li Fang-Kuei, Sino-Tibetan, language classification, reaction, 3613|Barsalou2017|Here I assume that a concept is a dynamical distributed network in the brain coupled with a category in the environment or experience, with this network guiding situated interactions with the category’s instances (for further detail, see Barsalou 2003b, 2009, 2012, 2016a, 2016b).|10|concept, definition, cognition 3614|Barsalou2017|Once the conceptual system is in place, it supports virtually all other forms of cognitive activity, both online in the current situation and offline when representing the world in language, memory, and thought (e.g., Barsalou 2012, 2016a, 2016b).|10|concept, online, offline, 3615|Barsalou2017|From the grounded perspective, concepts are typically situated, that is, they become active to process some aspect of the current physical situation (or a physical situation in the past or future). Although concepts may sometimes become active independently of a situation, they typically become active to support effective action in a specific situation. As a result, concepts become coupled with their physical referents. Although grounded theories of concepts have a long way to go in developing satisfactory accounts of this coupling, they naturally anticipate it, and provide many mechanisms for understanding and explaining it. Thus, grounded theories offer an approach for unifying formalist and cognitivist accounts of con- cepts, and further assume that neither approach alone is sufficient.|11|grounded theory of concepts 3616|Barsalou2017|Researchers often assume that a concept has a core, namely, important information about the respective category that is consistently activated rapidly and automatically, independently of context.|12|conceptual core theory, introduction 3617|Barsalou2017|When a given concept is combined with other concepts, its content is likely to vary considerably across concept compositions. Even when the same two con- cepts are combined on different occasions, their individual content is likely to vary, especially when combined in different background situations.|12|concept combination, conceptual core theory 3618|Barsalou2017|A related phenomenon is that the varying content of a concept often appears to contain information relevant in the current situation. In some of these cases, the information appears to be stored in the concept, with the current situation selecting it as relevant (e.g., Barsalou 1982; Conrad 1978; Greenspan 1986; Whitney et al. 1985).[...] In other cases, varying content appears to originate in background situations (e.g., Barsalou and Wiemer-Hastings, 2005; Papies 2013; Wu and Barsalou 2009).|12|situation-specific content, concepts, context 3619|Barsalou2017|As we have just seen, the properties active for a concept vary with context. As we also saw earlier, concepts may not be associated with stable cores that are activated automatically. How, then, should we think about the varying content of a concept? One possibility is that conceptual content is sampled in a Bayesian manner; in other words, sampling reflects both frequency of use and contextual relevance (Barsalou 2011). In a given situation, information that has been processed fre- quently across situations for the concept has a higher probability of becoming active than information processed infrequently (e.g., round vs. floats for basketball). Importantly, however, high probability information need not be core information, becoming active on all occasions when the concept is processed. Rather than being active obligatorily across situations, this information simply has a higher proba- bility of being active than less frequent information. Depending on the context, high probability information may or may not be active (Lebois et al. 2015).|13|Bayesian sampling, concepts, 3620|Barsalou2017|If the conceptual content represented for a concept varies dynamically across situations, how can a concept be represented in a general manner (cf. Pelletier, 2017)? The grounded approach to concepts offers several potential solutions to this issue.|13|grounded theory of concepts, concept representation 3621|Barsalou2017|First, the distributed network in the brain that aggregates multimodal information for a concept across its exemplars offers a general representation of the respective category. Because the network aggregates a tremendous amount of information across experiences with the category, it doesn’t simply represent a single exemplar, but represents all exemplars together (see the distinction between simulations and simulators in Barsalou 1999, 2009, also 2016b).|13|exemplar theory, grounded theory of concepts, 3622|Barsalou2017|Second, the most accessible simulation that can be easily constructed from the network offers a default representation of the concept (cf. McNally and Boleda, 2017).|14|grounded theory of concepts, concept accessibility 3623|Barsalou2017|Third, people may learn to explicitly construct specific simulations of a concept that offer a generic representation of it (e.g., one that omits situational detail). To the extent that people understand meta-cognitively that a concept can be general, they may be able to construct a simulation that captures this generic character.|14|grounded theory of concepts, generalization 3624|Barsalou2017|If the conceptual content represented for a concept varies dynamically, how can individuals ever come to effectively share a common conceptual representation in a specific situation (cf. Pelletier, 2017)? One likely solution to this problem follows from the facts that different individuals have similar bodies, brains, and cognitive systems; they live in similar physical envi- ronments; they operate in highly-coordinated social contexts. As a result, different individuals acquire similar distributed networks for a given concept over the course of development. Within a particular social group or culture, different individuals’ networks are likely to be highly similar, given similar coordinated experiences with many shared exemplars. Even across different cultures, these networks are likely to be highly similar, given that all humans have similar bodies, brains, and cognitive systems, operating in similar physical and social environments.|15|conceptualization, motivation, conceptual representation, grounded theory of concepts 3625|Barsalou2017|Together these three properties suggest that when a given concept is combined with others across situations, it is likely to exhibit no core stability and considerable situation-specific variability. Although representations of the concept across situa- tions probably exhibit statistical regularities associated with Bayesian priors and situational relevance, they nevertheless take diverse forms, similar to how a given phoneme constantly adapts to its phonemic and articulatory contexts (e.g., Repp 1982).|16|concept combination, summary 3626|Buchfink2014|Reduced alphabet. To increase speed without losing sensitivity, one approach is to use a reduced alphabet when comparing seeds. Using this, RAPSearch2 (ref. 7) is 40–100 times faster than BLASTX with minimal loss of sensitivity. For DIAMOND, we investigated the use of published reductions to four, eight and ten letters 12 . By analyzing a large number of BLASTX alignments, we developed a new reduction to an alphabet of size 11 that achieves slightly better sensitivity (brackets indicate one letter): [KREDQN] [C] [G] [H] [ILV] [M] [F] [Y] [W] [P] [STA].|M3|biology, sound classes, sequence alignment, software 3627|Buchfink2014|The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.|000|sequence alignment, BLAST, methodology, 3628|DomazetLoso2011|Motivation: Bacterial and viral genomes are often affected by horizontal gene transfer observable as abrupt switching in local homology. In addition to the resulting mosaic genome structure, they frequently contain regions not found in close relatives, which may play a role in virulence mechanisms. Due to this connection to medical microbiology, there are numerous methods available to detect horizontal gene transfer. However, these are usually aimed at individual genes and viral genomes rather than the much larger bacterial genomes. Here, we propose an efficient alignment-free approach to describe the mosaic structure of viral and bacterial genomes, including their unique regions. Results: Our method is based on the lengths of exact matches between pairs of sequences. Long matches indicate close homology, short matches more distant homology or none at all. These exact match lengths can be looked up efficiently using an enhanced suffix array. Our program implementing this approach, alfy (ALignment-Free local homologY), efficiently and accurately detects the recombination break points in simulated DNA sequences and among recombinant HIV-1 strains. We also apply alfy to Escherichia coli genomes where we detect new evidence for the hypothesis that strains pathogenic in poultry can infect humans. Availability: alfy is written in standard C and its source code is available under the GNU General Public License from http://guanine.evolbio.mpg.de/alfy/. The software package also includes documentation and example data.|000|alignment-free algorithm, sequence comparison, biology, database search 3629|Hentschel2009|Der Lokativ findet sich in den russischen Singularparadigmen bei einigen hundert Substantiven nach den Präpositio- nen v ‚in‘ oder na ‚auf‘ in lokaler Funktion: v lesu ‚im Wald‘, na nosu ‚auf der Nase‘, v teni ‚im Schatten‘. Nach Präpositionen in nicht-lokaler Funktion stehen diese Sub- stantive im Präpositiv (Lokativ i. w. S.): o lese ‚über den Wald‘, o nose ‚über die Nase‘, o teni ‚über den Schatten‘. Der Partitiv ist erkennbar bei maskulinen Substantiven wie čaj ‚Tee‘, wenn man Kontexte mit dem „gewöhnlichen“ Genitiv wie vkus čaja ‚der Geschmack des Tees‘ vergleicht mit Kontexten wie stakan čaju ‚ein Glas Tee‘ (aber mit attributiver Erweiterung eher stakan kitajskogo čaja ‚ein Glas chinesischen Tees‘). Diese Formen erfüllen Mel’čuks (1986, 53 ff.) „principle of external autonomy of case forms“: Zwei unterschiedlich kasusmarkierte Wortformen eines Lexems können nur dann als Kodierungen eines Kasusgrammems betrachtet werden, wenn sie nicht in freier Variation stehen und die Auswahl des Kasusmarkers auch nicht von weiteren syntaktisch abhängigen Wortformen abhängt|163|locative case, partitive case, Russian, grammar, 3630|Zufferey2013|Discourse connectives are lexical items indicating coherence relations between discourse segments. Even though many languages possess a whole range of connectives, important divergences exist cross-linguistically in the number of connectives that are used to express a given relation. For this reason, connectives are not easily paired with a univocal translation equivalent across languages. This paper is a first attempt to design a reliable method to annotate the meaning of discourse connectives cross-linguistically using corpus data. We present the methodological choices made to reach this aim and report three annotation experiments using the framework of the Penn Discourse Tree Bank.|000|discourse connectives, cross-linguistic study, English, German, Italian, Dutch, 3631|Elsen2017|This article deals with sound symbolism and the ways to interpret sound symbolic phenomena. Sound symbolism appears to be a universal phenomenon but linguists tend to neglect it or offer heterogeneous approaches and definitions. This paper is concerned with the role of motivation, as assumed in cases like cuckoo, and the question whether some sound symbolic effects might be the result of acquired statistical knowledge about the language system. The author argues that several aspects of sound symbolism such as natural/iconic or habitual relationships between sound and (facets of) referents interact but should be considered separately to gain a more realistic insight into the working of sound symbolism.|000|sound symbolism, methodology, terminology 3632|Suter2017|Aekyom, Pa and Kamula are spoken in the Western Province of Papua New Guinea. We demonstrate that these languages form a language family by reconstructing the sound sys- tem, some vocabulary, pronouns and grammatical suffixes. We also trace loans from neighboring languages. It is shown that Aekyom and Pa form a subfamily called Elevala. We suggest that the nearest relatives of the Kamula-Elevala languages are the Awyu- Dumut languages spoken across the border in Indonesian Papua.|000|Papua New Guinea, Trans-New-Guinean languages, Aekyom, Pa, Kamula, linguistic reconstruction 3633|Nuebling2017|The aim of this article is twofold. Firstly, it shows that the history of German proper name inflection is a story of profound change. Proper names started out being inflected like common nouns; later, the reduction of their inflectional endings eventually resulted in a distinct declension class of proper names. Furthermore, gender assignment in proper names is different from that of common nouns, and today proper names may be accompanied by classifiers that have evolved from the definite article. Additionally, proper names show particularities concerning their syntactical behavior, word-formation processes, and orthography. While (most of) these developments provide evidence for change, they can, at the same time, be functionally interpreted as strategies to preserve the name shape for reasons of recognition. A second aim of this article is therefore to show that, as proper names are specific linguistic units, they deserve specific treatment. Most of the changes serve to stabilize the “name body” (schema consistency) and to mark morphological boundaries.|000|proper name, inflection, German, 3634|Nuebling2017|Proper names (PN) are specific linguistic units that differ in several aspects from common nouns (CN), their closest linguistic neighbors. They therefore deserve specific grammatical treatment. First of all, PNs lack lexical meaning and thus refer they directly to one specific entity: Berlin and Düsseldorf uniquely refer to towns and Iris Bauer denotes a person. While many PNs are opaque in that they do not contain lexical material (Berlin, Köln), others are semi-transparent [pb] (Düsseldorf) in that they partially consist of existing CNs (here: -dorf ‘village’). They differ, however, in that the lexical meaning of these CNs is not activated: Düsseldorf is not a village but actually a town.|341f|opacity, proper name, definition, 3635|Bowern2018|Although there have long been links between research in historical linguis- tics and research in biological evolution, the last few years have witnessed growth in historical linguistic research that treats languages as evolutionary systems that can be investigated using tools from computational phyloge- netics. In this review, I explore some of the advantages and disadvantages of using computational tools for historical linguistics. I describe the theory that underlies treating languages as evolutionary systems (in general terms), present the results of classifying languages lexically, and review some of the implications of this research.|000|phylogenetic reconstruction, historical linguistics, overview, review 3636|Bowern2018|Useful summary on phylogenetic methods in historical linguistics. Can be used to have a single reference to the topic in introductory texts.|000|introduction, review, phylogenetic reconstruction, historical linguistics 3637|Creanza2016|Both genetic variation and certain culturally transmitted phenotypes show geographic signatures of human demographic history. As a result of the human cultural predisposition to migrate to new areas, humans have adapted to a large number of different environments. Migration to new environments alters genetic selection pressures, and comparative genetic studies have pinpointed numerous likely targets of this selection. However, humans also exhibit many cultural adaptations to new environments, such as practices related to clothing, shelter, and food. Human culture interacts with genes and the environment in complex ways, and studying genes and culture together can deepen our understanding of human evolution.|000|gene-language co-evolution, cultural evolution, biological evolution, overview 3638|Creanza2016|Very shallow article which does not necessarily deserve to be quoted much, especially the linguistic parts are largely disappointing.|000|gene-language co-evolution, cultural evolution, biological evolution, review 3639|Frantz2016|The geographic and temporal origins of dogs remain controversial. We generated genetic sequences from 59 ancient dogs and a complete (28x) genome of a late Neolithic dog (dated to ~4800 calendar years before the present) from Ireland. Our analyses revealed a deep split separating modern East Asian and Western Eurasian dogs. Surprisingly, the date of this divergence (~14,000 to 6400 years ago) occurs commensurate with, or several millennia after, the first appearance of dogs in Europe and East Asia. Additional analyses of ancient and modern mitochondrial DNA revealed a sharp discontinuity in haplotype frequencies in Europe. Combined, these results suggest that dogs may have been domesticated independently in Eastern and Western Eurasia from distinct wolf populations. East Eurasian dogs were then possibly transported to Europe with people, where they partially replaced European Paleolithic dogs.|000|domestication, dog, population genetics, 3640|Frantz2016|Article suggests a dual origin of dogs, quite contrary to the article by @Botigue2017.|000|dog, domestication, population genetics, 3641|Bowern2017|Article discusses the famous critique of Gray's et al's methods (@Gray2003, @Atkinson2006, etc.) by @Pereltsvaig2015.|000|review, Indo-European, Indo-European homeland, phylogenetic reconstruction, dating 3642|Arias2017|**Objectives** Northwestern Amazonia (NWA) is a center of high linguistic and cultural diversity. Several language families and linguistic isolates occur in this region, as well as different subsistence patterns, with some groups being foragers and others agriculturalists. In addition, speakers of Eastern Tukanoan languages are known for practicing linguistic exogamy, a marriage system in which partners are taken from different language groups. In this study, we use high-resolution mitochondrial DNA sequencing to investigate the impact of this linguistic and cultural diversity on the genetic relationships and population structure of NWA groups. **Methods** We collected saliva samples from individuals representing 40 different NWA ethnolinguistic groups and sequenced 439 complete mitochondrial genomes to an average coverage of 1,030×. **Results** The mtDNA data revealed that NWA populations have high genetic diversity with extensive sharing of haplotypes among groups. Moreover, groups who practice linguistic exogamy have higher genetic diversity, while the foraging Nukak have lower genetic diversity. We also find that rivers play a more important role than either geography or language affiliation in structuring the genetic relationships of populations. **Discussion** Contrary to the view of NWA as a pristine area inhabited by small human populations living in isolation, our data support a view of high diversity and contact among different ethnolinguistic groups, with movement along rivers probably facilitating this contact. Additionally, we provide evidence for the impact of cultural practices, such as linguistic exogamy, on patterns of genetic variation. Overall, this study provides new data and insights into a remote and little-studied region of the world.|000|population genetics, genetic diversity, Amazonia, 3643|Arias2017|What is interesting about this article is that they find that rivers enhance exchange between people, which is also observed for linguistic data. Rivers serve as some sort of highway for the exchange between people.|000|Amazonia, population genetics, genetic diversity, 3644|Bastide2017|The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) running along a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's λ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents'. Hybrid vigor and hybrid depression is indeed a rather common phenomenon observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts, and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios, and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to the Xiphophorus fishes dataset, to confirm and complete previous analysis on this group. All the methods developed here are available in the user-friendly julia package PhyloNetworks.|000|phylogenetic network, correlational studies, algorithms, methodology 3645|Bastide2017|This algorithms seems to allow, essentially, to do correlational studies on evolution (different traits, etc.) for phylogenetic networks, which had been so far only done for trees.|000|phylogenetic network, trait evolution, correlational studies, phylogenetic comparative method, 3646|DenzerKing2010|A nice blog post about words for cats in South-American languages.|000|South American languages, cat, domestication, object naming 3647|Freedman2014|To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.|000|domestication, dog, population genetics, 3648|Freedman2014|This article is a nice account on the dynamics of dog domestication. Apparently, there was a lot of interbreeding involved, also between dogs and wolves.|000|domestication, dog, population genetics, overview 3649|Luniewska2016|We present a new set of subjective age-of- acquisition (AoA) ratings for 299 words (158 nouns, 141 verbs) in 25 languages from five language families (Afro- Asiatic: Semitic languages; Altaic: one Turkic language: Indo-European: Baltic, Celtic, Germanic, Hellenic, Slavic, and Romance languages; Niger-Congo: one Bantu language; Uralic: Finnic and Ugric languages). Adult native speakers reported the age at which they had learned each word. We present a comparison of the AoA ratings across all languages by contrasting them in pairs. This comparison shows a con- sistency in the orders of ratings across the 25 languages. The data were then analyzed (1) to ascertain how the demographic characteristics of the participants influenced AoA estimations and (2) to assess differences caused by the exact form of the target question (when did you learn vs. when do children learn this word); (3) to compare the ratings obtained in our study to those of previous studies; and (4) to assess the validity of our study by comparison with quasi-objective AoA norms derived from the MacArthur–Bates Communicative Development Inventories (MB-CDI). All 299 words were judged as being acquired early (mostly before the age of 6 years). AoA ratings were associated with the raters’ social or language status, but not with the raters’ age or education. Parents reported words as being learned earlier, and bilinguals reported learning them later. Estimations of the age at which children learn the words revealed significantly lower ratings of AoA. Finally, compar- isons with previous AoA and MB-CDI norms support the validity of the present estimations. Our AoA ratings are avail- able for research or other purposes.|000|speech norms, age of acquisition, cross-linguistic study 3650|Luniewska2016|Cross-linguistic account on age of acquisition data, very interesting!|000|age of acquisition, cross-linguistic study, speech norms 3651|Pelletier2017|It’s no secret that different of the subfields in cognitive science dispute what the correct solution is to various problems that they each investigate in their separate ways. Sometimes this is due to differing antecedent ideas about what is the appropriate way to investigate the phenomenon, other times it is due to differ- ing antecedent ideas about what principles an adequate solution should embody, and still other times it is due to differing antecedent ideas concerning what the dispute is about. . . as for example when they use the same terminology for different phenom- ena. This paper is an investigation into these differing antecedent ideas in the realm of meaning and compositionality as they play out in linguistics, cognitive psychol- ogy, and philosophy of language. The focus is on the notions of subjective meaning and objective meaning, and a main conclusion is that there needs be a “two-factor semantic theory” to accommodate the overall goals of both sides. Some steps are made towards that end, but various previous attempts are argued to have missed the point. In the end it will be shown that, clearly, more work needs to be done along a number of specified dimensions, especially on the subjective side of the dispute.|000|compositionality, conceptualization, cognition, 3652|Downey1999|This book deals with complexity and proposes a new look beyond the np-hardness or the np-completeness of problems.|000|NP-complete, NP-hard, complexity 3653|Mcnally2017|One of the defining traits of language is its capacity to mediate between concepts in our mind, which encapsulate generalizations, and the things they refer to in a given communicative act, with all their idiosyncratic properties. This arti- cle examines precisely this interplay between conceptual and referential aspects of meaning, and proposes that concept composition (or concept combination, a term more commonly used in Psychology) exploits both: Conceptually afforded compo- sition is at play when a modifier and its head fit as could be expected given the prop- erties of the two concepts involved, whereas in referentially afforded composition the result of the composition depends on specific, independently available properties of the referent. For instance, red box tends to be applied to boxes whose surface is red, but, given the appropriate context, it can also be applied to e.g. a brown box that contains red objects. We support our proposal with data from nominal modification, and explore a way to formally distinguish the two kinds of composition and inte- grate them into a more general framework for semantic analysis. Along the way, we recover the classically Fregean notion of sense as including conceptual information, and show the potential of distributional semantics, a framework that has become very influential in Cognitive Science and Computational Linguistics, to address research questions from a theoretical linguistic perspective.|000|denotation, conceptualization, compositionality, affordance, concept combination 3654|Mcnally2017|The problem is that, in the absence of context, sometimes the default interpretation for the modifier-noun combination is so strong as to make other possible interpreta- tions seem impossible, whereas in context any interpretation—even the seemingly impossible—is possible.|246|affordance, concept combination, context, 3655|Mcnally2017|Confronted with the contrast between the strong default interpretation and the possibility of any interpretation in context, linguists have tended to follow one of two routes, both of which we will discuss and exemplify below. The first involves taking the default interpretation as the crucial fact to account for, leaving the non-default interpretations in context unexplained. The second involves providing an analysis that is weak enough to capture all interpretations, and saying little or nothing about the strength of the default interpretation. In this paper, we argue that, in effect, both routes must be taken because two fundamentally different interpretative processes can be appealed to in the composition of modified noun phrases or, more generally, in concept composition. Specifically, we take default interpretations to be the result of what we will call conceptually afforded concept composition, and non-default inter- pretations to be the result of referentially afforded concept composition.|246|definition, affordance, compositionality, concept combination 3656|Reitz2008|Naming organisms is a fundamental characteristic of our linguistic past. Folk taxonomy examines the way people name organisms. Such taxonomies reveal people’s concepts about animals, associations between different animals, and the source of introduced animals when they and their names are adopted together (Berlin 1992; Conklin 1972; Prummel 2001 ; Serjeantson 2000). Some folk taxonomies are the same as Linnaean classifications, others are more finely subdivided, and some combine organisms with quite different biological histories. Knowing which distinctions or combinations were made is essential to understanding economic and social systems.|32|denotation, folk taxonomy, animal naming, 3657|Poortman2017|This chapter studies the interpretations of plural sentences with con- joined predicates, e.g. The boys are sitting and cooking and The boys are waving and smiling. Such sentences are sometimes interpreted intersectively, sometimes non-intersectively (or ‘split’), and sometimes both interpretations appear to be allowed. This is surprising, since the logical structure of these sentences is identical, i.e. they differ only with respect to content words (e.g. sitting, cooking vs. waving, smiling). I propose that the logical interpretation of these sentences is systematically affected by lexical information tied to the complex predicate in the sentences, specifically their so-called typicality effects. With a set of experiments, I show that (a) the acceptability of a sentence in a non-intersective situation can be expressed in terms of a continuum and (b) each acceptability proportion is predicted by the typicality of the two conjoined predicates applying simultaneously. This way, I specify at least one of the relevant pragmatic considerations that determine the interpretation of a plural sentence with conjunctive predicates. More generally, these results stress the importance of conceptual structure of predicates in semantic theories of language.|000|concept combination, plural, semantics, pragmatics 3658|Poortman2017|The insight that context in general, and lexical information in particular, system- atically predicts the logical interpretation of a sentence is relatively new. A turning point came with an influential paper on reciprocity by @Dalrymple<1998> et al. (1998). Dalrymple et al. were the first to start to incorporate a notion of context within logical meaning. They put forward a principle called the Strongest Meaning Hypothesis (SMH), which aims to resolve ambiguity that is caused by contextual information, specifically in the area of reciprocals.|141|context, strongest meaning hypothesis, definition, introduction, conceptualization 3659|Dalrymple1998|The English reciprocal expressions each other and one another vary in meaning according to the meaning of their scope and antecedent, as well as the context in which they are uttered. Variation in the reciprocal's meaning is not just pragmatically determined alteration in speaker's mean ing but semantically determined change of literal conditions for strict truth. We will show how to parameterize the dramatic range of observed variation, and we will formulate a principle which predicts the reciprocal's literal meaning in any context of utterance: a reciprocal statement ex presses the strongest candidate meaning that is consistent with certain contextually given information. This analysis explains a large collection of examples, including those with quantified antecedents. |000|reciprocity, English, semantics, context, 3660|Scheinfeldt2010|Although Africa is the origin of modern humans, the pattern and distribution of genetic variation and correlations with cultural and linguistic diversity in Africa have been understudied. Recent advan- ces in genomic technology, however, have led to genomewide studies of African samples. In this article, we discuss genetic variation in African populations contextualized with what is known about archaeological and linguistic variation. What emerges from this review is the importance of using independent lines of evidence in the interpretation of genetic and genomic data in the reconstruc- tion of past population histories.|000|interdisciplinary research, archaeology, archaeogenetics, linguistics, African languages, Africa, 3661|Scheinfeldt2010|A promise made 2010, almost 10 years ago, yet nothing happened, as we know. Linguistics is still ignored when it comes to archaeology and archaeogenetics.|000|African languages, Africa, human prehistory, interdisciplinary research, 3662|Salisbury1999|Strongest evidence (SE) is an approach to evaluating the support provided by characters for in the context alternative phylogenetic hypotheses (i.e., trees). Although first demonstrated of parsimony, SE is equally applicable to compatibility analysis. In the logic of strongest evidence, a character that is compatible with a phylogenetic hypothesis supports the tree only to the degree at which this compatibility would be improbable under a model of cladistic dissociation between character state distribution and the tree. The support measure derived from this consideration is called the "apparent phylogenetic signal" (APS). The total support for a tree is the sum of the individual character APS values. Tree topology and character structure both affect the chance of "random" compatibility and thus the APS scores. Because a clique of compatible characters implies a specific tree (possibly more than one if the character states are unordered or there are missing data), the evidential strength of a clique may be measured as the SE support for the clique's tree by the clique's characters. Use of this measure is demonstrated on a morphological data set for North American spe- cies of Chloris (Poaceae). Cliques with the same number of characters vary tremendously in their apparent evidential quality. The strongest clique is nearly 100,000 times less likely to be compatible by chance with the tree it implies than is the weakest clique of the same size. This approach to clique evaluation is compared with character counting and Meacham's clique improbability measure.|000|clique, strongest evidence, maximum parsimony, phylogenetic reconstruction 3663|Salisbury1999|Paper describes a method to search for patterns in phylogenetic data. The method is not very well described, as it is not clear what they mean by "cliques". |000|pattern analysis, phylogenetic reconstruction, maximum parsimony, strongest evidence, methodology 3664|Schluecker2017|The aim of this paper is to provide an overview of the topics and recent developments in the research on the morphosyntax of proper names. The article reflects on the ways in which and the reasons why proper names may be morphosyntactically different from common nouns. It argues that the distinction between proper names and proper nouns is essential for the discussion of the topic, and it shows that there are considerable differences regarding morpholo- gical and syntactic properties both among the various name classes as well as cross-linguistically. In the second part of the paper, selected aspects dealt with in the recent literature are discussed in more detail, including those on the morphological and syntactic properties of proper names and proper nouns, and the specific morphosyntactic constructions proper names and proper nouns can occur in.|000|morphosyntax, proper name, grammar of names, 3665|Schluecker2017|From a semantic point of view, proper names are nominal expressions that refer uniquely to one specific extra-linguistic entity. For this reason, proper names (or simply, names) are rigid designators in the sense of Kripke (1980). Unlike common nouns, which have descriptive content and denote concepts, proper names do not have lexical meaning (or at most to a very limited extent); accordingly, the relation between name and referent is direct and not mediated via the lexical content as in the case of common nouns. Semantically, unique and direct reference can therefore be regarded as the defining properties of proper names.|310|proper name, definition, common nouns, 3666|Sassoon2017|Existing formal theories represent the interpretations of gradable predi- cates in terms of single scalar dimensions. This paper presents a new approach, which aims to cover morphological gradability in multidimensional adjectives and nouns. Following psychological theories, nouns are assumed to be associated with dimension sets, like adjectives. Degree constructions are proposed to involve quantification on dimensions. This approach correlates the acceptability of a given noun or adjective in comparison constructions with its type of characteristic cate- gorization criterion (i.e., whether, as a default, its dimensions combine into a single criterion via quantifiers or other operations). A preliminary study confirms the predicted correlation. Directions for future research are proposed.|000|gradability, morphology, dimension accessibility, 3667|Sassoon2017|In model-theoretic referential semantics, the interpretation of a predicate (a word like dog, big, or card game) is modeled through its intension, namely, a function from contexts (such as worlds, times, or information states) into classes of entities, contextually given extensions. By contrast, in cognitive psychology, concepts such as ‘dog’, ‘big’, or ‘card game’ are modeled through their dimensions, prototypes, and similarity structures. Gradually, it has been understood that both formal and conceptual representations play a role in natural language semantics. This is evident in the study of morphological gradability, which is the topic of the present study.|291|semantic similarity, interpretation, cognition, conceptualization 3668|Sassoon2017|Natural language predicates can be divided into various syntactic categories, two of which are adjectives and nouns. Most adjectives, including, tall, expensive and healthy among others, are morphologically gradable. In other words, they felicitously [pb] combine with degree morphemes, as in taller, tallest, too tall, tall enough and very tall. However, some adjectives exist that are not morphologically gradable, including, for instance, geological, prime, and even. Thus, a map can be said to be more expensive than another map, but not more geological.|191f|morphological gradability, gradability, dimension accessibility, cognition, adjectives, common nouns 3669|Skoglund2015|The origin of domestic dogs is poorly understood [1–15], with suggested evidence of dog-like features in fossils that predate the Last Glacial Maximum [6, 9, 10, 14, 16] conflicting with genetic estimates of a more recent divergence between dogs and world- wide wolf populations [13, 15, 17–19]. Here, we present a draft genome sequence from a 35,000- year-old wolf from the Taimyr Peninsula in northern Siberia. We find that this individual belonged to a population that diverged from the common ancestor of present-day wolves and dogs very close in time to the appearance of the domestic dog lineage. We use the directly dated ancient wolf genome to recali- brate the molecular timescale of wolves and dogs and find that the mutation rate is substantially slower than assumed by most previous studies, suggesting that the ancestors of dogs were separated from pre- sent-day wolves before the Last Glacial Maximum. We also find evidence of introgression from the archaic Taimyr wolf lineage into present-day dog breeds from northeast Siberia and Greenland, contributing between 1.4% and 27.3% of their ancestry. This demonstrates that the ancestry of pre- sent-day dogs is derived from multiple regional wolf populations.|000|wolf, dog, domestication, archaeogenetics, population genetics, admixture 3670|Skoglund2015|Paper discusses and finds evidence for admixture between wolves and dogs during the domestication of the dog.|000|domestication, dog, admixture, wolf, 3671|Viering2017|Elefants in need of salt collaborate in order to mine it from under the earth. This is a very interesting example for animal collaboration and culture.|000|cultural evolution, elefants, salt, collaboration, animal cognition 3672|Welsh1967|This paper points out the connection between the basic scheduling or timetabling problem and the well known problem of colouring the vertices of a graph in such a way that (i) no two adjacent vertices are the same colour and (ii) the number of colours used is a minimum. We give an algorithm for colouring a graph subject to (i) and give a new easily determined bound for the number of colours needed. This same bound is also a new upper bound for the chromatic number of a graph in terms of the degrees of its vertices.|000|algorithms, Welsh-Powel algorithm, graph-coloring problem, clique-partitioning problem, approximation 3673|Welsh1967|Algorithm provides a rather straightforward approximation for the graph-coloring problem which can be directly used to be applied to the clique coverage problem (or clique partitioning problem).|000|Welsh-Powel algorithm, algorithms, approximation, graph coloring, clique-partitioning problem 3674|Winter2017|Compositional semantic frameworks often compute the extension of a complex expression directly from the extensions of its parts. However, much work in cognitive psychology has shown important challenges for compositional meth- ods. For instance, Hampton (J Exp Psychol Learn Mem Cognit 14(1):12–32, 1988b) showed that speakers may let the complex nominal sports that are games include chess as one of its instances, without admitting chess in the extension of sports. Similarly, Lee (2017) experimentally supports the common intuition that instances of red hair are not necessarily categorized as red. This paper reviews further results about plural quantifiers, showing similar challenges for compositionality. It is pro- posed that typicality effects play a systematic role in compositional interpretation and the determination of truth-values. For instance, the “overextension” effect in the red hair example is predicted by the fact that focal red is an atypical hair color. Sim- ilarly, in the plural sentence the men are walking and writing, the availability of the split reading (“some men are walking and some men are writing”) increases due to the atypicality of doing both activities at the same time (Poortman, 2017). Fur- ther, in reciprocal sentences like the three men are pinching each other, the number of pinching acts may be three. This is related to the atypicality of situations where every man pinches two other men at the same time, as required by a strong interpre- tation of each other. The paper gives a uniform account of truth-value judgements on these different constructions, based on the identification of conflicts between typical preferences.|000|compositionality, concept combination, truth value, gradability, typicality, interpretation 3675|Westerlund2017|Within the cognitive neuroscience of language, the left anterior temporal lobe (LATL) is one of the most consistent loci for semantic effects; yet its precise role in semantic processing remains unclear. Here we focus on a literature showing that the LATL is modulated by semantic composition even in the simplest of cases, suggesting that it has a central role in the construction of complex meaning. We show that while the LATL’s combinatory contribution generalizes across several linguistic factors, such as composition type and word order, it is also robustly modulated by conceptual factors such as the specificity of the composing words. These findings suggest that the LATL’s role in composition is at the conceptual as opposed to the syntactic or logico-semantic level, making formal semantic theories of composition less obviously useful for guiding future research on the LATL. For an alternative theoretical foundation, this chapter seeks to connect LATL research to psychological models of conceptual combination, which potentially offer a more fruitful space of hypotheses to constrain our understanding of the computations housed in the LATL. We conclude that, though currently available data on the LATL do not rule out relation-based models, they are most consistent with schema-based models of conceptual combination, with the LATL potentially rep- resenting the site of concept schema activation and modification.|000|neurology, neurolinguistics, left anterior temporal lobe, concept combination, 3676|Morin2016a|The presence of emotional words and content in stories has been shown to enhance a story’s memorability, and its cultural success. Yet, recent cultural trends run in the opposite direction. Using the Google Books corpus, coupled with two metadata-rich corpora of Anglophone fiction books, we show a decrease in emotionality in English-speaking literature starting plausibly in the nineteenth century. We show that this decrease cannot be explained by changes unrelated to emotionality (such as demographic dynamics concerning age or gender balance, changes in vocabulary richness, or changes in the prevalence of literary genres), and that, in our three corpora, the decrease is driven almost entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words shows little if any decline. Consistently with previous studies, we also find a link between ageing and negative emotionality at the individual level.|000|big data, small data, Google N-Grams, word emotions 3677|Morin2016a|Essentially can be quoted as an example that big data may be misleading if not checked against small and well-understood datasets.|000|big data, small data, Google N-Grams, word emotions 3678|Viegas2004|The Internet has fostered an unconventional and powerful style of collaboration: “wiki” web sites, where every visitor has the power to become an editor. In this paper we investigate the dynamics of Wikipedia, a prominent, thriving wiki. We make three contributions. First, we introduce a new exploratory data analysis tool, the history flow visualization, which is effective in revealing patterns within the wiki context and which we believe will be useful in other collaborative situations as well. Second, we discuss several collaboration patterns highlighted by this visualization tool and corroborate them with statistical analysis. Third, we discuss the implications of these patterns for the design and governance of online collaborative social spaces. We focus on the relevance of authorship, the value of community surveillance in ameliorating antisocial behavior, and how authors with competing perspectives negotiate their differences.|000|history flow visualization, visualization, IBM, editing history, Wikipedia 3679|Viegas2004|Very interesting visualization technique which shows how text-editing in versionization software leads to the development of a given document. In a similar way, we can imagine that an etymology has developed, given that linguist exchange themselve for years until they confirm or reject a given etymology, and etymological dictionaries are full of this kind of knowledge which is based on copying and refining.|000|visualization, history flow visualization, Wikipedia 3680|Lass2017|All disciplines that deal with (apparent) recovery of objects from the past are faced by a fundamental question: what is the metaphysical status of these objects? Are they realia of some kind, or are they merely epistemic objects with no substance? This could be summed up from a debate still going on in quantum physics: do quantum systems have a real existence, or are they merely devices for calculation? In this paper I sum up the advantages of having an ontology, and the disadvantages of assuming that reconstructed linguistic objects are not real. I also discuss the uniformitarian position that makes this an unproblematic claim. I also deal with the neo-Saussurean claim that reconstructed items have no reality in themselves, but solely in terms of the systems they are in; and I suggest that this position (held by Meillet and Kuryłowicz among others), is fundamentally perverse.|000|linguistic reconstruction, realism, abstractionist-realist debate, phonological reconstruction, methodology 3681|Ringe2015|The authors' summary of 19th-century developments is reasonable, but their discussion of more recent work is inadequate and full of gaps. Embleton 1986, a major contribution to linguistic cladistics, is not even mentioned; the line of work which cul- minated in Ringe et al. 2002 and Nakhleh et al. 2005 is ignored, as is the independent but partly parallel work of Rexová et al. 2003. The authors note (p. Ill) that until re- cently cladistic work was based on small samples whose features they claim were "intui- tively" weighted, but the claim is false: to the extent that any weighting was attempted, characters that best reflect NLA (see above) were weighted more heavily and lexical characters less so.|318|review, history of science, historical linguistics, critics 3682|Ringe2015|The authors suggest (p. 120) that the "word" i heredity, comparable to the gene in biology; but from what clear that grammatical features, not lexemes, are the closest linguistic analog of genes. |319|biological parallels, analogy, grammatical feature, typological features, cognates, 3683|Borchsenius2017|The main goal of this chapter is to introduce the reader to the parallels and commonalities that exist between the fields of biology and linguistics. Researchers from both fields faced similar problems when seeking to account for the descent and diversification of related entities (species, languages). Therefore, they often sought mutual inspiration and opted for similar solutions. This has resulted in a convergence of models and methods in both fields. This chapter is divided into two parts. Firstly, we review some of the methodological and conceptual developments that have occurred in biology since the emer- gence of the field of evolutionary biology. There will be an emphasis on the last decade, where a variety of computer-based analyses have been developed. To illustrate the benefits of these tools, phylogenetic methods are applied in the second part of the chapter to a group of high-contact languages (creoles), which have long defied attempts at classification due to their multiple ancestry.|000|biological parallels, biology, linguistics, phylogenetic reconstruction, introduction 3684|Borchsenius2017|A rather shallow introduction to the topic, but the aspect of Creole languages in their is probably interesting.|000|phylogenetic reconstruction, biology, linguistics, biological parallels 3685|SolisLemus2016|Phylogenetic networks are necessary to represent the tree of life expanded by edges to rep- resent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylo- genetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudoli- kelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolution- ary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations.|000|incomplete lineage sorting, phylogenetic network, pseudo-likelihood, methodology, phylogenetic reconstruction 3686|Diamond2002|Domestication interests us as the most momentous change in Holocene human history. Why did it operate on so few wild species, in so few geographic areas? Why did people adopt it at all, why did they adopt it when they did, and how did it spread? The answers to these questions determined the remaking of the modern world, as farmers spread at the expense of hunter–gatherers and of other farmers.|000|domestication, plant domestication, animal domestication, introduction, overview 3687|Badr2012|Knowledge of the origin and domestication history of crop plants is important for stud- ies aiming at avoiding the erosion of genetic resources due to the loss of ecotypes and landraces and habitats and increased urbanization. Such knowledge also strengthens the capacity of modern farm- ing system to develop and scale-up the domestication of high value potential crops that can be achieved by improving the knowledge that help to identify and select high value plant species within their locality, identify and apply the most appropriate propagation techniques for improving crops and integrate improved crop species into the farming systems. The study of domestication history and ancestry provide means for germplasm preservation through establishment of gene banks, lar- gely as seed collections, and preservation of natural habitats. Information about crop evolution and specifically on patterns of genetic change generated by evolution prior, during, and after domesti- cation, is important to develop sound genetic conservation programs of genetic resources of crop plants and also increases the efficiency of breeding programs. In recent years, molecular approaches have contributed to our understanding of the aspects of plant evolution and crops domestication. In this article, aspects of crops domestication are outlined and the role of molecular data in elucidating the ancestry and domestication of crop plants are outlined. Particular emphasis is given to the contribution of molecular approaches to the origin and domestication history of barley and the ori- gin and ancestry of the Egyptian clover.|000|domestication, plant domestication, farming, 3688|Fuentes2017|It is well known that humans are creating new variants of organisms, ecosystems and landscapes. Here I argue that the degree of biological novelty generated by humans goes deeper than that. We use property rules to create exclusivity in cooperation among humans, and between humans and other biological entities, thus overcoming social dilemmas and breaking barriers to cooperation. This is leading to novel forms of cooperation. One of them is the human control, modification and replication of whole ecosystems. For the first time, there exist ecosystems with functional design, division of labor and unlimited heredity. We use mental representation and language as new mechanisms of inheritance and US modification that apply to an increasing variety of biological and non-biological entities. As a result, the speed, depth and scale of biological innovation is unprecedented in the history of life.|000|anthropocene, biological parallels, language evolution, cultural evolution, biological evolution, 3689|Fuentes2017|Rather boring article discussing the role of humans, which is not based on real facts but rather on a shallow knowledge of some linguistics and maybe a bit more of biology.|000|anthropocene, biological evolution, language evolution 3690|Evans2016a|There is something unique about human culture. Its complex technologies, customs, institutions, symbolisms and norms, which are shared and maintained and improved across countless generations, are what sets it apart from the ‘cultures’ of other animals. The fundamental question that researchers are only just beginning to unravel is: How do we account for the gap between their ‘cultures’ and ours? The answer lies in a deeper understanding of culture’s complex constituent components: from the micro-level psychological mechanisms that guide and facilitate accurate social learning, to the macro-level cultural processes that unfold within large-scale cooperative groups. This thesis attempts to contribute to two broad themes that are of relevance to this question. The first theme involves the evolution of accurate and high-fidelity cultural transmission. In Chapter 2, a meta-analysis conducted across primate social learning studies finds support for the common assumption that imitative and/or emulative learning mechanisms are required for the high-fidelity transmission of complex instrumental cultural goals. Chapter 3, adopting an experimental study with young children, then questions the claim that mechanisms of high-fidelity copying have reached such heights in our own species that they will even lead us to blindly copy irrelevant, and potentially costly, information. The second theme involves investigations of the mutually reinforcing relationship predicted between cultural complexity and ultra-cooperativeness in humans, employing a series of laboratory-based experimental investigations with adults. Chapter 4 finds only dynamic might help to scaffold the evolution of increased cultural complexity and cooperation in a learning environment where cultural information carries high value.|000|cultural evolution, social learning, imitation, emulation, 3691|Dautriche2016|Although the mapping between form and meaning is often regarded as arbitrary, there are in fact well-known constraints on words which are the result of functional pressures associated with language use and its acquisition. In particular, languages have been shown to encode meaning dis- tinctions in their sound properties, which may be important for language learning. Here, we inves- tigate the relationship between semantic distance and phonological distance in the large-scale structure of the lexicon. We show evidence in 100 languages from a diverse array of language families that more semantically similar word pairs are also more phonologically similar. This sug- gests that there is an important statistical trend for lexicons to have semantically similar words be phonologically similar as well, possibly for functional reasons associated with language learning.|000|speech norms, dataset, Wikipedia, semantic similarity, phonetic similarity 3692|Dautriche2016|Very interesting study in which people created a corpus from Wikipedia which can serve as some crude form of norm data for various languages. Their data contains frequency of words for 100 languages, as well as semantic vectors, which is extremely interesting for future studies, if more data in form of the Concepticon can be aggregated for more languages.|000|word embeddings, semantic similarity, phonetic similarity, dataset, speech norms 3693|Kraig2014|The study of crop origins has traditionally involved identifying geographic areas of high morphological diversity, sampling popula- tions of wild progenitor species, and the archaeological retrieval of macroremains. Recent investigations have added identification of plant microremains (phytoliths, pollen, and starch grains), bio- chemical and molecular genetic approaches, and dating through 14 C accelerator mass spectrometry. We investigate the origin of domesticated chili pepper, Capsicum annuum, by combining two approaches, species distribution modeling and paleobiolinguis- tics, with microsatellite genetic data and archaeobotanical data. The combination of these four lines of evidence yields consensus models indicating that domestication of C. annuum could have occurred in one or both of two areas of Mexico: northeastern Mexico and central-east Mexico. Genetic evidence shows more support for the more northern location, but jointly all four lines of evidence support central-east Mexico, where preceramic mac- roremains of chili pepper have been recovered in the Valley of Tehuacán. Located just to the east of this valley is the center of phylogenetic diversity of Proto-Otomanguean, a language spo- ken in mid-Holocene times and the oldest protolanguage for which a word for chili pepper reconstructs based on historical linguistics. For many crops, especially those that do not have a strong archaeobotanical record or phylogeographic pattern, it is difficult to precisely identify the time and place of their origin. Our results for chili pepper show that expressing all data in sim- ilar distance terms allows for combining contrasting lines of evi- dence and locating the region(s) where cultivation and domestication of a crop began. **Significance** The novelty of the information of this manuscript resides in the addition of species distribution modeling and paleobiolin- guistics data, combined with genetic and existing archae- obotanical data, to trace back the geographic origin of a crop, namely domesticated pepper, Capsicum annuum. Furthermore, the utilization of a geographic framework of reference for the four types of data has allowed us to combine these indepen- dent data types into a single hypothesis about the origin of this crop. Our results suggest that food crops in Mexico had a multiregional origin with chili pepper originating in cen- tral-east Mexico, maize in the Balsas River Basin and com- mon bean in the Lerma–Santiago River Basin, resembling similar finds for the Fertile Crescent and China. |000|paleolinguistics, paleobiolinguistics, domestication, chilli pepper, plant domestication 3694|Leglise2017|This chapter focusses on various methodologies we can rely on to study heterogeneity and dialectal variation among multilingual speakers. We first focus on linguistic variation in contact settings and present a methodology to describe heterogeneous and multilingual corpora and show how languages sometimes overlap. A second part focusses on (dialectal) language boundaries and how speakers may sometimes use unmarked elements showing fuzziness or reorganization of language boundaries. The role of ideology is highlighted in discourse but also at play in language practices and in doing-being ‘multilingual’, ‘urban and modern’ or performing authenticity. The third part focusses on how speakers use dialectal and linguistic resources from their linguistic repertoire in their everyday life interactions as stances and acts of identity.|000|multilingualism, language mixture, language boundaries, dialectal variation, 3695|Lass2017|Very interesting discussion of the realist-abstractionalist debate.|000|abstractionist-realist debate, methodology, linguistic reconstruction, 3696|Lass2017|When Grimm (1822) invented the formal notion of Lautverschiebung, he assumed that ‘things’ of some sort like tenues (stops) and aspirata (aspirates, actually his name for fricatives) had taken part in the shift, that these things had locations as well as manners of articulation (labiales, gutturales). Further, they moved from one stricture degree (Stufe ‘grade’) to another within a real space. For him the past had an ontology.|155|realism, linguistic reconstruction, phonological reconstruction, abstractionist-realist debate 3697|Lass2017|Meillet, under the influence of his mentor Saussure, posited an influential view of this kind, later developed and made more explicit by others. He denied that the major types of reconstructed object that would seem to be suitable for ontic interpretation (segments, tones, words) were real. In the larger project of history their potential material values were important only insofar as they defined relations within systems, and their historical value was apparently solely instrumental. What was real for him would in the commonsense view be even less real than a reconstructed phone(me): it was an abstraction with no possible material existence.|155|Ferdinand de Saussure, Antoine Meillet, abstractionist-realist debate, 3698|Lass2017|In a methodological exposition of comparative method (1964 [1937], 41–2), Meillet gives examples like skr. dh = gr. θ = arm. d = germ. d and describes the meaning of his praxis this way (p. 47): “On pourra convenir de designer” 10 such a correspondence set as *dh, and if we wished call it an “occlusive sonore” 11 . Then comes what appears to be a complete reversal: “les ‘restitutions’ ne sont que les signes par lesquels on exprime en abrégé les correspondences”. :translation:`‘[reconstructions’ are nothing but signs by which we summarise correspondences]`|155|correspondence patterns, abstractionist-realist debate, linguistic reconstruction 3699|Lass2017|Rather, reconstructed Indo- European is only “un système défini de correspondances entre des langues historiquement attestées” 14 (p. 47, italics Meillet’s).|156|Antoine Meillet, nice quote, correspondence patterns, Indo-European, realism, abstractionist-realist debate 3700|Lass2017|The first question to ask is precisely what it is that becomes phonemicised, and how if not by phonetic changes that make it contrastive. Another issue that does not seem to be problematic in this kind of structuralism is this: how is it that given a set like {Sanskrit pitar-, Latin pater, Old English fæder, German Vater}, supposedly reconstructed on the basis of pure ‘relations’, we always [pb] seem to come out with a labial initial? This is profoundly disingenuous (the term would apply both to Meillet and Kuryłowicz); there is really no way of avoiding phonetic operations, even in something as apparently basic, and to the structuralist imagination abstract, as constructing phoneme systems.|`56f|abstractionist-realist debate, realism, linguistic reconstruction, 3701|Lentz2008|Mexico has long been recognized as one of the world’s cradles of domestication with evidence for squash (Cucurbita pepo) cultiva- tion appearing as early as 8,000 cal B.C. followed by many other plants, such as maize (Zea mays), peppers (Capsicum annuum), common beans (Phaseolus vulgaris), and cotton (Gossypium hirsu- tum). We present archaeological, linguistic, ethnographic, and ethnohistoric data demonstrating that sunflower (Helianthus an- nuus) had entered the repertoire of Mexican domesticates by ca. 2600 cal B.C., that its cultivation was widespread in Mexico and extended as far south as El Salvador by the first millennium B.C., that it was well known to the Aztecs, and that it is still in use by traditional Mesoamerican cultures today. The sunflower’s associ- ation with indigenous solar religion and warfare in Mexico may have led to its suppression after the Spanish Conquest. The discovery of ancient sunflower in Mexico refines our knowledge of domesticated Mesoamerican plants and adds complexity to our understanding of cultural evolution.|000|sunflower, domestication, plant domestication, Mexico, 3702|Brown2008|@Lentz<2008> et al. cite linguistic evidence supporting their hypothesis that Helianthus annuus was domesticated in Me- soamerica in pre-Columbian times. This consists, in part, of a list of names for sunflower collected from speakers of 14 con- temporary native languages. They conclude that the fact that terms for sunflower in 11 of these languages do not phono- logically resemble Spanish words for sunflower provides sup- port for their understanding that the cultivation of sunflower in Mesoamerica today is not explained by its introduction from outside the region. However, the terms they cite are all descriptive in nature, e.g., Popoluca ‘‘big sun’’ and Nahuatl ‘‘shield flower.’’ Such semantically transparent names are typ- ically used as alternatives to loanwords to designate newly encountered introduced things (2), e.g., ‘‘cacao bean of the earth’’ for peanut in both Popoluca and Nahuatl (peanut is a late introduction to Mesoamerica from South America).|p1|linguistic palaeography, evidence, sunflower 3703|Vigne2004|It is generally accepted that cats were first domesticated in ancient Egypt (1–3), at the latest by the 20th to 19th century B.C. (Middle Kingdom, 12th dynasty) (4). However, several finds from Cyprus suggest that the origins of cat taming were earlier. A cat mandible at the early Neolithic site of Khirokitia, Cyprus (2), and, more recently, from several other sites (5–8), show that cats were present on the island starting from ⬃9500 years ago.|000|domestication, animal domestication, cat domestication, cats, 3704|Newberry2017|Both language and genes evolve by transmission over generations with opportunity for differential replication of forms 1 . The understanding that gene frequencies change at random by genetic drift, even in the absence of natural selection, was a seminal advance in evolutionary biology 2 . Stochastic drift must also occur in language as a result of randomness in how linguistic forms are copied between speakers 3,4 . Here we quantify the strength of selection relative to stochastic drift in language evolution. We use time series derived from large corpora of annotated texts dating from the 12th to 21st centuries to analyse three well-known grammatical changes in English: the regularization of past-tense verbs 5–9 , the introduction of the periphrastic ‘do’ 10 , and variation in verbal negation 11 . We reject stochastic drift in favour of selection in some cases but not in others. In particular, we infer selection towards the irregular forms of some past-tense verbs, which is likely driven by changing frequencies of rhyming patterns over time. We show that stochastic drift is stronger for rare words, which may explain why rare forms are more prone to replacement than common ones 6,9,12 . This work provides a method for testing selective theories of language change against a null model and reveals an underappreciated role for stochasticity in language evolution.|000|drift, language change, bigram analysis, corpus studies 3705|Newberry2017|Paper is less interesting than it seems from its title: they use word embeddings and other techniques on large corpora of English. In general nothing really specific or surprising.|000|word embeddings, language change, semantic change, English, corpus studies 3706|Bloomfield1928|This supposition was necessary (or, in fact, justifiable) only on the assumption that phonemes change,-i.e., that sound change goes on re- gardless of meaning and is therefore subject to phonetic conditions only (and is not affected by frequency, euphony, meaning, etc. of words and other forms).|99|systemic processes, sound change, nice quote, Leonard Bloomfield, 3707|Round2017|Review of a book on typology of phonology by @Gordon2016. In the critical assessment part, the author gives some interesting ideas on frequencies in phonology and typology and how they should be handled.|000|frequency, typology, phonology, review, 3708|Round2017|For consonants there is broad correlation: the higher a phoneme’s cross-linguistic frequency, the higher its language internal frequency (r = 0.6, p = 0.001). Naturally, there are some outliers, one of which I return to below. Vowels, however, are simply bimodal. Cross-linguis- tically, /ieaou/ vastly outnumber all others.|p3|correlational studies, frequency, phoneme inventory 3709|Round2017|As datasets grow and absolute universals remain elusive, the context for this discussion is that typology has increasingly turned to statistics, the branch of science that provides us tools to quantify our uncertainty and characterize variation. Statistical analysis works best when true uncertainty and variation are factored into the analysis, not out of it. If we are uncertain if item x is of category A or B, it is preferable to explicitly reflect that uncertainty, rather than arbitrarily choosing just B; or if x varies between A and B, it is preferable to explicitly reflect that variation, rather than arbitrarily choosing just A. Likewise, data is ideally coded to transparently compare like with like.|p7|normalization, comparability, phonology, methodology 3710|Round2017|That uncertainty can be factored into our statistical analyses, or, if the acoustic analysis has been done, then we would do well to represent our more-certain descriptions of those vowels somehow differently to the default, less-certain cases. Of course, the task of representing a dimension of (un)cer- tainty is not one that the IPA was originally designed for, but it is one for which [pb] we eventually will need to find the right tools, so we can represent uncertainty not only in our discussion of data, as Gordon does, but also directly in our datasets, models, and statistical calculations.|p7f|cross-linguistic study, frequency, uncertainty, statistics, phonology 3711|Round2017|On pp. 71–74, Gordon compares the language-internal frequency of conso- nants to their cross-linguistic frequency and finds considerable correlation (r = 0.6). An outlier is/r/, whose rate of language-internal use is surprisingly high, around double the expected value. A little digging shows that this boils down to comparing like with like. The comparison of like with like is one of the most challenging tasks of typology, since rarely are the “correct” categories given in advance. In some cases though, it may be that cross-linguistic statistical analysis furnishes evidence in favour of one option over another. In this instance, Gordon’s source for cross-linguistic frequency is seminal work by Maddieson (1984), which carefully distinguishes trills/r/from taps/ɾ/. One result of that analytical choice is that trills/r/are cross-linguistically less than half as frequent as/l/, a finding which might seem counter-intuitive. Alternatively, if we opt to collapse all instances of non-contrastive/r/and/ɾ/into a single category, then two things happen. The cross-linguistic frequency of combined/r ~ ɾ/ approximately matches that of/l/. Furthermore, the cross-linguistic frequency of /r ~ ɾ/ is almost exactly the rate predicted from language-internal frequency, according to the general relationship that Gordon finds for most consonants. What is interesting here is the subtle inversion of classic methodology: rather than running statistics to characterize distributions among pre-established cate- gories, we are using statistical results to compare two different options for categorizing the data. These first two exampl|p8|rhotic sounds, frequency, phonology, phoneme inventory, examples, 3712|Round2017|These first two examples involve the IPA, whose pitfalls are perhaps already well known (Chao 1934; Lass 1984; Simpson 1999; Vaux 2009). But these issues are not about the IPA. Rather, they concern what might be called the space of multiple, possible phonological analyses. :comment:`This may be captured by the TIER approach, as multi-tiers will allow to define two different levels of analysis.`|p8|phonological analysis, phonological theory, data-driven research, 3713|Round2017|The first paradox is that analysis inherently involves the removal of certain dimensions of variation from the dataset; it does so not randomly but systematically, and it does so prior to statistical analysis, which in an ideal world ought not to happen. The second paradox is that when, having weighed up the inevitably somewhat conflicting evidence that any language will present us with, we decide definitively in favour of analysis A and not analysis B, our definitive act removes aspects of real uncertainty, 7 and does so prior to statistical analysis. The third paradox is that overall analyses are the product of applying many small analytic procedures, in which decisions are made according to criteria over which linguists differ with one another (Chao 1934; Hockett 1963; Hyman 2008; Dresher 2009). Consequently, any dataset that collates the work of many linguists will not compare like with like.|p9|nice quote, problem, phonological theory, comparability, 3714|Round2017|A common strategy in typology is to “normalize” datasets (Lass 1984; Hyman 2008; Van Der Hulst 2017). In essence, this involves taking existing language analyses, distributed through various parts of analysis space, and corralling them into a more limited subspace. The method provides a workable response to the third paradox, and enables the comparison, more or less, of like with like. However, it also privileges just one corner of analysis space, and it does not solve paradoxes one or two.|p9|phonology, normalization, comparability 3715|Gibbon2017|Multilinear Grammar provides a framework for integrating the many different syntagmatic structures of language into a coherent semiotically based Rank Interpretation Architecture, with default linear grammars at each rank. The architecture defines a Sui Generis Condition on ranks, from discourse through utterance and phrasal structures to the word, with its sub-ranks of morphology and phonology. Each rank has unique structures and its own semantic-pragmatic and prosodic-phonetic interpretation models. Default computational models for each rank are proposed, based on a Procedural Plausibility Condition: incremental processing in linear time with finite working memory. We suggest that the Rank Interpretation Architecture and its multilinear properties provide systematic design features of human languages, contrasting with unordered lists of key properties or single structural properties at one rank, such as recursion, which have previously been been put forward as language design features. The framework provides a realistic background for the gradual development of complexity in the phylogeny and ontogeny of language, and clarifies a range of challenges for the evaluation of realistic linguistic theories and applications. The empirical objective of the paper is to demonstrate unique multilinear properties at each rank and thereby motivate the Multilinear Grammar and Rank Interpretation Architecture framework as a coherent approach to capturing the complexity of human languages in the simplest possible way.|000|multilinear grammar, syntax, Chomsky syntax, introduction, grammatical theory 3716|Gibbon2017|Very interesting and enlightening paper which states that grammatical complexity is often exaggerated, and that due to the limited capacity of humans, regular grammas with a few extensions should be sufficient. It also states that recursion in grammar, as often provided by context-free grammars, can only be applied with limitations. The paper is a very valuable introduction to grammatical theory and a nice counterpart to naive syntax papers that assume that one needs all kinds of complexity to handle human language.|000|regular grammar, grammatical theory, multilinear grammar, introduction, 3717|Gibbon2017|Frameworks for language description in linguistics and the human language technologies tend to concentrate on single components, such as a syntactic model, a morphological or phonological model, models for narration or argumentation in texts, or models of discourse sequencing. Each model tends to have a different formal basis. The result of this specialisation is that language modelling as a whole turns out to be a hybrid system with interfaces between component subsystems. It is not impossible that this concept provides an adequate approach to understanding human language abilities and to language development in acquisition or evolution, subsystem by subsystem.[pb] economy of descriptive and explanatory means, it is a wasteful strategy. There are recent – or revived – developments in language description which are based on formally integrated and semiotically motivated approaches. In order to combine formal integration and semiotic motivation we propose a rank hierarchy of domains from discourse to word, composed of default linear structures and their semantic-pragmatic and prosodic-phonetic interpretations at each rank. The overriding aim is to formulate a foundation for combinatorial communication (@ScottPhillips<2013> and Blythe 2013).|265f|grammatical theory, combinatorial communication, 3718|Gibbon2017|But the principled incorporation of the ‘larger’ domains of language structure is coming more into focus, with properties such as longer time windows (Christensen and Chater 2016), in prosodic-phonetic interpretation (Tillmann and Mansell 1980; Gibbon 1992; Carson-Berndsen 1998) as well as in semantic-pragmatic interpretation of the higher ranks in cultural contexts and in multimodal discourse (Gibbon 2011).|266|language structure, grammatical theory, pragmatics, intonation, 3719|Gibbon2017|This network of ranks and interpretations with default linear grammars at each rank characterises human speech and its heterogeneity more realistically than unstructured lists of design features (Hockett 1958) or selected formal properties at a single rank such as phrase structure recursion (Hauser et al. 2002). Realistic approaches depend, among other criteria, on a *Procedural Plausibility Condition* of linear processing time and finite working memory.|267|procedural plausibility criterion, grammatical theory, definition 3720|Gibbon2017|The resulting more complex conditions on processing are made possible, at least to a limited extent, by enhanced but still restricted time and memory resources such as those available in writing and rehearsed speech. From the perspective of acquisition (ontogeny), evolution (phylogeny) or simply language history, centre-embedded constructions thus represent a bold step, for example from right branching rhematic object relative clauses in an SVO language to centre-embedded thematic relative clauses in subject position, forcing a construction for which human speech processing facilities are inadequate.|267|writing systems, centre-embedded constructions, grammatical theory, origin of language, language evolution 3721|Gibbon2017|Over the past few decades there have been occasional moves towards ‘flat grammars’ based on the plausibility criteria of linear processing time and finite working memory, particularly in connection with the readjustment of locutionary structures to the linearity of prosody, and in computational morphology and phonology.|267|flat grammar, procedural plausibility criterion, phonology, morphology 3722|Gibbon2017|The flat grammar approach is justified empirically, but also arises from applying Occam’s Razor, the simplicity principle, in two ways. First, powerful centre-embedding recursion and cross-serial formalisms such as context-free and indexed grammars have unbounded working memory requirements, which is an empirically implausible assumption in view of the limited working memory and linear real-time processing of speech, and even of the more relaxed constraints on writing. Overly powerful formalisms are being abandoned in many contexts, either in favour of flat linear grammars, or in favour of more complex grammars which are nevertheless constrained to have only finite working memory requirements.|268|grammatical theory, flat grammar, centre-embedded constructions 3723|Gibbon2017|Second, it remains uncontested that semantic interpretation and disambiguation may sometimes require arbitrarily complex structures in order to express the full complexity of general cognition and cultural context. But this does not necessarily apply to grammar in the narrow sense, which basically needs to account for the combinatorial facts of language varieties. A thought experiment may help to clarify. Imagine someone attending a lecture in a subject in which they have no training: the grammar is recognisable, but it is largely devoid of content for that listener, except for structurally local contributions of inflectional morphology, articles, auxiliary verbs, prepositions and conjunctions.|268|flat grammar, semantics, 3724|Gibbon2017|The relevant procedural properties of algorithms for processing these, and more complex, grammars are detailed by @Kay<1980> (1980). In computational contexts these results are well-known, but in linguistic discussions they tend to be sidelined by @Chomsky<1957>’s early theorem (1957:21, example 9): *English is not a finite-state language*. Chomsky’s theorem applies – with certain restrictions – to the complexities of written language and rehearsed speech, because these varieties have at their disposal additional memory resources and more leisurely time windows which to some extent support the additional temporal and storage requirements of centre-embedded constructions. But Chomsky’s theorem does not apply to non-practised spontaneous speech which is produced and understood on the spur of the moment, is restricted to a working memory with a rather small finite bound (Church 1980) and is error-prone on more complex tasks.|266|nice quote, Noam Chomsky, Chomsky syntax, finite state grammar, English, 3725|Gibbon2017|A principled description of the heterogeneity of language varieties on the basis of both structural and procedural plausibility criteria is required for realistic grammars, and there is no reason to maintain a distinction between ‘internal’ or competence grammars with unlimited working memory and ‘external’ or performance grammars with limited working memory if the goal is empirical realism.|268|performance, competence, Chomsky syntax, critics 3726|Gibbon2017|We claim that the centre-embedding emerges as a generalisation of non-centre-embedding recursion on the different time scales of spontaneous speech improvisation, of language acquisition, of the history of language change and of language evolution, a claim which is speculative but not implausible, and far simpler and potentially easier to falsify than claims about the triggering of recursion by genetic mutation (though not excluding this possibility).|269|centre-embedded constructions, center embedding, emergence, language evolution 3727|Gibbon2017|The upshot of these arguments is that if distributional, combinatorial syntactic properties are separated from strictly semantic constraints, then linearity is the appropriate default for structure and process in speech, and hierarchical structuring is just needed for rather rare instances of centre-embedded recursion in speech, which only occurs with severe depth constraints (Section 6). The Procedural Plausibility Condition which justifies default flat grammars is valid at all ranks: 1 at discourse rank, for example for adjacency pairs, question-answer-confirmation triples and their iterations; 2 at utterance rank, to describe narration and argument patterns; 3 at phrasal rank, modelled by left branching or right branching structures; 4 at word rank and the morphology and phonology sub-ranks, for the linear combinatorics of inflectional and derivational affixation, compounding, and phonotactics. |269|phonology, procedural plausibility criterion, morphology, grammatical theory, flat grammar 3728|Gibbon2017|One key argument is the traditional Saussurean postulate of the primacy of spoken language versus writing, to which the primacy of gesture versus spoken language may be added.|270|Ferdinand de Saussure, spoken language, procedural plausibility criterion 3729|Gibbon2017|Another argument for procedural plausibility derives from psycholinguistic studies of the lexicon. In older studies, decision-tree-like lexical access patterns define cohorts of lexical items which are disambiguated incrementally as input continues (@MarslenWilson<1987> 1987). The cohort approach has been replaced by formal paradigms such as deep artificial neural network classifiers or other statistical models, but the later approaches still capture key insights of incrementality and linearity.|270|lexical processing, cohort theory, flat grammar 3730|Gibbon2017|A more circumstantial argument for the Procedural Plausibility Condition is based on external evidence: the success on theoretical and operational grounds of stochastic flat grammars in speech technology and of statistical finite state approximations to more complex context-free grammars (Pereira and Wright 1997) in natural language processing applications such as machine translation.|270|NLP, natural language processing, flat grammar, 3731|Gibbon2017|Following one common terminological convention, a grammar G for a language L is a theory consisting of 1 a set of general premises (the grammar rules), 2 a specific premise, i.e. a minimal string consisting of a single symbol, such as A. 3 an inference procedure which uses the specific premise as an initial theorem, to infer a set of derived theorems from the general premises by modus ponens until no more general premises apply, finally yielding sets of theorems such as {big, small, very big, very small, very very big, very very small, ...}. The inference procedure is very familiar to linguists.|271|grammar, definition, 3732|Gibbon2017|A theory such as the grammar G is understood here as a description of a model: an interpretation function maps theorems of the grammar to a domain of structures, either abstract structures or structures representing observed reality. The model is thus an interpretation of the theory. Prosodic-phonetic interpretations in terms of externally observable physical events at each rank are one kind of model for the part of the theory which pertains to that rank. The semantic-pragmatic interpretations are a second kind of model. A third kind of model for the theory is a set of rooted ordered tree graphs: the phrase structures of linguistic analysis, reflecting the theorem inference process, with the initial theorem and the elements of the right-hand sides of the ‘→’ premises as nonterminal nodes, and leaf nodes corresponding to elements of the sets in the ‘=’ premises. A fourth type of model for a theory is a procedural model, expressed in terms of specific algorithms and data structures, and possibly implemented as a computer program. Depending on the type of premise, different types of procedural model may be defined. For instance, a purely right branching or purely left branching grammar (but not a mixture of the two types) can be mapped to a finite state automaton model and straightforwardly implemented in a programming language. |271|grammatical theory, grammar model, 3733|Gibbon2017|The conventions used in the semi-formal definition of grammar G in the preceding subsection are taken from the Chomsky-Schützenberger hierarchy of formal languages and formal grammars (@Chomsky<1963> and Schützenberger 1963; Hopcroft et al. 2007).|272|Chomsky-Schützenberger hierarchy, regular grammar, 3734|Gibbon2017|One well-known case of a structure which can be described with an acyclic grammar or finite state automaton is the strictly layered hierarchy, that is, a non-recursive hierarchy, with well-defined left-hand side and right-hand side category types, which has finite depth, such as the prosodic hierarchy of @Selkirk<1984> (1984) and its later variants.|273|prosodic hierarchy, strictly layered hierarchy, flat grammar 3735|Gibbon2017|The next less restricted languages in the Chomsky-Schützenberger formal language hierarchy after the regular languages are the context-free languages, or Type 2 languages, described by context-free (phrase structure, constituent structure, Type 2) grammars.|273|description, introduction, context-free grammar 3736|Gibbon2017|The main linguistically relevant property of context-free languages is that they capture unrestricted centre- embedding recursion, illustrated by arbitrarily complex sentences of the following type: the man who the boy who the dog bit called fetched the police. Centre-embedding beyond a depth of two is not too easy to process under the real-time conditions of speech, as it relies on unlimited working memory to link left and right contexts of arbitrarily deep centre-embeddings: the man...fetched the police, who the boy...called, who the dog bit. This problem does not occur with unidirectionally branching grammars. If a centre-embedded sentence like the zebra whose skin, which a man from Orlando bought illegally, got lost was the last of its species is analysable, it is because the semantic-pragmatic interpretations with their cultural associations [pb] at each level of embedding are very different, and because writing provides additional working memory on the page, and additional processing time. It is not not because the syntax itself has intrinsically tractable combinatorial properties. A sentence like the man whose car the man who the other man saw saw saw the man is puzzling, to say the least; there is no support for centre-embedding from phrasal syntax alone.|273f|semantics, centre-embedded constructions, center embedding, examples 3737|Gibbon2017|If centre-embedding is intrinsically hard to process, what could be its origin in the restricted centre- embeddings found in speech? Three possible assumptions about the genesis of centre-embedding are: 1 Centre-embedding in speech and writing has its origin in the structural generalisation of right (or left) branching constructions (such as an object relative clause with rhematic or comment function) to non-initial or a non-final positions (such as a subject relative clause with thematic or topic function). 2 The centre-embedding generalisation is enabled by the registers of writing and rehearsed speech, which have enhanced paper or screen working memory capacity and allow increased processing time. 3 Failures of centre-embedding in speech are expected, because of lack of adequate working memory and processing time. |274|center embedding, centre-embedded constructions, emergence, language evolution 3738|Gibbon2017|The suggestion fits well with suggestions made by Wittenberg and Jackendoff (@Wittenberg<2014> and Jackendoff 2014; @Jackendoff<2016> and Wittenberg 2016) on the complexification of simpler clauses during the phylogeny and ontogeny of language.|274|centre-embedded constructions, center embedding, emergence, 3739|Gibbon2017|Context-sensitive grammars are typically used at the phrasal rank in order to account for cases which require more complex recursive constructions than tree graphs can represent, such as cross-serial dependencies in sentences such as *Jack and Jill fetched water and wine, respectively*.|275|context-sensitive grammar, examples 3740|Gibbon2017|The most complex language type in the Chomsky-Schützenberger hierarchy is the Type 0 or unrestricted language, described by an unrestricted grammar, modelled by an automaton known as a universal Turing machine, which can execute arbitrary string operations. The transformations of the first twenty years of Chomskyan grammars were of this type, and are now considered too general and too uninformative for describing specific insights into natural languages.|275|Turing machine, unrestricted language, unrestricted grammar, Chomsky syntax 3741|Gibbon2017|:comment:`Introduces different types of recursion:` R1 *Recursion in general definitions of infinite sets of structures which can be represented by rooted tree graphs.* [...] [pb] R2 *Apparent recursion in strictly layered and other finite depth tree hierarchies.* [...] R3 *Recursion in purely head-recursive or purely tail-recursive grammars.* [...] R4 *Recursion over tree hierarchies, as permitted by general context-free grammars.* [...] R5 *Tree hierarchies with cross-connections between the branches.* |275f|recursion, introduction, types, definition 3742|Gibbon2017|Context-sensitive rules like these are often abbreviated in linguistic descriptions as A→BC/X__Y, epecially in phonology. Phonological ‘context-sensitive’ rules are typically not context-free in this technical sense but have procedural models which define the phonology-phonetics mapping by means of finite state transducers with input-output symbol pairs rather than with single input or output symbols (Johnson 1972; Koskenniemi 1983; Kaplan and Kay 1994; Beesley and Karttunen 2003).|275|phonology, sound change, phonological rules, phonological alternation, sound change rules, 3743|Gibbon2017|Precursors are found most clearly in the rank scale of Halliday’s scale and category grammar (@Halliday<1961> 1961), or in the levels of tagmemes in tagmemics (@Pike<1967> 1967), or the concept of stratum in stratificational grammar (@Lamb<1966> 1966), as well as in the more restricted rank-like concepts of the duality (@Hockett<1958> 1958) or double articulation (@Martinet<1960> 1960) of words in terms of morphology and phonology. The clause-rank concept of @Jespersen<1924> (1924) is more related to phrasal linearity and hierarchy than to the present rank concept.|277|multilinear grammar, precursors, 3744|Gibbon2017|In the context of Multilinear Grammar it is relevant not only that the phonotactic systems of languages but also that many, perhaps all, sound laws in language change can be modelled by finite state transducers. This has been shown for Slavic languages by @Kilbury<1997> (1997), @Kilbury<2004> & Bontcheva (2004) and @Kilbury<2011> et al. (2011). Grimm’s Law, Verner’s Law and the High German Soundshift can also be modelled fairly straightforwardly by finite state transducers. In fact, transducers for Grimm’s Law and Verner’s Law can be composed into a single transducer, showing complementary environments for the two laws, thereby indicating that the laws are intrinsically unordered, whatever their actual temporal order, and not necessarily ordered, as the traditional controversy would have it. :comment:`[note that this point is essential when investigating the application of tiers: our approaches are always unordered, order is essentially not needed!]` This was first shown by @Kindt<1978> and Wirrer (1978) with their transducer-related separation of Stellenzuordnung (position assignment) and Korrelationsbeziehung (correlation connection). It is tempting to speculate that developments on the evolutionary time scale TSphyl are also subject to the same plausibility criteria of finite working memory and linear processing time as Kilbury’s models on the historical time scale TShist .|281|Grimm's Law, Verner's Law, High German Soundshift, finite state transducer, modelling, sound change, 3745|Doering2016|Am 27. und 28. November 2015 fand im Archiv der sozialen Demokratie in Bonn das #histocamp statt, das erste BarCamp für Geschichte im deutschsprachigen Raum. 1 Es wurde vom Verein Open History e. V. gemeinsam mit der Friedrich- Ebert-Stiftung, der Max Weber Stiftung Deutsche Geisteswissenschaftliche Insti- tute im Ausland und der Stiftung Haus der Geschichte der Bundesrepublik Deutschland veranstaltet.|000|barcamp, conference, 3746|Doering2016|This article discusses a new form of scientific conference which seems to be more similar to a festival with open discussions or the camps organised by activists than academic conferences as we know them.|000|conference, barcamp 3747|Morin2017|Cultural forms are constrained by cognitive biases, and writing is thought to have evolved to fit basic visual preferences, but little is known about the history and mechanisms of that evolution. Cognitive constraints have been documented for the topology of script features, but not for their orientation. Orientation anisotropy in human vision, as revealed by the oblique effect, suggests that cardinal (vertical and horizontal) orientations, being easier to process, should be overrepre- sented in letters. As this study of 116 scripts shows, the orientation of strokes inside written char- acters massively favors cardinal directions, and it is organized in such a way as to make letter recognition easier: Cardinal and oblique strokes tend not to mix, and mirror symmetry is anisotro- pic, favoring vertical over horizontal symmetry. Phylogenetic analyses and recently invented scripts show that cultural evolution over the last three millennia cannot be the sole cause of these effects.|000|cultural evolution, writing systems,legibility, 3748|Morin2017|Paper provides evidence for "natural force" in the design of writing systems, i.e., that writing systems are strongly oriented towards legibility. This is interesting in the context of linguistic questions, given that we also know that certain design features of languages seem to be due to natural, rather than evolutionary forces, especially those concerned with what linguists call "linguistic universals".|000|writing systems, reasons for similarity, cultural evolution 3749|Holton2012|The historical relations of the Papuan languages scattered across the islands of the Alor archipelago, Timor, and Kisar in southeast Indonesia have remained largely conjectural. This paper makes a first step towards demonstrating that the languages of Alor and Pantar form a single genealogical group. Applying the comparative method to primary lexical data from twelve languages sam- pled across the islands of the Alor-Pantar archipelago, we use form-meaning pairings in basic cognate sets to establish regular sound correspondences that support the view these languages as genetically related. We reconstruct 97 Proto‒Alor-Pantar vocabulary items and propose an internal subgrouping based on shared innovations. Finally, we compare Alor-Pantar with Papuan languages of Timor and with Trans-New Guinea languages, concluding that there is no lexical evidence supporting the inclusion of Alor-Pantar languages in the Trans-New Guinea family|000|Alor-Pantar, linguistic reconstruction, concept list 3750|Holton2012|Drawing from a comparative lexical database consisting of approximately 400 items, we identify 108 cognate sets reflecting regular sound correspondences. The range of semantic domains covered by the cognate sets and the number of sets represented in each domain are given in table 2. There are only 106 distinct meanings, as two of the meanings, ‘dog’ and ‘walk’, are found in more than one cognate set. :comment:`Is this not too few in the end? We should check with the Polynesian data in our sample. The general question of how many cognate sets are needed to account for regularity should be more generally investigated.` |92|sound correspondences, regular sound change, Alor-Pantar, linguistic reconstruction, 3751|Holton2012|:comment:`Table showing sound correspondence patterns.` .. image:: static/img/Holton2012-93.png :width: 600px :comment:`It is interesting to observe that linguists usually equate correspondence patterns with proto-forms in reconstruction. In some sense, Meillet may be said to be right, however, it seems more likely that linguists interpret more into the idea of correspondence patterns, as they also except irregular output.` |93|correspondence patterns, examples, 3752|Holton2012|The IPA transcriptions used in this paper differ from the Indonesian-based orthographies of Alor-Pantar languages we use in other publications. Important differences include IPA /j/ = orthographic y, /tʃ/ = c, /dʒ / = j.|93/fn10|IPA, phonetic transcription, examples 3753|Park2017|Excavated bamboo or wooden manuscripts dating from the 5th to 3rd centuries BCE have now become important new sources of data for Old Chinese phonology. The ways these sources are interpreted are necessarily based on methodological assumptions for Old Chinese reconstruction. So when debated issues in the latter are involved, disparate observations about the same materials turn out to manifest differences in the methodologies themselves. The study of Old Chinese through excavated manuscripts seems to become further complicated by considerations of the nature of the ‘pre-Qin’ archaic script and the provenances of the manuscripts. In response to these problems, as argued in this article, it is essential to recognize that the writings from the ancient Chu and Qin regions, notwithstanding the impressive range of graphic variability reaffirm the logographic nature of the Chinese writing system. The imperial ‘script-unification’ of the Qin dynasty was primarily an orthographic standardization in accordance with the norms of the old [pb] Qin region, whereby distinct regional variants were purged and preexisting internal variants were diminished. This by no means implicated such a drastic change in the writing system as a syllabary transforming to a logography. It is therefore necessary that the principles of OC reconstruction should be applied consistently to both the excavated archaic-script writings and transmitted early Chinese textual sources. It should be maintained first of all that the xiesheng (shared-phonophorics) and Shijing (the Book of Odes) rhymes in principle converge on a single phonological system, even though the actual history of the former is most probably older than the latter. This leads us to suppose about the OC vowel system that the Rounded Vowel Hypothesis does not hold, and that excavated texts have not yielded any data suggesting otherwise. Instead, this article suggests an alternative analysis of Middle Chinese (MC) -w- < *-w- which can explain the problem in the conventional OC rhyme classification concerning the Hypothesis. In the same vein, Chu and Qin writings in each case exhibit the early Chinese xiesheng series which had been received until that time; the regional variants thereof can complement each other as evidence for the Old Chinese phonology. When elements of dialects are found sporadically in the late archaic script, whether in Chu or Qin manuscripts, one may reasonably suspect that they reflect dialect-borrowings layered within the Old Chinese language.|000|Old Chinese, Chinese writing system, linguistic reconstruction, Baxter-Sagar System, rhyme patterns, excavated manuscripts 3754|Park2017|What is potentially interesting in this article is the claim that we should have a definit theory on how Chinese characters emerged in order to reconstruct the ancient readings. What is clear for the most cases is that the emergence of Chinese characters was a process which never really ended, and that, therefore, our reconstruction needs to take into account the evolutionary aspect of their formation. The article lists some point where one could catch up and show that the author's idea needs to be formulated much more rigorously. |000|Old Chinese, Chinese writing system, Chinese character formation, 3755|Kassian2017|This paper deals with the problem of linguistic homoplasy (parallel or backward development), how it can be detected, what kinds of linguistic homoplasy can be distinguished and which varieties of the phenomenon are the most deleterious for the reconstruction of language phylogeny. It is proposed that language phylogeny reconstruction should consist of two main stages. Firstly, a strict consensus tree should be built on the basis of high-quality input data elaborated with the help of the main phylogenetic methods (such as Neighbor-joining, Bayesian MCMC, and Maximum parsimony), and ancestral character states, allowing us to reveal a certain number of homoplastic characters. Secondly, after the detected instances of homo- plasy are eliminated from the input matrix, the consensus tree is to be compiled again. It is expected that after homoplastic optimization it will be possible to better resolve individual “problem clades”, and generally the homoplasy-optimized phylogeny should be more robust than the tree constructed initially. The proposed procedure is tested on the 110-item Swadesh wordlists of the Lezgian and Tsezic groups. The Lezgian and Tsezic results generally support theoretical expectations. The MLN (minimal lateral network) method, currently implemented in the LingPy software, is a helpful tool for the detection of linguistic homoplasy.|000|homoplasy, phylogenetic reconstruction, historical linguistics 3756|Kassian2017|Paper has interesting elaborations on what drives homoplasy in linguistics, and can be considered as one of the most elaborated ones in this regard.|000|homoplasy, phylogenetic reconstruction, language evolution, 3757|Jacques2017d|In Kiranti languages, rich alternations in verbal paradigms make internal reconstruction possible, and allow a better understanding of the vowels and codas of the proto-language than is possible for other parts of speech. This paper, using data from four representative languages (Wambule, Khaling, Bantawa, Limbu), proposes a new approach to Proto- Kiranti historical linguis- tics combining the comparative method and internal reconstruction, and taking morphological alternations and analogy into account. It presents a comprehen- sive account of the sound correspondences between the four target languages and reconstructs more than 280 proto-Kiranti verb roots.|000|Kiranti, linguistic reconstruction, verb, 3758|Jacques2017d| .. image:: static/img/Jacques2017d-186.png :width: 600px :comment:`Verb derivations in Proto-Kiranti` |186|verb derivation, Kiranti, derivation pattern, 3759|Jacques2017d| .. image:: static/img/Jacques2017d-188.png :width: 600px :comment:`correspondence patterns in Kiranti`|188|examples, sound correspondences, correspondence patterns, Kiranti 3760|Jacques2017d|All proto-phonemes in Table 4 except *ʔw and *ʦh are attested by a least five etymologies. In this paper, as in other works on Kiranti historical linguistics, no attempt is made to account for the contrast between plain voiced and voiced aspirated obstruents in Khaling and Bantawa.|188|regular sound change, Kiranti, etymology, 3761|Jacques2017d|Paper contains supplementary material for the reconstructions and is thus amenable for elaborations using computational algorithms.|000|Kiranti, linguistic reconstruction, dataset, 3762|Fuhhop2016|Traditionally, contemporary German is considered to be rich in affixes which is displayed by a wide range of e. g. nominal suffixes (such as -ung, -heit, -nis, -tum, -sal). However, productivity tests, especially with non-native lexemes, challenge this view since many formal restrictions between affixes and different word classes can be formally identified – synchronically and di- achronically – and which cannot be explained by traditional approaches. This paper questions the general morphological productivity of derivation coinciding with a decrease of nominal, adjectival and verbal affixation and, in parallel, pointing to morphological alternatives. In this view, a process of an increasing “syntactification” (as it will be called) is taking place resulting in a morphological preference for conversion. Diachronically, the morphological development from compounding to derivation is well-described. The question as to why and how conversion emerges, especially in an inflectional language, and how it is linked to former or coexisting morphological types, here derivation, has never been asked – though important observations from language typology have been made. Against this background, the process of syntactifica- tion fills this research lacuna, also in a morpho-theoretical way, since it can be interpreted as an ongoing language change consisting of a change in linguistic encoding.|000|word derivation, productivity, German, word formation 3763|Fuhhop2016|Very interesting account on productivity, showing that most perceived cases of productive German word formation are indeed no longer productive. They also make clear that one example, or if a speaker creates one case, this does not mean that a construction is productive. This is important as we should ask the empirical question when exactly productivity starts and when it ends.|000|productivity, German, corpus studies, word formation 3764|Fuhhop2016|In this article I review a number of 2017 web sites that can be thought of as databases for the study of historical Indo-European languages. Such sites form natural companions to the general (Krause 2015) and specifically lexical (Krause 2016) resources for the study of historical languages which I have reviewed earlier. The resources discussed here are quite heterogeneous. This stems from the fact that the notion of a database can be quite broad, depending on one’s particular research goals.|000|dataset, database, review, 3765|Gast2016|This article contains some thoughts on the role of bilingual cognition in the diachronic change of morphological paradigms, with a focus on contact-induced change. In a first step, a general typology of paradigm change is proposed, based on a distinction between three levels of linguistic organization (the sign/Level 1, the category/Level 2, and the dimension/Level 3), and two types of change (neutralization and differentiation), thus distinguishing six types of paradigm change. Examples of these types (taken from the pertinent literature) are discussed, and two questions are addressed in each case: (i) To what extent does contact-induced paradigm change of a specific type differ from internal change? (ii) What are (potentially) the underlying cognitive processes motivating each type of change? The hypothesis is explored that there is a correlation between the three levels of analysis and three types of cognitive processes involved in paradigm change. It is suggested that change at Level 1 is typically based on analogy, change at Level 2 is often sensitive to frequency of use, and change at Level 3 may imply conceptual transfer, as discussed in recent work on weak relativity effects in the context of bilingual cognition.|000|paradigms, analogy, language change, morphological change 3766|Gast2016|Article contains some interesting ideas regarding the modeling and description of analogical processes in paradigm morphology and should be consulted or quoted when dealing with those cases in computational frameworks.|000|paradigms, word paradigm morphology, analogy, morphological change 3767|Brown2004|This paper describes a novel undertaking: comparing the relationship between grammatical ambiguity (syncretism) in nouns, as represented in a default inheritance hierarchy, with textual frequency distributions. In order to do this we consider a language with a reasonable number of grammatical distinctions and where syncretism occurs in different morphological classes. We investigated this relationship for Russian nouns. Our results suggest that there is an intricate relationship between textual frequency and inflectional syncretism.|000|syncretism, morphology, word paradigm morphology, database, corpus studies 3768|Mateo2014|Short interesting text that introduces how translation quality can be evaluated. |000|evaluation, translation quality, translation theory, translation quality assessment, metrics, TQA 3769|Mariana2015|Determining translation quality requires a precise measure of the traits being examined. This article evaluates a new framework for translation quality evaluation, Multidimensional Quality Metrics (MQM), to the task of grading student translations. It demonstrates the viability (i.e. the practicality, validity and reliability) of using the MQM framework by novice raters to judge translations based on the American Translators Association’s translator certification exam grading system. The data gathered for this study were drawn from 29 student translations of a single news story that were rated by nine novice and two expert raters. The study used average time on task, correlations between novices and experts and Many-Facet Rasch Measurement to identify the extent to which this use of the MQM framework was viable. The findings indicate that this implementation of MQM can be viable with novice raters under the right conditions.|000|translation, translation quality assessment, TQA, metrics, 3770|Papineni2002|Human evaluations of machine translation are extensive but expensive. Human eval- uations can take months to finish and in- volve human labor that can not be reused. We propose a method of automatic ma- chine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evalu- ation, and that has little marginal cost per run. We present this method as an auto- mated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.|000|evaluation, machine translation, metrics, BLEU score, evaluation metrics, 3771|Papineni2002|Almost famous paper which introduces the BLEU score which has since then been used by almost all papers working on machine translation to assess the accuracy.|000|BLEU score, machine translation, evaluation, metrics, evaluation metrics 3772|Hovy2002|This section of the workbook describes the principles and mechanism of an integrative effort in machine translation (MT) evaluation. Building upon previous standardization initiatives, above all ISO/IEC 9126, 14598 and EAGLES, we attempt to classify into a coherent taxonomy most of the characteristics, attributes and metrics that have been proposed for MT evaluation. The main articulation of this flexible framework is the link between a taxonomy that helps evaluators define a context of use for the evaluated software, and a taxonomy of the quality characteristics and associated metrics. The document overviews these elements and provides a perspective on ongoing work in MT evaluation. An introduction to MT evaluation (PDF Download Available). Available from: https://www.researchgate.net/publication/228754667_An_introduction_to_MT_evaluation [accessed Nov 26 2017].|000|evaluation metrics, machine translation, introduction, overview 3773|Noldus2015|We survey the concept of assortativity, starting from its original definition by Newman in 2002. Degree assortativity is the most commonly used form of assortativity. Degree assortativity is extensively used in network science. Since degree assortativity alone is not sufficient as a graph analysis tool, assortativity is usually combined with other graph metrics. Much of the research on assortativity considers undirected, non-weighted networks. The research on assortativity needs to be extended to encompass also directed links and weighted links. In addition, the relation between assortativity and line graphs, complementary graphs and graph spectra needs further work, to incorporate directed graphs and weighted links. The present survey paper aims to summarize the work in this area and provides a new scope of research.|000|assortativity, degree assortativity, graph theory, 3774|Viti2017|Interesting paper in so far as the author tries to find near-synonymous doublets of words in Indo-European languages which often have a different origin. They are not studies from an onomasiological perspective, but it should definitely be done.|000|semantics, synonyms, 3775|Mountford2017|Studies of severe, monogenic forms of language disorders have revealed important insights into the mechanisms that underpin language development and evolution. It is clear that monogenic mutations in genes such as FOXP2 and CNTNAP2 only account for a small proportion of language disorders seen in children, and the genetic basis of language in modern humans is highly complex and poorly understood. In this review, we examine why we understand so little of the genetic landscape of lan- guage disorders, and how the genetic background of an individual greatly affects the way in which a genetic change is expressed. We discuss how the underlying genetics of language disorders has informed our understanding of language evolution, and how recent advances may obtain a clearer picture of language capacity in ancient hominins|000|FOXP2, CNTNAP2, language evolution, language origin, genetic analysis, 3776|Mountford2017|The idea that a single gene has a distinct role or con- fers a single trait is an outdated concept. Similarly, the idea that a gene will have a single role in the cell has been dispelled. We understand that non-coding variants can play a crucial role in gene regulation, and are highly likely to have an important function in DLDs, and other neurodevelopmental disorders. The genetic background and regulation of gene expression and function is dynamic, and depends greatly on individual cell types.|7|gene function, gene expression, FOXP2, CNTNAP2, language origin 3777|Mortensen2013|This paper presents a reconstruction of the rhyme system of Proto- Tangkhulic, the putative ancestor of the Tangkhulic languages, a Tibeto-Burman subgroup. A reconstructed rhyme inventory for the proto-language is presented. Correspondence sets for each of the members of the inventory are then systematically presented, along with supporting cognate sets drawn from four Tangkhulic languages: Ukhrul, Huishu, Kachai, and Tusom. This paper also summarizes the major sound changes that relate Proto-Tangkhulic to the daughter languages on which the reconstruction is based. It is concluded that Proto- Tangkhulic was considerably more conservative than any of these languages. It preserved the Proto-Tibeto-Burman length distinction in certain contexts and reflexes of final *-l, even though these are not preserved as such in Ukhrul, Huishu, Kachai, or Tusom. Proto-Tangkhulic is argued to be a potentially useful source of evidence in the reconstruction of Proto-Tibeto-Burman.|000|Tangkhul, finals, linguistic reconstruction, Sino-Tibetan, partial cognacy 3778|Mortensen2013|The paper is an interesting example for the importance of partial reconstruction, as als in Tangkhulic, we find many instances of only partially related words, and they are yet not properly handled, but the authors try hard to make their matchings transparent.|000|Tangkhul, Sino-Tibetan, finals, linguistic reconstruction, partial cognacy 3779|Shizuka2016|The existence of discrete social clusters, or ‘communities’, is a common feature of social networks in human and nonhuman animals. The level of such community structure in networks is typically measured using an index of modularity, Q. While modularity quantifies the degree to which individuals associate within versus between social communities and provides a useful measure of structure in the social network, it assumes that the network has been well sampled. However, animal social network data is typically subject to sampling errors. In particular, the associations among individuals are often not sampled equally, and animal social network studies are often based on a relatively small set of obser- vations. Here, we extend an existing framework for bootstrapping network metrics to provide a method for assessing the robustness of community assignment in social networks using a metric we call com- munity assortativity (r com ). We use simulations to demonstrate that modularity can reliably detect the transition from random to structured associations in networks that differ in size and number of com- munities, while community assortativity accurately measures the level of confidence based on the detectability of associations. We then demonstrate the use of these metrics using three publicly available data sets of avian social networks. We suggest that by explicitly addressing the known limitations in sampling animal social network, this approach will facilitate more rigorous analyses of population-level structural patterns across social systems.|000|networks, graph theory, assortativity, modularity, community structure 3780|Shizuka2016|At the level of the whole network, we can assess the robustness of community assignments using an index called ‘assortativity’, which is a corre- lation coefficient that measures the association patterns between different types of nodes (Farine, 2014; Newman, 2002, 2003). We can use this coefficient of assortativity to measure the degree to which pairs assigned to the same community in the empirical network also occur in the same community in bootstrap replicate networks (see [pb] also Shizuka et al., 2014). Alternative indices for comparing com- munity assignments (e.g. normalized mutual information: Danon, Diaz-Guilera, Duch, & Arenas, 2005) could also be used in a similar way to compare empirical and bootstrap replicate networks. Here, we describe the general method using the coefficient of assortativity, validate this method using simulations and provide empirical ex- amples of its application to animal social networks.|238f|bootstrap, graph theory, community detection, community structure, assortativity, methodology 3781|Nakagawa1998|This article provides information on the clicks which characterize the consonantal systems of Khoisan languages and their neighbouring languages. It discusses some issues of the framework for the description of clicks, and presents a brief survey of click systems sampled from a wide range of Khoisan languages in terms of click types and click accompaniments.|000|clicks, phonetics, Khoisan, distinctive features 3782|Nakagawa1998|Article provides a chart in which features for clicks are proposed.|000|phonetics, distinctive features, phonology, Khoisan, clicks, 3783|Tseng2017|Investigations of the Sapir-Whorf hypothesis often ask whether there is a difference in the non-linguistic behavior of speak- ers of two languages, generally without modeling the underly- ing process. Such an approach leaves underexplored the rela- tive contributions of language and universal aspects of cogni- tion, and how those contributions differ across languages. We explore the naming and non-linguistic pile-sorting of spatial scenes across speakers of five languages via a computational model grounded in an influential proposal: that language will affect cognition when non-linguistic information is uncertain. We report two findings. First, native language plays a small but significant role in predicting spatial similarity judgments across languages, consistent with earlier findings. Second, the size of the native-language role varies systematically, such that finer-grained semantic systems appear to shape similarity judg- ments more than coarser-grained systems do. These findings capture the tradeoff between language-specific and universal forces in cognition, and how that tradeoff varies across lan- guages.|000|Sapir-Whorff hypothesis, computational approaches, spatial similarity judgments, inter-speaker-variation 3784|Quaternary2017|In the opinion of the journal, the accession and current housing of the Untermassfeld lithic and bone material described in this article cannot be ascertained, and its accessibility for further study cannot be verified.|000|sustainable research, sustainability, archaeology, examples 3785|Prince2015|The alienability distinction has long been recognized as one of the major factors driving differential marking of possession across languages. But opinions are divided on its exact nature. I will bring evidence from the Oceanic language Daakaka to support the hypotheses that the alienability distinction is not a lexical property of nouns but a property of possessive relations: speakers can choose between different constructions depending on the relation between the possessor and the possessed that they wish to express. Secondly, lexically non-relational nouns can be transitivized to express inalienable relations. And thirdly, the basic semantic notion behind alienable possession is control.|000|alienability, Daakaka, language typology, 3786|Prince2015|This language-specific finding also has implications for our understanding of the alienability distinction more generally. According to Nichols (1988:568), a cross-linguistically valid definition of the distinction between alienable and inalienable possession is not possible. Chappell and McGregor (1996b) take the more optimistic stance that given sufficient background knowledge about the cultural and pragmatic context, predictions about the classification of specific relations should be possible. They suggest that only those relations that are conceptualized in a given culture as particularly close will be classified as inalienable. However, both accounts, as well as most of the literature on the alienability distinction, have focused on the difference between lexically relational and non-relational nouns. But lexically encoded information is bound to include unpredictable and idiosyncratic features. For example, in some languages, clothes and personal belongings pattern with prototypically inalienable relations, while in other they pattern with alienable ones Chappell and McGregor (1996b). Such cross-linguistic differences may well reflect culture-specific preferences and ideas.|p2|island of order, alienability, lexicon, 3787|Nikolaev2017|The paper presents an overview of The Database of Eurasian Phono- logical Inventories—a new information resource and analytical tool for research in the field of distributional phonological typology, theoretical phonology, and areal linguistics.|000|database, phoneme inventory, North Eurasian languages, 3788|FerreriCancho2017|The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.|000|crossing dependencies, syntax, empirical study, 3789|FerreriCancho2018|We have clarified the issue of the scarcity of crossing dependencies. We have provided the first evidence that the actual number of crossings is significantly small. From the perspective of planarity, the proportion of non-planar sentences can be ’’high’’ in certain languages (e.g., Dutch) but still significantly low. On the other hand, the mean number of crossings per sentence is a small number, consistently with the claim that crossings in real sentences are scarce [12,14,19–21,23] even in languages where non-planar sentences abound. However, whether a number is small or large is a matter of the scale or the units of measurement [43]. Therefore, statistical testing and a theory of crossings (Section 2) are vital. The former shows that crossings are significantly low. The latter helps to understand why and how.|322|crossing dependencies, syntax, tree bank, cross-linguistic study, 3790|Mortensen2012|While a few cases of the emergence of obstruents after high vowels are found in the literature (Burling 1966, 1967, Blust 1994), no attempt has been made to comprehensively collect instances of this sound change or give them a unified explanation. This paper attempts to resolve this gap in the literature by introduc- ing a post-vocalic obstruent emergence (POE) as a recurring sound change with a phonetic (aerodynamic) basis. Possible cases are identified in Tibeto-Burman, Austronesian, and Grassfields Bantu. Special attention is given to a novel case in the Tibeto-Burman language Huishu.|000|sound change, examples, post-vocalic obstruent emergence, exemplar theory, Sino-Tibetan, Tangkhul 3791|Mortensen2012|@Vennemann<1988> (1988) and others have argued that sound changes which insert segments most commonly do so in a manner that ‘improves’ syllable structure: they provide syllables with onsets, allow consonants that would otherwise be co- das to syllabify as onsets, relieve hiatus between vowels, and so on. On the other hand, the deletion of consonants seems to be particularly common in final posi- tion, a tendency that is reflected not only in the diachronic mechanisms outlined by Vennemann, but also reified in synchronic mechanisms like the Optimality [pb] Theoretic NoCoda constraint (Prince & Smolensky 1993). It may seem surprising, then, that a small but robust class of sound changes exists in which obstruents (usually stops, but sometimes fricatives) emerge after word-final vowels. For ex- ample, the Tibeto-Burman language Huishu has ruk “breast” from earlier *ru and sik “blood” from earlier *ʃi (Mortensen & Miller 2011). |434f|sound change, reasons for sound change, syllable structure, 3792|Sugasawa2017|Hominins have been making tools for over three million years [1], yet the earliest known hooked tools appeared as recently as 90,000 years ago [2]. Hook innovation is likely to have boosted our ancestors’ hunting and fishing efficiency [3], marking a major transition in human technological evolution. The New Caledonian crow is the only non-human animal known to craft hooks in the wild [4, 5]. Crows manu- facture hooked stick tools in a multi-stage process, involving the detachment of a branch from suitable vegetation; ‘‘sculpting’’ of a terminal hook from the nodal joint; and often additional adjustments, such as length trimming, shaft bending, and bark stripping [4, 6, 7]. Although tools made by a given population share key design features [4, 6, 8], they vary appre- ciably in overall shape and hook dimensions. Using wild-caught, temporarily captive crows, we experi- mentally investigated causes and consequences of variation in hook-tool morphology. We found that bird age, manufacture method, and raw-material properties influenced tool morphology, and that hook geometry in turn affected crows’ foraging effi- ciency. Specifically, hook depth varied with both detachment technique and plant rigidity, and deeper hooks enabled faster prey extraction in the provided tasks. Older crows manufactured tools of distinctive shape, with pronounced shaft curvature and hooks of intermediate depth. Future work should explore the interactive effects of extrinsic and intrinsic fac- tors on tool production and deployment. Our study provides a quantitative assessment of the drivers and functional significance of tool shape variation in a non-human animal, affording valuable compara- tive insights into early hominin tool crafting [9].|000|New Caledonian Crows, hook tools, tool manufacture, 3793|Roebroeks2017|he database regarding the earliest occupation of Europe has increased significantly in quantity and quality of data points over the last two decades, mainly through the addition of new sites as a result of long-term systematic excavations and large-scale prospections of Early and early Middle Pleistocene exposures. The site distribution pattern suggests an ephemeral presence of hominins in the south of Europe from around one million years ago, with occasional short northward expansions along the western coastal areas when temperate conditions permitted. From around 600,000-700,000 years ago Acheulean artefacts appear in Europe and somewhat later hominin presence seems to pick up, with more sites and now some also present in colder climatic settings. It is again only later, around 350,000 years ago, that the first sites show up in more continental, central parts of Europe, east of the Rhine. A series of recent papers on the Early Pleistocene palaeontological site of Untermassfeld (Germany) makes claims that are of great interest for studies of earliest Europe and are at odds with the described pattern: the papers suggest that Untermassfeld has yielded stone tools and humanly modified faunal remains, evidence for a one million years old hominin presence in European continental mid-latitudes, and additional evidence that hominins were well- established in Europe already around that time period. Here we evaluate these claims and demonstrate that these studies are severely flawed in terms of data on provenance of the materials studied and in the interpretation of faunal remains and lithics as testifying to a hominin presence at the site. In actual fact any reference to the Untermassfeld site as an archaeological one is unwarranted. Furthermore, it is not the only European Early Pleistocene site where inferred evidence for hominin presence is problematic. The strength of the spatiotemporal patterns of hominin presence and absence depend on the quality of the data points we work with, and data base maintenance, including critical evaluation of new sites, is crucial to advance our knowledge of the expansions and contractions of hominin ranges during the Pleistocene.|000|archaeology, data quality, sustainability, sustainable research 3794|Roebroeks2017|This paper apparently was the reason why in the @Quaternary2017 the journal was sending a warning regarding a certain study that was using bad data.|000|sustainability, sustainable research, archaeology, data quality 3795|Harding2014|The article considers the nature and extent of salt production in prehistoric Europe, in the light of re cent field work. The bi o log - i cal needs of hu mans and an i mals are de scribed, as this might have determined the ex tent to which an cient com mu ni ties sought out salt if they did not have ac cess to it lo cally. Three main zones of pro duc tion, utilising so lar evaporation, briquetage, and a tech nique in volv ing wooden troughs, are de scribed; deep min ing seems only to have oc curred in the Aus - trian Alps. Lastly con sid er ation is given to the ef fects of salt pro duc tion within and be tween com mu ni ties, bearing in mind the widely ex pressed view that in pre his tory rich ness in salt led to rich ness in other goods.|000|salt, Europe, prehistory, salt production, 3796|Vukovic2017|To help us live in the three-dimensional world, our brain integrates incoming spatial information into reference frames, which are based either on our own body (egocentric) or independent from it (allocentric). Such frames, however, may be crucial not only when interacting with the visual world, but also in language comprehension, since even the simplest utterance can be understood from different perspectives. While significant progress has been made in elucidating how linguistic factors, such as pronouns, influence reference frame adoption, the neural underpinnings of this ability are largely unknown. Building on the neural reuse framework, this study tested the hypothesis that reference frame processing in language comprehension involves mechanisms used in navigation and spatial cognition. We recorded EEG activity in 28 healthy volunteers to identify spatiotemporal dynamics in (1) spatial navigation, and (2) a language comprehension task (sentence-picture matching). By decomposing the EEG signal into a set of maximally independent activity patterns, we localised and identified a subset of components which best characterised perspective-taking in both domains. Remarkably, we find individual co-variability across these tasks: people’s strategies in spatial navigation are also reflected in their construction of sentential perspective. Furthermore, a distributed network of cortical generators of such strategy-dependent activity responded not only in navigation, but in sentence comprehension. Thus we report, for the first time, evidence for shared brain mechanisms across these two domains - advancing our understanding of language’s interaction with other cognitive systems, and the individual differences shaping comprehension.|000|spatial navigation systems, cortical network, empirical study, neurolinguistics, shared brain mechanisms, cognition, 3797|Widmer2017|Some languages constrain the recursive embedding of NPs to some specific morphosyntactic types, allowing it, for example, only with genitives but not with bare juxtaposition. In Indo-Euro- pean, every type of NP embedding—genitives, adjectivizers, adpositions, head marking, or juxta- position—is unavailable for syntactic recursion in at least one attested language. In addition, attested pathways of change show that NP types that allow recursion can emerge and disappear in less than 1,000 years. This wide-ranging synchronic diversity and its high diachronic dynamics raise the possibility that at many hypothetical times in the history of the family recursive NP em- bedding could have been lost for all types simultaneously, parallel to what has occasionally been observed elsewhere (Everett 2005, Evans & Levinson 2009). Performing Bayesian phylogenetic analyses on a sample of fifty-five languages from all branches of Indo-European, we show, however, that it is extremely unlikely for such a complete loss to ever have occurred. When one or more morphosyntactic types become unavailable for syntactic recur- sion in an NP, an unconstrained alternative type is very likely to develop in the same language. This suggests that, while diachronic pathways away from NP recursion clearly exist, there is a ten- dency—perhaps a universal one—to maintain or develop syntactic recursion in NPs. A likely ex- planation for this evolutionary bias is that recursively embedded phrases are not just an option that languages have (Fitch et al. 2005), but they are in fact preferred by our processing system.|000|Indo-European, recursion, embedding, syntax, 3798|FerreriCancho2018|:comment:`[Image illustrates crossing dependencies]` .. image:: static/img/FerreriCancho2018-312.png :width: 800px :name: image :comment:`[Figure 1]`|312|crossind dependencies, syntax, figure 3799|Whitney1884|We see, it may be further remarked, upon how narrow and imperfect a basis those comparative philologists build who are content with facile setting side by side words; whose materials are simple vocabularies, longer or shorter, of terms representing common ideas. There was a period in the history of linguistic science when this was the true method of investigation, and it still continues to be useful in certain departments of the field of research. It is the first experimental process it determines the nearest and most; obvious groupings, and prepares the way for more penetrating study.|246|word list, word lists, nice quote, critics 3800|Popper1945|Now the sciences which have this interest in specific events and in their explanation may, in contradistinction to the generalizing sciences, be called the historical sciences. This view of history makes it clear why so many students of history and its method insist that it is the particular event that interests them, and not any so-called universal historical laws. For from our point of view, there can be no historical laws. Generalization belongs sim ply to a different line of interest, sharply to be distinguished from that interest in specific events and their causal explanation which is the business of history. Those who are interested in laws must turn to the generalizing sciences (for example, to sociology). Our view also makes it clear why history has so often been described as ‘the events of the past as they actually did happen’.|II:25:II:459|Karl Popper, historical sciences, history of science, philosophy of science, nice quote 3801|Stone2018|Since we encode the structural change information in the derivation itself, mor- phisms between derivations will keep track of this information. Similarly, we can describe grammati- cal relations just using derivational structure, since the explicit structural changes are part of the model.|197|word derivation, modeling, formal language theory, formal linguistics 3802|Ermolaeva2018|This research reorganizes the Distributed Morphol- ogy (DM, (Halle and Marantz, 1993)) framework to work over strings. That the morphological mod- ule should operate over strings is desirable, since it is assumed that most (arguably all) morphological processes can be modelled with regular languages (Karttunen et al. 1992). As is, DM typically op- erates on binary trees, with the syntax- morphol- ogy interface implicitly treated as a tree-transducer. We contend that using (binary) trees is overpowered, and predicts patterns which are unattested in natu- ral language (e.g. iterable nested dependencies). If however, we restrict the morphological component to working on strings, we correctly predict that mor- phology can be modelled with regular string lan- guages, and so we treat the morphological compo- nent as a finite-state string-transducer, i.e. as a regu- lar relation.|178|formal language theory, morphology, regular grammar 3803|Crain1998|Text introduces Donald Foster, a linguist who is working for the FBI on the task of author recognition.|000|author recognition, NLP, crime investigation, FBI, 3804|Cop2017|This article introduces GECO, the Ghent Eye-Tracking Corpus, a monolingual and bilingual corpus of the eyetracking data of participants reading a complete novel. English monolinguals and Dutch–English bilinguals read an entire novel, which was presented in paragraphs on the screen. The bilinguals read half of the novel in their first language, and the other half in their second language. In this article, we describe the distributions and descriptive statistics of the most important reading time measures for the two groups of participants. This large eyetracking corpus is perfectly suited for both exploratory purposes and more directed hypothesis testing, and it can guide the formulation of ideas and theories about naturalistic reading processes in a meaningful context. Most importantly, this corpus has the potential to evaluate the generalizability of monolingual and bilingual language theories and models to the reading of long texts and narratives. The corpus is freely available at http://expsy.ugent.be/downloads/geco|000|corpus studies, eye-tracking, monolingual, bilingual, sentence reading, bilingualism, dataset, database, corpus 3805|Dirix2017|In the present study, we investigated the effects of word-level age of acquisition (AoA) on natural reading. Previous studies, using multiple language modalities, showed that earlier-learned words are recognized, read, spoken, and responded to faster than words learned later in life. Until now, in visual word recognition the experimental materials were limited to single-word or sentence studies. We analyzed the data of the Ghent Eye-tracking Corpus (GECO; Cop, Dirix, Drieghe, & Duyck, in press), an eyetracking corpus of participants reading an entire novel, resulting in the first eye movement megastudy of AoA effects in natural reading. We found that the ages at which specific words were learned indeed influenced reading times, above other important (correlated) lexical variables, such as word frequency and length. Shorter fixations for earlier-learned words were consistently found throughout the reading process, in both early (single-fixation durations, first-fixation durations, gaze durations) and late (total reading times) measures. Implications for theoretical accounts of AoA effects and eye movements are discussed.|000|eye-tracking, age of acquisition, speech norms, corpus studies, corpus, dataset 3806|Dirix2017|This paper uses the corpus by @Cop2017 for the investigation and links it with other data, like the one by @Kuperman2012 and the subtitle database by @Cuetos2011.|000|eye-tracking, corpus studies, age of acquisition, speech norms 3807|Marian2012|Past research has demonstrated cross-linguistic, cross-modal, and task-dependent differences in neighborhood density effects, indicating a need to control for neighborhood variables when developing and interpreting research on language processing. The goals of the present paper are two-fold: (1) to introduce CLEARPOND (Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities), a centralized database of phonological and orthographic neighborhood information, both within and between languages, for five commonly-studied languages: Dutch, English, French, German, and Spanish; and (2) to show how CLEARPOND can be used to compare general properties of phonological and orthographic neighborhoods across languages. CLEARPOND allows researchers to input a word or list of words and obtain phonological and orthographic neighbors, neighborhood densities, mean neighborhood frequencies, word lengths by number of phonemes and graphemes, and spoken-word frequencies. Neighbors can be defined by substitution, deletion, and/or addition, and the database can be queried separately along each metric or summed across all three. Neighborhood values can be obtained both within and across languages, and outputs can optionally be restricted to neighbors of higher frequency. To enable researchers to more quickly and easily develop stimuli, CLEARPOND can also be searched by features, generating lists of words that meet precise criteria, such as a specific range of neighborhood sizes, lexical frequencies, and/or word lengths. CLEARPOND is freely-available to researchers and the public as a searchable, online database and for download at http://clearpond.northwestern.edu.|000|speech norms, neighborhood density, phonetic similarity, corpus, dataset, 3808|Gelitz2018|Article presents the eye-tracking studies based on the database by @Cop2017 and the experiments by @Dirix2017. The results seem to suggest that knowing a concept in one's mother tongue makes it easier to learn it in a foreign language.|000|age of acquisition, corpus studies, eye-tracking, bilingualism 3809|Statuts1866|La Société n'adment aucune communication concernant, soit l'origine du langage, soit la création d'une langue universelle. :translation:`[Die Gesellschaft gestattet keinerlei Diskussion zum Ursprung der Sprache oder zur Erschaffung einer Universalsprache]`|III,|nice quote, language origin, origin of language, Société Linguistique de Paris 3810|Thouseau2017|Linguistic and genetic data have been widely compared, but the histories underlying these descriptions are rarely jointly inferred. We developed a unique methodological framework for analysing jointly language diversity and genetic polymorphism data, to infer the past history of separation, exchange and admixture events among human populations. This method relies on approximate Bayesian computations that enable the identification of the most probable historical scenario underlying each type of data, and to infer the parameters of these scenarios. For this purpose, we developed a new computer program PopLingSim that simulates the evolution of linguistic diversity, which we coupled with an existing coalescent-based genetic simu- lation program, to simulate both linguistic and genetic data within a set of populations. Applying this new program to a wide linguistic and genetic data- set of Central Asia, we found several differences between linguistic and genetic histories. In particular, we showed how genetic and linguistic exchanges differed in the past in this area: some cultural exchanges were maintained without genetic exchanges. The methodological framework and the linguistic simulation tool developed here can be used in future work for disentangling complex linguistic and genetic evolutions underlying human biological and cultural histories.|000|ABC, Approximate Bayesian Computation, Indo-Iranian, Turkish, Central Asia, simulation studies 3811|Schippan1974|Article deals with lexical motivation, or semantic motivation and uses an interesting set of terms that do not seem to ever have made it into English, but are useful to discuss the phenomenon of lexical motivation. For a research on motivation structures, it is indispensable to further elaborate on this and similar work. |000|motivation, semantic motivation, object naming, expression, expressivity, semantics, 3812|Schippan1974|Die Bedeutung als eine sprachliche Kategorie ist bezogen auf das sprachliche Formativ, bildet eine Seite des sprachlichen Zeichens. Sie stellt eine Struktur verschiedener Sememe dar, die als ideelle Abbilder durch die Beziehung auf das gemeinsame Formativ und gemeinsame Bedeutungselemente zur semantischen Einheit des Wortes zusammengeschlossen sind. Die durch die Sememe ein und desselben Wortes bezeichneten Erscheinungen, Gegenstände, Eigenschaften, Prozesse stehen oft in keiner tatsächlichen Beziehung zueinander.|213|meaning, sememe, semantics, definition 3813|Schippan1974|Damit erhalten die Bedeutungen einen historischen Apsekt. In ihren Elementen kónnen auch Merkmale widergespiegelt sein, deren "Urbilder" überlebt sind. bedeutungen sind *stabiler* als die individuellen Abbilder. Doch das ändert grundsätzlich nichts daran, daß sie objektiv bestimmt sind durch die Eigenschaften der Gegegstände der objektiven Realität.|215|diachronic explanations, meaning, word meaning, lexical meaning, semantics, 3814|Schippan1974|2. Die Bedeutungen sind weiter durch die Art und Weise bestimmt, in der sich die Gesellschft die Wirklichkeit geistig aneignet. Wie die Menschen klassifizieren, welche Eigenschaften der Dinge ihnen wesentlich sind, ist abhängig vom Erkenntnissstand, aber auch -- und das in erster LInie -- vom *Interesse*, das die Gesellschaft oder eine ihrer Gruppen an einem Bereich der objektiven Realität hat.|215|cultural evolution, meaning, culture-language interaction, influence, lexical meaning, semantics, 3815|Schippan1974|3. Der dritte bedeutungsbestimmtende Faktor is tdas sprachsystem bselbst. Die Bedeutungen werden in ihrem Umfang und in iherer Qualität durch Systembeziehungen mitbestimmt, denn jedes neue Wort wird dem lexikalischen Teilsystem nicht einfach hinzugefügt, sondern ihm inkorporiert. Dadurch werden auch andere Wörter beeinflußt, ihre Bedeutungen werden begrenzt.|216|systemic processes, lexical meaning, meaning, influence, interaction, 3816|Schippan1974|Im Motivationsprozeß werden, als Resultat von Vergleichsoperationen, neue Zuordnungen von Formativen und Abbildern vorgenommen, indem bestimmte Merkmale des zu Benennenden mit ihren schon vorhandenen Morphemen benannt und damit hervorgehoben werden. Neubenennung muß nicht Erstbenennung sein, sonder ist auch Zweitbenennung aus Gründen stärkerer Expressivität, der Korrektur des Standpunktes, der Verdeutlichung oder Verhüllung, der höheren Verallgemeinerung oder der beabsichtigten größeren Massenwirksamkeit. Mit der Benennung wird in ein sprachlich-gedankliches System (Mirkossystem) eingeordnet. Zuordnungsbahnen werden geöffnet und somit alle Faktoren wirksam, die sich aus der Stellung der Benennungsmotive im Sprachsystem ergeben. |218|naming strategies, object naming, neologism, motivation, motivation structure, 3817|Kuemmel2017|he article investigates the agricultural lexicon of Indo-Iranian, especially its earlier records, and what it may tell us about the spread of farming. Ater some general remarks on “Neolithic” vocabulary, a short overview of the animal husbandry terminology shows that this ield of vocabulary was evidently well- established in Proto-Indo-Iranian, with many cognate terms. Words for cattle, horses, sheep and goats are well developed and mostly inherited, while evidence for pigs is more limited, ad the words for donkey and camel look like common loans. A more extensive discussion of plant terminology reveals that while some generic terms for grain are inherited, more speciic words for diferent kinds of cereals show few inherited terms and/or irregular variation, and the same is even clearer for pulses and some other vegetables. he terminology for agricul- tural terminology is largely diferent from that of most European branches of Indo- European. he conclusion is that the cultural background behind these linguistic data points to spreading of a mainly pastoralist culture in the case of Indo-Iranian.|000|linguistic palaeography, Indo-Iranian, pastoralism, agricultural terms, 3818|Kuemmel2017|he lexicon of Indo-Iranian, like that of Indo-European in general, presupposes a Neolithic stage of cultural development. he terminology for animal husbandry and pastoralism is well developed and easily reconstructible. Terminology for diferent aspects of plant cultivation is also present, including terms for grain, pulses, and technology such as ploughing – however, it is more diicult to reconstruct, as we shall see. While it is disputed if wheeled vehicles can be assumed for the Proto- Indo-European level, there is no questions that they were known already in Proto- Indo-Iranian, including the chariot (PII *rátha-).|277|Indo-European, wheel, linguistic palaeography 3819|Kuemmel2017|As already mentioned, this semantic ield is well attested and contains many assured Proto-Indo-Iranian terms with rather ine-grained distinctions. his is valid for cattle, horses and sheep, while terms for goats are already a little more varied, and the words for the “southern” animals, donkey and camel, seem to be loanwords.|277|Indo-Iranian, animals, horse, sheep, cattle, 3820|Kortvelyessy2017b|The paper summarizes the fundamental principles of research into 73 European languages examined and evaluated on the basis of 100 word-formation characteristics. The focus of this paper is on affixation, in particular, the role of suffixation, prefixation, suffixal-prefixal derivation, circumfixation and infixation in forming new complex words. This objective is achieved by means of two main parameters, the parameter of structural complexity whose quantitative representation is saturation value (both absolute and relative) and the parameter of the measure of occurrence. Based on the evaluated data the authors identify the SAE core and periphery for suffixation, prefixation and prefixal-suffixal derivation|000|word formation, affix, affixation, Standard Average European, European languages, typology, lexical typology 3821|Kortvelyessy2017b|It is based on an analysis of 100 word-formation characteristics that serve a comprehensive comparison of word-formation systems in various languages. They are labeled as ‘comparables’, i.e., linguistic features that are used to compare prototypical (theory- independent) manifestations of word-formation systems in sample languages. We distinguish two types of comparables, basic comparables that include basic word-formation techniques of coining new words, and complex comparables that include word-formation processes. Our 100 basic comparables represent 12 complex comparables, including prefixation, suffixation, prefixal-suffixal derivation, circumfixation, infixation, postfixation, compounding,[pb] conversion, reduplication, blending and internal modification.|18f|word formation, comparative concept, 3822|Kortvelyessy2017b|Both suffixation and prefixation are evaluated by means of 16 basic comparables defined by the word-class of the input and the output, and one comparable examining the possibility to form a new complex word by suffixation/prefixation of an already suffixed/prefixed word: .. image:: static/img/Kortvelyessy2017-19.png :width: 800px :name: bla :comment:`[Table 1]`|19|prefix, suffix, suffixation, word formation, comparative concept, 3823|Andrason2016| This article designs a method of improving traditional, qualitative semantic maps based on grammaticalisation paths, by including both quantitative data (frequency) and information concerning a gram’s environment (the relation to the other maps). The incorporation of qualitative evidence transforms vectored maps into waves, while the introduction of the contextual factor combines waves organised along the same grammaticalisation template into a stream. The structure of a wave delivers, in turn, the statistical prototypicality of a gram (i.e. the prototypicality that is conditioned by the gram’s own wave), whereas the structure of the stream yields product prototypicality (i.e. the prototypicality that is a combination of the gram’s wave and the other waves of the stream). It is additionally hypothesised that the product prototypicality may be an overt indicator of the psychological perception of the grams by speakers.|000|semantic map, typology, empirical study, 3824|Andrason2016|On the one hand, senses are connected because they arise due to human cognitive mechanisms, being derived by means of metaphor, image-schema process, metonymy, analogy and abduction (Andersen 1973; Antilla 1989; Sweetser 1990; Lichtenberk 1991; Traugott and König 1991; Bybee, Perkins and Pagliuca 1994; Heine 1997; Radden and Kövecses 1999; Panther and Radden 1999; Croft and Cruse 2004; Steen 2007; Yu 2008, 2009; Carstairs- McCarthy 2010; Geeraerts 2010).|2|cognition, polysemy, semantic map, lexical typology 3825|Georgakopoulos2018|First, several scholars observed that areal factors possibly lead to the extension of the meaning of a linguistic form in a given language based on the meaning of a similar expression in a (prestigious) neighboring language (e.g., van der Auwera et al., 2009). This phenomenon, known as ‘polysemy copying,’ has been studied within the classical semantic map method and described with the labels ‘semantic map harmony’ (Tenser, 2008; 2016; see also Matras, 2009: 263–264) and ‘semantic map assimilation’ (Gast & van der Auwera, 2012).|p25|semantic change, polysemy copying, language contact 3826|Regier2013|Semantic maps are a means of representing universal structure underlying se- mantic variation. However, no algorithm has existed for inferring a graph- based semantic map from cross-language data. Here, we note that this open problem is formally identical to the known problem of inferring a social net- work from disease outbreaks. From this identity it follows that semantic map inference is computationally intractable, but that an efficient approximation algorithm for it exists. We demonstrate that this algorithm produces sensible semantic maps from two existing bodies of data. We conclude that univer- sal semantic graph structure can be automatically approximated from cross- language semantic data.|000|semantic map, algorithms, graph theory, automatic approach 3827|Regier2013|We assume that each such grouping picks out a connected region of an underlying universal network of semantic functions, but we are not given the connections of that network. In- stead, we wish to infer the set of connections between semantic functions that best explains the observed groupings.|92|semantic map, inference, problem, 3828|Regier2013|Given aset V of vertices (representing semantic functions), and a set of constraints S i ⊆ V (representing a set of language-specific groupingsof these functions into categories), we wish to find the minimum set of edges E between the vertices of V such that each S i picks out a connected subgraph of the graph G =( V , E ) .|92|problem, semantic map, inference, 3829|Regier2013|They wished to find [pb] the social network that could best account for the observed outbreaks – that is, the minimum set of edges E such that each constraint S i picks out a connected subgraph of the overall social graph G =( V , E ) . This social network inference problem is formally the same as the semantic map inference problem; therefore any formal results concerning one also apply to the other.|92f|graph theory, semantic map, problem, disease outbreak, social networks 3830|Angluin2010|We consider the problem of inferring the most likely social network given connectivity constraints imposed by observations of outbreaks within the network. Given a set of vertices (or agents) V and constraints (or observations) S i  ⊆ V we seek to find a minimum log-likelihood cost (or maximum likelihood) set of edges (or connections) E such that each S i induces a connected subgraph of (V,E). For the offline version of the problem, we prove an Ω(log(n)) hardness of approximation result for uniform cost networks and give an algorithm that almost matches this bound, even for arbitrary costs. Then we consider the online problem, where the constraints are satisfied as they arrive. We give an O(nlog(n))-competitive algorithm for the arbitrary cost online problem, which has an Ω(n)-competitive lower bound. We look at the uniform cost case as well and give an O(n 2/3log2/3(n))-competitive algorithm against an oblivious adversary, as well as an Ω(n−−√)-competitive lower bound against an adaptive adversary. We examine cases when the underlying network graph is known to be a star or a path, and prove matching upper and lower bounds of Θ(log(n)) on the competitive ratio for them.|000|social networks, inference, disease outbreak, graph theory, algorithms 3831|Knight2008|Like a number of other Kimberley languages, Bunuba has very few morpholog- ically simple verbs. Most verbs (including exponents of some semantic primes, such as WANT , SEE , and THERE IS ) consist of an inflected auxiliary combined with an invariable coverb. After a brief review of how other predicate primes are expressed in Bunuba, the main body of the chapter considers semantic primes SAY , DO , THINK , HAPPEN , and FEEL , which, it is argued, are all expressed by a single, morphologically simple Bunuba verb MA. Detailed language- internal evidence is adduced to support the existence of this striking five-way polysemy. It is shown that each of the five identifiable lexical units has a distinc- tive syntactic/semantic profile. These facts are incompatible with alternative analyses which posit a single general abstract meaning.|000|polysemy, hyperpolysemy, Bunuba, Australian languages 3832|Monaghan2010|Natural reading development gradually builds up to the adult vocabulary over a period of years. This has an effect on lexical processing: early-acquired words are processed more quickly and more accurately than later-acquired words. We present a connectionist model of reading, learning to map orthography onto phonology to simulate this natural reading development. The model learned early words more robustly than late words, and also showed interactions between age of acquisition and spelling-sound consistency that have been reported for skilled adult readers. In additional simulations, we demonstrated that age of acquisition effects are a consequence of incremental exposure to words in concert with changes in plasticity as learning proceeds, and are not due to uncontrolled differences in ease of reading between early and late-acquired words. Models which do not learn through cumulative training are unable to explain age of acquisition and related effects.|000|reading, reading development, age of acquisition, speech norms 3833|Dingemanse2016|A key finding of our study is that both segments and prosody contribute to the effect of sound symbolism, and that neither alone is sufficient to drive the effect. This adds to the body of converging evi- dence suggesting that iconic associations between form and meaning are not to be sought in single phonemes and their supposed meanings, but in structural correspondences that recur across words, and that involve both segmental and suprasegmental information (Nuckolls 1996, Tufvesson 2011, Emmorey 2014).|128|supra-segmentals, phonological segments, sound symbolism, experimental study 3834|Kristiansen2017|During the history of archaeology we have witnessed how a new scientific breakthrough changed the balance between science and archaeology by producing new knowledge that had previously been unob- tainable and thereby making former interpretations and theories obsolete.|000|archaeology, philosophy of science, scientific breakthrough 3835|Kristiansen2017|This was the case when C14 undermined the Neolithic chronology during the 1950s and 1960s, and many archaeologists therefore refused to accept the new result for quite an extended period.|p1|C4 dating, archaeology, history of science 3836|Kristiansen2017|And now we are here again. Since the beginning of the first decade after the millen- nia change, we have witnessed a new scienti- fic breakthrough, from strontium isotopic tracing to more recently ancient DNA, which has produced fundamental new knowl- edge that shatters the prevailing paradigm.|p1|scientific revolution, archaeology, dating, ancient DNA 3837|Kristiansen2017|At the heart of his argument is the proposition that science and humanities do not share the same methodologies, which also has implications for theory and interpretation.|p1|archaeology, humanities, science, methodology, 3838|Kristiansen2017|Should archaeology be a historical discipline whose interpretations were anchored in a humanistic discourse of the particular (‘Romanticism) or a science-based discipline whose interpretations were anchored in a scientific discourse of historical regularities (‘Rationalism’)? This universal schism in the history of research, termed by Snow the ‘Two Cultures’, begs the question: why can the two not co-exist peacefully?|p2|archaeology, romanticism, rationalism, scientific revolution, 3839|Kristiansen2017|Within archaeology there exists a divide between relative and absolute dating: relative dating is purely archaeological and based on a stylistic or typological classification and ordering of time sequences.|p2|relative dating, archaeology, dating, history of science 3840|Kristiansen2017|When C14 was introduced it turned out that these guesses had been rather wrong. However, the point to be made here is that the radio car- bon revolution indeed revolutionised archae- ological research. Not only by providing absolute dating for all prehistory but, in doing so, also freeing a lot of research resources that could now be employed more productively to analyse and interpret the organisation of prehistoric societies.|p2|resources, scientific revolution, radio-carbon dating, C14 dating, dating, 3841|Kristiansen2017|Through this example I wish to emphasise two things: (1) interdisciplinary or multi- disciplinary collaboration is hard work that put demands on both sides to understand at least the basics of archaeology and science in order to be able to collaborate critically and productively and (2) both sides employ their own theoretical and methodological standards, some are shared, some not. Therefore, there exists a relative autonomy between scientific and humanistic [pb] research frameworks. In this I am in partial agreement with Tim Flohr Sørensen. But they share enough common ground to enable inter- disciplinary collaboration whether about things or about humans.|p3f|interdisciplinary research, archaeology, humanities, science, 3842|Nichols2017|The category of person has both inflectional and lexical aspects, and the distinction provides a finely graduated grammatical trait, relatively stable in both families and areas, and revealing for both typology and linguistic geography. Inflectional behavior includes reference to speech-act roles, indexation of argu- ments, discreteness from other categories such as number or gender, assignment and/or placement in syntax, arrangement in paradigms, and general resemblance to closed-class items. Lexical behavior includes sharing categories and/or forms and/or syntactic behavior with major lexical classes (usually nouns) and generally resembling open-class items. Criteria are given here for typologizing person as more vs. less inflectional, some basic typological correlations are tested, and the worldwide linguistic-geographical distribution is mapped.|000|inflection, person, grammatical categories, typological study, correlational studies, 3843|Nichols2017|This seems to be an example for data dreding or p-hacking, as the study does not report significance levels (Signifikanzniveau) and also does not control fur multiple testing.|000|p-hacking, data dredging, correlational studies, person marking, typological study, 3844|Head2015|A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.|000|p-hacking, data dredging, meta study, introduction 3845|Head2015|There are two widely recognized types of researcher-driven publication bias: selection (also known as the “file drawer effect”, where studies with nonsignificant results have lower publication rates [7]) and inflation [12]. Inflation bias, also known as “p-hacking” or “selective reporting,” is the misreporting of true effect sizes in published studies (Box 1). It occurs when researchers try out several statistical analyses and/or data eligibility specifications and then selectively report those that produce significant results [12–15]. Common practices that lead to p-hacking include: conducting analyses midway through experiments to decide whether to continue collecting data [15,16]; recording many response variables and deciding which to report postanalysis [16,17], deciding whether to include or drop outliers postanalyses [16], excluding, combining, or splitting treatment groups postanalysis [2], including or excluding covariates postanalysis [14], and stopping data exploration if an analysis yields a significant p-value [18,19].|1/15|p-hacking, selective reporting, definition, introduction 3846|Fisher1922|This is the paper introducing what is known as Fisher's exact test. A test for the independence of findings in contingency tables.|000|statistical test, original paper, R. A. Fisher, Fisher's exact test, significance 3847|Brembs2013|Most researchers acknowledge an intrinsic hierarchy in the scholarly journals (“journal rank”) that they submit their work to, and adjust not only their submission but also their reading strategies accordingly. On the other hand, much has been written about the negative effects of institutionalizing journal rank as an impact measure. So far, contributions to the debate concerning the limitations of journal rank as a scientific impact assessment tool have either lacked data, or relied on only a few studies. In this review, we present the most recent and pertinent data on the consequences of our current scholarly communication system with respect to various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings or retractions). These data corroborate previous hypotheses: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether, in favor of a library-based scholarly communication system, will ultimately be necessary. This new system will use modern information technology to vastly improve the filter, sort and discovery functions of the current journal system.|000|journal rank, scientific practice, 3848|Leskien1876|Um nicht missverstanden zu werden, möchte ich noch hinzufügen: versteht man unter Ausnahmen solche Fälle, in denen der zu erwartende Lautwandel aus bestimmten erkennbaren Ursachen nicht eingetreten ist, z. B. das Unterbleiben der Verschiebung imdeutschen in Lautgruppen wie *st* u.s.w., wo also gewissermassen eine Regel die andre durchkreuzt, so ist gegen den Satz, die Lautgesetze seien nicht ausnahmslos, natürlich nichts einzuwenden. Das Gesetz wird eben dadurch nicht aufgehoben und wirkt, wo diese oder andre Störungen, die Wirkungen ander Gesetze nicht vorhanden sin, in der zu erwartenden Weise. Lässt man aber beliebige zufällige, unter einander in keinen Zusammenhang zu bringende Abweichungen zu, so erklärt man im Grunde damit, dass das Object der Untersuchung, die Sprache, der wissenschaftlichen Erkenntnis nicht zugänglich ist. :translation:`To avoid any misunderstanding I would add the following: if by exception we mean the cases in which the expected phonetic change has not occurred due to specific and identifiable causes […] – that is, when a rule interferes to some extent with another one – nothing evidently contradicts the principle that phonetic laws have no exceptions. The law is still present and when this or that disturbing factor – that is, the action of other laws – is not present, it continues to operate as expected. If instead we admit to random exceptions, of whatever nature, that can in no way be related among themselves, we are basically saying that the object at hand, that is language, is not accessible to scientific knowledge.` :comment:`[Translation by `@Formigari2018 :comment:`]` |xxviii|exceptionlessness hypothesis, Neogrammarian sound change, August Leskien, 3849|Coblin1983|All previous works in this area have now been superceded by Luo Changpei and Zhou Zumo's monumental *Han Wei Jin Nanbeichao yunbu yanbian yanjiu* 汉魏晋南北朝韵部演变研究 (Luo and Zhou 1958), which provides a comprehensive listing of Han rime sequences accompanied by extensive and detailed analysis and discussion. This work is is the standard reference source for the riming practices of Han times, and the system of rime categories it proposes has usually formed the basis for subsequent discussions of the syllable finals of the Han period. Phonological reconstructions are not attempted by Luo and Zhou, but on the basis of their rime categories Ting (@1975: 235-60) has proposed tentative reconstructions for the western and Eastern Han periods. These reconstructions are viewed as evolutionary stages through which Ting [pb] derives his reconstructed WJ finals from those of the OC system proposed by Li (@
  • 1971). |3f|rhyme patterns, rhyme analysis, Eastern Hàn, 3850|Coblin1983|The *duruo* glosses of SW were of great interest to scholars of Qing and early Republican times. Most of the major studies of this period are included in SWGL. |5|Shuōwén Jiězì, dúruò, Hàn time, sources, 3851|Branner2000|The words ǎn, fān, and qiè are simply markers that identify the preceding two syllables as a fǎnqiè sound-gloss. It is not known how far back these terms go.|38|fǎnqiè, terminology, origin, history of science 3852|Branner2000|Fǎnqiè were incorporated into various dic- tionaries, the most important of which is the Qièyùn, which will be discussed below. But we know of such large-scale compendia be- ginning only from the middle of medieval period ⫺ from around the 6th century. At first, fǎnqiè seem to have been used most often in the annotation of classical texts. These early materials are especially useful in trying to understand the principles of fǎnqiè construction. One of the best extant sources for early exegetic fǎnqiè is a book called the Jı̄ngdiǎn shı̀wén, or “Exegeses on Classical [pb] Texts”, compiled by Lù Démı́ng (c. 550⫺ 630). The Shı̀wén is meant to be read along with one of the standard editions of the ten classical books most revered in that day. It supplies glosses and readings for individual characters in order as they appear in the orig- inal text. Lù Démı́ng gathered his material from hundreds of different sources, most of which are otherwise unknown to us. Some- times enough fǎnqiè from an individual scho- liast appear in the Shı̀wén that we can get a working idea of the outlines of a coherent phonological system from them, but most of the time we can recover no such context.|37f|Jīngdiǎn Shìwén, fǎnqiè, sources, Old Chinese, Hàn Chinese, Hàn time, 3853|Branner2000|For if there is any one characteristic of fǎn- qiè, it is that only loose distinctions are made. There are many places, even in fairly consis- tent corpora of fǎnqiè, where the gloss and the glossed syllable do not quite match. |39|fǎnqiè, critics, examples, 3854|Branner2000|This theory was the work of a number of 5th century poets, but it is most saliently associated with the name of Shěn Yuē (441⫺513), who also gave us the names of the four canonical tone categories, pı́ng, shǎng, qù, and rù. Shěn’s prosodic rules, the so-called sı̀shēng bābı̀ng “four tones and eight prosodic defects” emphasize arranging the syllables of a poetic couplet and quatrain so that their tones, initials and finals all con- trast as much as possible. Shěn and his fol- lowers developed the concept of contrasting the pı́ng tone category with the so-called zè “non-level” category, embracing the shǎng, qù, and rù tones.|44|Chinese tones, sìshēng, píng shǎng qù rù, Shěn Yuē, history of science, Chinese linguistics 3855|Sun2017|Prior to the advent of fǎnqiè, there already existed two ways to show pronunciation as early as the Hàn dynasty (206 BCE–220 CE). First, there was the zhíyīnfǎ 直音法, literally “direct pronunciation method”, in which a character’s pronunciation was demonstrated by another, homophonous character.|p1|sound glosses, zhíyīnfǎ, terminology, definition 3856|Sun2017|Second, there was the bǐnǐfǎ 比擬法, literally “compare and imitate method”, in which a character was said to have a pronunciation similar to another character; for example: xí, dú ruò xī 郋, 讀若奚, i.e., xí 郋 is read like xī 奚, from the dictionary Shuōwén Jiězì 說文解字 of about 100 CE.|p1|dúruò, bǐnǐfǎ, Shuōwén Jiězì, terminology, definition 3857|Sun2017|After its invention in the late Hàn dynasty (2nd to 3rd c. CE), fǎnqiè became prevalent in exegesis, dictionaries and rhyme books. For example, in the character dictionary Yùpiān 玉篇 (543 CE), each character as an entry is immediately followed by a fǎnqiè notation.|p3|fǎnqiè, origin, history of science, 3858|Raffaelli2018|The paper explores the importance of the concept of ‘taste’ in the formation of the Croa- tian and the Turkish lexicon. The main goals of the paper are 1) to investigate differences and similarities in conceptual mappings based on the concept of ‘taste’ in two typologi- cally different and genetically unrelated languages by analyzing the vocabulary based on the root kus in Croatian and the vocabulary based on the root tat in Turkish and 2) to see to what extent the formation of taste vocabulary differs with respect to lexicalization patterns in the two languages.|000|lexicalization, taste, onomasiological approach, Turkish, Croatian, motivation, semantic motivation 3859|Raffaelli2018|Paper uses the term *lexicalization* to denote what could also be labelled *motivation*. It is interesting to compare the vocabulary use further to make sure that the terminology is correctly understood.|000|lexicalization, motivation, semantic motivation, Turkish, terminology, Croatian 3860|Raffaelli2018|However, some of the studies done so far have shown that, be- sides regularities, there are also some cross–linguistic differences in the way lexemes and lexical structures of perception vocabulary extend their meanings into abstract domains, in spite of the fact that sense modalities are biologically common to all humans. Differences have been observed with regard to differ- ent language families and cultures.|22|linguistic differences, semantic motivation, lexicalization 3861|Raffaelli2018|Therefore, a comparative and contrastive analysis of two typologically and genetically unrelated languages could point to regularities and specificities in lexicalization processes operative in the formation of vocabularies related to the concept of ‘taste’.|22|terminology, lexicalization, semantic motivation 3862|Lindner2015|In the wake of the Indian grammatical tradition, Franz Bopp was the first to recognize that Indo-European words could generally be broken down into the structure root + derivational affix + inflectional affix. The identification of what was indistinctly called Grundform ‘basic form’, Stammform ‘stem form’, Stamm ‘stem’ or Thema ‘theme’, was so new and groundbreaking at the time that Bopp felt obliged to provide the following clarifications: .. pull-quote:: Die Indischen Grammatiker fassen die Nomina (sowohl Substantive, als Adjektive, Prono- mina und Zahlwörter), in ihrem absoluten, von allen Casusverhältnissen unabhängigen, und von allen Casuszeichen entblößten Zustande auf, und nehmen daher eine Grund- oder Stammform an, zu welcher der Nominativ und die obliquen Casus der drei Zahlen sich als [pb] abgeleitet verhalten. Diese Grundform kommt häufig in zusammengesetzten Wörtern vor, indem die ersten Glieder eines Compositums aller Casusendungen beraubt, und somit iden- tisch mit der Grundform sind. (@Bopp<1827> 1827: 23) [The Indian grammarians conceived of nouns (substantives, adjectives, pronouns and numer- als) in their absolute state, independent of all relations and markers of case, assuming the existence of a basic or stem form, from which the nominative and the oblique cases of all three numbers were derived. This basic form often appears in compound words, the first members of compounds being deprived of all case endings and therefore identical with the basic form.] |38f|root, stem, affix, terminology, history of science, Franz Bopp 3863|Lindner2015|In the second half of the 19 th century there was also some confusion concerning the notion Wortbildung ‘word-formation’ itself. The early Neogrammarians used Wortbild- ung in the sense of ‘inflection’, while what is called Wortbildung today went under the term Stammbildung ‘stem-formation’: .. pull-quote:: Es zerfällt demnach die wortbildungslehre (formen-, flexionslehre) in die lehre von der bildung der nomina und in die lehre von der bildung der verba. Jenen liegen nominal-, diesen verbalstämme zu grunde. Die lehre von der bildung der nomina nennt man declination, die lehre von der bildung der verba conjugation. (Miklosich 1876: 1; vgl. Miklosich 1875) [Word-formation (the study of forms, of inflection) therefore comprises the formation of nouns and verbs. The former are derived from noun stems, the latter from verb stems. The study of the formation of nouns is called declension, the study of the formation of verbs conjugation.]|40|terminology, word formation, definition, history of science 3864|Casule2017|Unreadable paper about Burushaki, a linguistic isolate, which the author thinks to be related with Indo-European.|000|Burushaki, Indo-European, genetic relationship, language isolate, 3865|Casule2017|Comparative historical studies have established over five hundred lexical correspondences between autochthonous Burushaski words and Indo-European as well as significant grammatical correlations. A genetic relationship has been proposed. Within these correspondences, the correlations of Burushaski with Slavic together with other branches are numerous and regular. These are not the subject of this paper. We concentrate exclusively on Burushaski isoglosses with words or meanings uniquely found in Slavic which consequently often have unclear, difficult or competing etymologies. The stratification of these isoglosses is complex. It appears that we might be dealing with various layers. In some cases, the phonetic and formal make up suggests a correlation of remote antiquity, yet in many instances it is difficult to establish a chronology. Most of the isoglosses involve cultural borrowing, with the direction of borrowing unclear, but a significant number (the considerable correspondences in the names of body parts, grammatical particles) may point to a closer genetic relationship.|000|Burushaki, language isolate, Indo-European, genetic relationship, long-range comparison 3866|Lieberherr2017|Tianshin Jackson Sun (Sun 1992, Sun 1993) was the first to suggest the phylogenetic relatedness of a number of highly divergent, endangered and poorly described languages of Western Arunachal Pradesh, later named the ‘Kho-Bwa cluster’ by Van Driem (2001). In this paper, we make use of what are predominantly new data from our own field work, covering a total of 22 linguistic varieties. In a list of 100 lexical entries, cognate roots were tagged and subsequently a pairwise “cognacy percentage” was computed which forms the basis for a hierarchic cluster analysis. The result of this analysis and some further considerations confirm earlier reported views of a phylogenetic relationship between these languages. The appendix contains the full data set with cognacy statements. All computer code is available and documented on Github (https://github.com/metroxylon/kho-bwa-lexicostat).|000|Kho-Bwa, dataset, Sino-Tibetan, lexicostatistics, concept list 3867|Meelen2017|This paper presents a new approach to two challenging NLP tasks in Classical Tibetan: word segmentation and Part-of-Speech (POS) tagging. We demonstrate how both these problems can be approached in the same way, by generating a memory-based tagger that assigns 1) segmentation tags and 2) POS tags to a test corpus consisting of unsegmented lines of Tibetan characters. We propose a three-stage workflow and evaluate the results of both the segmenting and the POS tagging tasks. We argue that the Memory-Based Tagger (MBT) and the proposed workflow not only provide an adequate solution to these NLP challenges, they are also highly efficient tools for building larger annotated corpora of Tibetan.|000|POS tagging, part of speech, Classical Tibetan, Tibetan, word segmentation 3868|Duanmu2017a|The “non-uniqueness” theory assumes that there is no best solution in phonemic analysis; rather, competing solutions can co-exist, each having its own advantages (Chao, Bulletin of the Institute of History and Philology 4: 363–398, 1934). The theory is based on the assumption that there is no common set of criteria to evaluate alternative solutions. I argue instead that such a set of criteria can be established and it is possible to find the best solution. The criteria include riming properties, rime structure, constraints on syllable gaps, phonemic economy, phonetics, syllable sizes, and feature theory. I illustrate the proposal with Chengdu. Four analyses are compared, the “CGV” segmentation, the “CV” segmentation, the “finest” segmentation, and the “CVX” segmentation, and CVX is shown to be the best.|000|non-uniqueness theory, Chao Yuenren, phonology, phonological theory, Chinese, Chéngdū Chinese 3869|Duanmu2017a|@Hockett<1960> (1960: 90) proposes that a defining property of human language is the use of two coding systems (the duality of patterning): (i) sentences are made of words (or morphemes) and (ii) words are made of phonemes (consonants and vowels).|1/23|duality of patterning, phoneme, morpheme, terminology 3870|Duanmu2017a|Aware of such ambiguity, @Chao<1934> (1934) argues that there is no best solution in phon- emic analysis. Instead, phonemic analysis serves multiple functions, and each function may favor a different solution.|2/23|non-uniqueness theory, Chao Yuenren, problem, phonological theory 3871|Duanmu2017a|Article contains some interesting examples on problems of segmentation of words into phonemes, which illustrate the general problems and give hints to further literature.|000|Chéngdū Chinese, non-uniqueness theory, phonology, segmentation 3872|Duanmu2017a|For example, for Swadesh (1935: 149), English diphthongs are single phonemes, but for Trager and Bloch (1941: 234) and Pike (1947b: 151), each English diphthong is made of two phonemes. Simi- larly, for Wiese (1996), the German affricates [pf ts tʃ dʒ] are four phonemes and [pb] no further segmentation is needed, but for Kohler (1999), they are two phonemes each and should be segmented as [p+f t+s t+ʃ d+ʒ].|1f/23|German, English, affricates, phoneme segmentation 3873|Pyysalo2017|By the present day only a handful of models are left to compete for a solution concerning the reconstruction of the Proto-Indo-European ( PIE ) laryngeal and vowel system. The remaining hybrid versions of the laryngeal theory, as proposed by E ICHNER , M ELCHERT , K ORTLANDT , and R IX , explain the Indo-European ( IE ) vocalisms with both the laryngeals *h 1 *h 2 *h 3 and at least two of the proto- vowels *e *o *a. Due to this dual fixation these models are inherently ambiguous, as in principle every IE vocalism can be explained with both a laryngeal and the respective vowel. This means that the laryngeal theory is ultimately incapable of solving the PIE laryngeal and vowel problem, and the only way out is a radical simplification of the framework. A simplification was first proposed by Oswald S ZEMERÉNYI , who reconstructed a single glottal fricative PIE *h = Hitt. ḫ, accompanied by a near equivalent of the Neogrammarian vowel system *a e o ā ē ō å ǝ. Despite the need for additional work on a number of key problems, monolaryngealism, as proposed by him, remains the only realistic option for Indo-European linguistics in the 21 st century.|000|laryngeal theory, Indo-European, linguistic reconstruction 3874|Hsu2017|This paper revisits Chinese counterfactuality with 要不是 yàobúshì 'if it were not' by presenting additional data from elementary school children and high school teenagers to determine the availability of counterfactual reasoning with psycholinguistic studies. Constituent comparisons of propositional representations were hypothesized in the mental model of sentence processing. Results indicate both developmental groups processed counterfactuals with yàobúshì similarly to college students, demonstrating that the mismatch of structures and semantics does not exist in counterfactuals with yàobúshì in Chinese. Relevance to theory of mind in processing counterfactuality with Chinese as an example is proposed and discussed.|000|counterfactuality, Chinese, Mandarin, yàobúshì, theory of mind 3875|Stevens2016|he period from the late third millennium BC to the start of the first millennium AD witnesses the first steps towards food globalization in which a significant number of important crops and animals, independently domesticated within China, India, Africa and West Asia, traversed Central Asia greatly increasing Eurasian agricultural diversity. This paper utilizes an archaeobotanical database (AsCAD), to explore evidence for these crop translocations along southern and northern routes of interaction between east and west. To begin, crop translocations from the Near East across India and Central Asia are examined for wheat (Triticum aestivum) and barley (Hordeum vulgare) from the eighth to the second millennia BC when they reach China. The case of pulses and flax (Linum usitatissimum) that only complete this journey in Han times (206 BC–AD 220), often never fully adopted, is also addressed. The discussion then turns to the Chinese millets, Panicum miliaceum and Setaria italica, peaches (Amygdalus persica) and apricots (Armeniaca vulgaris), tracing their movement from the fifth millennium to the second millennium BC when the Panicum miliaceum reaches Europe and Setaria italica Northern India, with peaches and apricots present in Kashmir and Swat. Finally, the translocation of japonica rice from China to India that gave rise to indica rice is considered, possibly dating to the second millennium BC. The routes these crops travelled include those to the north via the Inner Asia Mountain Corridor, across Middle Asia, where there is good evidence for wheat, barley and the Chinese millets. The case for japonica rice, apricots and peaches is less clear, and the northern route is contrasted with that through northeast India, Tibet and west China. Not all these journeys were synchronous, and this paper highlights the selective long-distance transport of crops as an alternative to demic-diffusion of farmers with a defined crop package.|000|crop dispersal, archaeology, peopling of South-East Asia, agriculture, dataset 3876|Liu2017|Here we present one of the world’s oldest examples of large-scale and formalized water management, in the case of the Liangzhu culture of the Yangtze Delta, dated at 5,300–4,300 years cal B.P. The Liangzhu culture represented a peak of early cultural and social development predating the historically recorded Chinese dynasties; hence, this study reveals more about the ancient origins of hydraulic engineering as a core element of social, political, and economic developments. Archaeological surveys and excavations can now portray the impressive extent and structure of dams, levees, ditches, and other landscape-transforming features, sup- porting the ancient city of Liangzhu, with an estimated size of about 300 ha. The results indicate an enormous collective under- taking, with unprecedented evidence for understanding how the city, economy, and society of Liangzhu functioned and developed at such a large scale. Concurrent with the evidence of technolog- ical achievements and economic success, a unique relationship be- tween ritual order and social power is seen in the renowned jade objects in Liangzhu elite burials, thus expanding our view beyond the practicalities of water management and rice farming.|000|archaeology, hydraulic enterprise, damb construction, rice cultivation, China, 3877|Formigari2018|Paper discusses the regularity of sound correspondences in the debates between Schuchardt and the Neo-Grammarians.|000|history of science, Hugo Schuchardt, Wilhelm Wundt, sound law, regularity hypothesis 3878|Garnier2017|Recent evidence from archaeology and ancient DNA converge to indicate that the Yamnaya culture, often regarded as the bearer of the Proto-Indo-European language, underwent a strong population expansion in the late 4th and early 3rd millennia BCE. It suggests that the underlying reason for that expansion might be the then unique capacity to digest animal milk in adulthood. We examine the early Indo-European milk-related vocabulary to confirm the special role of ani- mal milk in Indo-European expansions. We show that Proto-Indo-European did not have a specialized root for ‘to milk’ and argue that the IE root *h 2 melg̑ - ‘to milk’ is secondary and post-Anatolian. We take this innovation as an indication of the novelty of animal milking in early Indo-European society. Together with a detailed study of language-specific innovations in this semantic field, we con- clude that the ability to digest milk played an important role in boosting Proto- Indo-European demography.|000|milk, archaeology, Indo-European, linguistic palaeography, Indo-European homeland 3879|Altmann2005|From time to time but ever more frequently, natural scientists intervene in linguistics showin ga specific analogy in two disciplines or transferring methods or bringing in a new way of argumentation. Not without pleasure we ascertain that it was quantitative linguistics that attracted the attention of the representatives of "harder" sciences among whom there were physicists, mathematicians and those of "softer" sciences like psychology, musicology, documentation, scientometry, etc., the majority of whom has been inspired by G.K. Zipf, a linguist having left his traces in at least twenty different scientific disciplines. In many cases the "external linguists" solved relevant problems and enriched linguistics but it was not necessarily always so. The enticing idea of being able to say something about language because we all can speak is the same as the idea to be able to philosophize because we can think. We take the opportunity to express our dissa­tisfaction with some practice tacitly consented to hoping that the pro­voked physicists help us to leave the prescientific levcl in which ma­ny linguistic disciplines are settled. We shall discuss here very general problems leaving specific ones open.|000|analogy, biological parallels, physics, interdisciplinary research, nice quote 3880|Altmann2005|The introduction point made at the beginning is already very nice in this paper as it elicites what has been bothering myself for long: everybody thinks they can talk about language just because they speak.|000|biological parallels, physics, analogy, philosophy of science, 3881|Brentari2017|Language emergence describes moments in historical time when nonlinguis- tic systems become linguistic. Because language can be invented de novo in the manual modality, this offers insight into the emergence of language in ways that the oral modality cannot. Here we focus on homesign, gestures developed by deaf individuals who cannot acquire spoken language and have not been exposed to sign language. We contrast homesign with (a) ges- tures that hearing individuals produce when they speak, as these cospeech gestures are a potential source of input to homesigners, and (b) established sign languages, as these codified systems display the linguistic structure that homesign has the potential to assume. We find that the manual modality takes on linguistic properties, even in the hands of a child not exposed to a language model. But it grows into full-blown language only with the support of a community that transmits the system to the next generation.|000|language emergence, sign language, deaf individuals, gesture, homesign, 3882|Brentari2017|This article is particular interesting for its terminology, since language emergence is not always used as a term in this strict sense.|000|terminology, language emergence, homesign, overview, 3883|Fruehwald2017|This article reviews the role phonology plays in phonetic changes. After first establishing what kinds of changes qualify as phonetic changes for the purposes of discussion, and laying out the theoretical outlook that is adopted here, I review the most obvious cases in which phonology plays a role in phonetic change. These include (a) the way phonological contrast can lead to phonetic dispersion, (b) the way phonological natural classes can define a set of segments to undergo a parallel phonetic shift, and (c) how phonological biases may lead to instances of underphonologization. Throughout, I discuss alternative approaches to these phenomena.|000|phonology, sound change, reasons for sound change, overview 3884|Fruehwald2017|1) Phonetic change refers exclusively to a change in speakers’ knowledge of how speech sounds are implemented in a continuous phonetic space. 2) Phonological influence refers to any kind of effect that speakers’ knowledge of discrete, categorical phonological representations and processes can have on causing, constraining, or preventing phonetic change as defined in definition 1.|2/18|phonology, sound change, reasons for sound change, systemic processes 3885|Goldsmith2017|This article reviews research on the unsupervised learning of morphology, that is, the induction of morphological knowledge with no prior knowledge of the language beyond the training texts. This is an area of considerable activity over the period from the mid 1990s to the present. It is of particular interest to linguists because it provides a good example of a domain in which complex structures must be induced by the language learner, and successes in this area have all relied on quantitative models that in various ways focus on model complexity and on goodness of fit to the data.|000|unsupervised method, morphology, morpheme detection, morpheme segmentation 3886|Goldsmith2017|From a practical point of view, we need to better understand exactly how well our current methods of morpheme segmentation work, based on some reliable measurements in several dozen languages. In addition, we need to address the challenge of learning the morphosyntactic features [pb] that organize both the inflectional morphology and the interface between syntax and morphol- ogy. Current and recent research on category induction will help with this task, just as methods of induction of rules of morphophonology will help provide simpler computational models of morphology per se.|17f/24|challenges, morpheme detection, evaluation, 3887|Kemp2018|Crosslinguistic research on domains including kinship, color, folk biology, number, and spatial relations has documented the different ways in which languages carve up the world into named categories. Although word mean- ings vary widely across languages, unrelated languages often have words with similar or identical meanings, and many logically possible meanings are never observed. We review research suggesting that this pattern of con- strained variation is explained in part by the need for words to support effi- cient communication. This research includes several recent studies that have formalized efficient communication in computational terms and a larger set of studies, both classic and recent, that do not explicitly appeal to efficient communication but are nevertheless consistent with this notion. The effi- cient communication framework has implications for the relationship be- tween language and culture and for theories of language change, and we draw out some of these connections.|000|semantic typology, review, overview, communicative efficiency, 3888|Saussure1916|Cela est si bien l'essentiel qu'on pourrait désigner les éléments phoniques d'un idiome a reconstituer par des chiffres ou des signes quelqonques.|303|Ferdinand de Saussure, abstractionist-realist debate, nice quote 3889|Meillet1903|Les «restitutions» ne sont rien autre chose que les signes par lesquels on exprime en abrégé les correspondances.|23|sound correspondences, Antoine Meillet, proto-form, abstractionist-realist debate, nice quote 3890|Gasiorowski1999|The parallels between biological and linguistic evolution have been noticed and commented upon for well over a century, both by students of language - suffice it to mention August Schleicher's (@1863) celebrated tree (Stammbaum) model of the Indo-European family - and by natural scien- tists, from Charles Darwin's remarks on the subject in The Origin of Species (1859) to a number of contributions, in recent years, by the geneticist Luigi L. Cavalli-Sforza.|000|cladistics, biological parallels, genetic classification, family tree, introduction, analogy 3891|Gasiorowski1999|A very detailed and interesting article, especially given that it is pre-quantitative turn linguistics. Many things are mentioned already, which makes this a must-cite in many occasions.|000|cladistics, genetic classification, family tree, biological parallels, analogy, introduction 3892|Gasiorowski1999|If at a certain point in the evolution of a group of lan- guages a diachronically stable innovation arises in one language and is passed on to its descendants, it can be used as evidence that the languages that share it form a clade. The more such common innovations, the better the case for grouping the languages in question into a single taxon.|53|shared innovation, cladistics, genetic classification, 3893|Gasiorowski1999|However, the more naive notion of 'shared resemblance' is somehow difficult to eradicate.|53|shared innovation, shared traits, problem, cladistics, family tree, 3894|Gasiorowski1999|One such change is *s > h in Iranian/ and Greek/. We cannot group the two taxa into a clade on account of this single innovation, not only because the contexts in which it occurs are slightly different in [pb] both, but also because numerous common innovations show Iranian/ to be more closely related to Indie/, in which there is no comparable change.|53f|parallel development, shared innovation, cladistics, genetic classification 3895|Gasiorowski1999|Generally, the odder an evolutionary step is, the lower the likelihood that it was taken independently in two or more lineages. In other words, a non- trivial change has a better chance of representing a plausible synapomorphy. Changes like the final devoicing of obstruents, vowel nasalisation before syllable-final nasals, or the palatalisation of velars before front vowels, are commonplace in completely unrelated languages; their diagnostic value is therefore practically nil.|54|shared innovation, diagnostic value, cladistics, genetic classification, 3896|Gasiorowski1999|If we succeed in reconstructing a realistic family tree of languages, our protolanguage reconstructions will be seriously affected. The current prac- tice is to project onto the Proto-Indo-European plane any archaic-looking features of the attested languages, with particular emphasis on Greek and Sanskrit.|55|phylogenetic reconstruction, ancestral state reconstruction, linguistic reconstruction, cladistics, genetic classification, 3897|Mendes2017|When misfortune befalls another, humans may feel distress, leading to a motivation to escape. When such misfortune is per- ceived as justified, however, it may be experienced as rewarding and lead to motivation to witness the misfortune. We explored when in human ontogeny such a motivation emerges and whether the motivation is shared by chimpanzees. Chimpanzees and four- to six-year-old children learned through direct interaction that an agent was either prosocial or antisocial and later saw each agent’s punishment. They were given the option to invest physical effort (chimpanzees) or monetary units (children) to continue watching. Chimpanzees and six-year-olds showed a preference for watching punishment of the antisocial agent. An additional control experiment in chimpanzees suggests that these results cannot be attributed to more generic factors such as scene coherence or informational value seeking. This indicates that both six-year-olds and chimpanzees have a motivation to watch deserved punishment enacted.|000|punishment, chimpanzees, pre-school children, antisocial behaviour 3898|Francois2017|Linguistic diffusion is commonly equated with contact, and contrasted with geneal- ogy. This article takes a new perspective, by showing how diffusion lies in fact at the heart of language genealogy itself. Indeed, the Comparative method has taught us to identify genetic subgroups based on sets of shared innovations; but each of these innovations necessarily had to diffuse from speaker to speaker across a ­network of then mutually intelligible idiolects. Such a diffusionist approach to language genealogy allows us to model language change as it really took place in the social and geographical space of past societies. Crucially, the entangled isoglosses typical of dialect continuums and linkages (Ross 1988) cannot be handled by the Tree model, which is solely based on diver- gence; but they are easily captured by a diffusionist approach such as the Wave model, where the key process is convergence. After comparing the theoretical underpinnings of these two models, I introduce Historical Glottometry, a new quan- titative approach aiming to free the Comparative Method from any cladistic assump- tion, and to reconcile it with a wave-based analysis. Finally, data from a group of Oceanic languages from Vanuatu illustrate the powerful potential of Glottometry as a new method for linguistic subgrouping.|000|historical glottometry, comparative method, wave theory, linguistic diffusion 3899|Lee2016|This paper introduces Linguistica 5, a soft- ware for unsupervised learning of linguistic structure. It is a descendant of Goldsmith’s (2001, 2006) Linguistica. Open-source and written in Python, the new Linguistica 5 is both a graphical user interface software and a Python library. While Linguistica 5 inherits its predecessors’ strength in unsupervised learn- ing of natural language morphology, it incor- porates significant improvements in multiple ways. Notable new features include tools for data visualization as well as straightforward extensions for both its components and em- bedding in other programs.|000|software, Linguistica, open source, linguistic structure, inference, automatic approach 3900|List2007|The term “motivation” is usually used in derivational morphology and refers to the ‘degree to which [the complex word] can be understood as the sum of the parts of its meanings and their manners of combination’ :comment:`[Metzler Lexicon Sprache is the Source quoted]`. Motivation is a gradual concept and implies perspicuity and comprehensibility, but it is neither regular nor explicit in the sense that it only gives one possibility of expression. How the speakers of one language have decided to express certain concepts may be understandable, but it is never predictable. :translation:`Metzler Lexikon Sprache („Motiviertheit”, 458): ‘Ausmaß, in dem [das komplexe Wort] sich als Summe der Bedeutungen seiner Teile und der Weise ihrer Zusammenfügung verstehen lässt (my translation).`|p1|motivation, semantic motivation, definition, expression 3901|Schwarz1996|Die sprachliche Referenz wird von drei Aspekten geprägt: von der Gebundenheit an die Ausdrücke einer Sprache, von der Determination durch die lexikalischen Bedeutungen, die mit den Ausdrücken konventionell verbunden sind und die das jeweilige Referenzpotential (d.h. die Klasse aller möglichen Referenten) eines Ausdrucks festlegen, und von dem Gebrauch sprachlicher Ausdrücke in bestimmten Situationen durch einen Sprecher.|175|reference potential, expression, meaning, denotation 3902|Baxter2017|This is a direct reply to @Schuessler2015 where he criticizes the Old Chinese reconstructions of @Baxter2014. |000|Old Chinese, Old Chinese phonology, linguistic reconstruction, methodology, revew, reply 3903|Schuessler2015|A reply to this review of @Baxter2014 is available in @Baxter2017|000|Old Chinese, linguistic reconstruction, phonology, review 3904|Burridge2018|We provide a unified mathematical explanation of two classical forms of spatial linguistic spread. The wave model describes the radiation of linguistic change outwards from a central focus. Changes can also jump between population centres in a process known as hierarchical diffusion. It has recently been proposed that the spatial evolution of dialects can be understood using surface tension at linguistic boundaries. Here we show that the inclusion of long-range interactions in the surface tension model generates both wave-like spread, and hierarchical diffusion, and that it is surface tension that is the dominant effect in deciding the stable distribution of dialect patterns. We generalize the model to allow population mixing which can induce shrinkage of linguistic domains, or destroy dialect regions from within.|000|dialect history, dialect evolution, wave theory, geographic spread, simulation studies 3905|Zheng2018|The Min dialect group is easily characterized, in contrast to other dialect groups in China. Min varieties do, however, exhibit a considerable degree of diversity. The internal relationship among these dialects has remained a puzzle, and the study of Min classification is still considered one of the most challenging tasks in Chinese dialectology. The present article applies a new approach, subgrouping, to solve the problem of classifying the 12 representative Min dialects. The study begins with a review of 0210; 0215 ; 0230 Proto-Min system and revises this system based on more comprehensive data. The revised Proto-Min system is then compared with the modern Min dialects to arrive at 39 shared innovations that constitute criteria for subgrouping. A phylogenetic tree model processed by the PENNY program (from PHYLIP, a package of computer programs for inferring phylogenies; Phylogeny Inference Package, version 3.695) displays the result of Min subgrouping based on maximum parsimony. The phylogenetic results derived through this method are shown to differ significantly from the traditional classification of Min dialects. The model also pinpoints the position of some controversial dialects within the Min family.|000|maximum parsimony, subgrouping, Mǐn, Chinese dialects, computational approaches, phylogenetic reconstruction 3906|Zheng2018|Article uses the approach by @Baxter2006 with new characters to do subgrouping of Mǐn. It is surprising how this article made it into the journal.|000|Mǐn, subgrouping, Chinese dialects, phylogenetic reconstruction, maximum parsimony 3907|BerezKroeker2018|The difference between reproducible research and replicable research is that the latter produces new data, which can then ostensibly be analyzed for either confirmation or disconfirmation of previous results; the former pro- vides access to the original data for independent analysis.|4|reproducibility, replicability, definition, terminology, validity 3908|BerezKroeker2018|In Bird and Simons’ seminal 2003 article on portability for linguistic data in the digital age, the authors present at least four domains of data management that directly support reproducible research as it is understood here: citation, discovery, access, and preservation. Of particular interest to the present discus- sion among these is citation. Bird and Simons advocate a robust citation prac- tice: “[w]e value the ability of users of a resource to give credit to its creators, as well as learn the provenance of the sources on which it is based” (2003: 572).|7|reproducibility, citation, replicability, validity, open research 3909|BerezKroeker2018|Paper's title is more promising than the conent. In a nutshell, all they claim is that data be cited and acknowledged. This does not go in the direction of open research which I was envisaging earlier.|000|open research, reproducibility, replicability, dataset, citation 3910|BerezKroeker2018|This paper is a position statement on reproducible research in linguis- tics, including data citation and attribution, that represents the collective views of some 41 colleagues. Reproducibility can play a key role in increasing verification and accountability in linguistic research, and is a hallmark of social science research that is currently under-represented in our field. We believe that we need to take time as a discipline to clearly articulate our expectations for how linguistic data are managed, cited, and maintained for long-term access.|000|dataset, reproducibility, position-paper, replicability, open research 3911|Zhang2018|Han Chinese experienced substantial population migrations and admixture in history, yet little is known about the evolutionary process of Chinese dialects. Here, we used phylogenetic approaches and admixture inference to explicitly decompose the underlying structure of the diversity of Chinese dialects, based on the total phoneme inventories of 140 dialect samples from seven traditional dialect groups: Mandarin, Wu, Xiang, Gan, Hakka, Min and Yue. We found a north-south gradient of phonemic differences in Chinese dialects induced from historical population migrations. We also quantified extensive horizontal language transfers among these dialects, corresponding to the complicated socio-genetic history in China. We finally identified that the middle latitude dialects of Xiang, Gan and Hakka were formed by admixture with other four dialects. Accordingly, the middle-latitude areas in China were a linguistic melting pot of northern and southern Han populations. Our study provides a detailed phylogenetic and historical context against family-tree model in China.|000|STRUCTURE, population genetics, biological parallels, Chinese dialects, sound inventories 3912|Zhang2018|Extremely poorly written paper in which pop-gen methods are applied in the wild to find signals that are not there.|000|sound inventories, Chinese dialects, population genetics, biological parallels, 3913|Reali2018|Languages with many speakers tend to be structurally simple while small communities sometimes develop languages with great structural complexity. Paradoxically, the opposite pattern appears to be observed for non-structural properties of language such as vocabulary size. These apparently opposite patterns pose a challenge for theories of language change and evolution. We use computational simulations to show that this inverse pattern can depend on a single factor: ease of diffusion through the population. A popu- lation of interacting agents was arranged on a network, passing linguistic conventions to one another along network links. Agents can invent new con- ventions, or replicate conventions that they have previously generated themselves or learned from other agents. Linguistic conventions are either Easy or Hard to diffuse, depending on how many times an agent needs to encounter a convention to learn it. In large groups, only linguistic conventions that are easy to learn, such as words, tend to proliferate, whereas small groups where everyone talks to everyone else allow for more complex conventions, like grammatical regularities, to be maintained. Our simulations thus suggest that language, and possibly other aspects of culture, may become simpler at the structural level as our world becomes increasingly interconnected.|000|grammar, lexicon, language evolution, linguistic complexity, simulation studies, artificial agents 3914|Michailovsky1994|Bhutan is home to perhaps a dozen Tibeto-Burman languages; the three major ones, from west to east, are Dzongkha, the official language, linguistically a Tibetan dialect, Bumthap, and Sharchop (or Tshangla).|000|Bumthang, Tibeto-Burman, Sino-Tibetan, Tshangla, Bhutan, grammar sketch 3915|Wu2018|The genus Citrus, comprising some of the most widely cultivated fruit crops worldwide, includes an uncertain number of species. Here we describe ten natural citrus species, using genomic, phylogenetic and biogeographic analyses of 60 accessions representing diverse citrus germ plasms, and propose that citrus diversified during the late Miocene epoch through a rapid southeast Asian radiation that correlates with a marked weakening of the monsoons. A second radiation enabled by migration across the Wallace line gave rise to the Australian limes in the early Pliocene epoch. Further identification and analyses of hybrids and admixed genomes provides insights into the genealogy of major commercial cultivars of citrus. Among mandarins and sweet orange, we find an extensive network of relatedness that illuminates the domestication of these groups. Widespread pummelo admixture among these mandarins and its correlation with fruit size and acidity suggests a plausible role of pummelo introgression in the selection of palatable mandarins. This work provides a new evolutionary framework for the genus Citrus.|000|genetics, Citrus fruit, plant evolution, plant domestication 3916|Sankoff2018|Understanding the relationship between language change and variation has progressed considerably over the last several decades, but less is known about how speakers at different life stages deal with ongoing change in their speech communities. Longitudinal studies of individuals and groups reveal three trajectory types postadolescence: stability (the most common), adopting (to some degree) a change led by younger people (the next most common tra- jectory), or swimming against the community current by reverting to an older pattern in later life (the least common trajectory). Declining plastic- ity over the life course places limits on possible trajectories, which are also subject to social and cultural influences. This article reviews relevant stud- ies from historical linguistics as well as panel studies on African American English and dialect contact, proposing that future progress will be made by interdisciplinary research combining psycholinguistic and sociolinguistic perspectives. Lifespan trajectories in situations of community stability are also discussed.|000|language change, lifespan, introduction, review, overview 3917|Schmidt2017|Article introduces a database for spoken German. Maybe interesting to keep in mind when writing up database articles, although this article is way too traditional in some sense.|000|database, spoken German, spoken vernacular, data paper 3918|Kaltz2015|In Greco-Latin grammatical theory as well as in the first grammars of European vernac- ular languages, word-formation was treated within the presentation of word classes. This article aims to show how specific theories of derivation and composition were progres- sively developed by grammarians of German and French.|000|word formation, history of science, introduction, overview 3919|Libben2015|Most words of a language will contain more than one morpheme. Thus, word-formation is central to the understanding of how words are represented in the mind and brain, how they are created, and how acts of word-formation, diachronically in the history of the language and synchronically in acts of language processing, are linked to cognitive processes and patterns of brain activity. Early studies on morphological processing in the visual domain revealed that morphological constituents are activated. It is also the case, however, that effects of the characteristics of whole-word forms have been found. These have given rise to models that capture the processing of multimorphemic words as both fully integrated units and as decomposed representations. Recent proposals have called into question the extent to which traditional theoretical constructs such as mor- pheme, morphological structure, and mental lexicon actually play a role in lexical repre- sentation and processing from a psycholinguistic and neurocognitive perspective. It has been suggested that we need to recast those constructs in a manner that accords with a more dynamic view of words in the mind and brain and the role that learning and experience play in shaping how multimorphemic words are represented.|000|word formation, psycholinguistics, neurolinguistics, overview 3920|Stekauer2015|This article establishes that the relation between inflectional morphology and derivation- al morphology is that of cline. The main reasons for this situation are briefly outlined. The fuzzy nature of the boundary between the two areas of morphology is illustrated by means of selected criteria for their delimitation that are each confronted with counter- examples taken from various languages of the world.|000|word formation, word derivation, inflection, introduction, overview 3921|Mugdan2015|The article discusses the notion of word and the smaller units that words are composed of (morph and morpheme, root and affix, stem). It provides surveys of different types of affixes and morphological processes (in particular, reduplication, substitution and subtraction), with a focus on terms and definitions. It also addresses the question of which units can be inputs to word-formation rules, with special attention to inflected forms and phrases as well as the role of word classes.|000|word formation, word, lexeme, terminology, description, overview 3922|Simpson1999|The approach to phonological comparison adopted in UPSID as well as other studies fails to recognize the abstract nature of even the most phonetically based definition of a phonemic system. Phonemes receive a simple phonetic translation based on one allophone. Phonological comparison is therefore carried out using no more than arbitrary selections of the phonetics of the languages involved. Phonemic systems belonging to the phonological level of comparison are being compared in phonetic terms, misrepresenting the abstract relational nature of a phonological system and at the same time grossly oversimplifying the complex phonetic patterns employed in languages to bring about differences in meaning. While the appeal of many aspects of UPSID is recognized, the need for a more complex demarcation of three levels of phonetic and phonological comparison requiring different types and quantities of information is argued. |000|phoneme inventory, problem, critics, database, phonology, 3923|Simpson1999|Important and early article criticising databases of phoneme inventories.|000|critics, phoneme inventory, database, phonology 3924|Lotz2017|Clear research questionnaires ultimately help to ensure the reliability and comparability of the data that they gather (Fowler 1992; Lenzner 2012; Moroney and Cameron 2016). This paper explores the intersection of best practices in the fields of questionnaire design and intralingual translation as a means to ensure clarity and comprehensibility in research questionnaires. The questionnaire design perspective on comprehensibility (as represented by the 2010, 2011 and 2012 studies by Lenzner and colleagues, and work done by Knäuper et al. (1997) and Krosnik (1991)) essentially requires intralingual translation for questionnaires that do not meet the clarity requirement. To illustrate how intralingual translation in the form of plain language practice can operationalise comprehensibility (Nisbeth Jensen 2015), a short case study is presented. It chronicles a case of interlingual translation that has evolved into an intralingual translation endeavour. A client had a copyrighted medical research questionnaire, originally in American English, translated into Afrikaans and isiXhosa. Initially, the language service provider was not allowed any interventions in the source text. Testing of this questionnaire and its translations then revealed that the questionnaires were incomprehensible to their respondents. In this paper, the intralingual interventions required to improve comprehensibility of the questionnaire are classified in terms of the four parameters that Zethsen (2009) has identified in this regard, namely knowledge, time, culture and space. In addition, a fourfold text assessment checklist for ensuring clarity in questionnaires is proposed. This checklist may prove valuable for highlighting areas in questionnaires that need intralingual translation – whether used as motivation for a client or as a starting point for an intralingual intervention itself.|000|questionnaire, research questionnaire, design, intralingual translation, position-paper 3925|Locatell2018|Cognitive and generative approaches to linguistics have taken a different perspective on grammatical polysemy and grammaticalization. While the former see polysemy as a core characteristic of language and a necessary result of grammaticalization within idiolects, the latter see it as a less interesting phenomenon peripheral to linguistics proper. Grammaticalization is seen as a phenomenon of language acquisition which does not disturb the homogeneity of idiolects. These differing perspectives have generated much debate between the two approaches and are even in large part responsible for the different programmatic focuses of each. While the disagreement over grammatical polysemy between these two approaches to language is rooted in entrenched commitments on each side that are perhaps irreconcilable, at least some common ground does seem to be possible. Specifically, when it comes to inter- generational corpora, it seems that both cognitive and generative approaches to linguistics can agree that the universal phenomenon of grammaticalization would result in polysemy at least at the language community level. This can serve as a common ground on which both generative and cognitive linguists can join efforts in describing and explaining usage profiles of grammatically polysemous forms at the corpus level according to prototypicality, even if disagreement persists on the nature of the idiolect|000|polysemy, grammar, cognitive linguistics, generative grammar, 3926|Locatell2018|Strange paper with extremely skewed German example for the usage of "weil". Interesting probably in the context of offering a description of differences between generative linguistics and cognitive linguistics, as well as in so far as they point to the importance of using variation (polysemy in grammar) as a basic object of research.|000|cognitive linguistics, generative linguistics, generative grammar, synchronic variation, polysemy, grammar, 3927|Hualde2015|Affective or expressive palatalization is a phenomenon in Basque whereby certain conso- nants are replaced by (pre)palatals to create diminutives. Affective palatalization is also found in child-directed speech, where it may have its origin.|000|affective palatalization, diminutive, Basque language, introduction, 3928|Serrano-Dolader2015|Parasynthesis is a word-formation process that Romance languages have inherited from Latin. It is characterised by the simultaneous and joint attachment of two affixes (a prefix and a suffix) to a lexical base. In order to define the concept of parasynthesis, several theoretical tenets (e.g., the transcategorisation power of prefixes, the binary branching hypothesis, etc.) must be taken into account. In Romance languages, verbs are the most representative cases of this morphological process; there are, however, other non-verbal formations that have been included in this category.|000|Romance, parasynthesis, word formation, word derivation, introduction 3929|Koonin2017|Complementarity between nucleic acid molecules is central to biological information transfer processes. Apart from the basal processes of replication, transcription and translation, complementarity is also employed by multiple defense and regulatory systems. All cellular life forms possess defense systems against viruses and mobile genetic elements, and in most of them some of the defense mechanisms involve small guide RNAs or DNAs that recognize parasite genomes and trigger their inactivation. The nucleic acid-guided defense systems include prokaryotic Argonaute (pAgo)-centered innate immunity and CRISPR-Cas adaptive immunity as well as diverse branches of RNA interference (RNAi) in eukaryotes. The archaeal pAgo machinery is the direct ancestor of eukaryotic RNAi that, however, acquired additional components, such as Dicer, and enormously diversified through multiple duplications. In contrast, eukaryotes lack any heritage of the CRISPR-Cas systems, conceivably, due to the cellular toxicity of some Cas proteins that would get activated as a result of operon disruption in eukaryotes. The adaptive immunity function in eukaryotes is taken over partly by the PIWI RNA branch of RNAi and partly by protein-based immunity. In this review, I briefly discuss the interplay between homology and analogy in the evolution of RNA- and DNA-guided immunity, and attempt to formulate some general evolutionary principles for this ancient class of defense systems.|000|convergent evolution, virus evolution, prokaryotic evolution, eukaryotic evolution, overview, review 3930|Koonin2017|Article may be interesting in the context of discussing convergent evolution of languages.|000|convergent evolution, introduction, virus evolution 3931|Byrd2016|In this paper I contend that the plausibility of those sounds and sound sequences re- constructed for a proto-language may be measured by following three simple rules when undertaking an etymology for a proto-language|000|linguistic reconstruction, methodology, 3932|Byrd2016|Paper is much less technical then thought, and focusing a bit too much on Indo-European, but important in the discussion about methodology in the field.|000|linguistic reconstruction, methodology, 3933|Caha2017|This article reviews some of the main theoretical claims made in Jonathan David Bobaljik’s 2012 book, which deals with root suppletion in adjectival degree expressions. My first goal is to make the reader familiar with a coherent fragment of the overall system and the data that motivate it. The second goal is to discuss one specific part of the account, namely that words, understood as complex heads, are special for suppletion in the sense that suppletion is impossible beyond this domain. I argue that it is possible to abandon this assumption with no loss of descriptive coverage, and argue that in doing so, we can formulate a unified theory which covers both suppletion and morpheme order.|000|suppletion, word formation, morphology, 3934|Caha2017|Article has some interesting examples on suppletion in different languages. Therefore probably worth a read.|000|suppletion, examples, word formation, morphology 3935|Albert2000|Many complex systems display a surprising degree of tolerance against errors. For example, relatively simple organisms grow, persist and reproduce despite drastic pharmaceutical or environmental interventions, an error tolerance attributed to the robustness of the underlying metabolic network 1 . Complex communication networks 2 display a surprising degree of robust- ness: although key components regularly malfunction, local fail- ures rarely lead to the loss of the global information-carrying ability of the network. The stability of these and other complex systems is often attributed to the redundant wiring of the func- tional web defined by the systems’ components. Here we demon- strate that error tolerance is not shared by all redundant systems: it is displayed only by a class of inhomogeneously wired networks, called scale-free networks, which include the World-Wide Web 3–5 , the Internet 6 , social networks 7 and cells 8 . We find that such networks display an unexpected degree of robustness, the ability of their nodes to communicate being unaffected even by un- realistically high failure rates. However, error tolerance comes at a high price in that these networks are extremely vulnerable to attacks (that is, to the selection and removal of a few nodes that play a vital role in maintaining the network’s connectivity). Such error tolerance and attack vulnerability are generic properties of communication networks.|000|scale-free network, connectivity, social networks, small world network 3936|Albert2000|One of the early papers introducing the idea of scale-free networks to be relevant to describe phenomena of our daily lives (social networks, world wide web, etc.).|000|scale-free network, small world network, graph theory, introduction 3937|Schreier2018|This article explores major processes that operate in new dialect forma- tion, with a focus on the effects of various kinds of isolation of communities and their speakers. A sociolinguistic approach concentrates on competition and selection between transplanted dialect features and mechanisms such as mixing, leveling, simplification, and reallocation. With reference to the established (yet not uncontroversial) concept of the feature pool, the ques- tion is how contact between distinct systems gives rise to localized forms. From a more socially oriented perspective, the question is how the inte- gration and segregation of groups contributes to or shapes the emergence and disappearance of dialects. Isolation as a sociolinguistic concept is not thoroughly defined, particularly at the intergroup, societal, or even national level; therefore, geographic, social, and psychological factors need to be dis- cussed and assessed. The processes investigated here are koinéization and dialect/language shift, illustrated with examples from all over the world.|000|dialect formation, isolated communities, review, overview, koinéization, koine, sociolinguistic isolation 3938|Hartwigsen2017|The adaptive potential of the language network to compensate for lesions remains elusive. We show that perturbation of a semantic region in the healthy brain induced suppression of activity in a large semantic network and upregulation of neighbouring phonological areas. After perturbation, the disrupted area increased its inhibitory influence on another semantic key node. The inhibitory influence predicted the individual delay in response speed, indicating that inhibition at remote nodes is functionally relevant. Individual disruption predicted the upregulation of semantic activity in phonological regions. In contrast, perturbation over a phonological region suppressed activity in the network and disrupted behaviour without inducing upregulation. The beneficial contribution of a neighbouring network might thus depend on the level of functional disruption and may be interpreted to reflect a differential compensatory potential of distinct language networks. These results might reveal generic mechanisms of plasticity in cognitive networks and inform models of language reorganization.|000|neurolinguistics, semantic network, phonological network, 3939|Hartwigsen2017|It seems that they use a completely different terminology from us. When talking with neurolinguists, one needs to keep this in mind.|000|neurolinguistics, semantic network, phonological network 3940|Rougier2017|Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results; however, computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested and are hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.|000|sustainability, sustainable research, open research, computer science 3941|Broido2018|A central claim in modern network science is that real-world networks are typically “scale free,” meaning that the fraction of nodes with degree k follows a power law, decaying like k −α , often with 2 < α < 3. However, empirical evidence for this belief derives from a relatively small number of real-world networks. We test the universality of scale-free structure by applying state-of-the-art statistical tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational sources. We fit the power-law model to each degree distribution, test its statistical plausibility, and compare it via a likelihood ratio test to alternative, non-scale- free models, e.g., the log-normal. Across domains, we find that scale-free networks are rare, with only 4% exhibiting the strongest-possible evidence of scale-free structure and 52% exhibiting the weakest-possible evidence. Furthermore, evidence of scale-free structure is not uniformly distributed across sources: social networks are at best weakly scale free, while a handful of technological and biological networks can be called strongly scale free. These results undermine the universality of scale-free networks and reveal that real-world networks exhibit a rich structural diversity that will likely require new ideas and mechanisms to explain.|000|scale-free network, graph theory, small world network, 3942|Broido2018|Paper argues against the idea proposed by @Albert2000 that many networks in daily life are scale-free.|000|scale-free network, graph theory 3943|Wright1976|We have agreed with de Saussure on one point then. There is some basis for this claim for arbitrariness in there being a *neutral* sensory element. However, from another point of view the arbitrariness claim does *not* hold. |512|arbitrariness, motivation, linguistic sign 3944|Wright1976|True 'universals of language' are those that will hold logically for any being anywhere in the universe -- the most general of statements. Those who think they are establishing 'universals' by showing how onomatopoeic effects might hold higher than chance across terrestrial languages are not necessarily discussing a universal: it may only be a 'terrestrial'.|512|universals, language universals, terrestrials, 3945|Kim2016|This work systematically analyzes the smoothing effect of vocabulary reduction for phrase translation models. We extensively compare various word-level vocabularies to show that the performance of smoothing is not significantly affected by the choice of vocabulary. This result provides empirical evidence that the standard phrase translation model is extremely sparse. Our experiments also reveal that vocabulary reduction is more effective for smoothing large-scale phrase tables.|000|smoothing, phrases, NLP, phrase translation model, automatic translation 3946|Singh2016|Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to gener- ate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based (Botha and Blunsom, 2014) morphological analysis to generate embeddings, our system applies a computationally-simpler sub-word search on words that have existing embeddings. Embeddings of the sub-word search results are then combined using string similarity functions to generate rare word embeddings. We augmented pre-trained word embed- dings with these novel embeddings and evaluated on a rare word similarity task, obtaining up to 3 times improvement in correlation over the original set of embeddings. Applying our technique to embeddings trained on larger datasets led to on-par performance with the existing state-of-the- art for this task. Additionally, while analysing augmented embeddings in a log-bilinear language model, we observed up to 50% reduction in rare word perplexity in comparison to other more complex language models.|000|word embeddings, trigrams, NLP, low-resource languages, semantic similarity 3947|Singh2016|Since they use sub-strings in their research, this may be interesting in the context of directed compoundhood networks (or partial colexification networks).|000|partial colexification, NLP, substring similarity, sequence comparison 3948|Schuermann1996|We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non-trivial chaotic regimes, a 1-D cellular automaton, and to written English texts.|000|entropy, introduction, overview, NLP, 3949|Takahira2016|One of the fundamental questions about human language is whether its entropy rate is positive. The entropy rate measures the average amount of information communicated per unit time. The question about the entropy of language dates back to experiments by Shannon in 1951, but in 1990 Hilberg raised doubt regarding a correct interpretation of these experiments. This article provides an in-depth empirical analysis, using 20 corpora of up to 7.8 gigabytes across six languages (English, French, Russian, Korean, Chinese, and Japanese), to conclude that the entropy rate is positive. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes were proposed previously, here we use a new stretched exponential extrapolation function that has a smaller error of fit. Thus, we conclude that the entropy rates of human languages are positive but approximately 20% smaller than without extrapolation. Although the entropy rate estimates depend on the script kind, the exponent of the ansatz function turns out to be constant across different languages and governs the complexity of natural language in general. In other words, in spite of typological differences, all languages seem equally hard to learn, which partly confirms Hilberg’s hypothesis.|000|entropy, corpus studies, NLP, Shannon 3950|Takahira2016|This article provides a new way to measure entropy of natural languages. This may be interesting with respect to some questions.|000|Shannon, entropy, NLP, entropy estimation, corpus studies 3951|Koplenig2017|Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly con- veyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order infor- mation tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.|000|bible corpus, word order, word structure, correlational studies, 3952|Koplenig2017|What is interesting about this article are the findings that word order seems to play a more crucial role when it comes to different chapters of the bible. It seems that this is due to the age of the chapters. If the chapters have little variation in content, word order will be rather fixed. But this needs a more thorough treatment.|000|word order, word structure, correlational studies, bible corpus, linguistic complexity 3953|Castilho2016|We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams. New features in this release focus on semantic annotation tasks (e.g. semantic role labelling or event annotation) and allow the tight integration of semantic annotations with syntactic annotations. In particular, we introduce the concept of slot features, a novel constraint mechanism that allows modelling the interaction between semantic and syntactic annotations, as well as a new annotation user interface. The new features were developed and used in an annotation project for semantic roles on German texts. The paper briefly introduces this project and reports on experiences performing annotations with the new tool. On a comparative evaluation, our tool reaches significant speedups over WebAnno 2 for a semantic annotation task.|000|web-based tool, interfaces, linguistic annotation, syntactic annotation, semantic annotation, tools 3954|Castilho2016|In the context of rhyme annotation, this tool may come in handy, or we could take some inspiration from it.|000|tools, rhyme patterns, rhyme analysis, interfaces, linguistic annotation 3955|Harvey2017|Evaluation of hypotheses on genetic relationships depends on two factors: data- base size and criteria on correspondence quality. For hypotheses on remote rela- tionships, databases are often small. Therefore, detailed consideration of criteria on correspondence quality is important. Hypotheses on remote relationships commonly involve greater geographical and temporal ranges. Consequently, we propose that there are two factors which are likely to play a greater role in com- paring hypotheses of chance, contact and inheritance for remote relationships: (i) spatial distribution of corresponding forms; and (ii) language specific unpre- dictability in related paradigms. Concentrated spatial distributions disfavour hypotheses of chance, and discontinuous distributions disfavour contact hypoth- eses, whereas hypotheses of inheritance may accommodate both. Higher levels of language-specific unpredictability favour remote over recent transmission. We consider a remote relationship hypothesis, the Proto-Australian hypothesis. We take noun class prefixation as a test dataset for evaluating this hypothesis against these two criteria, and we show that inheritance is favoured over chance and contact.|000|noun-class prefix, Australian languages, 3956|Harvey2017|What is interesting in this context is the question of how to infer whether noun class prefixes really exist, and how this could be substantiated statistically. A random test seems to be in order, to see whether this really holds for statistical significance tests. This would likewise have implications for measure words in Chinese.|000|noun-class prefix, lexical semantics, Australian languages, measure words 3957|Ramiro2018|Human language relies on a finite lexicon to express a potentially infinite set of ideas. A key result of this tension is that words acquire novel senses over time. However, the cognitive processes that underlie the historical emergence of new word senses are poorly understood. Here, we present a computational framework that formalizes competing views of how new senses of a word might emerge by attaching to existing senses of the word. We test the ability of the models to predict the temporal order in which the senses of individual words have emerged, using an histori- cal lexicon of English spanning the past millennium. Our findings suggest that word senses emerge in predictable ways, following an historical path that reflects cognitive efficiency, predominantly through a process of nearest-neighbor chaining. Our work con- tributes a formal account of the generative processes that under- lie lexical evolution.|000|word meaning, semantic change, semantic shift, lexical change, automatic approach, simulation studies 3958|Saarimaki2016|Categorical models of emotions posit neurally and physiologically distinct human basic emotions. We tested this assumption by using multivariate pattern analysis (MVPA) to classify brain activity patterns of 6 basic emotions (disgust, fear, happiness, sadness, anger, and surprise) in 3 experiments. Emotions were induced with short movies or mental imagery during functional magnetic resonance imaging. MVPA accurately classified emotions induced by both methods, and the classification generalized from one induction condition to another and across individuals. Brain regions contributing most to the classification accuracy included medial and inferior lateral prefrontal cortices, frontal pole, precentral and postcentral gyri, precuneus, and posterior cingulate cortex. Thus, specific neural signatures across these regions hold representations of different emotional states in multimodal fashion, independently of how the emotions are induced. Similarity of subjective experiences between emotions was associated with similarity of neural patterns for the same emotions, suggesting a direct link between activity in these brain regions and the subjective emotional experience.|000|emotion concepts, neurolinguistics, 3959|Saarimaki2016|After the scanning, the participants viewed the movie clips again and chose the emotion (disgust, fear, happiness, sad- ness, neutral, anger, surprise) that best described their feelings during each movie, and they rated the intensity (1–9) of the elicited emotion.|2564|basic vocabulary, emotion concepts, neurolinguistics, concept list 3960|Hoffmann2018|The extent and nature of symbolic behavior among Neandertals are obscure. Although evidence for Neandertal body ornamentation has been proposed, all cave painting has been attributed to modern humans. Here we present dating results for three sites in Spain that show that cave art emerged in Iberia substantially earlier than previously thought. Uranium-thorium (U-Th) dates on carbonate crusts overlying paintings provide minimum ages for a red linear motif in La Pasiega (Cantabria), a hand stencil in Maltravieso (Extremadura), and red-painted speleothems in Ardales (Andalucía). Collectively, these results show that cave art in Iberia is older than 64.8 thousand years (ka). This cave art is the earliest dated so far and predates, by at least 20 ka, the arrival of modern humans in Europe, which implies Neandertal authorship.|000|Neandertal, cave art, U-Th dating, dating, archaeology 3961|Montenegro2006|The sweet potato is a plant native to the Americas, and its pre-historic presence in Polynesia is a long-standing anthropological problem. Here we use computer-driven drift simulations to model the trajectories of vessels and seed pods departing from a segment of coast between Mexico and Chile. The experiments demonstrate that accidental drift voyages could have been the mechanism responsible for the pre-historic introduc- tion of the sweet potato from the Americas to Polynesia. While present results do not relate to the feasibility of a transfer by purposeful navigation, they do indicate that this type of voyaging is not required in order to explain the introduction of the crop into Polynesia. The rel- atively high probability of occurrence and relatively short crossing times of trips from Northern Chile and Peru into the Marquesas, Tuamotu and Society groups are in agreement with the general consensus that this region encompasses the area of original arrival and subsequent dispersal of the sweet potato in Polynesia.|000|sweet potato, Polynesia, lexical borrowing, Southern America, prehistoric contact 3962|Kazenin2001|The passive voice is one of the most impor- tant types of voice alternations attested across languages. The majority of languages with voice alternations also have the passive voice.|000|introduction, passive voice, syntactic typology, typology, 3963|Nedjalkov2001|The prototypical resultative (or resultative proper) is defined as a verb form or a more or less regular derivative from terminative verbs that expresses a state implying a previous event (action or process) it has resulted from.|000|resultative construction, introduction, overview, handbook 3964|Maiza2017|A social network enables individuals to communicate with each other by posting information, comments, messages, images, etc. In most applications, a social network is modelled by a graph with ver- tices and edges. Vertices represent individuals and edges represent social interactions between the individuals. A social network is said to have community structure if the nodes of the network can be grouped into sets of nodes such that each set is densely connected internally. The investigation of the community structure in the social network is an important issue in many domains and disciplines such as marketing and bio-informatics. Community detection in social networks can be consid- ered as a graph clustering problem where each community corresponds to a cluster in the graph. The goal of conventional community detection methods is to partition a graph such that every node belongs to exactly one cluster. However, in many social networks, nodes participate in mul- tiple communities. Therefore, a node’s communities can be interpreted as its social circles. Thus, it is likely that a node belongs to multiple communities. We propose in this paper a new overlapping community detection method which can be adopted for several real world social net- works requiring non-disjoint community detection.|000|community detection, fuzzy clustering, overlapping community detection, overlapping community, algorithms 3965|Soerensen2005|A minimum spanning tree of an undirected graph can be easily obtained using classical algorithms by Prim or Kruskal. A number of algorithms have been proposed to enumerate all spanning trees of an undirected graph. Good time and space complexities are the major concerns of these algorithms. Most algorithms generate spanning trees using some fundamental cut or circuit. In the generation process, the cost of the tree is not taken into consideration. This paper presents an algorithm to generate spanning trees of a graph in order of increasing cost. By generating spanning trees in order of increasing cost, new opportunities appear. In this way, it is possible to determine the second smallest or, in general, the k-th smallest spanning tree. The smallest spanning tree satisfying some additional constraints can be found by checking at each generation whether these constraints are satisfied. Our algorithm is based on an algorithm by Murty (1967), which enumerates all solutions of an assignment problem in order of increasing cost. Both time and space complexities are discussed.|000|minimum spanning tree, algorithms, 3966|Prim1957|The basic problem considered is that of interconnecting a given set of terminals with a shortest possible network of direct links, Simple and prac- tical procedures are given for solving this problem both graphically and computationally. It develops that these procedures also provide solutions for a much broader class of problems, containing other examples of practical interest.|000|minimum spanning tree, algorithms, 3967|Zhou2015a|Since both the minimum spanning tree and the minimum spanning network (as the union of all minimum spanning trees) can be constructed efficiently, previous heuristics have approximated RSMT by adding additional Steiner nodes to the minimum spanning network [1,2] or to the minimum spanning tree [30].|111|minimum spanning tree, definition, minimum spanning network, algorithms 3968|Nature2018|Computer code written by scientists forms the basis of an increasing number of studies across many fields — and an increasing number of papers that report the results. So, more papers should include these executable algorithms in the peer-review process. From this week, Nature journal editors handling papers in which code is central to the main claims or is the main novelty of the work will, on a case-by-case basis, ask reviewers to check how well the code works, and report back.|000|code, guidelines, Nature journal, sustainability, replicability 3969|Greenberg1987|However, a proto-language with, say, 125 phonemes is completely implausible on typological grounds. This shows in any case that if we are to consider a reconstruction proof relationship, we need tests for the plausibility of the reconstruction system, in which typological factors will figure prominently. :comment:`Quote from https://cipanglo.hypotheses.org/640`|12|typology, proto-language, methodology, linguistic reconstruction 3970|Schleicher1852|Bei dem vergleichen von sprachformen zweier verwanten sprachen suche ich vor allem die verglichenen [pb] formen beide auf ire mutmaßsliche grundform, d. i. die gestalt, die sie abgesehen von den späteren lautgesetzen haben müßsen, zurückzufüren oder doch überhaupt auf eine gleiche stufe der lautverhältnisse zu bringen. da uns auch die ältesten sprachen unseres stammes, selbst das sanskrit, nicht in irer ältesten lautlichen gestaltung vorliegen, da ferner die verschiedenen sprachen in ser verschiedenen altersttufen bekant sind, so mußs diese altersverschiedenheit nach tunlichkeit erst aufgehoben werden ehe verglichen werden kann, die gegebenen größsen müßsen erst auf einen gemeinsamen außsdruck gebracht werden ehe sie zu einer gleichung aŋesetzt werden können, sei dieser gleiche lautliche außsdruck der zu erschließsende älteste beider zusammengestellten sprachen oder die lautform der einen derselben. :comment:`goes on talking about what we might call root-forms, i.e., forms where morphology is mostly stripped off or traced back to the ancestral language. In some sense, Schleicher directly hints to unalignable parst in words!`|iv|Old Church Slavonic, phonetic alignment, methodology, linguistic reconstruction, August Schleicher 3971|Liddel1989|Paper introduces American Sign Language along with a way to transcribe it in a form that reminds of the multi-tiered sequence representation. This is interesting for the context of discussing how to advanve or propose multi-tiers as a valid format for representing sound sequences in linguistics.|000|American Sign Language, multi-tiers, sequence modeling, 3972|Hoenigswald1950|These examples illustrate a fundamental assumption of comparative grammar: *partially like sets occuring in mutually exclusive environments are taken to be continuations of one and the same phoneme of the proto-language.* The assumption has one important corollary: if, after the reconstruction has been made, we describe the relations between the phonemes of the proto-langauge and those of onnly one of the two or more daughter languages (ignoring the rest), our statements constitute the *historical phonology* of this language in terms of sound changes.|359|sound correspondences, correspondence patterns, methodology, sound change, linguistic reconstruction 3973|Stodden2018|A key component of scientific communication is sufficient infor- mation for other researchers in the field to reproduce published findings. For computational and data-enabled research, this has often been interpreted to mean making available the raw data from which results were generated, the computer code that gen- erated the findings, and any additional information needed such as workflows and input parameters. Many journals are revising author guidelines to include data and code availability. This work evaluates the effectiveness of journal policy that requires the data and code necessary for reproducibility be made available postpub- lication by the authors upon request. We assess the effectiveness of such a policy by (i) requesting data and code from authors and (ii) attempting replication of the published findings. We chose a random sample of 204 scientific papers published in the jour- nal Science after the implementation of their policy in February 2011. We found that we were able to obtain artifacts from 44% of our sample and were able to reproduce the findings for 26%. We find this policy—author remission of data and code postpublica- tion upon request—an improvement over no policy, but currently insufficient for reproducibility.|000|reproducibility, code, replicability, open research, sustainability 3974|Anttila1972|:comment:`Author shows both correspondence patterns and alignments for a couple of words. This is useful to be quoted and should probably also be typed off.`|246f|sound correspondences, correspondence patterns, methodology, linguistic reconstruction, Indo-European 3975|Anttila1972|We have seen one important aspect of the comparative method: the number of reconstructed proto-units is independent (1) of the number of units in any of the languages being used and (2) of th enumber of sets of correspondences among the languages being used. The decisive factor is the number of contrasts within the sets; that is, all noncontrasting sets are grouped together into one proto-unit. [...] :comment:`Talks also about the importance of contrats (do the sounds occur in the same position?)` To choose appropriate labels, the linguist would have to use his sense of ptterning and his experience. What is important is that the method lays out the contrasts, the relevant distinctions needed to derive the outcomes for both languages, it unvels the *relations* between the units, but it does not choose the labels for the units. This has to be done by the linguist. Thus, although the method is rather simple and mechanistic, the linguist is needed for the postediting of its product, as has already been noted many times.|243|sound correspondences, linguistic reconstruction, proto-form, correspondence patterns, 3976|Trask1999|As a result of this, the words of related languages often exhibit a set of conspicuous patterns, each of the following general form: if word W1 in language L1 contains a sound S1 in a particular position, then word W2 of the same meaning in language L2 will contain the sound S2 in the same position. Such a pattern is a systematic correspondence.|204|sound correspondences, definition 3977|Beekes1995|:comment:`Nice examples on correspondence patterns across Indo-European languages.`|124-159|sound correspondences, correspondence patterns, Indo-European, examples 3978|Gauch2003|:comment:`Introduces parsimony and efficiency as a principle in scientific theory.`|269-326|philosophy of science, parsimony, Occam's razor, 3979|Evans2017|Accounts of language evolution have largely suffered from a monolin- gual bias, assuming that language evolved in a single isolated community sharing most speech conventions. Rather, evidence from the small-scale societies who form the best simulacra available for ancestral human communities suggests that the com- bination of small societal scale and out-marriage pushed ancestral human communi- ties to make use of multiple linguistic systems. Evolutionary innovations would have occurred in a number of separate communities, distributing the labor of structural invention between populations, and would then have been pooled gradually through multilingually mediated horizontal transfer to produce the technological package we now regard as a natural ensemble.|000|language origin, language evolution, bilingualism, 3980|Mehrotra1996|The order in which the nodes are to be considered can be specified in our MWIS algorithm. we have found that ordering the nodes in order of nonincreasing weights or in order of nonincreasing degree is not as efficient as ordering them by considering both at the same time. In our experiments we order the nodes in nonincreasing values of square root of the degree of the node times the weight of the node.|347|graph coloring, sorting, algorithms, linear time 3981|Kaplan2017|This essay analyzes the data of prehistorical work in comparative linguistics. These data typically derive from the sector of “basic vocabulary”—words thought to be especially frequent, universal, and resistant to change over time. I show how basic vocabulary data have facilitated transfer between languages, methods, and disciplin- ary groups. The essay focuses on the standardization of these wordlists by the an- thropological linguist Morris Swadesh (1909–67). It argues that the history of basic vocabulary exemplifies “data drag” rather than data-driven change: the labor- intensiveness of wordlist compilation and calibration has only reinforced the use of basic wordlists despite foundational criticisms straddling the move to electronic computing.|000|history of science, Swadesh list, concept list, overview, 3982|Kaplan2017|Interesting article that provides many insights into the question of basic vocabulary and similar aspects during the time of Swadesh, before and after. It criticizes modern approaches but disregards the most recent standardization efforts that have been addressed with the concepticon project.|000|basic vocabulary, Swadesh list, history of science 3983|Stricker2016|Myriads of epigenomic features have been comprehensively profiled in health and disease across cell types, tissues and individuals. Although current epigenomic approaches can infer function for chromatin marks through correlation, it remains challenging to establish which marks actually have causative roles in gene regulation and other processes. After revisiting how classical approaches have addressed this question in the past, we discuss the current state of epigenomic profiling and how functional information can be indirectly inferred. We also present new approaches that promise definitive functional answers, which are collectively referred to as ‘epigenome editing’. In particular, we explore CRISPR-based technologies for single-locus and multi-locus manipulation. Finally, we discuss which level of function can be achieved with each approach and introduce emerging strategies for high-throughput progression from profiles to function.|000|profiling, epigenomic markers, genomic markers, modelling, multi-tiers 3984|Stricker2016|Review is potentially important for multi-tier approaches in linguistics, as they integrate in a "profile" in a similar way different layers of information.|000|profiling, genomic markers, epigenomic markers, multi-tiers 3985|Stricker2016|:comment:`Figure illustrating genomic profiling.` .. image:: static/img/stricker-2016-56.png :width: 1000px :name: bla :comment:`multi-tiered representation as a "profile"`|56|profiling, genomic markers, epigenomic markers, multi-tiers 3986|Pelechrinis2016|Network connections have been shown to be correlated with structural or external attributes of the network vertices in a variety of cases. Given the prevalence of this phenomenon net- work scientists have developed metrics to quantify its extent. In particular, the assortativity coefficient is used to capture the level of correlation between a single-dimensional attribute (categorical or scalar) of the network nodes and the observed connections, i.e., the edges. Nevertheless, in many cases a multi-dimensional, i.e., vector feature of the nodes is of inter- est. Similar attributes can describe complex behavioral patterns (e.g., mobility) of the net- work entities. To date little attention has been given to this setting and there has not been a general and formal treatment of this problem. In this study we develop a metric, the vector assortativity index (VA-index for short), based on network randomization and (empirical) statistical hypothesis testing that is able to quantify the assortativity patterns of a network with respect to a vector attribute. Our extensive experimental results on synthetic network data show that the VA-index outperforms a baseline extension of the assortativity coeffi- cient, which has been used in the literature to cope with similar cases. Furthermore, the VA- index can be calibrated (in terms of parameters) fairly easy, while its benefits increase with the (co-)variance of the vector elements, where the baseline systematically over(under)esti- mate the true mixing patterns of the network.|000|assortativity, graph theory, 3987|KuemmelTria2017|Ancestor–descendent relations play a cardinal role in evolutionary theory. Those relations are determined by rooting phylogenetic trees. Existing rooting methods are hampered by evolutionary rate heterogeneity or the unavailability of auxiliary phylogenetic information. Here we present a rooting approach, the minimal ancestor deviation (MAD) method, which accommodates heterotachy by using all pairwise topological and metric information in unrooted trees. We demonstrate the performance of the method, in comparison to existing rooting methods, by the analysis of phylogenies from eukaryotes and prokaryotes. MAD correctly recovers the known root of eukaryotes and uncovers evidence for the origin of cyanobacteria in the ocean. MAD is more robust and consistent than existing methods, provides measures of the root inference quality and is applicable to any tree with branch lengths.|000|rooting of phylogenetic trees, automatic approach, minimal ancestor deviation 3988|Pyysalo2018|:comment:`Quotes this as Fick's rule for reconstruction:` > Durch zweier Zeugen Mund wird alle Wahrheit kund . :comment:`refers to the original publication by` @Fick1870 |000|three witness theory, evidence, linguistic reconstruction, ancestral state reconstruction, 3989|Fick1890|Es besteht sich von selbst, sei hier aber ausdrücklich bemerkt, dass in dem vorliegenden Werke sich viel fremdes Gut findet. Eine Zeit lang bestand die Absicht, in dieser neuen Auflage die Urheber der bedeutenderen Eytmologien namhaft zu machen, zumal des eigenen Gutes der Herausgeber doch genug übrig geblieben wäre, doch liessen wir den Gedanken fallen, weil wir alsbald auf Prioritätsfragen sttiessen, deren Entscheidung viel Zeit in Anspruch genommen hätte. |VI|etymology, citation, scientific practice, Indo-European, nice quote 3990|Fick1890|Dabei erheben sich zwei Fragen: welche Wörter und Wortformen sind der Ursprache zuzuweisen? und wie war ihre Lautgestalt? Beschränken wir uns zunächst auf ddie erste dieser beiden Fragen, so scheint es eine ganz natürliche und unverfängliche Antwort: alle Wörter und Wortformen, welche sich in allen ig. Haupt-Sprachen vorfinden. Aber welche sind diese Hauptsprachen? [goes on talking about the major 12 branches of IE] [pb] [pb] Wollte man nur solches Sprachgut, das sich in allen diesen 12 Sprachen vorfindet, der indogermanischen Ursprache zuweisen, so würde man eine so geringe Ausbeute finden, dass man den Wiederaufbau der alten Spracheinheit aufgeben müsste. [examples follow] [pb] :comment:`Talks on about the problem of identifying which words go back to Indo-European but does not seem to find a solution` |XIII-XVI|history of science, ancestral state reconstruction, Indo-European, three witness theory, 3991|Fick1890|Von der ältesten Scheidung in Arier und Nichtarier aus könnte man auch einen passenden Namen für die Gesammtheit der Sprachen und Völker unseres Stammes gewinnen, wenn man die "Indogermanen" nicht gelten lassen will. |XXV|Indo-European, phylogenetic reconstruction, Arian languages, 3992|Goethe1808|Durch zweier Zeugen Mund Wird allerwegs die Wahrheit kund. :comment:`See for more information:` https://www.aphorismen.de/zitat/478|-|Johann Wolfgang von Goethe, Faust, nice quote, three witness theory 3993|Pyysalo2013|See also @Bammesberger<1984> (1984:11):“Um ein linguistisches Phänomen der Grundsprache zuschreiben zu können, muß es in mindenstens zwei verschiedenen Sprachgruppen unverkennbare Spuren hinterlassen haben.” |.19 |three witness theory, ancestral state reconstruction, linguistic reconstruction 3994|Pyysalo2013|Throughout the study, ‘Fick’s rule’ is used as the principle of postulation to justify the entire reconstuction. According to this key principle of the comparative method, two independent witnesses are always required. :comment:`The quote by Fick is actually taken from` @Goethe1808|.19|Fick's rule, three witness theory, ancestral state reconstruction, methodology 3995|Zhang2013|A protein‐coding gene is composed of a series of nucleotide triplets – the codons – that encrypt not only the protein content but also the start and stop signals. There are 64 (43) codons in the canonical genetic code, which encode 20 amino acids with redundancy. Hence, there are synonymous codons that encode the same amino acids, and they are used at different frequencies among different species. The resultant codon‐usage biases reveal complex interplays of mutation and selection. Protein‐coding genes can be organised into families of similar function, structure and sequence, according to their shared evolutionary histories. Individual proteins are modularly constructed of domains, which are often rearranged on evolutionary timescales to create functionally novel proteins. |000|protein coding, protein evolution, proteins, bioinformatics, introduction, overview, review 3996|Zhang2013|Paper summarizes a few important points on protein coding (taken from the official summary of the paper): * A protein‐coding gene consists of a series of nucleotide triplets. * The genetic code defines the relationship between codons and amino acids. * The genetic code can be organised into two halves and four quarters, which manifest distinct physiochemical features. * Codon usage bias, a phenomenon in which synonymous codons (encoding the same amino acid) are used at different frequencies in different species, is a result of complex interplays between mutation and selection. * Protein‐coding genes are organised into families of similar function, structure and sequence, according to their shared evolutionary histories. * Individual proteins are modularly constructed from domains, which are often rearranged on evolutionary timescales to create functionally novel proteins. |000|protein coding, bioinformatics, introduction, overview, 3997|Callaway2018|Genomanalysen revolutionieren die Forschung zur Frühgeschichte des Menschen, belasten aber auch die Beziehung zwischen Archäologen und Genetikern.|000|archaeology, archaeogenetics, discussion, methodology 3998|Callaway2018|Interesting overview (popular science) about the discussion between archaeology and archaeogenetics and the methodology used in the field.|000|archaeogenetics, discussion, archaeology, methodology, overview 3999|Cathcart2018|This paper takes a detailed look at some popular models of evolution used in contemporary diachronic linguistic research, focusing on the continuous-time Markov model, a particularly popular choice. I provide an exposition of the math underlying the CTM model, seldom discussed in linguistic papers. I show that in some work, a lack of explicit reference to the underlying computation creates some difficulty in interpreting results, particularly in the domain of ancestral state reconstruction. I conclude by adumbrating some ways in which linguists may be able to exploit these models to investigate a suite of factors which may influence diachronic linguistic change.|000|phylogenetic reconstruction, ancestral state reconstruction, methodology, introduction, historical linguistics 4000|Cathcart2018|Very useful overview on phylogenetic and ancestral state reconstruction in linguistics. Criticizes recent paper by @Dunn<2017> for selecting a model that confirms their hypothesis.|000|ancestral state reconstruction, introduction, historical linguistics, overview 4001|Szewczyk2013|Für deutsche Ohren haben manche Sprachen eindeutige Charaktere. Russisch ist hart, Spanisch hektisch, Französisch klingt eher charmant. Forscher untersuchen, welche Komponenten eine Sprache ausmachen.|000|perception, cross-linguistic study, introduction, popular science 4002|Jin2012|:comment:`Nice visualization of characters and xiesheng readings: table plus middle chinese readings, very easy to follow and to detect certain patterns across different series.`|363|visualization, Middle Chinese, Old Chinese, linguistic reconstruction, Old Chinese phonology, 4003|Barret2016|The science of emotion has been using folk psychology categories derived from philosophy to search for the brain basis of emotion. The last two decades of neuroscience research have brought us to the brink of a paradigm shift in understanding the workings of the brain, however, setting the stage to revolutionize our understanding of what emotions are and how they work. In this article, we begin with the structure and function of the brain, and from there deduce what the biological basis of emotions might be. The answer is a brain-based, computational account called the theory of constructed emotion.|000|emotion concepts, emotion, psycholinguistics, psychology, 4004|Kaufmann2018|In this paper we present Kratylos, at www.kratylos.org/, a web application that creates searchable multimedia corpora from data collections in diverse formats, including collections of interlinearized glossed text (IGT) and dictionaries. There exists a crucial lacuna in the electronic ecology that supports language documen- tation and linguistic research. Vast amounts of IGT are produced in stand-alone programs without an easy way to share them publicly as dynamic databases. Solv- ing this problem will not only unlock an enormous amount of linguistic informa- tion that can be shared easily across the web, it will also improve accountability by allowing us to verify analyses across collections of primary data. We argue for a two-pronged approach to sharing language documentation, which involves a popular interface and a specialist interface. Finally, we briefly introduce the potential of regular expression queries for syntactic research.|000|Kratylos, web-based tool, corpus studies, 4005|Powell2017|The goal of this paper is to use string edit distance to describe the synchronic re- lationship between the Tibetan speech varieties located on the Northeastern edge of the Tibetan Plateau. String edit distance provides a statistical way to compare a large number of linguistic features, in essence producing a statistical bundle of isoglosses. In this way, it can be used as a tool in dialect mapping and synchronic clustering. In this paper, the aggregate distance matrix produced by string edit distance reveals that the great degree of phonetic continuity on the grasslands of the northeastern edge of the plateau is matched by an equal degree of phonetic discontinuity in the mountains forming the eastern border of the plateau. While the dialects located on the grasslands can be grouped together into one cluster, the dialects in the mountains can be grouped together into six clusters.|000|edit distance, aggregate distances, genetic classification, Tibetan, dialect classification, 4006|Powell2017|Paper was published withouth any supporting data, but apparently uses some kind of a Swadesh list and the typical GabMap workflow.|000|Tibetan, dialect classification, edit distance, missing data 4007|Boye2018|The distinction between grammatical and lexical words is standardly dealt with in terms of a semantic distinction between function and content words or in terms of distributional distinctions between closed and open classes. This paper argues that such distinctions fall short in several respects, and that the grammar-lexicon distinction applies even within the same word class. The argument is based on a recent functional and usage-based theory of the grammar-lexicon distinction (Boye & Harder 2012) and on the assumption that aphasic speech data represent the ideal testing ground for theories and claims about this contrast. A theoretically-based distinction between grammatical and lexical instances of Dutch modal verb forms and the verb form hebben was confronted with agrammatic and fluent aphasic speech. A dissociation between the two aphasia types was predicted and confirmed.|000|function words, aphasia, grammar, lexicon, empirical study 4008|Boye2018|The two dominating theoretical positions propose solutions to this problem that are to some degree reductionist. Chomskyan linguistics focuses on the former of these two aspects and tries to fit grammatical items into a general view of grammar as procedures/rules/templates, dealing with them as rule-governed or as “functional” phrase- structural “heads” (e.g. Cinque 1999). Construction Grammar – as the most prominent of the functional-cognitive theories in opposition to Chomskyan ones – focuses on the item aspect to a degree where also templates for combination are dealt with as items (e.g. Croft 2001). Both positions are problematic. On the one hand, the view of grammatical items as rule-governed may be seen as nothing but a stipulation (albeit a theoretically motivated one). On the other hand, the treatment of the combination aspect of grammar on a par with lexical items is at odds with neurolinguistic evidence (Pulvermüller et al. 2013).|1/18|construction grammar, Chomsky syntax, problem 4009|Boye2018|This means that also this second distinction between closed- and open-class words can- not be co-extensive with the distinction between grammatical and lexical words (nobody would consider Latin praenomina grammatical). Although it is a clear empirical differ- ence, then, it cannot be anchored in a theoretical distinction between grammar and lexi- con. Accordingly, it is incompatible with theoretically anchored hypothesis formation.|3/18|function words, lexicon, closed class, open class, 4010|Boye2018|Interesting article that contrasts Chomskyan syntax with construction grammar and proposes an alternative theory that emphasizes that both grammar and lexicon share words, and that words can function grammatically and purely lexically. Somewhat allows to draw a continuum between the different domains.|000|construction grammar, function words, lexical words, Chomsky syntax, grammatical theory 4011|Bloom1998|The major known inefficiencies in the canonical PPM modeling algorithm are systematically removed by careful consideration of the mehtod. The zero frequency problem is helped by secondary modeling of escape probabilities. The poor performance of PPM`*` is remedied with beneficial handling of deterministic context. A dramatic improvement in PPM is made with the successful use of local order estimation. An average of 2.10 bpc is achieved on the Calgary Corpus.|000|data compression, Markov model, finite context prediction, 4012|Lee2017|It is widely believed that different parts of a classical Chinese poem vary in syntactic properties. The middle part is usually parallel, i.e. the two lines in a couplet have similar sentence structure and part of speech; in contrast, the be- ginning and final parts tend to be non-parallel. Imagistic language, dominated by noun phrases evoking images, is concentrated in the middle; propositional lan- guage, with more complex grammatical structures, is more often found at the end. We present the first quantitative analysis on these linguistic phenomena— syntactic parallelism, imagistic language, and propositional language—on a tree- bank of selected poems from the Complete Tang Poems. Written during the Tang Dynasty between the 7th and 9th centuries CE, these poems are often considered the pinnacle of classical Chinese poetry. Our analysis affirms the traditional ob- servation that the final couplet is rarely parallel; the middle couplets are more frequently parallel, especially at the phrase rather than the word level. Further, the final couplet more often takes a non-declarative mood, uses function words, and adopts propositional language. In contrast, the beginning and middle couplets employ more content words and tend toward imagistic language.|000|Táng poems, poetry, Chinese poetry, Classical Chinese, quantitative analysis, rhyme patterns 4013|Lee2017|Authors describe quantitative analyses applied to a treebank of Táng poems. |000|Táng poems, tree bank, Classical Chinese, quantitative analysis, rhyme patterns 4014|Lee2017|‘Coupling’ is a universal principle behind poetic structure (Levin, 1962), and has been recognized in diverse languages such as Hebrew (Kugel, 1981) and Russian (Jakobson, 1966). It is also a pervasive phenomenon in classical Chinese poems, which are composed of ‘couplets’—each couplet is a pair of adjacent lines, with an identical number of characters.|1/14|couplet, definition, poetry, 4015|Greenhill2018|What role does speaker population size play in shaping rates of language evolution? There has been little consensus on the expected relationship between rates and patterns of language change and speaker population size, with some predicting faster rates of change in smaller populations, and others expecting greater change in larger populations. The growth of comparative databases has allowed population size effects to be investigated across a wide range of language groups, with mixed results. One recent study of a group of Polynesian languages revealed greater rates of word gain in larger populations and greater rates of word loss in smaller populations. However, that test was restricted to 20 closely related languages from small Oceanic islands. Here, we test if this pattern is a general feature of language evolution across a larger and more diverse sample of languages from both continental and island populations. We analyzed comparative language data for 153 pairs of closely-related sister languages from three of the world’s largest language families: Austronesian, Indo-European, and Niger-Congo. We find some evidence that rates of word loss are significantly greater in smaller languages for the Indo-European comparisons, but we find no significant patterns in the other two language families. These results suggest either that the influence of population size on rates and patterns of language evolution is not universal, or that it is sufficiently weak that it may be overwhelmed by other influences in some cases. Further investigation, for a greater number of language comparisons and a wider range of language features, may determine which of these explanations holds true.|000|population size, language change, lexical change, 4016|Greenhill2018|Study contains a very detailed description of tests for genetic independence (Galton's problem) of changes. This principle can be easily applied to other problems and is conceptually easier to investigate than alternative scenarios.|000|Galton's problem, statistical independency, genetic classification, methodology 4017|Coblin2018|In Jerry Norman’s earlier reconstructed Proto-Mǐn (PM) sound system only one syllable final, i.e., `*`-iu, has a high back rounded main vowel. But in his newer Common Mǐn (CM) system there are two such finals, `*`-u and `*`-iu, which together correspond to PM `*`-iu. The old reconstruction is systemically more economical, and the reasons for Norman’s shift in reconstructive strategy to a two-final solution are unknown. This paper is in the main a consideration of this problem. But in the concluding section, we shall also turn to a different but related question involving another set of interacting PM/CM finals, `*`-aŋ/`*`-iaŋ, which are reconstructed as a pair in both of the two systems but nonetheless show the same basic type of anomaly seen in the case of CM `*`-u and `*`-iu.|000|Mǐn, Proto-Mǐn, linguistic reconstruction, Chinese dialects, 4018|LopezCozar2012|The launch of Google Scholar Citations and Google Scholar Metrics may cause a revolution in the research evaluation field as it places within every researcher’s reach tools that allow them to measure their output. However, the data and bibliometric indicators offered by Google’s products can be easily manipulated. In order to alert the research community, we present an experiment in which we manipulate the Google Citations’ profiles of a research group through the creation of false documents that cite their documents, and consequently, the journals in which they have published, modifying their H-index. For this purpose we created six documents authored by a faked author and we uploaded them to a researcher’s personal website under the University of Granada’s domain. The result of the experiment meant an increase of 774 citations in 129 papers (six citations per paper) increasing the authors and journals' H-index . We analyse the malicious effect this type of practices can cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with several deliberations over the effects these malpractices may have and the lack of control these tools offer.|000|GoogleScholar, citation metrics, manipulation, problem 4019|CurrieHall2012|This paper presents a probabilistic model of phonological relationships, recasting the traditional relation- ships of “contrast” and “allophony” in terms of a gradient scale of predictability, based on the Information- Theoretic concept of entropy (uncertainty). The model is applied to the case of [AI] and [2I] in Canadian English to demonstrate its utility in describing relationships that are neither perfectly allophonic nor fully contrastive.|000|contrastiveness, allophony, allophones, phonological theory 4020|CurrieHall2012|The PPRM makes use of three different probabilistic measures to create a picture of contrastive- ness: bias, environment-specific contrastiveness, and systemic contrastiveness. The basic assump- tion of the model is that in any given environment e, a particular choice can be made between two sounds a and b –either a occurs in e, or b occurs in e. The bias and environment-specific contrastive- ness measures focus on this choice within an environment and indicate which of a or b is more likely in that environment (bias) and how much uncertainty there is in the choice between a and b in that environment (environment-specific contrastiveness). The third measure, systemic contrastiveness, is a measure of how much uncertainty there is in the choice between a and b across all environments in the language where either a or b can occur. Each of these will be described in more detail below.|5/14|contrastiveness, measure, phonological theory, allophony 4021|CurrieHall2012|The paper describes a measure for contrastiveness that is based on probabilities (frequencies) of contrasts in corpora. This is interesting with respect to the work on multi-tiers and other aspects of phonology and historical linguistics.|000|allophony, contrastiveness, measure, empirical study 4022|Carvalho2017|This paper seeks to rigorously evaluate a set of claims that lexical items in Southern Arawak languages are loanwords from Tupi-Guarani languages. I show that, in most cases, these hypotheses can be rejected because the Arawak forms in question either have clear internal etymologies or because the noted similarities are too superficial and no coherent or plausible picture for the phonological deviation between the putative loans and their presumed source forms can be offered. In advancing internal etymologies for the target Arawak forms I will also try to cast light on aspects of the historical developments of these languages, as well as raise some so far unacknowledged issues for future research. Next, I consider some plausible cases of Guarani loans in one Southern Arawak language, Terena, explicitly arguing for these contact etymologies and placing these loanwords within a chronological stratum in Terena history. Complications related to dissimilar sources in Arawak-Tupi- Guarani contact and to the status of Wanderwörter are also briefly addressed.|000|Tupi-Guarani, loan word, lexical borrowing, Arawakan, 4023|Carvalho2017|Study is interesting as it deals with borrowing of unrelated languages, which is an important field in historical linguistics that is so far only purely studied.|000|Tupi-Guarani, Arawakan, lexical borrowing, methodology 4024|Seifart2018|Many drum communication systems around the world transmit information by emulating tonal and rhythmic patterns of spoken languages in sequences of drumbeats. Their rhythmic characteristics, in particular, have not been systematically studied so far, although understanding them represents a rare occasion for providing an original insight into the basic units of speech rhythm as selected by natural speech practices directly based on beats. Here, we analyse a corpus of Bora drum communication from the northwest Amazon, which is nowadays endangered with extinction. We show that four rhythmic units are encoded in the length of pauses between beats. We argue that these units correspond to vowel-to-vowel intervals with different numbers of consonants and vowel lengths. By contrast, aligning beats with syllables, mora or only vowel length yields inconsistent results. Moreover, we also show that Bora drummed messages conventionally select rhythmically distinct markers to further distinguish words. The two phonological tones represented in drummed speech encode only few lexical contrasts. Rhythm thus appears to crucially contribute to the intelligibility of drummed Bora.|000|drummed language, speech rhythm, Amazonia, corpus studies, 4025|Seifart2018|Study is interesting with respect ot corpus annotation and multi-tiers as they need to encode different signals.|000|drummed language, speech rhythm, corpus studies 4026|Brooman2017|Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers prac- tical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don’t leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don’t include calculations in the raw data files, don’t use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text file.|000|dataset, data organization, annotation, spreadsheet, best practice, data format 4027|Brooman2017|Paper is important in so far as the authors discuss best practices of data annotation when using spreadsheets.|000|dataset, annotation, spreadsheet, best practice, data format 4028|Zacarella2017|Language comes in utterances in which words are bound together according to a simple rule-based syntactic computation (merge), which creates linguistic hierarchies of potentially infinite length—phrases and sentences. In the current functional magnetic resonance imaging study, we compared prepositional phrases and sentences—both involving merge—to word lists— not involving merge—to explore how this process is implemented in the brain. We found that merge activates the pars opercularis of the left inferior frontal gyrus (IFG; Brodmann Area [BA] 44) and a smaller region in the posterior superior temporal sulcus ( pSTS). Within the IFG, sentences engaged a more anterior portion of the area ( pars triangularis, BA 45)—compared with phrases—which showed activity peak in BA 44. As prepositional phrases, in contrast to sentences, do not contain verbs, activity in BA 44 may reflect structure-building syntactic processing, while the involvement of BA 45 may reflect the encoding of propositional meaning initiated by the verb. The pSTS appears to work together with the IFG during thematic role assignment not only at the sentential level, but also at the phrasal level. The present results suggest that merge, the process of binding words together into syntactic hierarchies, is primarily supported by BA 44 in the IFG.|000|merge, neurolinguistics, syntactic hierarchy, Chomsky syntax, 4029|Zacarella2017|Paper claims to have found where the brain accounts for Chomsky's MERGE procedure, namely the procedure that fuses units into higher units in grammar.|000|merge, Chomsky syntax, neurolinguistics, syntactic hierarchy 4030|Gale1995|Linguists and speech researchers who use statistical methods often need to estimate the frequency of some type of item in a population containing items of various types. A common approach is to divide the number of cases observed in a sample by the size of the sample; sometimes small positive quantities are added to divisor and dividend in order to avoid zero estimates for types missing from the sample. These approaches are obvious and simple, but they lack principled justification, and yield estimates that can be wildly inaccurate. I.J. Good and Alan Turing developed a family of theoretically well‐founded techniques appropriate to this domain. Some versions of the Good‐Turing approach are very demanding computationally, but we define a version, the Simple Good‐Turing estimator, which is straightforward to use. Tested on a variety of natural‐language‐related data sets, the Simple Good‐Turing estimator performs well, absolutely and relative both to the approaches just discussed and to other, more sophisticated techniques.|000|Good-Turing, frequency, Markov chain 4031|Houben2018|“We know that Middle Indian (Middle Indo-Aryan) makes its appearance in epigraphy prior to Sanskrit: this is the great linguistic paradox of India.” In these words Louis Renou (1956: 84) referred to a problem in Sanskrit studies for which so far no satisfactory solution had been found. I will here propose that the perceived “paradox” derives from the lack of acknowledgement of certain parameters in the linguistic situation of Ancient India which were insufficiently appreciated in Renou’s time, but which are at present open to systematic exploration with the help of by now well established sociolinguistic concepts, notably the concept of “diglossia”. Three issues will here be addressed in the light of references to ancient and classical Indian texts, Sanskrit and Sanskritic. A simple genetic model is indadequate, especially when the ‘linguistic area’ applies also to what can be reconstructed for earlier periods. The so-called Sanskrit “Hybrids” in the first millennium CE, including the Prakrits and Epics, are rather to be regarded as emerging “Ausbau” languages of Indo-Aryan with hardly any significant mutual “Abstand” before they will be succesfully “roofed,” in the second half of the first millennium CE, by “classical” Sanskrit.|000|Sanskrit, diglossia, roof language, linguistic variation, 4032|Tanenbaum2012|This culminates in a quasi-circular pattern among the tones of Southern Min, where in sandhi position first tone becomes seventh tone, seventh third, third second, and second first [...].|1|Mǐn, tone sandhi, 4033|Papakitsos2018|This paper presents an attempt to reconstruct the most basic features of the language of Homo Sapiens, following the principle of monogenesis, namely the viewpoint that since humans share a common biological ancestry, they also share a common linguistic one. Considering this issue, the basic methods of comparative linguistics are briefly presented first, along with the methodological approach utilized herein, named Qualitative Inquiry. The results of the reconstructing process are presented, classified in terms of phonological, morphological, lexical, grammatical and syntactic aspects. Only bordering to the scope of this paper, a brief comparison of this treatise to previous studies reveals both convergence and discrepancy concerning the features of the language.|000|crazy paper, proto-world, linguistic reconstruction 4034|Papakitsos2018|The Proto-Sapiens grammar was so simple that the sporadic references in previous paragraphs have essentially described it. The prime importance of sound symbolism for the people of nature should be noted again before we further detail that the vowel “E” was felt as indicating the “yin” element, passivity, femininity etc., while “O” indicated the “yang” element, activeness, masculinity etc.; “A” was neutral or spiritual, indicating things conceived by the mind and emotions rather than with the physical senses.|8|proto-world, crazy paper, linguistic reconstruction 4035|Chipman1998|In this article we put forward a Bayesian approach for finding classification and regression tree (CART) models. The two basic components of this approach consist of prior specification and stochastic search. The basic idea is to have the prior induce a posterior distribution that will guide the stochastic search toward more promising CART models. As the search proceeds, such models can then be selected with a variety of criteria, such as posterior probability, marginal likelihood, residual sum of squares or misclassification rates. Examples are used to illustrate the potential superiority of this approach over alternative methods.|000|classification tree, regression tree, CART models, stochastic search, 4036|Brdar2017|.. image:: static/img/Brdar2017-16.png :width: 800px :name: bla :comment:`Figure 1 in text` :comment:`Image on morpheme types` |16|morphology, typology, morpheme types, introduction 4037|BenGal2014|The construction of efficient decision and classification trees is a fundamental task in Big Data analytics which is known to be NP-hard. Accordingly, many greedy heuristics were suggested for the construction of decision-trees, but were found to result in local-optimum solutions. In this work we present the dual information distance (DID) method for efficient construction of decision trees that is computationally attractive, yet relatively robust to noise. The DID heuristic selects features by considering both their immediate contribution to the classification, as well as their future potential effects. It represents the construction of classification trees by finding the shortest paths over a graph of partitions that are defined by the selected features. The DID method takes into account both the orthogonality between the selected partitions, as well as the reduction of uncertainty on the class partition given the selected attributes. We show that the DID method often outperforms popular classifiers, in terms of average depth and classification accuracy.|000|decision tree, algorithm, 4038|Bancock1996|k-Decision lists and decision trees play important roles in learning theory as well as in practical learning systems. k-Decision lists generalize classes such as monomials, k-DNF, and k-CNF, and like these subclasses they are polynomially PAC-learnable [R. Rivest, Mach. Learning 2 (1987), 229246]. This leaves open the question of whether k-decision lists can be learned as efficiently as k-DNF. We answer this question negatively in a certain sense, thus disproving a claim in a popular textbook [M. Anthony and N. Biggs, ``Computa- tional Learning Theory,'' Cambridge Univ. Press, Cambridge, UK, 1992]. Decision trees, on the other hand, are not even known to be polynomially PAC-learnable, despite their widespread practical application. We will show that decision trees are not likely to be efficiently PAC-learnable. We summarize our specific results. The following problems cannot be approximated in polynomial time within $ a factor of 2 log n for any $<1, unless NP/DTIME[2 polylog n ]: a generalized set cover, k-decision lists, k-decision lists by monotone decision lists, and decision trees. Decision lists cannot be approximated in polynomial time within a factor of n $ , for some constant $>0, unless NP=P. Also, k-decision lists with l 01 alternations cannot be approximated within a factor log l n unless NP/DTIME[n O(log log n) ] (providing an interesting comparison to the upper bound obtained by A. Dhagat and L. Hellerstein [in ``FOCS `94,'' pp. 6474]).|000|lower bound, decision tree, algorithm 4039|Hyafil1976|We demonstrate that constructing optimal binary decision trees is an NP-complete problem, wehre an optimal tree is one which minimizes teh expected number of tests required to identify the unknown object.|000|NP-complete, decision tree, algorithm, proof 4040|Bybee2017|The question of whether or not grammatical factors can condition or block sound change has been discussed from many perspectives for more than a century without resolution (Melchert, 1975). Here we consider studies of sound change in progress which show that words or phrases that are used frequently in the phonetic environment for change undergo the change before those whose use is less frequent in these contexts. Because words of different categories and with different structures also have different distributions, they may occur preferentially in certain phonetic environments. Thus, some apparent cases of influence by grammatical and lexical factors can be explained by phonetic factors if we expand our notion of phonetic environment to include frequency within the environment for change, which includes the segmental environment as well as factors that affect the degree of prominence a word receives in context.|000|sound change, reasons for sound change, explanation of sound change, word frequency 4041|Bybee2017|Interesting claim, that seemingly grammatical triggers of sound change can in fact more easily explained with frequency effects, exemplified by a study of sound change in progress.|000|sound change in progress, sound change, explanation of sound change, sound change 4042|Schlenker2017|We argue that a gesture replacing an English verb—a ‘gestural verb’— displays some properties of ‘agreement verbs’ in American Sign Language (ASL). Specifically, gestural verbs involving (among others) slapping and punching can be realized as targeting the addressee (SLAP-2, PUNCH-2) if the object is second per- son, or as targeting some other position (SLAP-a, PUNCH-a) if the object is third person. This property is shared with ASL verbs that display object agreement. Strik- ingly, in both cases the object agreement marker can be disregarded under ellipsis and under the focus-sensitive particle only, a behavior which is shared with phi-features in spoken language, and is not entirely reducible to the presuppositional nature of the marker. The main findings are based on introspective judgments, but crucial exam- ples are validated by an experimental approach. In sum, we provide initial evidence that English gestural verbs have a grammar, and that it partly mirrors that of some sign language constructions.|000|gesture, agreement, grammaticalization, sign language 4043|Schlenker2017|As far as one can tell from the abstract, the article argues that certain cases of agreement in gesture systems (addressed to the listener or to a third person) are similar in sign languages and in normal "language" of gesture among humans.|000|gesture, sign language, agreement, 4044|Haspelmath1999a|Interesting review of @Lightfoot<1999>'s book on language change, which argues against the basic theory and shows example for language change in adults.|000|language change, reasons for language change, Chomsky syntax, review, critics 4045|Wiener1987|There is no good evidence for or against the holophyly of language. A single origin cannot be assumed with confidence because the relevant charac- ters have been lost with the passage of time. Relationships can be analyzed only when a common ancestor can be confidently posited on the basis of re- curring correspondences between sound and meaning in basic vocabulary items and grammatic paradigms.|220|monophyly, holophyly, biological parallels, origin of language 4046|Schietecat2018|Although researchers have repeatedly shown that the meaning of the same concept can vary across different contexts, it has proven difficult to predict when people will assign which meaning to a concept, and which associations will be activated by a concept. Building on the affective theory of meaning (Osgood, Suci, & Tannenbaum, 1957) and the polarity correspondence principle (Proctor & Cho, 2006), we propose the dimension-specificity hypothesis with the aim to understand and predict the context- dependency of cross-modal associations. We present three sets of experiments in which we use the dimension-specificity hypothesis to predict the cross-modal associations that should emerge between aggression-related concepts and saturation and brightness. The dimension-specificity hypothesis predicts that cross-modal associations emerge depending upon which affective dimension of meaning (i.e., the evaluation, activity, or potency dimension) is most salient in a specific context. The salience of dimensions of meaning is assumed to depend upon the relative conceptual distances between bipolar opposed concept pairs (e.g., good vs. bad). The dimension-specificity hypothesis proposes that plus and minus polarities will be attributed to the bipolar concepts, and associations between concrete and affective abstract concepts that share plus or minus polarities will become activated. Our data support the emergence of dimension- specific polarity attributions. We discuss the potential of dimension-specific polarity attributions to understand and predict how the context influences the emergence of cross-modal associations.|000|concepts, cognitive linguistics, word association, empirical study 4047|Schietecat2018|Study's introduction has some interesting accounts on concepts and ambiguity, similar to the fact that a concept can cross-linguistically colexify differently. This study could be quoted in this context, as colexifications are not handled by them, but they report about similar phenomena.|000|colexification, word association, concepts, cognitive linguistics, empirical study 4048|Schietecat2018|Although researchers have repeatedly shown that the meaning of the same concept can vary across different contexts, it has proven difficult to predict when people assign which meaning to a concept. Incorporating context effects in models of human cognition is one of the major future challenges in psychological research.|1/20|context, concepts, meaning, cognitive linguistics 4049|Honey2018|Aus seiner Perspektive er- zeugt REM-Schlaf eine virtuelle Realität, ein Protobewusstsein, wie er es nennt, »inklusive eines imaginären Agenten (des Protoselbst), das sich durch einen vom Gehirn erzeugten Raum bewegt und dabei starke Emotionen erfährt«. Dabei aktiviere sich das Gehirn auf eine Weise selbst, die Erlebnisse im Wachzustand vorwegnehme – im träumenden Fötus als Vorbereitung auf das Wachsein nach der Geburt, später dann zunehmend als Mittel, um das interne Modell der Realität zu optimieren.|37|human brain, dreaming, simulation, reality, 4050|MunozRodriguez2018|The sweet potato is one of the world’s most widely consumed crops, yet its evolutionary history is poorly understood. In this paper, we present a comprehensive phylogenetic study of all species closely related to the sweet potato and address several questions pertaining to the sweet potato that remained unanswered. Our research combined genome skimming and target DNA capture to sequence whole chloroplasts and 605 single-copy nuclear regions from 199 specimens representing the sweet potato and all of its crop wild relatives (CWRs). We present strongly supported nuclear and chloroplast phylogenies demonstrating that the sweet potato had an autopolyploid origin and that Ipomoea trifida is its closest relative, confirming that no other extant species were involved in its origin. Phylogenetic analysis of nuclear and chloroplast genomes shows conflicting topologies regarding the monophyly of the sweet potato. The process of chloroplast capture explains these conflicting patterns, showing that I. trifida had a dual role in the origin of the sweet potato, first as its progenitor and second as the species with which the sweet potato introgressed so one of its lineages could capture an I. trifida chloroplast. In addition, we provide evidence that the sweet potato was present in Polynesia in pre-human times. This, together with several other examples of long-distance dispersal in Ipomoea, negates the need to invoke ancient human-mediated transport as an explanation for its presence in Polynesia. These results have important implications for understanding the origin and evolution of a major global food crop and question the existence of pre-Columbian contacts between Polynesia and the American continent.|000|sweet potato, linguistic evidence, linguistic palaeography 4051|MunozRodriguez2018|Interesting article claims that the origin of the sweet potato does not go back to early exchange between the Americas and Polynesia. It cannot explain the linguistic similarity between names for sweet potato, so it falls short on explaining this, but it argues that the origin of the sweet potato in Polynesia does not go back to humans bringing crops over the ocean.|000|linguistic palaeography, plant genetics, plant domestication, sweet potato, 4052|Haspelmath2018a|Another striking example from linguistics is the shortness of frequent words, which is surely adaptive. But there are quite diverse paths to shortness. According to Zipf (1935), shorter words are shorter because they underwent clipping processes (e.g. laboratory > lab), and according to Bybee (2007: 12), short words are short because “high-frequency words undergo reductive changes at a faster rate than low-frequency words... the major mechanism is gradual phonetic reduction”. But actually in most cases, rarer words are longer because they are (originally) complex elements, consisting of multiple morphs, e.g. horse vs. hippopotamus, car vs. cabriolet, church vs. cathedral. Drastic shortening of longer words seems to occur primarily in the modern age with its large number of technical and bureaucratic innovations, but even here, clipping is only one of many possibilities; for example, Ronneberger-Sibold (2014) discusses a number of fairly diverse of “shortening techniques” in German. What unites all of these processes is only one feature: the outcomes of the changes, which are functionally adapted.|12/17|grammaticalization, language change, functional-adaptive constraint, constraints, 4053|Elimam2018|Words can be matched with the concept of sign (correspondence of a signifier to a signified) as long as they act as symbol-words endowed with some semantic self-sufficiency. But in discourse, they lose their wholeness as symbol-words and metamorphose into wording-symbols. They, suddenly, appear as mere signifier entities with a more or less loose allusion to their status as cultural symbols. In discourse, words are no longer signs but tools covering ephemeral collections of neurosemes: the link of the sign breaks as soon as discourse takes over. The referential potential is no longer the schematic meaning issued from culture, but the universe of discourse under construction. This is why any attempt to account for meaning in language must integrate the neural process of meaning creation. It is now established that meaning is not the result of language activity but the result of cognition. However, what language does, via discourse, is to make this meaning communicable. For all these reasons, the task of linguistics should be to investigate the relationship between cognition and linguistic output in order to shed light on all the cognitive traces left within the surface strings. The role of morphosyntax thus has to be re-evaluated in this light.|000|linguistic sign, reference potential, morphosyntax, cognition, opinion paper 4054|Verhagen2018|While theories on predictive processing posit that predictions are based on one’s prior experiences, experimental work has effectively ignored the fact that people differ from each other in their linguistic experiences and, consequently, in the predictions they generate. We examine usage-based variation by means of three groups of participants (recruiters, job-seekers, and people not (yet) looking for a job), two stimuli sets (word sequences characteristic of either job ads or news reports), and two experiments (a Completion task and a Voice Onset Time task). We show that differences in experiences with a particular register result in different expectations regarding word sequences characteristic of that register, thus pointing to differences in mental representations of language. Subsequently, we investigate to what extent different operationalizations of word predictability are accurate predictors of voice onset times. A measure of a participant’s own expectations proves to be a significant predictor of processing speed over and above word predictability measures based on amalgamated data. These findings point to actual individual differences and highlight the merits of going beyond amalgamated data. We thus demonstrate that is it feasible to empirically assess the variation implied in usage-based theories, and we advocate exploiting this opportunity. |000|experimental study, predictive language processing, usage, 4055|Himmelmann2018| This study is concerned with the identifiability of intonational phrase boundaries across familiar and unfamiliar languages. Four annotators segmented a corpus of more than three hours of spontaneous speech into intonational phrases. The corpus included narratives in their native German, but also in three languages of Indonesia unknown to them. The results show significant agreement across the whole corpus, as well as for each subcorpus. We discuss the interpretation of these results, including the hypothesis that it makes sense to distinguish between phonetic and phonological intonational phrases, and that the former are a universal characteristic of speech, allowing listeners to segment speech into intonational phrase-sized units even in unknown languages. |000|universals, universality, intonation, phrases, empirical study, German, Indonesian, corpus studies 4056|Pepper2018|Dissertation draft offers rich material for investigation questions we discuss in CLICS and beyond. Especially the terminology, e.g., on binomial lexemes, is highly interesting. And his work contains a questionnaire with 100-200 meanings.|000|binomial lexemes, semantics, colexification, cross-linguistic study 4057|Labov2018|The spread of the new quotative be like throughout the English-speaking world is a change from above for each community that receives it. Diffusion of this form into Philadelphia is traced through the yearly interviews of the Philadelphia Neighborhood Corpus, beginning with young adults in 1979 and spreading to adolescents in 1990, a generation later. The first users of be like form the Avant Garde, young adults with extensive awareness of linguistic patterns within and without the city. The use of this quotative in Philadelphia is favored by constraints that are found elsewhere, particularly to introduce inner speech that is not intended to be heard by others and to cite exemplars of a range of utterances. Not previously reported is a strong tendency to be favored for quotations with initial exclamations, prototypically expressions of surprise and alarm such as “Oh” and “Oh my god!”.|000|quotative, language change in progress, empirical study, language change 4058|Labov2018|The Avant Garde does not appear to be a social network or a community of practice. It is an uncollected collection of influentials, whose children inherit what they have done, and do more with it.|20|language change, language change in progress, empirical study, nice quote 4059|Hanks2018|This chapter is set in the context of Corpus Pattern Analysis (CPA), a technique developed by Patrick Hanks to map meaning onto word patterns found in corpora. The main output of CPA is the Pattern Dictionary of English Verbs (PDEV), cur- rently describing patterns for over 1,600 verbs, many of which are acknowledged to be multiword expressions (MWEs) such as phrasal verbs or idioms. PDEV entries are manually produced by lexicographers, based on the analysis of a substantial sample of concordance lines from the corpus, so the construction of the resource is very time-consuming. The motivation for the work presented in this chapter is to speed up the discovery of these word patterns, using methods which can be transferred to other languages. This chapter explores the benefits of a detailed con- trastive analysis of MWEs found in English and French corpora with a view on English-French translation. The comparative analysis is conducted through a case study of the pair (bite, mordre), to illustrate both CPA and the application of sta- tistical measures for the automatic extraction of MWEs. The approach taken in this chapter takes its point of departure from the use of statistics developed ini- tially by Church & Hanks (1989). Here we look at statistical measures which have not yet been tested for their ability to discover new collocates, but are useful for characterizing verbal MWEs already found. In particular we propose measures to characterize the mean span, rigidity, diversity, and idiomaticity of a given MWE.|000|multiword expressions, corpus studies, linguistic annotation 4060|Hanks2018|Study is interesting on the background of hand-crafted annotation by experts using corpus annotation software. Since they als encode semantics, it is definitely worth a read.|000|multiword expressions, corpus studies, linguistic annotation 4061|Hanks2018|The alternative view mentioned here is supported by lexicographers such as Atkins et al. (2001), Kilgarriff et al. (2004), and Hanks (2000). These lexicogra- phers argue that much of the meaning of an utterance is carried by underlying patterns of co-selection of the words actually used, rather than by simple con- catenations. These conclusions overlap to some extent with the tenets of Con- struction Grammar, though the methodologies are very different. In corpus lin- guistics, Sinclair declared, after a lifetime’s empirical research into texts, corpora, and meaning, “Many if not most meanings require the presence of more than one word for their normal realisation” (Sinclair 1998: 4).|94|context, meaning, corpus studies, construction grammar 4062|Hanks2018|More to the point is the fact that many other expressions, that at first sight might be considered compositional, are associated with a limited phraseology. They do not vary freely, but employ selectional variations drawn from within a (usually quite small) lexical set. Such patterns are found for many expressions that intuitions alone might encourage us to classify as fixed. Corpus evidence shows that people not only grasp at straws, they also clutch at straws and even seize on straws. Moon (1998) observes that shiver in one’s shoes (meaning ‘to be afraid’) may at first seem to be a fixed expression, but in fact corpus evidence shows that every lexical item in the expression allows a modicum of variation: people quake in their boots, shake in their sandals, and she even found a mention of policemen quaking in their size fourteens.|95|corpus studies, multiword expressions, 4063|Goncalves2018|As global political preeminence gradually shifted from the United Kingdom to the United States, so did the capacity to culturally influence the rest of the world. In this work, we ana- lyze how the world-wide varieties of written English are evolving. We study both the spatial and temporal variations of vocabulary and spelling of English using a large corpus of geolo- cated tweets and the Google Books datasets corresponding to books published in the US and the UK. The advantage of our approach is that we can address both standard written language (Google Books) and the more colloquial forms of microblogging messages (Twit- ter). We find that American English is the dominant form of English outside the UK and that its influence is felt even within the UK borders. Finally, we analyze how this trend has evolved over time and the impact that some cultural events have had in shaping it.|000|geography, American English, British English, geospatial mapping, Google N-Grams, corpus studies 4064|Moravec1988|Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much more powerful, though usually unconscious, sensorimotor knowledge. We are all prodigious olympians in perceptual and motor areas, so good that we make the [pb] difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.|15f|artificial intelligence, machine learning, nice quote, evolution of the brain, evolution of cognition 4065|Wu2016|Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference – sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.|000|machine translation, Google, neural network, machine learning, 4066|Padhi2017|We address the problem of learning comprehensive syntactic profiles for a set of strings. Real-world datasets, typically curated from multiple sources, often contain data in various formats. Thus any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify various formats is infeasible in standard big-data scenarios. We present a technique for generating comprehensive syntactic profiles in terms of user-defined patterns that also allows for interactive refinement. We define a syntactic profile as a set of succinct patterns that describe the entire dataset. Our approach efficiently learns such profiles, and allows refinement by exposing a desired number of patterns. Our implementation, FlashProfile, shows a median profil- ing time of 0.7 s over 142 tasks on 74 real datasets. We also show that access to the generated data profiles allow for more accurate synthesis of programs, using fewer examples in programming-by-example workflows.|000|syntactic profiling, data profiling, machine learning, computer-aided approaches 4067|Padhi2017|Data profiling is the process of automatically discovering useful metadata (typically as a succinct summary) for the data [...]. In this work, we focus on syntactic profiling, i.e. learning a succinct structural description of the data. We present FlashProfile, a novel technique for learning syntactic profiles that satisfy the following three desirable properties: *Comprehensive*: We expose the syntactic profile as a set of patterns, which cover 100% of the data. *Refinable*: Users can interactively refine the granularity of profiles by requesting the desired number of patterns. *Extensible*: Each pattern is a sequence of atomic patterns, or atoms. Our pattern learner L P includes a default set of atoms (e.g., digits and identifiers), and users can extend it with appropriate domain-specific atoms for their datasets.|1/14|syntactic profiling, data profiling, definition, computer-aided approaches, computer-assisted analysis 4068|Grant2018|Steinthal’s 1875 chapter in a scientific manual for explorers (von Neumayer 1875) presented a phonetic alphabet and long lists of words and phrases to be translated into the local languages which the explorers encountered, in order to provide material for subsequent analysis and as an aid to learning the language. This approach informed much of the linguistic work of scholars, such as the descriptivist linguist and typologist Franz Nikolaus Finck (1867-1910), but many of his descriptive writings can be read with some detachment from this theory.|92|fieldwork, phonetic alphabet, history of science, language description 4069|Gulwani2011|We describe the design of a string programming/expression lan- guage that supports restricted forms of regular expressions, condi- tionals and loops. The language is expressive enough to represent a wide variety of string manipulation tasks that end-users struggle with. We describe an algorithm based on several novel concepts for synthesizing a desired program in this language from input-output examples. The synthesis algorithm is very efficient taking a fraction of a second for various benchmark examples. The synthesis algo- rithm is interactive and has several desirable features: it can rank multiple solutions and has fast convergence, it can detect noise in the user input, and it supports an active interaction model wherein the user is prompted to provide outputs on inputs that may have multiple computational interpretations. The algorithm has been implemented as an interactive add-in for Microsoft Excel spreadsheet system. The prototype tool has met the golden test - it has synthesized part of itself, and has been used to solve problems beyond author’s imagination.|000|string programming, regular expression, programming languages, algorithms 4070|Gulwani2011|This paper may be very relevant for handling sound change as a sub-task of string manipulation.|000|sound change, regular expression 4071|Gulwani2011|We have identified a string expression language that is expressive enough to describe various string manipulation tasks succinctly, while at the same time concise enough to be amenable for efficient learning. There is a tradeoff between the expressiveness of a search space, and the complexity of finding simple consistent hypotheses within that space [6, 18]. In general, the more expressive a search space, the harder the task of finding consistent hypotheses within that search space. However, it is also worth-mentioning that the expressiveness-complexity tradeoff is not as simple as it seems, as an expressive language can sometimes make a simple theory fit the data, whereas restricting the expressiveness of the language means that any consistent theory must be very complex. Our string expres- sion language seems to enjoy the right tradeoff. We present a core version of this language; extensions that enable easy adaptation of the underlying algorithm are mentioned later in Section 4.7.1.|2/13|string expression language, algorithms, regular expression, string manipulation, 4072|Calve1995|In this paper, dissimilarity relations are defined on triples rather than on dyads. We give a definition of a three-way distance analogous to that of the ordi- nary two-way distance. It is shown, a s a straightforward generalization, that it is possible to define three-way ultrametric, three-way star, and three-way Euclidean distances. Special attention is paid to a model called the semi-perimeter model. We construct new methods analogous to the existing ones for ordinary dis- tances, for example: principal coordinates analysis, the generalized Prim (1957) algorithm, hierarchical cluster analysis.|000|distances, three-way distances, classification, theory of classification 4073|Stachowski1995|This paper argues that automatic phonetic comparison will only return true results if the languages in question have similar and comparably lenient phonologies. In the situ- ation where their phonologies are incompatible and / or restrictive, linguistic knowledge of both of them is necessary to obtain results matching human perception. Whilst the case is mainly exemplified by Levenshtein distance and Russian loanwords in Dolgan, the conclusion is also applicable to the approach as a whole.|000|edit distance, computer-based approaches, evaluation, lexical borrowing, Russian, Dolgan, 4074|Bacovcin2018|This paper makes two contributions. First, we introduce a new method for computational cladistics that produces a rooted tree by minimising the number of homoplasies. This method is compared with lexicostatistics and maximum parsimony. We validate the method on Indo-European data and show that the tree derived is consistent with current understanding of the internal cladistics of that family. Secondly, we turn the method to treat the less well studied problem of the internal cladistics of Afro-Asiatic. We show that there is good evidence for a North/South division in Afro- Asiatic with Berber, Egyptian and Semitic in the North and Chadic, Cushitic and Omotic in the South. There is also tentative evidence for further grouping of Egyptian with Semitic and Cushitic with Omotic.|000|computational cladistics, cladistics, genetic classification, Afroasiatic languages, subgrouping 4075|Bacovcin2018|Our new method (the minimum homoplasy method) attempts to apply Hennig’s 1966 notion of clade defining state directly. As discussed above, Hennig proposed that a clade is defined by possessing a shared innovation (i.e. jointly inheriting a character state). These clade defining states are called synapomorphies, while non-clade defining traits (e.g. from accidental parallel development or borrowing) are called homoplasies. Our method relies on the assumption that true clades are likely share a cluster of synapomorphies, since the ancestor of clades often remain together long enough to develop a number of shared innovations before dividing into smaller units. Our method optimizes for the rooted tree such that we minimize the number of states that must be treated as homoplasies.|7|cladistics, computer-based approaches, phylogenetic reconstruction, 4076|Bacovcin2018|Not clear to which degree the method differs from parsimony. The authors claim, parsimony would not work on rooted trees, but this is not true, given that one can use step matrices in Paup. Code and data are also not shared, so it's not clear how they implement the method.|000|cladistics, Afroasiatic languages, phylogenetic reconstruction, methodology, algorithms 4077|Salmons2018|Typical words in some East Asian languages, including Chinese, have reduced historically from disyllabic (CV·CV(C)) to monosyllabic (C(C)V(C)) and then open monosyllables (CV). More recently, in some of those languages, many monosyllabic CV forms again appear as disyllabic (CV·CV). The former developments result from a variety of apparently unconnected segmental changes. In the latter, they often reflect morphological innovations, like compounding and affixation. That is, apparently disparate segmental phonological processes reduced monosyllabic word templates and apparently disparate morphological and phonological processes have created new disyllables, which can all be captured in terms of preferred prosodic templates. We integrate these areal and genetic patterns into a growing literature on prosodic templates in diachrony, expanding the set of languages and patterns. That body of work has focused on sound changes that bring words into alignment with templates while our cases studies also involve clear changes in the templatic structures themselves. Finally, the patterns reviewed here resemble cycles of prosodic change, driven by tensions between reduction and minimal word constraints; we suggest that these phenomena show ‘bare’ prosodic cyclicity without the grammatical or functional ramifications of familiar cycles of change.|000|prosody, template phonology, Chinese, cycles of change, South-East Asian languages, sound change, 4078|Jespersen1917|The history of negative expressions in various languages makes us witness the following curious fluctuation: the original negative adverb is first weakened, then found insufficient and therefore strengthened, generally through some additional word, and this in its turn may be felt as the negative proper and may then in course of time be subject to the same development as the original word. [cited after @Salmons2018 551]|4|cycles of change, negation, language change, lexical change 4079|Jespersen1917|A much smaller body of work has explored the role of prosodic templates in sound change and this paper adds additional examples and types of examples to that set. While earlier work (crucially Murray and Vennemann 1983; @Vennemann<1988> 1988), make a case for the role of syllabic templates in sound change, work on Mixtec (Macken and Salmons 1997) argues that word structure can be organized around a foot template, patterns which cannot be captured by syllable-level processes alone.|552|prosodic template, sound change, 4080|Salmons2018|:comment:`[Presents the classical statistics on monosyllabicity in Chinese]`|554|Chinese, Old Chinese, monosyllabicity, polysyllabification 4081|Salmsons2018| * Homonymy/homophony avoidance on East Asian languages (Michaud 2012: 127), especially Chinese (Guo 1938: 2; Lü 1963: 21; Li and Thompson 1981: 14, 44, 392; Kaplan 2015: 269), * Addition of new lexical material, e.g., in Chinese (Duanmu 1999: 1; Cheng 2008: 36–39) and in Thai (Yang and He 2012), * Addition of phonological material, often epenthesis aimed at preserving or creating a particular prosodic structure (Itô 1989; Hall 2011), or * A historical “foot shift” (Feng 1998: 230, Feng 2000: 128 129, and Feng 2009: 37–40).|555|Old Chinese, polysyllabification, Chinese, sound change 4082|Salmsons2018|This is exactly what Macken and Salmons (1997: 42) propose, namely that “Crosslinguistically, feet are bimoraic in quantity-sensitive languages and dis- yllabic in quantity-insensitive languages.” Archaic Chinese was quantity-sensi- tive, and its feet were bimoraic; Modern Chinese, however, being quantity- insensitive, can have disyllabic feet.|564|Chinese, foot, prosody, quantity-sensitive language, quantity-insensitive language, language typology, 4083|Jin2018|This paper analyzes a hitherto unnoticed semantic change process in Chinese, in which lexical (adjectival) materials develop into superlative operators, and subsequently turn into definiteness markers. Our analysis focuses on the semantic factors that underlie this meaning change trajectory. Specifically, we argue that frequent association of gradable adjectives with superlative implication leads to pragmatic strengthening in which the superlative implication conventionally enters the literal meaning. Furthermore, we show that a further change in the extension of the nominal part of superlative phrases leads to a maximality reanalysis that is compatible with the semantics of definite NPs. This paper contributes to the burgeoning field of applying truth-conditional semantics to theories of grammaticalization.|000|semantic change, Chinese, superlative 4084|Retzlaff2018|Multiple sequence alignments are an essential tool in bioinformatics and computational biology, where they are used to represent the mutual evolutionary relationships and similarities between a set of DNA, RNA, or protein sequences. More recently they have also received considerable interest in other application domains, in particular in comparative linguistics. Multiple sequence alignments can be seen as a generalization of the string- to-string edit problem to more than two strings. With the increase in the power of computational equipment, exact, dynamic programming solutions have become feasible in practice also for 3- and 4-way alignments. For the pairwise (2-way) case, there is a clear distinction between local and global alignments. As more sequences are considered, this distinction, which can in fact be made independently for both ends of each sequence, gives rise to a rich set of partially local alignment problems. So far these have remained largely unexplored. Here we introduce a general formal framework that gives raise to a classification of partially local alignment problems. This leads to a generic scheme that guides the principled design of exact dynamic programming solutions for particular partially local alignment problems.|000|multiple sequence alignment, local alignment, algorithms, 4085|Retzlaff2018|Article is interesting in so far as it describes multiple local alignment as a potentially useful idea for different kinds of multiple alignments. This could be interesting for the task of morpheme or word family detection.|000|morpheme detection, multiple sequence alignment, local alignment, word family 4086|Mortensen2016|This paper contributes to a growing body of evidence that—when coupled with appropriate machine-learning techniques—linguistically motivated, information-rich representations can out- perform one-hot encodings of linguistic data. In particular, we show that phonological features outperform character-based models using the PanPhon resource. PanPhon is a database relating over 5,000 IPA segments to 21 subsegmental articulatory features. We show that this database boosts performance in various NER-related tasks. Phonologically aware, neural CRF models built on PanPhon features are able to perform comparably to character-based models on monolin- gual Spanish and Turkish NER tasks. On transfer models (as between Uzbek and Turkish) they have been shown to perform better. Furthermore, PanPhon features also contribute measurably to Orthography-to-IPA conversion tasks.|000|software, dataset, IPA, phonetic transcription, distinctive features 4087|Pepper2011|What is the role of ontologies in language documentation theory and practice? This paper clarifies the meaning of the term ‘ontology’ in the context of information management and the Web, and emphasizes the importance of distinguishing between knowledge representation and knowledge organization. It then examines how the term ‘ontology’ has been applied in the field of linguistics, focusing on a particular kind of ontology that is regarded as especially relevant in the context of language documentation. The General Ontology for Linguistic Description (GOLD) is presented in some detail, along with criticisms that have been raised against it. Finally it is suggested that the discipline of language documentation has more need for a knowledge organization system, and a shared thesaurus, than for an ontology- based knowledge representation system.|000|ontology, introduction, definition 4088|Nickels2007|The notion ‘ontology for linguistics’ refers to those conceptualizations of the domain of language and languages that are used to ‘talk linguistics’, to express and describe linguistic phenomena with the help of the corresponding concepts and the relations between them. The linguistic codings of these concepts are often, but by no means exclusively, technical terms of linguistics. :comment:`[quoted after` @Pepper2011 :comment:`]`|39|ontology, linguistics, definition 4089|Pepper2011|Instead what is needed is a kind of thesaurus – a knowledge organization system – consisting of a set of concepts with globally unique identifiers that can be used as the values of thick metadata. In order to account for gradience, those concepts should not be defined more precisely than necessary, and any hierarchies into which they are organized should not be based on subsumption. Such a thesaurus would improve the findability of documentations and lead to more efficient use of resources. It would not necessarily improve their documentation value as such (except, possibly, in encouraging greater consistency), but it can be claimed that the value of a documentation – like that of any information – resides as much in its findability as in its actual content: a language documentation, whatever its quality, is of no value at all if its content cannot be located.|216|thesaurus, ontology, linguistics, methodology, description 4090|Kershenbaum2014|Many animals produce vocal sequences that appear complex. Most researchers assume that these sequences are well characterized as Markov chains (i.e. that the probability of a particular vocal element can be calculated from the history of only a finite number of preceding elements). However, this assumption has never been explicitly tested. Furthermore, it is unclear how language could evolve in a single step from a Markovian origin, as is frequently assumed, as no intermediate forms have been found between animal communication and human language. Here, we assess whether animal taxa produce vocal sequences that are better described by Markov chains, or by non-Markovian dynamics such as the ‘renewal process’ (RP), characterized by a strong tendency to repeat elements. We examined vocal sequences of seven taxa: Bengalese finches Lonchura striata domestica, Carolina chickadees Poecile carolinensis, free-tailed bats Tadarida brasiliensis, rock hyraxes Procavia capensis, pilot whales Globicephala macrorhynchus, killer whales Orcinus orca and orangutans Pongo spp. The vocal systems of most of these species are more consistent with a non-Markovian RP than with the Markovian models traditionally assumed. Our data suggest that non-Markovian vocal sequences may be more common than Markov sequences, which must be taken into account when evaluating alternative hypotheses for the evolution of signalling complexity, and perhaps human language origins.|000|Markov chain, bird song, animal vocal sequence, language origin, 4091|Kershenbaum2014|Paper is interesting in so far as it claims that there is an alternative process compared to simple Markov chains. The question is, however, to which degree this claim holds in the light of more complex sequence models that still use the Markovian property but circumvent it (as our multi-tieres sequence representation).|000|multi-tiers, Markov chain, bird song, 4092|Penagarikano2004|This paper presents the theoretical basis of layered Markov model s (LMM), which integrate all the knowledge levels commonly used in automatic speech recognition (acoustic, lexical and language levels) in a single model. Each knowledge level is rep- resented by a set of Markov models (or even hidden Markov mod- els) and all these sets are arranged in a layered structure. Given that common supervised training and recognition paradigms can be also expressed as simple Markov models, they can be formal- ized and integrated into the model as an extra knowledge layer. In addition, it is shown that hidden Markov models (HMM) and newer HMM2 can be considered as particular instances of LMM.|000|layered Markov model, Markov chain, multi-tiers, NLP, speech recognition 4093|Kershenbaum2014|Both RP and PHM models are considered non-Markovian because they do not rely on finite memory. In RP models, a particular behaviour (for instance, production of a particular vocal syllable) is repeated for some probabilistically determined time. Transitions between syllables of different types are still defined by a transition table as with a pFSA, but the number of repeats of each syllable in between transitions may be drawn from a distribution (e.g. Poisson). Although at first surprising, it can be shown that the sequence generated by such a process is non-Markovian [23] and cannot be well described by a pFSA. The RP does not fit the Markovian paradigm of finite memory, since the Poisson tail is unbounded. The PHM also relies on a nominally unbounded memory; in this case, the probability of a particular syllable occurring increases with the time since its last occurrence, and then falls to a minimum as soon as the syllable is used.|2/9|bird song, renewal process model, sequence modeling, 4094|Kershenbaum2014|pFSAs remain popular for characterizing animal vocal sequences [11,14], as the mechanism for producing Markov chains is easily understood, and simple neural mechanisms for implementing them have been postulated, based on neuroanatomical observations [17,18]. However, Markov chains are insufficient for producing the complexity of any human language [9], and there exist grammatical structures that no pFSA can generate, in particular tree-like syntax such as ‘the hyrax ate the grass that grew near the rock under the tree’ [11]. Furthermore, no intermediate grammatical form exists between pFSA models and the CFG of human language [9]. It is not clear what adaptive force could drive the gradual evolution of CFGs in a species that uses only pFSA vocal communication. In computer science, the addition of register memory, which provides the ability to count the number of repetitions of a syllable, appears to be a simple transition from regular to context-free automata [19]. However, such models have not been described in animal communication.|2/9|finite state transducer, bird song, context-free grammar, CFG, 4096|Kershenbaum2014|:comment:`Workflow for evaluation (Figure 1)` .. image:: static/img/Kershenbaum2014-1.png :width: 80% :name: image :comment:`Figure 1` |4/9|edit distance, evaluation, predictive power, grammar model 4097|Guthrie2006|Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model language, but they allow for tokens to be skipped) to overcome the data sparsity problem. We analyze this by computing all possible skip-grams in a training corpus and measure how many adjacent (standard) n-grams these cover in test documents. We examine skip-gram modelling using one to four skips with various amount of training data and test against similar documents as well as documents generated from a machine translation system. In this paper we also determine the amount of extra training data required to achieve skip-gram coverage using standard adjacent tri-grams.|000|skip-grams, n-gram model, language model, skip-gram modeling, data sparsity, 4098|Navarro2009|Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential re- source for Natural Language Processing. It requires however to be processed be- fore being used efficiently as an NLP re- source. After describing the relevant as- pects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We pro- vide an in-depth study of these synonymy networks and compare them to those ex- tracted from traditional resources. Fi- nally, we describe two methods for semi- automatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself.|000|Wiktionary, synonyms, synonym networks, semantic similarity, automatic approach 4099|Navarro2009|Paper describes an algorithm to assess the semantic proximity if links are not already established by the network. This algorithm may be interesting to be used for colexification networks as well.|000|synonym networks, automatic approach, semantic similarity, Wiktionary 4100|Jelbert2018|Cumulative cultural evolution occurs when social traditions accumulate improvements over time. In humans cumulative cultural evolution is thought to depend on a unique suite of cognitive abilities, including teaching, language and imitation. Tool-making New Caledonian crows show some hallmarks of cumulative culture; but this claim is contentious, in part because these birds do not appear to imitate. One alternative hypothesis is that crows’ tool designs could be culturally transmitted through a process of mental template matching. That is, individuals could use or observe conspecifics’ tools, form a mental template of a particular tool design, and then reproduce this in their own manufacture – a process analogous to birdsong learning. Here, we provide the first evidence supporting this hypothesis, by demonstrating that New Caledonian crows have the cognitive capacity for mental template matching. Using a novel manufacture paradigm, crows were first trained to drop paper into a vending machine to retrieve rewards. They later learnt that only items of a particular size (large or small templates) were rewarded. At test, despite being rewarded at random, and with no physical templates present, crows manufactured items that were more similar in size to previously rewarded, than unrewarded, templates. Our results provide the first evidence that this cognitive ability may underpin the transmission of New Caledonian crows’ natural tool designs.|000|template, New Caledonian Crows, hook tools, tool manufacture, 4101|Dediu2018|One of the best-known types of non-independence between languages is caused by genealogical relationships due to descent from a common ancestor. These can be repre- sented by (more or less resolved and controversial) language family trees. In theory, one can argue that language families should be built through the strict application of the comparative method of historical linguistics, but in practice this is not always the case, and there are several proposed classifications of languages into language families, each with its own advantages and disadvantages. A major stumbling block shared by most of them is that they are relatively difficult to use with computational methods, and in particular with phylogenetics. This is due to their lack of standardization, coupled with the general non-availability of branch length information, which encapsulates the amount of evolution taking place on the family tree. In this paper I introduce a method (and its implementation in R ) that converts the language classifications pro- vided by four widely-used databases (Ethnologue, WALS, AUTOTYP and Glottolog) into the de facto Newick standard generally used in phylogenetics, aligns the four most used conventions for unique identifiers of linguistic entities (ISO 639-3, WALS, AUTOTYP and Glottocode), and adds branch length information from a variety of sources (the tree’s own topology, an externally given numeric constant, or a distance matrix). The R scripts, input data and resulting Newick trees are available under liberal open-source licenses in a GitHub repository (https://github.com/ddediu/lgfam‐newick), to encour- age and promote the use of phylogenetic methods to investigate linguistic diversity and its temporal dynamics.|000|Newick, phylogenetic tree, reference tree, gold standard, Glottolog, branch lengths 4102|Dediu2018|One of the best-known types of non-independence between languages is caused by genealogical relationships due to descent from a common ancestor. These can be repre- sented by (more or less resolved and controversial) language family trees. In theory, one can argue that language families should be built through the strict application of the comparative method of historical linguistics, but in practice this is not always the case, and there are several proposed classifications of languages into language families, each with its own advantages and disadvantages. A major stumbling block shared by most of them is that they are relatively difficult to use with computational methods, and in particular with phylogenetics. This is due to their lack of standardization, coupled with the general non-availability of branch length information, which encapsulates the amount of evolution taking place on the family tree. In this paper I introduce a method (and its implementation in R ) that converts the language classifications pro- vided by four widely-used databases (Ethnologue, WALS, AUTOTYP and Glottolog) into the de facto Newick standard generally used in phylogenetics, aligns the four most used conventions for unique identifiers of linguistic entities (ISO 639-3, WALS, AUTOTYP and Glottocode), and adds branch length information from a variety of sources (the tree’s own topology, an externally given numeric constant, or a distance matrix). The R scripts, input data and resulting Newick trees are available under liberal open-source licenses in a GitHub repository (https://github.com/ddediu/lgfam‐newick), to encour- age and promote the use of phylogenetic methods to investigate linguistic diversity and its temporal dynamics.|000|Newick, phylogenetic tree, reference tree, gold standard, Glottolog, branch lengths 4103|Dediu2018|Interesting article in so far as it presents data and code to impute branch lengths on Glottolog trees. This means essentially that the data can now also be used for evaluation with phylogenetic algorithms in more detail, in so far as the branch lengths were so far always lacking.|000|branch lengths, phylogenetic tree, Glottolog, gold standard 4104|Seifart2008|This paper addresses a set of issues related to language documentation that are not often explicitly dealt with in academic publications, yet are highly important for the development and success of this new discipline. These issues include embedding language documentation in the socio-political context not only at the community level but also at the national level, the ethical and technical challenges of digital language archives, and the importance of regional and international cooperation among documentation activities. These issues play a major role in the initiative to set up a network of regional language archives in three South American countries, which this paper reports on. Local archives for data on endangered languages have recently been set up in Iquitos (Peru), Buenos Aires (Argentina), and in various locations in Brazil. An important feature of these is that they provide fast and secure access to linguistic and cultural data for local researchers and the language communities. They also make data safer by allowing for regular update procedures within the network.|000|language archive, dataset, language documentation, overview 4105|Wang2018a|In Mantauran (Rukai), noun+noun structures are not easy to identify as compounds or nominal juxtapositions. According to Zeitoun (2007), these two constructions share certain structural similarities. Linking elements such as the coordinator la ‘and’ usually do not intervene between the two nominal constituents. They exhibit free word order and bear independent lexical stresses. They differ in that the head of compounds cannot take the genitive pronoun and cannot be coordinated. A more refined investigation is required to validate Zeitoun’s (2007) analysis, however. Further pieces of morphosyntactic evidence are provided for distinguishing compounds from nominal juxtapositions. A true compound exhibits none of the following three properties: 1) internal genitive marking, 2) ellipsis and 3) (multiple) modification. By contrast, nominal juxtapositions generally are characterized by the three properties stated above. This paper also shows that Mantauran exhibits a specific pattern in its classification of compounds. More specifically, it exhibits the attributive and subordinate types (Bisetto & Scalise 2005), while the coordinate type is not observed; most of subordinate compounds express a theme-location relation.|000|Rukai, Formosan, Austronesian, noun-noun compounds, compounding, 4106|Wang2018a|What is interesting with this paper is that it describes a language where compounding really seems to reflect the interface between syntax and morphology, or morphosyntax. It really seems difficult to make a clear distinction between phrase and word, especially in compounding.|000|Rukai, Formosan, Austronesian, compounding, noun-noun compounds, morphosyntax 4107|Gaume2008|The node r can be linked to the node s by many short paths (there is a strong confluence going from r to s).|851|graph theory, definition, measure, terminology 4108|Gaume2008|Prox is a stochastic method to map the local and global structures of real-world complex networks, which are called small worlds. Prox transforms a graph into a Markov chain; the states of which are the nodes of the graph in question. Particles wander from one node to another within the graph by following the graph’s edges. It is the dynamics of the particles’ trajectories that map the structural properties of the graphs that are studied. Concrete examples are presented in a graph of synonyms to illustrate this approach.|000|synonym networks, Wiktionary, real world networks, small world network, automatic approach 4109|Grady2004|One of the most striking aspects of conceptual integration (or ‘blending’) is that it seeks to unify an extremely broad variety of conceptual phenomena – from the most startling feats of imagination and invention to the most mundane instances of conceptual composition – and treats them all as the products of a single cognitive process (or closely related suite of processes). This article focuses on metaphoric blends, and assesses properties that distinguish them from other blends, and lend them their particular quality. In particular, the paper examines the nature of metaphoric counterpart connections, and especially the ‘‘ready-made’’ connections, i.e., entrenched metaphoric correspon- dences between concepts, that provide the basis for the real-time construction of metaphoric blends. The paper argues that primary metaphors constitute a distinctive class of counterpart connections and they require an explanation not found among blending theory’s other technical apparatus. These patterns of metaphoric association cannot be explained by mechanisms such as analogy, nor by relations such as cause-effect or identity, which underlie other sorts of blends. Instead they derive from recurring correlations between particular types of mental experiences.|000|metaphor, cognition, conceptualization, 4110|Wang2018b|Diffusion of Tibeto-Burman populations across the Tibetan Plateau led to the largest human community in a high-altitude environment and has long been a focus of research on high-altitude adaptation, archeology, genetics, and linguistics. However, much uncertainty remains regarding the origin, diversification, and expansion of Tibeto-Burman populations. In this study, we analyzed a 7.0M bp region of 285 Y-chromosome sequences, including 81 newly reported ones, from male samples from Tibeto-Burman populations and other related Eastern Asian populations. We identified several paternal lineages specific to Tibeto-Burman populations, and most of these lineages emerged between 6000 and 2500 years ago. A phylogenetic tree and lineage dating both support the hypothesis that the establishment of Tibeto-Burman ancestral groups was triggered by Neolithic expansions from the middle Yellow River Basin and admixtures with local populations on the Tibetan Plateau who survived the Paleolithic Age. Furthermore, according to the geographical distributions of the haplogroups, we propose that there are two Neolithic expansion origins for all modern Tibeto-Burman populations. Our research provides a clear scenario about the sources, admixture process and later diffusion process of the ancestor population of all Tibeto-Burman populations.|000|Sino-Tibetan, Tibeto-Burman, Y-chromosome, population genetics, 4111|Winter2017a|Some spoken words are iconic, exhibiting a resemblance between form and meaning. We used native speaker ratings to assess the iconicity of 3001 English words, analyzing their iconicity in relation to part-of-speech differences and differences between the sensory domain they relate to (sight, sound, touch, taste and smell). First, we replicated previous findings showing that onomatopoeia and interjections were highest in iconicity, followed by verbs and adjectives, and then nouns and grammatical words. We further show that words with meanings related to the senses are more iconic than words with abstract mean- ings. Moreover, iconicity is not distributed equally across sensory modalities: Auditory and tactile words tend to be more iconic than words denoting concepts related to taste, smell and sight. Last, we examined the relationship between iconicity (resemblance between form and meaning) and systematicity (statistical regularity between form and meaning). We find that iconicity in English words is more strongly related to sensory meanings than systematicity. Altogether, our results shed light on the extent and distribution of iconicity in modern English.|000|iconicity, speech norms, dataset 4112|Winter2017a|What is interesting in this paper is that it explicitly discusses the problem of overlap in different norms. They present their own new norm dataset and also compare it with existing norms and their overlap.|000|speech norms, dataset, 4113|Yanowich2018|Lexical datasets used for computational phylogenetic inference suffer a unique type of data error. Some words actually present in a language may be absent from the dataset at no fault of its curators: especially for lesser-studied languages, a word may be miss- ing from all available sources such as dictionaries. It is thus important to be able to (i) check how robust one’s inferences are to dictionary omission errors, and (ii) incorpo- rate the knowledge that such errors may be present into one’s inference. I introduce two simple techniques that work towards those goals, and study the possible effects of dictionary omission errors in two real-life case studies on the Lezgian and Uralic datasets from Kassian (2015) and Syrjänen et al. (2013), respectively. The effects of dic- tionary omission turn out to be moderate (Lezgian) to negligible (Uralic), and certainly far less significant than the possible effects of modeling choices, including priors, on the inferred phylogeny, as demonstrated in the Uralic case study. Assessing the possible effects of dictionary omissions is advisable, but severe problems are unlikely. Collecting significantly larger lexical datasets, in order to overcome sensitivity to priors, is likely more important than expending resources on verifying data against dictionary omis- sions.|000|missing data, phylogenetic reconstruction, Bayesian approaches 4114|Yanowich2018|If I understand this properly, the main finding is that missing data (since one doesn't know one word in a dictionary) is less grave in linguistic approaches, as long as one has enough words to test for.|000|missing data, phylogenetic reconstruction, methodology 4115|Yenni2018|Data management and publication are core components of the research process. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. "Living data" present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a living data workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to: 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow uses two tools from software development, version control and continuous integration, to create a modern data management system that automates the pipeline.|000|data curation, workflow, biology, data managment, standardization 4116|Yenni2018|What is interesting about this paper is that it proposes standards similar to the ones we now try to establish within the LexiBank project.|000|LexiBank, data, standardization, workflow, data curation 4117|Berkemer2017|Where string grammars describe how to generate and parse strings, tree grammars describe how to generate and parse trees. We show how to extend generalized algebraic dynamic programming to tree grammars. The resulting dynamic programming algorithms are efficient and provide the complete feature set available to string grammars, including automatic generation of outside parsers and algebra products for efficient backtracking. The complete parsing infrastructure is available as an embedded domain-specific language in Haskell. In addition to the formal framework, we provide implementations for both tree alignment and tree editing. Both algorithms are in active use in, among others, the area of bioinformatics, where optimization problems on trees are of considerable practical importance. This framework and the accompanying algorithms provide a beneficial starting point for developing complex grammars with tree- and forest-based inputs.|000|tree alignment, sequence alignment, grammar, inside-outside algorithm, dynamic programming 4118|Berkemer2017|Important paper describing the concept of tree alignment. If understood properly, the idea of tree alignment might allow to score morphological structures more properly, since a handling of morphological structures in trees might allow us to compare to or more structures (compounds, for example) with each other and score their differences properly.|000|tree alignment, grammar, sequence alignment, dynamic programming, 4119|Bradley2018|Speaking and singing are two modes of the same system. These modes are subject to similar constraints, but have different goals. This study examined the acoustic vowel spaces, as defined by formant frequencies, used by singers when they were singing or speaking the same linguistic content. Overall, formant values decreased during singing compared to singing. This resulted in compression of the vowel space, with more overlapping vowel regions during singing. However, this was not consistent for all vowels and all singers. Differences between the modes are partially explained by known articulatory processes used during singing, such as larynx lowering. This may reflect the way that speakers balance communicative versus aesthetic concerns when articulating lyrics.|000|speech acoustics, song, human voice, vowel acoustics, singing, 4120|Baik2018|The present study aims to investigate whether there are any effects of conceptual distinctions on semantic memory retrieval, and if so, how different concepts play out in cued-recall. Semantic memory is one of the core features characterizing humans, and includes all acquired knowledge about the world. We conducted a semantic memory cued-recall study comparing action-associated and literal sentences with non-action and metaphoric ones. Here, we report that action-related sentences are better recalled than their non-action counterparts. This result is attributable to more sensory-motor activation of action-related utterances leading to a better maintenance in memory, which is in support of the Grounded Cognition (henceforth, GC) theory. In addition, we observed a literal sentence advantage during the same task, given that literal sentences are remembered to a greater extent than metaphoric sentences. This finding is also accounted for by the GC model in a way that the more concrete a concept is, the more activation in the sensory-motor cortex it will engage during comprehension, thereby inducing a more effective recall.|000|grounded theory of concepts, embodiment, empirical study, concreteness 4121|Chacon2017|This paper analyzes several linguistic traits that are eviden- ces for ancient and continuous contacts between Arawakan, Tukanoan and neighboring languages from the Northwest Amazon. It is shown that Arawakan-Tukanoan contacts have a long-term duration and since ancient times have been shaping the languages from both families in terms of direct and indirect diffusion processes, with an overall tendency for Arawakan dominance in the exchange of linguistic traits. Broader patterns of areal relationships are also explored, showing evidence for large-scale multilingual regional systems in pre-history, as well as suggesting that Arawakan and Tukanoan had the most intense and prolonged contact situation in the region. These results contribute to our overall understanding of the cultural history and the complex regional systems in Amazonia.|000|Tukano languages, Arawakan, dataset, phylogenetic reconstruction, lexical borrowing 4122|Denz2002|This dictionary (volume 1) contains an interesting explanation of the internal structure, which explains how things are interlinked and intended. This is important with respect to the general structuring of information, showing that people do indeed try to be systematic, but that they loose it by publishing a book on this, and not a database.|000|dictionary, Bavarian, lexicography, 4123|Ratliff2006|Linguists who attempt to reconstruct the histories of the Mon-Khmer, Tibeto-Burman, Hmong-Mien, and Tai-Kadai families, and even Chinese, have all had to face the problem of initial minor syllable ("prefix") variation. Reconstruction is especially difficult when there are different semantically empty prefixes floating around in a family, functioning more like initial syllable word formatives than meaningful prefixes in the usual sense, and when there is little consistency in the association of a particular prefix with a particular word or semantic class of words from language to language. The existence of these prefixes, and their variation, raises two questions for the historical linguist: (1) what difficulties do these prefixes represent for the successful application of the comparative method, and (1) what is recon­ structible of these prefixes themselves. This paper represents an attempt to answer these questions as they pertain to the nominal prefixes of Hmong- Mien.|000|prefix, prefix reconstruction, linguistic reconstruction, South-East Asian languages, Hmong-Mien, methodology 4124|Joseph2006|Paper deals (according to title) with the question of reconstructing variation back to the proto-languages. This is very interesting, as it is less often discussed, but methodologically a concrete possibility, even if one tries to avoid it when working on linguistics reconstructions.|000|methodology, Germanic, linguistic reconstruction, language variation, nature of the proto-language 4125|Xu2009|To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely under- represented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at ~160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intri- cately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F ST 1⁄4 0.0002 ~0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F ST > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psori- asis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10 101 ). These signals indicating significant differences among Han Chinese subpopulations should be care- fully explained in case they are also detected in association studies, especially when sample sources are diverse.|000|Chinese, Chinese dialects, population genetics, empirical study, 4126|Gorrie2014|Very weird dissertation, which makes new claims, but which are hard to grasp when reading quickly. It basically uses statistical analyses based on typological features to study among others the classification of Chinese dialects, but it is not clear, how these methods are actually used, and what they mean.|000|Chinese dialects, grammatical feature, typology, empirical study, methodology 4127|Stadler2016|Potentially interesting work on language change, as it tests simulation accounts, directionality, and also actuation problems. Good in order to get a quick overview on major issues.|000|language change, introduction, actuation problem, s-curve, lexical diffusion, simulation studies 4128|Zhang2006|Book on the phonology of the Shaoxing Chinese dialect of the Wú subgroup. It describes the pronunciation of "muddy" initials as breathy voiceless sounds, which is interesting in the context of the cross-linguistic transcription systems initiative.|000|Shaoxing dialect, Chinese dialects, phoneme inventory, Wú, breathy voice, phonology 4129|Ferlus2009|Determining the nature of the four Divisions of the Qièyùn is a fundamental problem in the study of the phonetic history of Chinese. Analyses by Baxter and Pulleyblank make it possible to bring out two major changes from Old Chinese to Middle Chinese: a two-way split of the vowel system, and later the lenition of medial -r-. The use of models drawn from Mon-Khmer voice type register languages made it possible to reconstruct the phonetic bases of the four divisions. Div. I consists of tense rimes, Div. II consists of tense, velarized rimes resulting from medial -r-, Div. III consists of lax/breathy rimes. As for Div. IV, it consists of (non velarized) rimes with the diphthong ie.|000|Middle Chinese, děng, four divisions, linguistic reconstruction, 4130|Wen2004|The spread of culture and language in human populations is explained by two alternative models: the demic diffusion model, which involves mass movement of people; and the cultural diffusion model, which refers to cultural impact between popu- lations and involves limited genetic exchange between them 1 . The mechanism of the peopling of Europe has long been debated, a key issue being whether the diffusion of agriculture and language from the Near East was concomitant with a large movement of farmers 1–3 . Here we show, by systematically analysing Y-chromo- some and mitochondrial DNA variation in Han populations, that the pattern of the southward expansion of Han culture is consistent with the demic diffusion model, and that males played a larger role than females in this expansion. The Han people, who all share the same culture and language, exceed 1.16 billion (2000 census), and are by far the largest ethnic group in the world. The expansion process of Han culture is thus of great interest to researchers in many fields.|000|Hàn population, Chinese, population genetics, demography, expansion, migration 4131|Wen2004|This paper is interesting for a graphic illustrating attested population movements in the history of the Hàn population (Chinese).|000|Hàn population, demography, migration, 4132|Davletshin2012|The Uto-Aztecan language family is one of the largest genetically related groups of the Americas, whose speakers inhabited a vast territory, extending from the state of Oregon to Panama. The paper is based on the observation that six Proto-Uto-Aztecan animal names re- ceived the augment *­yoː in Proto-Aztecan. This augment can be interpreted as a suffix of ab- stract possession which derives abstract nouns and indicates possession of the object or quality. Thus, Proto-Aztecan ‘coyote’ *koyoː- literaly means ‘one of the coyote’s, somewhat like the coyote’, ‘owl’ *tkoloː- ‘one of the owl’s, somewhat like the owl’, etc. This change in meaning implies that the Proto-Uto-Aztecan homeland must have been ecologically different from the place to which speakers of Proto-Aztecan later migrated.|000|dataset, Uto-Aztekan, word list, Swadesh list 4133|Springer2018|Short summary on problems of fair algorithms in machine learning. The basic point is: if you want the algorithms to be good in prediction, they cannot be fair, as they reflect what the society does, e.g., with a high number of people with not much money in prisons, etc. This reminds of cognate detection algorithms that were trained by human data, but also of translation algorithms. If they are trained by humans, they can only do as well as humans, not better, and this means that they will reflect the same problems that humans encounter when trying to translate, or to find cognates.|000|machine learning, fairness, philosophy of science, bias 4134|Denworth2018|So kontrovers Joels Studie auch diskutiert würde, die Quintessenz ihrer Aussage sei richtig, sagt dazu die Molekularbiologin Catherine Dulac von der Harvard University in Boston. Sie hatte nach Geschlechterunter­ schieden im Mäusegehirn gesucht und ist zu dem gleichen Ergebnis gekommen wie Joel: »Die Vielfältigkeit zwischen den Individuen ist enorm.«|47|human brain, cognition, sex differences, gender differences, biology, sex, gender, 4135|Saussure1911|On peut comparer un mot composé à une molécule construite au moyen de trois sortes d’atomes (radicaux, pré- fixes, suffixes) ; l’analyse et la synthèse logique des mots est alors comparable à l’étude d’une molécule dont les atomes sont connus, et le double problème que nous cherchons à résoudre peut s’énon- cer : «Trouver l’idée exprimée par une molécule donnée», ou réciproquement «construire la molécule représentant une idée donnée». :translation:`A compound word can be compared to a molecule built by means of three sorts of atoms (roots, prefixes, suffixes); the analy- sis of the logical synthesis of words is thus comparable to the study of a molecule of which the atoms are known, and the dou- ble problem which we are trying to solve can be formulated as “Find the idea that a given molecule expresses” or inversely “construct the molecule that represents a given idea.”`|5|nice quote, compounding, semantic molecules, 4136|Anderson2018a|What matters here is the fact that René de Saussure (1911) enunciates categorically the view that all morpholog- ical elements, roots and affixes alike, constitute parallel atomic sound-meaning pairings. In this regard, such elements are uniformly of the type Ferdinand de Saussure (1916 [1974]) would analyze as minimal signs: arbitrary, irreducible as- sociations of expression (sound, gesture, orthography) with content. As pointed out by Matthews (2001), the observation that such associations are a core charac- teristic of natural language was by no means completely original with Saussure, but his importance lies in having made them the center of attention in the study of language.|211|word formation, René de Saussure, compounding, linguistic sign 4137|Saussure1916|Une unité telle que désireux se compose en deux sous-unités (désir-eux), mais ce ne sont pas deux parties indépendantes ajoutées simplement l’une à l’autre (désir+eux). C’est un produit, une combinaison de deux éléments solidaires, qui n’ont de valeur que par leur action réciproque dans une unité supérieure (désir×eux). Le suffixe, pris isolément, est inexistant; ce qui lui confère sa place dans la langue, c’est une série de termes usuels tels que chaleur-eux, chanc-eux, etc. À son tour, le radical n’est pas autonome; il n’existe que par combinaison avec un suffixe; dans roul-is, l’élément roul-[pb]n’est rien sans le suffixe qui le suit. Le tout vaut par ses parties, les parties valent aussi en vertu de leur place dans le tout, et voilà pourquoi le rap- port syntagmatique de la partie au tout est aussi important que celui des parties entre elles. :comment:`Quoted after `@Anderson2018a :comment:`211f`|176f|word formation, compounding, Ferdinand de Saussure 4138|Anderson2018a|We can categorize the difference between the views of the two Saussure brothers, at least roughly, in terms of two useful dimensions of theories as distinguished by @Stump<2001> (2001: 1). On the first of these, theories can be lexical, and treat all form-content associations as listed; or they can be inferential, in treating form-content relations in complex words as more holistic.|212|word formation, compounding, introduction, definition 4139|Anderson2018a|**Lexical** theories are those where associations between (morphosyntactic) content and (phonological) form are listed in a lexicon. Each such association is discrete and local with respect to the rest of the lexicon, and constitutes a morpheme of the classical sort. **Inferential** theories treat the associations between a word’s morphosyntactic properties and its morphology as expressed by rules or formulas.|213|lexical word formation theory, inferential word formation theory, definition, introduction, word formation 4140|Stump2001|This book gives an interesting introduction into the different theories of morphology, be it that they are *lexical*, or *interferential*, as discussed by @Anderson2018a|000|morphology, inflection, lexical word formation theory, inferential word formation theory, introduction 4141|Wilson2018|Cartesian Genetic Programming (CGP) has previously shown ca- pabilities in image processing tasks by evolving programs with a function set specialized for computer vision. A similar approach can be applied to Atari playing. Programs are evolved using mixed type CGP with a function set suited for matrix operations, including image processing, but allowing for controller behavior to emerge. While the programs are relatively small, many controllers are com- petitive with state of the art methods for the Atari benchmark set and require less training time. By evaluating the programs of the best evolved individuals, simple but effective strategies can be found.|000|evolutionary programming, algorithms, computer science 4142|Greed2018|Evidentiality is a widely researched category in contemporary linguistics, both from the viewpoint of grammatical expression and also that of semantics/pragmatics. Amongst markers expressing information source is the illocutionary evidential quotative, which codes a speech report with an explicit reference to the quoted source. This article investigates the quotative particle tip in Bashkir, a Kipchak-Bulgar Turkic language spoken in the Russian Federation. In its default quota- tive meaning, tip signals direct speech and functions as a syntactic complementiser. This function was found to have extended from spoken utterances to coding thoughts and experiences in the context of semi-direct speech. A separate function of tip is its use as an adverbialiser signalling a logical relation and conveying the meaning of intention/purpose.|000|evidentiality, quotative, teatative 4143|Barnes1939|To many students, a thing is known and understood as soon as a label has been attached to it. There is little wonder that biology is so subject to the reproach that it is nothing but a jumble of difficult words.|478|problem, terminology, biology, nice quote 4144|Pranav2018|Ranking functions in information retrieval are often used in search engines to recom- mend the relevant answers to the query. This paper makes use of this notion of information retrieval and applies onto the problem domain of cognate detection. The main contributions of this paper are: (1) positional segmentation, which incorpo- rates the sequential notion; (2) graphical error modelling, which deduces the trans- formations. The current research work focuses on classification problem; which is distinguishing whether a pair of words are cognates. This paper focuses on a harder problem, whether we could predict a possible cognate from the given input. Our study shows that when language mod- elling smoothing methods are applied as the retrieval functions and used in con- junction with positional segmentation and error modelling gives better results than competing baselines, in both classification and prediction of cognates.|000|phonetic alignment, algorithms, sequence alignment, sequence comparison, cognate detection 4145|Pranav2018|Our main contribution is to design an informa- tion retrieval based scoring function (see section 4) which can capture the complex morphological [pb] shifts between the cognates. We tackled this by proposing a shingling (chunking) scheme which incorporates positional information (see section 2) and a graph-based error modelling scheme to un- derstand the transformations (see section 3). Our test harness focuses not only on distinguishing be- tween a pair of cognates, but also the ability to pre- dict the cognate for a target language (see section 5).|1f|cognate detection, morphological change, algorithms 4146|Pranav2018|Algorithm only performs pairwise cognate detection using the test set by @Ciobanu2014. It is thus not particularly interesting, also since it works on orthography.|000|pairwise alignment, cognate detection, algorithms 4147|Behr2018|Text gives a brief biography of Gerhard Schmitt (1933–2017), a sinologist.|000|biography, Gerhard Schmitt, 4148|Morrison2018|Multiple sequence alignment is a basic procedure in molecular biology. The goal is often stated to be to juxtapose nucleotides (or their derivatives, such as amino acids) that have been inherited from a common ancestral nucleotide (although other goals are also possible). However, this is not an operational definition, because homology (in this sense) refers to unique and unobservable historical events, and so there can be no objective mathematical function to optimize. Consequently, almost all algorithms developed for multiple sequence alignment are based on optimizing some sort of compositional similarity (similarity = homology + analogy) . As a result, many, if not most, practitioners either manually modify computer-produced alignments or they perform de novo manual alignment, especially in the field of phylogenetics. So, if homology is the goal, then multiple sequence alignment is not yet a solved computational problem. In this review, I summarize the criteria that have been developed by biologists to help them identify potential homologies (compositional, ontogenetic, topographical and functional similarity, plus conjunction and congruence), and explain how these criteria can be applied to molecular data. Current computer programs do implement one (or occasionally two) of these criteria, but no program implements them all. What is needed is a program that evaluates all of the evidence for the sequence homologies, optimizes their combination, and thus produces the best hypotheses of homology. This is basically an inference problem not an optimization problem.|000|multiple sequence alignment, theory of alignment, methodology, sequence comparison 4149|Roux2018|This document presents our research on the the correct formation of a Classical Tibetan syllable. It was triggered by attempts at defining the boundaries of well-formed syllables in Classical Tibetan for spell checking purposes. Formalizing the formation of the syllable led us to inspect the small differences among grammar books, both in Western and Tibetan language. We then checked these differences against the Tibetan dictionaries we consider reliable, and also against the Kangyur. Our inquiry finally led us to study the way to decompose a syllable, discussing the ambiguous cases, as well as the formation of the Dzongkha syllable.|000|spelling correction, spell-checking, Tibetan, 4150|Ran2018|To determine whether a linguistic variety is a language or a dialect is a complex issue involving many aspects of both linguistic and sociolinguistic factors. In this study, we aim to provide a more objective basis for the distinction between languages and dialects. Quantitative approaches are used to explore the boundary between languages and dialects in terms of the degree of difference of language per se. For this purpose, the doculects in ASJP database are taken as the object and normalized Leven- shtein distances are computed between language variants. Based on the fi ndings obtained from computation, we propose that the value of LDN=0.48 can be used as the cut-off value to distinguish languages and dialects. In addition to what are described in this study, Ethnologue’s overdifferentiation and underdifferentiation between languages and dialects are discussed in this paper.|000|dialects, languages, dialectology, algorithms, ASJP 4151|Grollmann2018|The Nachiring language belongs to the Kiranti branch of the Trans-Himalayan language family (a.k.a Tibeto-Burman or Sino-Tibetan) and is spoken in the Himalayan foothills of eastern Nepal. Within the Kiranti branch, Nachiring has been classified as belonging to the Khambu unit of the Central Kiranti subgroup, but no linguistic fieldwork has been undertaken so far and the language remains undocumented and undescribed. The present paper constitutes a first sociolingusitic survey of the Nachiring language, based on an initial field trip, and presents updates on the number of speakers, location, language usage and attitude, as well as a first linguistic inspection of the relationship between Nachiring and the closely related Kulung language. Nachiring is a highly endangered language and thus in severe need of linguistic documentation.|000|Kiranti, Nachiring, description, field work, Sino-Tibetan 4152|Urban2018|Die ESA-Sonde Mars Express hat die südliche Polarkappe des Mars unter anderem im Jahr 2015 fotografiert. Im Winter ist sie deutlich größer. Unter dem Eis vermuten italienische Forscher nun einen 20 Kilometer großen See.|000|Mars, proof, indirect proof, evidence, physics, astronomy 4153|Urban2018|What is interesting about the way the researchers construct their proof is that their proof of finding whater might also include different explanations, alternative explanations, but the scholars are very sure that it is not, based on how the different pieces of evidence that show in the same direction. This is consilience par excellence.|000|consilience, proof, astronomy, examples 4154|Schrader2018|Seit den 1970er Jahren rüttelt Niède Guidon im Nordosten Brasiliens am Lehrgebäude der Anthropologie. Mit ihren Funden im Nationalpark Serra da Capivara stützt sie ihre These, die ersten Siedler seien schon vor 100 000 Jahren aus Afrika gekommen, nicht erst vor 13 000 aus Sibirien. Die 1933 geborene Grande Dame der Archäologie spricht in ihrem Haus in São Raimundo Nonato über Felsmalereien, versteinerte Fäkalien, korrupte Politiker und einen Flughafen ohne Flugzeuge.|000|Out-of-Africa, peopling of South America, archaeology, interview, 4155|Schrader2018|Interesting interview with archaeologist Niède Guidon on the peopling of South America. Her findings seem to provide evidence against the theory that first settlers came from Asia about 13000 years ago. |000|archaeology, peopling of South America, discussion, Niède Guidon 4156|Zaslavsky2018|We derive a principled information-theoretic account of cross- language semantic variation. Specifically, we argue that lan- guages efficiently compress ideas into words by optimizing the information bottleneck (IB) trade-off between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming and show that (i) color-naming systems across languages achieve near-optimal compression; (ii) small changes in a single trade-off parameter account to a large extent for observed cross-language variation; (iii) efficient IB color-naming systems exhibit soft rather than hard category boundaries and often leave large regions of color space inconsistently named, both of which phenomena are found empirically; and (iv) these IB systems evolve through a sequence of structural phase transi- tions, in a single process that captures key ideas associated with different accounts of color category evolution. These results sug- gest that a drive for information-theoretic efficiency may shape color-naming systems across languages. This principle is not spe- cific to color, and so it may also apply to cross-language variation in other semantic domains.|000|color terms, color categories, empirical study, simulation studies, cross-linguistic study, World Color Survey 4157|BarHillel1957|The terminology, in which this insight will be formulated here, is in part already quite customary among psychologists, with the remainder coined in the investigation of Professor Rudolf Carnap on the methodological character of theoretical concepts,[@Carnap1956] in which this insight has found its concise formulation. Let me give a rough outline of the main ideas of this investigation, insofar as they are of relevance to our present problem. Many methodologists of science, though not all, distinguish between two parts in the language of science, the observational part, on the one hand, and the lheorelical part, on the other.|329|terminology, descriptive term, methodology 4158|BarHillel1957|Since the terms of the observational sublanguage are ordinary words and phrases (say, of English) or their one-to-one symbolic counterparts, and their combination into sentences follows the rules of ordinary syntax (or, again, their simple symbolic counter- parts), no problems arise as to the interpretation of the sentences of this sublanguage. The situation is different with regard to the theoretical sublan- guage. Unfortunately, it is impossible, without presupposing a considerable amount of knowledge in modern logic, to describe in detail the logical structure of this sublanguage. A certain loss of preciseness in the following discussion is the inevitable result. It is hoped, however, that this loss will not seriously impair the value of this discussion.|330|descriptive term, terminology, descriptive terminology, methodology 4159|Carnap1956|In discussions on the methodology of science, it is customary and use­ ful to divide the language of science into two parts, the observation language and the theoretical language. The observation language uses terms designating observable properties and relations for the descrip­ tion of observable things or events. The theoretical language, on the other hand, contains terms which may refer to unobservable events, un­ observable aspects or features of events, e.g., to micro-particles like electrons or atoms, to the electromagnetic field or the gravitational field in physics, to drives and potentials of various kinds in psychology, etc.|000|Rudolf Carnap, descriptive term, terminology, explanation, observation, methodology 4160|McInerney2017|The existence of large amounts of within-species genome content variability is puzzling. Population genetics tells us that fit- ness effects of new variants—either deleterious, neutral or advantageous—combined with the long-term effective population size of the species determines the likelihood of a new variant being removed, spreading to fixation or remaining polymorphic. Consequently, we expect that selection and drift will reduce genetic variation, which makes large amounts of gene content vari- ation in some species so puzzling. Here, we amalgamate population genetic theory with models of horizontal gene transfer and assert that pangenomes most easily arise in organisms with large long-term effective population sizes, as a consequence of acquiring advantageous genes, and that the focal species has the ability to migrate to new niches. Therefore, we suggest that pangenomes are the result of adaptive, not neutral, evolution.|000|pangenome, bacterial evolution, terminology, introduction 4161|Martin2018|Sceptics such as myself contend that most claims for eukaryote LGT are more easily explained as bacterial contaminations, misinterpretations, data analysis artefacts, differential loss, or combinations thereof. The most serious cause for scepticism about eukaryote LGT is that it produces no detectable cumulative effects. Even if LGT to eukaryotes was occurring in such a way as to be neutral rather than adaptive, LGT would still produce a pangenome structure to eukaryote species and populations.[@McInerney2017]|000|eukaryotes, bad data, interpretation, bacterial evolution 4162|Dyen1990|Different sets of cognates distributed over exactly the same set of languages are said to be homomerous and to be homomerously distributed. All of a collection of interhomomerous cognate sets constitute a homomery.|212|homomerous cognate sets, cognate sets, definition, terminology, shared innovation 4163|Dyen1990|The term cognate properly used is applied to a set of features that are individually direct continuations — regardless of alterations — in two ore more related languages of a feature of a common proto-language. It is, however, convenient to apply this term to words of sets that satisfy the conditions indicated above well enough to establish a reasonable likelihood that they are cognate.|214|definition, cognate set, cognacy, terminology 4164|Watkins1990|In the case of German Messer 'knife', the charm lies in the phonological and semantic attrition of a close compound. We have the clear and semantically transparent living compounds Old High German mezzi-sahs and Old English mete-seax, both 'food-knife, "meat"-knife' But the more common Old High German form is mezzi-rahs, with rhotacism rendering the second member sahs to- tally opaque. From mezzirahs comes Middle High German mez- zeres, modern Messer. (English hussy 'house-wife' is comparable both in typology and charm.)|296|Old High German, compounding, etymology, opacity 4165|Hoenigswald1990b|One is the Verner paper of 1876. It is certainly "comparative" The comparanda under observation in- clude such pairs as Vedic Sanskrit/Germanic (1) /'t/ /£>/ and (2) /°t/ /d/ (as in the words for 'brother' and 'father', respectively), but the more crucial discovery lies in the fact that (1) and (2) are also found to alternate, for instance in the singular as against the plural of the "perfect/preterit" (the further fact being that Germanic /d/ occurs in yet two more correspondences, (3) /'dh/ /d/ and (4) /°dh/ /d/). By using notations like /'t/ ('accent on vowel + [t]') or /°dh/ ('no accent on vowel + [dh]') we have here adopted an entirely unusual ad hoc segmentation on which to base phonological entities; it is more customary to say, separately, that as we go (synchroni- cally) from singular to plural, a) the Vedic accent shifts, and b) the Germanic /]?/ shifts to /d/ (while the Germanic /d/ does not shift).|379|Verner's Law, multi-tiers, linguistic reconstruction, methodology 4166|Ogrady1990|:comment:`Article uses alignments to illustrate sound correspondences.`|451f|sequence alignment, examples, phonetic alignment, multiple sequence alignment, nice quote 4167|Hoenigswald1973|Sound change, as classically though of, is stated in a characteristic form, namely in terms of replacement, for instance: IE d > E t; IE r > E r, IE st > E st, or alternatively, IE s > s /-t; IE t > t /s- (read: IE s before t, yields or remains s; IE t after s, yields or remains t); [...] More generally, I a/1 > II m/101, where I = older stage, II = later stage, a = an element of the older stange, m = an element of the later stage; 1, 2 = environments stated in terms of elements of the older stage (a, b, c, ...); 101, 102 = environments stated in terms of elements of the later stage (m, n, o, ...) in a certain order; in addition to the position '-' with regard to which the environment occurs. |p 1|sound change, sound change rules, linguistic reconstruction, methodology 4168|Hoenigswald1973|In the interest of maintaining continuity with the work of the past which it is our aim to comment upon it will be well to adhere to two requirements: a weaker one, that discourses judged by speakers as different not be represented as being alike; and a stronger, homonymy-serving one, that discourses judged identical be represented in one and the same way. |p 1|distinguishability, linguistic reconstruction, homophony, methodology 4169|Hoenigswald1973|Since it is true that no two phonemic entities should normally have identical ranges, inasmuch as they would then be totally interchangeable and hence reduced to "free variants", (any more than two entities can have mutually exclusive ranges and not be potential positional variants), a given phonemic entity may be regarded as characterized by the range of its environments. In English, t is that phoneme the frames for which include #s- Vowel and #s-r but not #s-l. |2|context, environment, phonology, linguistic reconstruction, multi-tiers 4170|Hoenigswald1973|It is generally held that any two languages are translatable into each other. Presumably there exists a theory of translation. Such a theory may be expected to have something to say about formal relationships between discourses (and their constituents) and their translations in the other language. We can here only operate with the notion that for any meaningful stringin one language there are acceptable translations in the form of meaningful strings, in the other.|3|translatability, translation, theory of translation, linguistic reconstruction, 4171|Hoenigswald1973|In addition to their defining relationship, texts and teir translations may exhibit certain regularities. Some of these interest the typologist and the student of potential language universals. [... :comment:`mentions also semantic regularities...`] but then, another variety of regularity stands out: the recurrence of phonological correspondences [pb] among morphs that are translations of each other. :comment:`Note that he mentions explicitly that we are dealing with the same meaning.` In those extensive sectors of language where the arbitrariness of the linguistic sign is unimpaired -- that is, outside, say, of onomatopoea and of parts of sentence intonation - such regularities are to be understood as producs of specific history. :comment:`Includes borrowing relations in the remainder.`|2f|language typology, language history, similarity, parallel development, linguistic reconstruction, sound change, lexical borrowing 4172|Hoenigswald1973|It is, however, possible to distinguish between the following two kinds of relationshp: one, in which one language is either a source of borrowing on the part of the other or an older stage of the other; and a second kind, in which the two languages are borrowers from, or descendants of, a third language. It is the 'comparative' method, applied to correspondences, which decides this. If 'comparative' reconstruction from the two, or from particular correspondences within the two, yields a source identical with either, then the language with which the reconstruction is identical is the source, that is, either the ancestor, or the model of the borrowing, as the case may be. If, on th eother hand, the reconstruction differs from both languages, each proceeds separately from the reconstructed language.|3|borrowing, sound change, sound correspondences, borrowing detection, comparative method, methodology 4173|Hoenigswald1973|As the problem is really one of ordering alleged change events, a clearer way of expressing it is to ask whether those two mergers must remain unordered or whether it is possible to show that either of two orderings is preferable -- always assuming that each merger process is indeed a well-defined whole. |11|sound change, relative chronology, linguistic reconstruction, 4174|Lake2018|Interesting paper containing a rather impressive graphic of circular processes. |000|nice figure, nice illustration, reticulate evolution, tree of life 4175|Maddison1997a|Lineage sorting could also be called deep coalescence, the failure of ancestral copies to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events.|523|deep coalescence, incomplete lineage sorting, terminology, definition, introduction 4176|Hsu2013|Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evol- ution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we con- structed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A data- base and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA net- works are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature’s blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.|000|protein domain architecture, domains, introduction 4177|Hsu2013|The term "protein domain architecture" is important with respect to "compounding", as we could also say that we have some underlying architecture of compoundhood phenomena in the languages of the world.|000|biological parallels, compoundhood, protein domain architecture 4178|Degnan2018|Simultaneously modeling hybridization and the multispecies coalescent is becoming increasingly common, and inference of species networks in this context is now implemented in several software packages. This article addresses some of the conceptual issues and decisions to be made in this modeling, including whether or not to use branch lengths and issues with model identifiability. This article is based on a talk given at a Spotlight Session at Evolution 2017 meeting in Portland, Oregon. This session included several talks about modeling hybridization and gene flow in the presence of incomplete lineage sorting. Other talks given at this meeting are also included in this special issue of Systematic Biology .[|000|phylogenetic network, reticulation processes, introduction, overview 4179|Degnan2018|What is interesting is here a table showing an overview of the processes that yield hybridization and how they surfave. This table is inspiring for linguistic approaches: we could make it for other aspects and also get a clearer impression as to what we want to model.|000|biological parallels, hybridization, lateral gene transfer, 4180|Cousijn2018|This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the “life of a paper” workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.|000|dataset, citation, scientific practice, data curation 4181|Burigo2018|Spatial descriptions such as “The spider is behind the bee” inform the listener about the location of the spider (the located object) in relation to an object whose location is known (i.e., the bee, also called the reference object). If the geometric properties of the reference object have been shown to affect how people use and understand spatial language (Carlson & Van Deman, 2008; Carlson-Radvansky & Irwin, 1994), the geometric features carried by the located object have been deemed irrelevant for spatial language (Landau, 1996; Talmy, 1983). This view on the (ir)relevance of the located object has been recently questioned by works showing that presenting the located object in misalignment with the reference object has consequences for spatial language understanding (Burigo, Coventry, Cangelosi, & Lynott, 2016; Burigo & Sacchi, 2013). In the reported study we aimed to investigate which geometric properties of the located object affect the apprehension of a spatial description, and to disentangle whether the information|000|spatial language, spatial navigation systems, language use, 4182|Dellert2018|Methods for automated cognate detection in historical linguistics invariably build on some mea- sure of form similarity which is designed to capture the remaining systematic similarities between cognate word forms after thousands of years of divergence. A wide range of clustering and clas- sification algorithms has been explored for the purpose, whereas possible improvements on the level of pairwise form similarity measures have not been the main focus of research. The ap- proach presented in this paper improves on this core component of cognate detection systems by a novel combination of information weighting, a technique for putting less weight on reoccur- ring morphological material, with sound correspondence modeling by means of pointwise mu- tual information. In evaluations on expert cognacy judgments over a subset of the IPA-encoded NorthEuraLex database, the combination of both techniques is shown to lead to considerable improvements in average precision for binary cognate detection, and modest improvements for distance-based cognate clustering.|000|cognate detection, algorithms, information-based modeling, phonetic alignment 4183|Ciobanu2018|Language change across space and time is one of the main concerns in historical linguistics. In this paper, we develop a language evolution simulator: a web-based tool for word form produc- tion to assist in historical linguistics, in studying the evolution of the languages. Given a word in a source language, the system automatically predicts how the word evolves in a target language. The method that we propose is language-agnostic and does not use any external knowledge, except for the training word pairs.|000|simulation studies, Romanian, Romance, simulation, sound change 4184|Ciobanu2018|Article only employs pairwise sound change models for prediction, based on orthographies, so not entirely interesting for any language-independent approach, as they need larger test sets and specific pairwise cognates.|000|simulation studies, sound change, Romance, Romanian 4185|Elimam2017|Words can be matched with the concept of sign (correspondence of a signifier to a signified) as long as they act as symbol-words endowed with some semantic self-sufficiency. But in discourse, they lose their wholeness as symbol-words and metamorphose into w o r d i n g - s y m b o l s . They, suddenly, appear as mere signifier entities with a more or less loose allusion to their status as cultural symbols. In discourse, words are no longer signs but t o o l s covering ephemeral collections of n e u r o s e m e s : the link of the sign breaks as soon as discourse takes over. The referential potential is no longer the schematic meaning issued from culture, but the universe of discourse under construction. This is why any attempt to account for meaning in language must integrate the neural process of meaning creation. It is now established that meaning is not the result of language activity but the result of cognition. However, what l a n g ua g e does, via discourse, is to make this meaning communicable. For all these reasons, the task of linguistics should be to investigate the relationship between cognition and linguistic output in order to shed light on all the cognitive traces left within the surface strings. The role of morphosyntax thus has to be re-evaluated in this light.|000|morphosyntax, language faculty, linguistic sign, discourse, opinion paper 4186|Hobolth2007|The genealogical relationship of human, chimpanzee, and gorilla varies along the genome. We develop a hidden Markov model (HMM) that incorporates this variation and relate the model parameters to population genetics quantities such as speciation times and ancestral population sizes. Our HMM is an analytically tractable approximation to the coalescent process with recombination, and in simulations we see no apparent bias in the HMM estimates. We apply the HMM to four autosomal contiguous human–chimp–gorilla–orangutan alignments comprising a total of 1.9 million base pairs. We find a very recent speciation time of human–chimp (4.1 6 0.4 million years), and fairly large ancestral effective population sizes (65,000 6 30,000 for the human–chimp ancestor and 45,000 6 10,000 for the human–chimp–gorilla ancestor). Furthermore, around 50% of the human genome coalesces with chimpanzee after speciation with gorilla. We also consider 250,000 base pairs of X-chromosome alignments and find an effective population size much smaller than 75% of the autosomal effective population sizes. Finally, we find that the rate of transitions between different genealogies correlates well with the region-wide present-day human recombination rate, but does not correlate with the fine-scale recombination rates and recombination hot spots, suggesting that the latter are evolutionarily transient.|000|coalescent theory, hidden markov models, population genetics, chimpanzees, human, gorilla, algorithms 4187|Hockett1958|:comment:`Discusses correspondence patterns for multiple languages.`|487-489|comparative method, sound correspondences, correspondence patterns, methodology 4188|Huber2016|Phylogenetic networks are rooted, labelled directed acyclic graphs which are commonly used to represent reticulate evolution. There is a close relation- ship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be “unfolded” to obtain a MUL-tree U (N ) and, conversely, a MUL-tree T can in certain circumstances be “folded” to obtain a phylogenetic network F(T ) that exhibits T . In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable net- works, phylogenetic networks N for which F(U (N )) is isomorphic to N , characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U (N ). To do this, we develop a phylogenetic analogue of graph fibrations. This allows us to view U (N ) as the ana- logue of the universal cover of a digraph, and to establish a close connection between displaying trees in U (N ) and reconciling phylogenetic trees with networks.|000|phylogenetic network, multi-labelled tree, phylogenetic tree, methodology, modeling 4189|Huber2006|It is now quite well accepted that the evolutionary past of certain species is bet- ter represented by phylogenetic networks as opposed to trees. For example, polyploids are typically thought to have resulted through hybridization and duplication, processes that are probably not best represented as bifurcating speciation events. Based on the knowledge of a multi-labelled tree relating collection of polyploids, we present a canonical construction of a phylogenetic network that exhibits the tree. In addition, we prove that the resulting network is in some well-defined sense a minimal network having this property.|000|multi-labelled tree, phylogenetic network, methodology 4190|Iersel2018|Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the “hardwired” and “softwired” parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.|000|phylogenetic network, maximum parsimony, modeling, methodology 4191|Kehoe2017|This study examines the acquisition of /r/ in German and Spanish monolingual and bilingual children. German and Spanish are characterized by different /r/s. German has a uvular approximant whereas Spanish has an alveolar tap and trill. Words containing /r/ were extracted from longitudinal recordings of the children, aged 1;9 to 3;6. Results indicate that monolingual German children acquired uvular /r/ earlier than monolingual Spanish children acquired the tap and trill. The bilingual children acquired uvular /r/ similarly to the monolingual children or, in the case of /r/ clusters, they were mildly delayed. They were advanced in the acquisition of alveolar tap and they produced more /r/-like errors for the trill. Transfer patterns were observed in one child but they could not be explained by markedness or language dominance. Findings are consistent with cross-linguistic interaction in the acquisition of /r/, in which the phonological systems of the bilinguals approximate each other.|000|perception, production, age of acquisition, rhotic sounds, Spanish, German 4192|Zhao2018|This study adopted a corpus-based approach to examine the synaes- thetic metaphors of gustatory adjectives in Mandarin. Based on the distribution of synaesthetic uses in the corpus, we found that: (1) the synaesthetic metaphors of Mandarin gustatory adjectives exhibited directionality; (2) the directionality of Mandarin synaesthetic gustatory adjectives showed both commonality and spe- cificity when compared with the attested directionality of gustatory adjectives in English, which calls for a closer re-examination of the claim of cross-lingual universality of synaesthetic tendencies; and (3) the distribution and direction- ality of Mandarin synaesthetic gustatory adjectives could not be predicted by a single hypothesis, such as the embodiment-driven approach or the biological association-driven approach. Thus, linguistic synaesthesia was constrained by both the embodiment principle and the biological association mechanism.|000|Chinese, synaesthesia, adjectives, Mandarin, Chinese 4193|Sayyari2017|Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy|000|gene tree reconciliation, missing data, phylogenetic reconstruction 4194|LLeo2018|This study investigates the acquisition of grammatical gender in French by German L1 children (age of onset of acquisition (AO) 2;8-4,0). The analysis of spontaneous production data of 24 children gathered longitudinally and a gender assignment test administered to 8 of these children at ages 6;7-8;3 and to 9 children (AO 2,11-3;8) at ages 3;2-5;1 revealed that some of them resembled L1 learners whereas others behaved like adult L2 learners. The turning point is at around AO 3;6. AO is thus a crucial factor determining successive language acquisition.|000|second language learning, second language acquisition, gender systems, 4195|Meisel2018|The present study analyzes percentages of target-like production of Spanish spirantization and assimilation of coda nasals place of articulation, in three groups of bilingual children simultaneously acquiring German and Spanish: two very young groups, one living in Germany and another one in Spain, and a group of 7-year-old bilinguals from Germany. There were monolingual Spanish and monolingual German control groups. The comparison between groups shows that the Spanish of bilinguals is different from that of monolinguals; and the Spanish of bilinguals in Germany is different from that of bilinguals in Spain. Results lead to the conclusion that the Spanish competence of the bilinguals from Germany is still incomplete, and influenced by transfer of the majority language (German). Only bilingual children living in Germany show influence of the majority language onto the heritage language, whereas transfer does not operate on the Spanish competence of the bilingual children from Spain.|000|bilingualism, heritage language, Spanish, incomplete transition, tran 4196|Oldman2017|Phylogenetic networks are a generalization of evolutionary trees that can be used to represent reticulate processes such as hybridization and recombination. Here, we introduce a new approach called TriLoNet (Trinet Level- one Network algorithm) to construct such networks directly from sequence alignments which works by piecing together smaller phylogenetic networks. More specifically, using a bottom up approach similar to Neighbor-Joining, TriLoNet constructs level-1 networks (networks that are somewhat more general than trees) from smaller level-1 networks on three taxa. In simulations, we show that TriLoNet compares well with Lev1athan, a method for reconstructing level-1 networks from three-leaved trees. In particular, in simulations we find that Lev1athan tends to generate networks that overestimate the number of reticulate events as compared with those generated by TriLoNet. We also illustrate TriLoNet’s applicability using simulated and real sequence data involving recombination, demonstrating that it has the potential to reconstruct informative reticulate evolutionary hist|000|software, phylogenetic network, triples, 4197|SolisLemus2017|PhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documen- tation at https://github.com/crsl4/PhyloNetworks.jl|000|software, phylogenetic network, Julia, 4198|Levickij2009|The semantics of nouns, adjectives and verbs as components of compound words was stud- ied; also the peculiarities of their combination in the models N+N, A+N, V+N were analyzed. The main characteristics of the combinatory were established as combinatory range, strength (intensity) of a tie, and the number of significant ties.|000|compounding, compositionality, semantics, nouns, German, 4199|Pardi2015|Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference meth- ods such networks are indistinguishable. This is true for all methods that evaluate a phylo- genetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisa- tions of maximum parsimony and maximum likelihood for networks. This identifiability prob- lem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only at- tempt to reconstruct what they can uniquely identify. To this end, we introduce a novel defi- nition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolu- tion, only the canonical form of the underlying phylogenetic network can be uniquely recon- structed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limita- tion that will require an important change of perspective when interpreting phylogenetic networks.|000|distinguishability, phylogenetic network, methodology, phylogenetic reconstruction 4200|Zhou2015|We investigated cognitive and metalinguistic correlates of Chinese word reading in children with L2 Chinese learning experience and compared these to those in L1 Chinese speaking children. In total, 102 third and fourth grade children were recruited for the study. We examined a range of Chinese and English word reading related cognitive and metalinguistic skills. Compared to the native Chinese speaking group (NCSS), the non-native Chinese speaking group (NNCS) only performed better in English vocabulary knowledge and English working memory. On Chinese word reading related skills the NNCS group performed significantly worse than the NCS group. Hierarchical regression analyses revealed that the unique correlates of Chinese word reading for both groups were Chinese vocabulary, working memory, lexical tone awareness, and orthographic skills. For the NNCS group only, visual skills were also unique correlates of word reading skills. The results suggest cognitive similarities and differences in reading among native and non-native Chinese speakers.|000|reading, Chinese, cognition, bilingualism, second language learning 4201|Wang2018|We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/.|000|prediction, protein structure, protein functions, software, protein domain architecture, 4202|Wang2018|What is interesting is that scholars use sequence similarity to try and predict the functions of proteins in biology. This would not work in linguistics, due to the arbitrariness of the linguistic sign. |000|arbitrariness, biological parallels, predictability, prediction, retrodiction 4203|Zheng2018|Background: We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Divergence time, population size, time of gene flow, distance of outgroup and number of loci were examined in a sensitivity analysis. Result: The sensitivity study shows that the primary determinant of the D-statistic is the relative population size, i.e. the population size scaled by the number of generations since divergence. This is consistent with the fact that the main confounding factor in gene flow detection is incomplete lineage sorting by diluting the signal. The sensitivity of the D-statistic is also affected by the direction of gene flow, size and number of loci. In addition, we examined the ability of the f-statistics, ^ f G and ^ f hom , to estimate the fraction of a genome affected by gene flow; while these statistics are difficult to implement to practical questions in biology due to lack of knowledge of when the gene flow happened, they can be used to compare datasets with identical or similar demographic background. Conclusions: The D-statistic, as a method to detect gene flow, is robust against a wide range of genetic distances (divergence times) but it is sensitive to population size. The D-statistic should only be applied with critical reservation to taxa where population sizes are large relative to branch lengths in generations.|000|gene flow, D-statistic, population genetics, population size, simulation studies 4204|Zheng2018|Question that we should ask ourselves: can the idea of "gene flow" find a counterpart in linguistics? I.e., by assuming that words can easily jump boundaries? Or the mutual intelligibility of dialects?|000|mutual intelligibility, biological parallels, gene flow, borrowability, 4205|VanderBeeken2017|Despite an increase in bilingualism and the use of English as a medium of instruction, little research has been done on bilingual memory for learnt information. In a previous study, we found an L2 recall cost but equal recognition performance in L2 versus L1 when students studied short expository texts (Vander Beken & Brysbaert, 2017). In this paper, we investigate whether there is a recognition cost after a longer delay, which would indicate that the memory trace is weaker in L2. Results showed equal performance in L1 and L2, suggesting that the recall cost is either located at the production level, or that the levels-of-processing effect is mediated by language, with unaffected surface encoding leading to effective MARGINAL KNOWLEDGE on the one hand, and hampered deep encoding leading to ineffective (uncued) recall. This paper also contains the Dutch vocabulary test we used for native speakers.|000|reading, second language learning, memory, psycholinguistics 4206|Ullrich2017|Various complaints about the consistent use of a non-epistemological ‘norm of progress’ (also known as ‘Scala Naturae’) can be found frequently in recent evolution of language and communication litera- ture. Affiliated to earlier studies that addressed quantification of some overt indicators such as word combinations of ‘high þ species’, the current account aims to go beyond the obvious in describing the presumed phenomena. Using a mixed-methodology approach, we quantify the general use of vo- cabulary, range of study species, amount of ‘progressionist attributes’ and subsequently qualify the context of some key words. Investigating 915 peer-reviewed articles from a species-comparative evo- lution of language and communication discourse, we found that articles focussing on species groups historically regarded as ‘high’ make more use of attributes implying directed progress than otherwise. We subdivided all articles in two distinct corpora. Articles using the term ‘language’ or ‘speech’ in title, abstract or keywords were labelled ‘language’. Those using other terms than language were labelled ‘communication’. We could identify a more diverse focus on studied species groups and a more be- haviouristic vocabulary in corpus ‘communication’ as compared to the corpus ‘language’. Additionally, articles from the latter corpus tend to stress a narrative of human uniqueness. Our re- sults, taken together, do not provide clear evidence for a structural and active promotion of a ‘norm of progress’, but hint towards historical aftermaths exercising indirect influence and worthy of further study.|000|scala naturae, progress, language evolution, discussion, 4207|Ullrich2017|Article is potentially interesting with respect to the discussion about progress in evolution concering the evolution / origin of language.|000|language evolution, language origin, scala naturae, progress, discussion 4208|Osterkamp2018|Den mysteriösen Homo floresiensis ken- nen Forscher nur, seit sie 2003 Skelett- teile in einer Höhle auf der indonesischen In- sel Flores fanden. Von dem Augenblick an streiten sie, wie die Art in den Stammbaum der Menschheit einzusortieren ist.|000|homo floresiensis, origin, homo sapiens, 4209|Osterkamp2018|Article states that the appearance of the Homo floriensis was rather adaptation that independent origins.|000|adaptation, homo floresiensis, homo sapiens, origin, 4210|Haspelmath2018b|Review of a book on polysynthesis. Very interesting regarding questions of wordhood. Mentions also the concept of promiscuity, but not clear where this stems from.|000|promiscuity, morphology, polysynthesis, typology, review, methodology, wordhood 4211|Szeto2018|This study explores the range and diversity of the typological features of Mandarin, the largest dialect group within the Sinitic branch of the Sino- Tibetan family. Feeding the typological data of 42 Sinitic varieties into the phylogenetic program NeighborNet, we obtained network diagrams suggesting a north-south divide in the Mandarin dialect group, where dialects within the Amdo Sprachbund cluster at one end and those in the Far Southern area cluster at the other end, highlighting the impact of language contact on the typological profiles of various Mandarin dialects.|000|Mandarin, Chinese dialects, Neighbor-Net, genetic classification, dialect classification, bad data, typology 4212|Szeto2018|Very bad example for the abuse of Neighbor-Net to classify and analyse dialect differences. Data is not accessible in any useful form, and I wonder why they didn't share the Nexus file directly.|000|Neighbor-Net, bad example, bad data, data sharing, Mandarin, dialect classification, Chinese dialects, typology 4213|Haider2018|We present the first supervised approach to rhyme detection with Siamese Recurrent Networks (SRN) that offer near perfect performance (97% accuracy) with a single model on rhyme pairs for German, English and French, allowing future large scale analyses. SRNs learn a similarity metric on variable length character sequences that can be used as judgement on the distance of imperfect rhyme pairs and for binary classification. For training, we construct a diachronically balanced rhyme goldstandard of New High German (NHG) poetry. For further testing, we sample a second collection of NHG poetry and set of contemporary Hip-Hop lyrics, annotated for rhyme and assonance. We train several high-performing SRN models and evaluate them qualitatively on selected sonnetts.|000|rhyme annotation, rhyme patterns, rhyme analysis, automatic rhyme annotation, corpus studies 4214|Haider2018|Authors promise a test set, but this is not available on GitHub yet.|000|rhyme patterns, rhyme analysis, dataset, missing data 4215|Derungs2018|Linguistic diversity is a key aspect of human population diversity and shapes much of our social and cognitive lives. To a considerable extent, the distribution of this diversity is driven by environmental factors such as climate or coast access. An unresolved question is whether the relevant factors have remained constant over time. Here, we address this question at a global scale. We approximate the difference between pre- versus post- Neolithic populations by the difference between modern hunter–gatherer versus food-producing populations. Using a novel geostatistical approach of estimating language and language family densities, we show that environmental—chiefly climate factors—have driven the language density of food-producing populations considerably more strongly than the language density of hunter–gatherer populations. Current evidence suggests that the population dynamics of modern hunter–gatherers is very similar to that of what can be reconstructed from the Palaeolithic record. Based on this, we cau- tiously infer that the impact of environmental factors on language densities underwent a substantial change with the transition to agriculture. After this transition, the environmental impact on language diversity in food-producing populations has remained relatively stable since it can also be detected—albeit in slightly weaker form—in models that capture the reduced linguistic diversity during large-scale language spreads in the Mid-Holocene.|000|cultural evolution, language diversity, environmental factors, Glottolog, uniformitarianism 4216|Atkinson2018|FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reana- lyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selec- tion at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in hu- mans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revi- sion to the adaptive history of FOXP2, a gene re- garded as vital to human evolution.|000|FOXP2, natural selection, language origin, 4217|Atkinson2018|The article finds no evidence that FOXP2 was recently selected, so it is not that important for "speaking".|000|FOXP2, natural selection, language origin 4218|Betts2018|Establishing a unified timescale for the early evolution of Earth and life is challenging and mired in controversy because of the paucity of fossil evidence, the difficulty of interpreting it and dispute over the deepest branching relationships in the tree of life. Surprisingly, it remains perhaps the only episode in the history of life where literal interpretations of the fossil record hold sway, revised with every new discovery and reinterpretation. We derive a timescale of life, combining a reappraisal of the fossil material with new molecular clock analyses. We find the last universal common ancestor of cellular life to have predated the end of late heavy bombardment (>​3.9 billion years ago (Ga)). The crown clades of the two primary divisions of life, Eubacteria and Archaebacteria, emerged much later (<​3.4 Ga), relegating the oldest fossil evidence for life to their stem lineages. The Great Oxidation Event significantly predates the origin of modern Cyanobacteria, indicating that oxygenic photosynthesis evolved within the cyanobacterial stem lineage. Modern eukaryotes do not constitute a primary lineage of life and emerged late in Earth’s history (<​1.84 Ga), falsifying the hypothesis that the Great Oxidation Event facilitated their radiation. The symbiotic origin of mitochondria at 2.053–1.21 Ga reflects a late origin of the total-group Alphaproteobacteria to which the free living ancestor of mitochondria belonged.|000|early life, origin of life, eukaryotes, dating, molecular clock 4219|Betts2018|The dataset consists of 102 species and 29 universally distributed, protein-coding genes (see Supplementary Information). All our data and scripts are available at https://bitbucket.org/bzxdp/ betts_et_al_2017. Proteomes were downloaded from GenBank 62 and putative orthologues were identified using BLAST 63 . The top hits were compiled and aligned into gene-specific files in MUSCLE 64 and trimmed to remove poorly aligned sites using Trimal 65 . To minimize the possible inclusion of paralogues and laterally transferred genes, we generated gene trees (under CAT-GTR +​ G) in PhyloBayes 66 and excluded sequences when the tree topology suggested that they might have been paralogues.|5/10|paralog, sequence alignment, trimming, molecular clock 4220|CapellaGutierrez2009|Summary: Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized. Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications. |000|trimming, sequence alignment, tools, software, 4221|CapellaGutierrez2009|We use similar methods for trimming in linguistics, or, say, we should try to have them and find them. So it is useful to have a paper to quote that deals with the problem in biology.|000|trimming, sequence alignment, software, 4222|Reali2009|Scientists studying how languages change over time often make an analogy between biological and cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent work has exploited this analogy by using models of biological evolution to explain the properties of languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are very different: biological traits are passed between generations by genes, while languages and concepts are transmitted through learning. Here we show that these different mechanisms can have the same results, demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus provides a justification for the use of models of genetic drift in studying language evolution. In addition to providing an explicit connection between biological and cultural evolution, this allows us to define a ‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of language change, the distribution of word frequencies, and the relationship between word frequencies and extinction rates.|000|biological parallels, words, allels, simulation studies, language evolution, cultural evolution, Wright-Fischer model 4223|Reali2009|Paper makes the analogy between words and alleles and claims to be able to explain both cultural or language evolution and biological evolution in a neutral model within the same framework. Quite interesting, as they also say they can handle s-curves in language change, word frequency distributions, and similar.|000|biological parallels, s-curve, language change, word frequency, simulation studies 4224|Townsend2018|A key step in understanding the evolution of human language involves unravelling the ori- gins of language’s syntactic structure. One approach seeks to reduce the core of syntax in humans to a single principle of recursive combination, MERGE , for which there is no evidence in other species. We argue for an alternative approach. We review evidence that beneath the staggering complexity of human syntax, there is an extensive layer of nonproductive, nonhierarchical syntax that can be fruitfully compared to animal call combinations. This is the essential groundwork that must be explored and integrated before we can elucidate, with sufficient precision, what exactly made it possible for human language to explode its syntactic capacity, transitioning from simple nonproductive combinations to the unrivalled complexity that we now have.|000|compositionality, language origin, Chomsky syntax, merge 4225|Auer2018|In this paper, we first present a close analysis of conversational data, capturing the variety of non-addressee deictic usages of du in contemporary German. From its beginnings, it has been possible to use non-addressee deictic du not only for generic statements, but also for subjective utterances by a speaker who mainly refers to his or her own experiences. We will present some thoughts on the specific inferences leading to this interpretation, making reference to Bühler’s deixis at the phantasm. In the second part of the paper, we show that non-addressee deictic du (‘thou’) as found in present-day German is not an innovation but goes back at least to the 18th century. However, there is some evidence that this usage has been spreading over the last 50 years or so. We will link non-addressee deictic du back historically to the two types of “person-shift” for du discussed by Jakob Grimm in his 1856 article “Über den Personenwechsel in der Rede” [On person shift in discourse]. Grimm distinguishes between person shift in formulations of “rules and law” on the one hand, and person shift in what he calls “thou-monologue” on the other. The subjective interpretation of non-addressee-deictic du in present-day German may have originated from these “thou-monologues”.|000|second person singular, German, functional linguistics, pragmatics, 4226|Auer2018|Interesting paper, in so far as it discusses how the pragmatic shift in German emerged that allows to use the second person to refer to oneself.|000|pragmatics, language change, second person singular, German, 4227|Detges2018|This paper deals with the question of how and why resultative constructions change into anteriors. This discussion will be based on synchronic data concerning tener + past participle, a resultative construction used in modern Spanish. One of the latter's most frequent is te lo tengo dicho 'I have (already) told you'. This is remarkable since decir 'to tell' is a non-transitional verb; te lo tengo dicho thus violates the requirement that resultatives should only combine with transitional verbs. In the literature, such mismatches between the semantics of a given construction and the meaning of its lexical filler have been claimed to normally trigger coercion, i.e. an inferential repair mechanism giving rise to special meaning effects. Thus, coercion - despite being conceived as a purely synchronic mechanism - is a prime candidate for an explanation of the change from resultative to anterior. In line with this hypothesis, occurrences of te lo tengo dicho are attested in my corpus where the latter is specified by quantifying adverbials such as muchas veces 'many times'. However, speaker judgements indiacte that even te lo tengo dicho muchas veces is not an iterative anterior construction, but still a resultative. Based on synchronic data taken from the CREA-corpus, it will be shown that in the vast majority of its occurrences, te lo tengo dicho is part of an dialogal discourse pattern where certain argumentative effects based on its resultative meaning are highly relevant. Crucially, therefore, in such "strong" uses a coercive shift towards an anterior meaning is excluded. On a more abstract level, it will be shown that coercion is controlled by pragmatic factors; in the case of te lo tengo dicho muchas veces, conceptual/semantic plausibility is systematically overridden by pragmatic relevance.|000|resultative construction, coercion, grammaticalization, Spanish, pragmatics 4228|SanRoque2018|Apart from references to perception, words such as see and listen have shared, non-literal meanings across diverse languages. Such cross-linguistic meanings have not been systematically investigated as they appear in their natural home — informal spoken interaction. We present a qualitative examination of the semantic associations of perception verbs based on recorded everyday conversation in thirteen diverse languages. Across these diverse communities, spontaneous interaction provides evidence for two commonly-discussed extensions of perception verbs — perception~cognition, hearing~linguistic communication — as well as illustrating other meanings and functions (e.g., the use of perception verbs as discourse markers) that have been less appreciated heretofore. The range of usage that is readily observable in informal conversation makes it clear that this type of data must take center stage for the empirically grounded study of semantics. Moreover, these data suggest that commonalities in polysemous meanings may rely not only on universal cognition, but also on the universal exigencies of social interaction. |000|polysemy, universals, cognition, perception verbs, pragmatics 4229|SanRoque2018|To investigate transfield meanings of perception words, we began in phase one with a bottom-up approach, looking at the “free translations” provided by researchers for each example. We collapsed different grammatical forms of the same verb (e.g., the separate translations ‘think’ and ‘thinking’) as one item and grouped semantically similar items together, examining the original context where necessary to understand the meaning. The groupings were made independently by four native English speakers (the authors), with discrepancies then resolved by consensus, resulting in the groups of meanings shown in Table 1. |380|polysemy, cross-linguistic study, polysemy detection 4230|SanRoque2018|In phase two, for each meaning group, researchers were asked to re-examine the one-hour sample and identify examples where a sight, hearing, and/or multi-sense term was used with one of the meanings listed in the second column of Table 1. This process confirmed the presence of candidate meanings in the data, and allowed for clearer comparison across languages, as everyone used the same criteria (and the same proxy meta-language, English) for the identification of polysemies. Researchers were also asked to identify examples of any polysemies for verbs of any sense modality that were not included in the meaning groups shown in Table 1. This ensured we did not miss less frequent sense extensions. Where there were discrepancies between “singleton” meanings identified in phases one and two, these were checked again in consultation with the relevant researcher(s).|381|polysemy, polysemy detection, corpus studies 4231|SanRoque2018|Interesting paper, describes a manual method to identify polysemies in corpora, and offers all examples found as supplementary material. Data could also be linked to Concepticon. What is strange is tha the number of examples found are so few: this barely allows for a valid abstraction, although the findings intuitively hold.|000|polysemy, corpus studies, polysemy detection, perception verbs, cognition 4232|Hill2009|The paper deals with the so-called Preference Theory developed in the works of Theo Vennemann and Robert Murray within the scope of historical phonology. The first part of the paper examines the constituting assumptions and claims of the theory. The goal of the preference-based historical phonology – uncovering the motivation for sound changes which the Neogrammarian methodology can merely describe – will be achieved only if the universal preferences are reliably established. It is shown that the procedures which are employed to extract the universal preferences from empirical data do not lead to reliable results. The reason for this is the failure of the Preference Theory to distinguish in a non-arbitrary way between the alleged universally preferred structures and the mere by-products of sound changes with different or unknown motivation. The second part of the paper examines a recently suggested modification of the traditional notion of the exceptionlessness of sound changes. According to Vennemann, the traditional exceptionless sound changes are in fact to be considered as non-exclusive tendencies towards universally more preferred phonological structures. The paper shows that this position is neither based on the core assumptions of the Preference Theory nor supported by the adduced empirical evidence.|000|preference theory of sound change, sound change, regular sound change, universals 4233|Hill2009|Important summary and introduction of the preference theory of sound change which claims that there are universal tendencies to be found, going back to @Vennemann1989. |000|Theo Vennemann, preference theory of sound change, discussion, introduction 4234|Axelsen2014|Global linguistic diversity (LD) displays highly heterogeneous distribution patterns. Though the origin of the latter is not yet fully understood, remarkable parallelisms with biodiversity distribution suggest that environmental variables should play an essential role in their emergence. In an effort to construct a broad framework to explain world LD and to systematize the available data, we have investigated the significance of 14 variables: landscape roughness, altitude, river density, distance to lakes, seasonal maximum, average and minimum temperature, precipitation and vegetation, and population density. Landscape roughness and river density are the only two variables that universally affect LD. Overall, the considered set accounts for up to 80% of African LD, a figure that decreases for the joint Asia, Australia and the Pacific (69%), Europe (56%) and the Americas (53%). Differences among those regions can be traced down to a few variables that permit an interpretation of their current states of LD. Our processed datasets can be applied to the analysis of correlations in other similar heterogeneous patterns with a broad spatial distribution, the clearest example being biological diversity. The statistical method we have used can be understood as a tool for cross-comparison among geographical regions, including the prediction of spatial diversity in alternative scenarios or in changing environments.|000|river density, landscape roughness, linguistic diversity, correlational studies, Ethnologue 4235|Axelsen2014|Scholars use Ethnologue data for their study.|000|river density, landscape roughness, Ethnologue, linguistic diversity, 4236|Tallavaara2017|Because of complex cumulative culture, human populations are often considered to be divorced from the environment and not be under the same ecological forcing as other species. However, this study shows that key environmental parameters net primary productivity, biodiversity, and environmental pathogen stress have strong influence on the global pattern of hunter-gatherer population density. Productivity and biodiversity exert the strongest influence in high and midlatitudes, whereas pathogens become more important in tropics. The most suitable conditions for preagricultural humans are found in temperate and subtropical biomes. Our results show that cultural evolution has not freed human hunter-gatherers from strong biotic and abiotic forcing.|000|hunter gatherers, biodiversity, correlational studies, pathogens, population density 4237|Michaud2018|Automatic speech recognition tools have potential for facilitating language doc- umentation, but in practice these tools remain little-used by linguists for a va- riety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrat- ing the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic tran- scription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manu- ally transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcrip- tions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide in- sights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.|000|automatic transcription, Na language, Sino-Tibetan, computer-assisted analysis, corpus studies 4238|MilaGarcia2018|Corpus annotation is generally considered an added value (McEnery et al. 2006), since it provides additional information about the original text and eases and expands the search possibilities that the corpus can undergo. |271|corpus annotation, corpus studies, nice quote, linguistic annotation, 4239|MilaGarcia2018|Within the recently-coined sub-field of corpus pragmatics, one of the areas of interest is the study of speech acts and, specifically, how it can profit from the adoption of this methodological approach. However, the acknowledged lack of correspondence between speech acts and linguistic forms makes basic form-based corpus searches unreliable in retrieving speech acts from a corpus. In fact, function-to-form corpus research can prove much more fruitful in carrying out this kind of study, but it usually requires time-consuming manual annotation, which in turn means that there have been few attempts to employ this methodology. As a contribution in this new direction, this study will showcase a function-to-form approach to investigating speech acts of agreement and disagreement in spoken Catalan. Through this example, this paper aims to show the benefits of designing, compiling, transcribing and, especially, annotating one’s own corpus for the study of speech acts. In order to annotate data for the study of speech acts, a complex and multi-layered annotation system was designed and manually applied, so that all the different aspects that play a relevant role in the expression of agreement and disagreement could be covered. In addition to discussing the findings from this study, it is argued that the possibilities of exploitation provided by the resulting annotated corpus far outweigh the time cost and open the door to in-depth analyses of speech acts and politeness in naturally occurring spoken data.|000|multi-layered annotation, corpus studies, corpus annotation, linguistic annotation 4240|DeCastroArrazola2018|This dissertation is about verse, some of its recurrent features, and cognitive aspects which can explain their prevalence. Verse includes a range of verbal phenomena, most typically songs and poems, but also nursery rhymes, religious chants or demonstration slogans. Compared to everyday speech, all these forms show additional layers of structure, like a regular alternation of accented syllables, a fixed melody, or a systematic number of syllables per utterance. Every linguistic community in the world engages in verse, but certain features seem suspiciously widespread. On the one hand, I have developed computational tools in order to assess systematically how widespread individual verse features are. On the other hand, I have conducted behavioural experiments to investigate to which extent these widespread features may stem from properties of human cognition. Using these two approaches, the thesis examines three aspects of verse. The first part deals with constituent structure in verse and how it can emerge in the process of iterative learning. The second part measures final strictness in several languages, and proposes that it is a consequence of reduced attention at the beginning of lines. The last part develops a method to describe how linguistic and musical features are aligned in songs, and how to test the intuitions of native speakers experimentally. Although verse constitutes a prototypically creative activity subject to extensive cultural variability, it is nonetheless bound and shaped by our cognitive system. |000|verse, rhyme patterns, computational approaches, cross-linguistic study, 4241|Ide2017|The Handbook of Linguistic Annotation provides a comprehensive survey of the development and state-of-the-art for linguistic annotation of language resources, including methods for annotation scheme design, annotation creation, physical format considerations, annotation tools, annotation use, evaluation, etc. The vol- ume is divided into two parts: Part I includes survey chapters on the various phases and considerations for an annotation project, and Part II consists of thirty-nine case studies describing major annotation projects for a broad range of linguistic phenomena.|000|linguistic annotation, introduction, overview 4242|Pustejovsky2017|In this chapter, we describe the method and process of transforming the theoretical formulations of a linguistic phenomenon, based on empirical observations, into a model that can be used for the development of a language annotation specification. We outline this procedure generally, and then examine the steps in detail by specific example. We look at how this methodology has been implemented in the creation of TimeML (and ISO-TimeML), a broad-based standard for annotating temporal information in natural language texts. Because of the scope of this effort and the richness of the theoretical work in the area, the development of TimeML illustrates very clearly the methodology of the early stages of the MATTER annotation cycle, where initial models and schemas cycle through progressively mature versions of the resulting specification. Furthermore, the subsequent effort to convert TimeML into an ISO compliant standard, ISO-TimeML, demonstrates the utility of the CASCADES model in distinguishing between the concrete syntax of the schema and abstract syntax of the model behind it.|000|linguistic annotation, annotation scheme, introduction, overview, 4243|Wilcock2017|This chapter outlines the evolution of linguistic annotation frameworks. The aim is tutorial, describing older approaches that introduced basic ideas before show- ing how their various contributions have been combined and integrated into more modern frameworks. After a summary of typical annotation tasks and some open source tools that can perform them, we present two older ways to organize the tools into pipelines that ensure the annotation tasks are done in the correct order, first using traditional Linux scripts, then XML-based Ant buildfiles which give inde- pendence across operating systems. Manual and automatic annotation processes were integrated in WordFreak, which supported interactive visualization and edit- ing of annotations through its graphical user interface, and also used a stand-off XML annotation format. These developments (pipeline configuration, platform independence, graphical interface, stand-off XML mark up) were successfully integrated into GATE and UIMA, the main large-scale modern annotation frame- works. UIMA added a type system that supports automatic validation of inputs and outputs between components in the pipeline. We present examples from both GATE and UIMA, and illustrate interoperability between frameworks with another older approach using XSLT transformations. The chapter ends by dis- cussing the differences between annotation toolkits and annotation frameworks.|000|linguistic annotation, text annotation, introduction, overview, 4244|Wheeler2018|Phylogenetic methods offer a promising advance for the historical study of language and cultural relationships. Applications to date, however, have been hampered by traditional approaches dependent on unfalsifiable authority statements: in this regard, historical linguistics remains in a similar position to evolutionary biology prior to the cladistic revolution. Influential phyloge- netic studies of Bantu languages over the last two decades, which provide the foundation for multiple analyses of Bantu socio- cultural histories, are a major case in point. Comparative analyses of basic lexica, instead of directly treating written words, use only numerical symbols that express non-replicable authority opinion about underlying relationships. Building on a previous study of Uto-Aztecan, here we analyse Bantu language relationships with methods deriving from DNA sequence optimization algorithms, treating basic vocabulary as sequences of sounds. This yields finer-grained results that indicate major revisions to the Bantu tree, and enables more robust inferences about the history of Bantu language expansion and/or migration throughout sub-Saharan Africa. “Early-split” versus “late-split” hypotheses for East and West Bantu are tested, and overall results are com- pared to trees based on numerical reductions of vocabulary data. Reconstruction of language histories is more empirically based and robust than with previous methods.|000|Bantu languages, phylogenetic reconstruction, sequence alignment, dataset, 4245|Berdicevskis2018|We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks. We propose a method of estimating robustness of the complexity values obtained using a given measure and a given treebank. The results indicate that measures of syntactic complexity might be on average less robust than those of morphological complexity. We also estimate the validity of complexity measures by comparing the results for very similar languages and checking for unexpected differences. We show that some of those differences that arise can be diminished by using parallel treebanks and, more importantly from the practical point of view, by harmonizing the language- specific solutions in the UD annotation.|000|universal dependencies, treebank, corpus studies, linguistic complexity, cross-linguistic study 4246|Ariel2018|Or constructions introduce a set of alternatives into the discourse. But alternativity does not exhaust speakers’ intended messages. Speakers use the profiled or alternatives as a starting point for expressing a variety of readings. Ever since (Grice, H. Paul. 1989. Studies in the way of words. Cambridge, MA: Harvard University Press) and (Horn. 1972. On the semantic properties of the logical operators in English. Los Angeles, CA: University of California Los Angeles dissertation), the standard approach has assumed that or has an inclusive lexical meaning and a predominantly exclusive use, thus focusing on two read- ings. While another, “free choice”, reading has been added to the repertoire, accounting for the exclusive reading remains a goal all or theorists must meet. We here propose that both “inclusive” and “exclusive” interpretations, as cur- rently defined, do not capture speakers’ intended readings, which we equate with the relevance-theoretic explicature. Adopting a usage-based approach to language, we examined all the or occurrences in the Santa Barbara Corpus of spoken American English (1053 tokens), and found that speakers use or utter- ances for a far richer variety of readings than has been recognized. In line with Cognitive Linguistics, we propose that speakers’ communicated intentions are better analyzed in terms of subjective construals, rather than the objective conditions obtaining when the or proposition is true. We argue that in two of these readings speakers are not necessarily committed to even one of the alternatives being the case. In the most frequent reading, the overt disjuncts only serve as pointers to a higher-level concept, and it is that concept that the speaker intends to refer to.|000|or-construction, or sentence, pragmatics, truth value, corpus studies 4247|Abrusan2018|Counter to the often assumed division of labour between content and function words, we argue that both types of words have lexical content in addition to their logical content. We propose that the difference between the two types of words is a difference in degree. We con- ducted a preliminary study of quantificational determiners with methods from Distributional Semantics, a computational approach to natural language semantics. Our findings have impli- cations both for distributional and formal semantics. For distributional semantics, they indicate a possible avenue that can be used to tap into the meaning of function words. For formal se- mantics, they bring into light the context-sensitive, lexical aspects of function words that can be recovered from the data even when these aspects are not overtly marked. Such pervasive context-sensitivity has profound implications for how we think about meaning in natural lan- guage.|000|function words, linguistic theory, grammar, semantics, 4248|Quinn2018|A developmental relationship between symbolic play and language has been long proposed, going as far back as the writings of Piaget and Vygotsky. In the current paper we build on recent qualitative reviews of the literature by reporting the first quantitative analysis of the relationship. We conducted a three-level meta-analysis of past studies that have investigated the relationship between symbolic play and language acquisition. Thirty-five studies (N = 6848) met the criteria for inclusion. Overall, we observed a significant small-to-medium association between the two domains (r = .35). Several moderating variables were included in the analyses, including: (i) study design (longitudinal, concurrent), (ii) the manner in which language was measured (comprehension, production), and (iii) the age at which this relationship is measured. The effect was weakly moderated by these three variables, but overall the association was robust, sug- gesting that symbolic play and language are closely related in development.|000|symbolic play, play, language acquisition, language origin 4249|Talmy2018|For early pre-language hominins, the vocal-auditory channel of communication as then organized may have been unable to accommodate any enhancement in the trans- mission of conceptual content due to three limitations: comparatively low degrees of parameter diversity, iconicity, and fidelity. We propose that these limitations were overcome by an evolutionary development that enabled an advance from the fixed holophrastic calls of earlier species to the open-ended spoken language of our own species. What developed was a “combinant” form of organization. Such combinance is a system in which smaller units combine to form larger units. At its smallest scale, this process yields a “clave”. In a clave, generally, units from an inventory at a lower tier combine to form the units of an inventory at the next higher tier in accord with a particular set of constraints. In turn, such claves function as the smaller units that combine to form a larger unit, a concatenation, where the higher tier of one clave serves as the lower tier of the next. The longest such concatenation in ­language consists of six successive claves. Phonetic features combine to form pho- nemes under the constraints of feature assembly; phonemes combine to form mor- phemes under the constraints of phonotactics; morphemes combine to form complex words under the constraints of morphology; morphemes and complex words combine to form expressions under the constraints of syntax; expressions combine to form a single speaker’s “monolog” under the constraints of discourse rules; and such monologs combine to form an exchange between speakers under the constraints of turn-taking. Our analysis characterizes communication at its most general and contrasts differ- ent channels of communication. In particular, the vocal-auditory channel of spoken language is extensively contrasted with the somatic-visual channel of signed language, whose classifier system largely lacks the three limitations of the former. To show this difference, the limitations are analyzed in detail (e.g., iconicity is shown to be based on six properties: prorepresentation, covariation, proportionality, proportional direct- ness, cogranularity, and codomainality). In accord with this difference, the signed classifier system demonstrates the cognitive feasibility of communicating advanced conceptual content with little combinance, but the vocal-auditory channel is seen to have needed the incorporation of combinance for spoken language to evolve. |000|combinance, compositionality, language origin, language evolution, double articulation 4250|BermudezOtero2018|The surface realization of a linguistic expression can often be predicted from the form of paradigmatically related items that are not con- tained within it: in Latin, the nominative singular of a noun can often be inferred from the genitive; in French, the final consonant of a prenominal masculine adjective in liaison can typically be predicted from the feminine; in Romanian, the plural form of a noun determines whether its stem will exhibit palatalization before the derivational suffix /-ist/. Such instances of phonological paradigmatic dependence without containment have been claimed to challenge cyclic models of the morphosyntax-phonology interface. In this article, however, they are shown to be established indirectly through the acquisition of underlying representations. This approach correctly predicts that phonological paradigmatic dependencies are never systematically extended to new items if they involve suppletive allomorphy rather than regular alternation, whilst those surface phonological properties of derivatives that are under strict phonotactic control evade paradigmatic dependence on the inflectional forms of their bases. Theories relying on surface-to-surface computation fail to recover these empirical predictions because they are inherently nonmodular, positing generalizations that promiscuously mix pho- nological, morphosyntactic, and lexical information. Underlying representa- tions, therefore, remain indispensable as a means of establishing a necessary modular demarcation between regular phonology and suppletive allomorphy.|000|underlying representations, allomorphy, phonology, phonological theory, morphology, 4251|BermudezOtero2018|What is interesting about this is that we could see this as a new task: try to identify the underlying representations in phonology.|000|phonology, underlying representations, morphology, allomorphy 4252|Cladistics2016|Phylogenetic data sets submitted to this journal should be analysed using parsimony.|p1|maximum parsimony, cladistics, nice quote 4253|Cladistics2016|Phylogenetic data sets submitted to this journal should be analysed using parsimony.|p1|maximum parsimony, cladistics, nice quote 4254|Hammarstroem2018a|The world harbors a diversity of some 6,500 mutually unintelligible languages. As has been increasingly observed by linguists, many minority languages are be- coming endangered and will be lost forever if not documented. Urgently indeed, many efforts are being launched to document and describe languages. This under- taking naturally has the priority toward the most endangered and least described languages. For the first time, we combine world-wide databases on language de- scription (Glottolog) and language endangerment (ElCat, Ethnologue, UNESCO) and provide two online interfaces, GlottoScope and GlottoVis, to visualize these together. The interfaces are capable of browsing, filtering, zooming, basic statis- tics, and different ways of combining the two measures on a world map back- ground. GlottoVis provides advanced techniques for combining cluttered dots on a map. With the tools and databases described we seek to increase the overall knowledge of the actual state language endangerment and description worldwide.|000|tools, visualization, Glottolog, language endangerment 4255|Kukekova2018|Strains of red fox (Vulpes vulpes) with markedly different behavioural phenotypes have been developed in the famous long-term selective breeding programme known as the Russian farm-fox experiment. Here we sequenced and assembled the red fox genome and re-sequenced a subset of foxes from the tame, aggressive and conventional farm-bred populations to identify genomic regions associated with the response to selection for behaviour. Analysis of the re-sequenced genomes identified 103 regions with either significantly decreased heterozygosity in one of the three populations or increased divergence between the populations. A strong positional candidate gene for tame behaviour was highlighted: SorCS1, which encodes the main trafficking protein for AMPA glutamate receptors and neurexins and suggests a role for synaptic plasticity in fox domestication. Other regions identified as likely to have been under selection in foxes include genes implicated in human neurological disorders, mouse behaviour and dog domestication. The fox represents a powerful model for the genetic analysis of affiliative and aggressive behaviours that can benefit genetic studies of behaviour in dogs and other mammals, including humans.|000|fox, red fox, vulpes vulpes, domestication, animal evolution, symbiosis 4256|Fourment2006|Phylogenies are commonly used to analyse the differences between genes, genomes and species. Patristic distances calculated from tree branch lengths describe the amount of genetic change represented by a tree and are commonly compared with other measures of mutation to investigate the substitutional processes or the goodness of fit of a tree to the raw data. Up until now no universal tool has been available for calculating patristic distances and correlating them with other genetic distance measures. PATRISTICv1.0 is a java program that calculates patristic distances from large trees in a range of file formats and allows graphical and statistical interpretation of distance matrices calculated by other programs. The software overcomes some logistic barriers to analysing signals in sequences. In additional to calculating patristic distances, it provides plots for any combination of matrices, calculates commonly used statistics, allows data such as isolation dates to be entered and reorders matrices with matching species or gene labels. It will be used to analyse rates of mutation and substitutional saturation and the evolution of viruses. It is available at http://biojanus.anu.edu.au/ programs/ and requires the Java runtime environment.|000|patristic distances, distances, phylogenetic distance, phylogenetic reconstruction 4257|Sims-Williams2018|The Neogrammarian approach to historical phonology involves propounding sound- change laws and explaining exceptions by means such as sub-laws, rearranging the relative chronology, and appeal to special factors such as analogy, borrowing, and sporadic phenomena like metathesis. Progress is mostly made manually, but in the second half of the twentieth century some linguists looked forward to the ‘triumph of the electronic Neogrammarian’. Although this has not been realized yet, there are opportunities to make important advances. This paper offers a critical survey of the field since the 1950s and suggestions for the future.|000|history of science, cognate detection, sound law induction, rule induction, overview, automatic linguistic reconstruction 4258|Sims-Williams2018|Very detailed review of different aspects of past attempts to automatitize the comparative method.|000|comparative method, history of science, overview, review, automatic approach 4259|Goldberg2017|One of the promises of deep learning is that it vastly simplifies the feature-engineering process by allowing the model designer to specify a small set of core, basic, or “natural” features, and letting the trainable neural network architecture combine them into more meaningful higher-level features, or representations . However, one still needs to specify a suitable set of core features, and tie them to a suitable architecture. |18|feature engineering, feature selection, machine learning, neural network, definition 4260|Cohen2018|The Anatolian Dissimilation Rule (ADR) was first introduced in an oral presentation by us in 2006 and first published by us in 2012, though it had, in several fundamental aspects, been prefigured in articles by, e.g., Gillian Hart and Birgit Olsen. The ADR expresses the following sound change(s): Proto-Indo-European *h3 > {Hittite š; Luvian t/d; Lycian, Milyan t; Lydian s} / ## __ X Labiovelar Y, where X and Y are arbitrary (possibly null) phone strings and X does not contain #. There are five PIE roots/words with attested reflexes in Anatolian that are subject to the ADR, and all of them exhibit the appropriate outcomes: *h3okw- ‘eye’, *h3ēh2u̯r̥ ‘urine’, *h3n̥gwh- ‘fingernail, toenail’, *h3óngwn̥ ‘fat, butter, oil, salve’, *h3(o)rh2u̯ent- ‘innards, intestine(s)’. The ADR covers all relevant items exceptionlessly; nevertheless, it has not been widely accepted. Potential reasons—both Anatolian-specific and more generally phonological—will be discussed and rebutted below, in the light of our previous arguments/suggestions and some newly added and upgraded ones.|000|Anatolian, sound law, Anatolian dissimilation rule, examples 4261|Cohen2018|The Anatolian Dissimilation Rule (ADR) was first introduced in an oral presentation by us in 2006 and first published by us in 2012, though it had, in several fundamental aspects, been prefigured in articles by, e.g., Gillian Hart and Birgit Olsen. The ADR expresses the following sound change(s): Proto-Indo-European *h3 > {Hittite š; Luvian t/d; Lycian, Milyan t; Lydian s} / ## __ X Labiovelar Y, where X and Y are arbitrary (possibly null) phone strings and X does not contain #. There are five PIE roots/words with attested reflexes in Anatolian that are subject to the ADR, and all of them exhibit the appropriate outcomes: *h3okw- ‘eye’, *h3ēh2u̯r̥ ‘urine’, *h3n̥gwh- ‘fingernail, toenail’, *h3óngwn̥ ‘fat, butter, oil, salve’, *h3(o)rh2u̯ent- ‘innards, intestine(s)’. The ADR covers all relevant items exceptionlessly; nevertheless, it has not been widely accepted. Potential reasons—both Anatolian-specific and more generally phonological—will be discussed and rebutted below, in the light of our previous arguments/suggestions and some newly added and upgraded ones.|000|Anatolian, sound law, Anatolian dissimilation rule, examples 4262|Cohen2018|Interesting example for a sound law in Indo-European, in so far as there are only five known cases for this law, so the data and the judgments are quite sparse.|000|Anatolian, sound law, Anatolian dissimilation rule 4263|Babinski2018|A crucial question for historical linguistics has been why some sound changes happen but not others. Recent work on the foundations of sound change has argued that subtle distributional facts about segments in a language, such as functional load, play a role in facilitating or impeding change. Thus not only are sound changes not all equally plausible, but their likelihood depends in part on phonotactics and aspects of functional load, such as the density of minimal pairs. Tests of predictability on the chance of phoneme merger suggest that phonemes with low functional load (as defined by minimal pair density) are more likely to merge, but this has been investigated only for a small number of languages with very large corpora and well attested mergers. Here we present work suggesting that the same methods can be applied to much smaller corpora, by presenting the results of a preliminary exploration of nine Australian languages, with a particular focus on Bardi, a Nyulnyulan language from Australia’s northwest. Our results support the hypothesis that the probability of merger is higher when phonemes distinguish few minimal pairs.|000|missing data, Australian languages, missing code, replicability, sound change, functional load 4264|Tambets2018| Background The genetic origins of Uralic speakers from across a vast territory in the temperate zone of North Eurasia have remained elusive. Previous studies have shown contrasting proportions of Eastern and Western Eurasian ancestry in their mitochondrial and Y chromosomal gene pools. While the maternal lineages reflect by and large the geographic background of a given Uralic-speaking population, the frequency of Y chromosomes of Eastern Eurasian origin is distinctively high among European Uralic speakers. The autosomal variation of Uralic speakers, however, has not yet been studied comprehensively. Results Here, we present a genome-wide analysis of 15 Uralic-speaking populations which cover all main groups of the linguistic family. We show that contemporary Uralic speakers are genetically very similar to their local geographical neighbours. However, when studying relationships among geographically distant populations, we find that most of the Uralic speakers and some of their neighbours share a genetic component of possibly Siberian origin. Additionally, we show that most Uralic speakers share significantly more genomic segments identity-by-descent with each other than with geographically equidistant speakers of other languages. We find that correlated genome-wide genetic and lexical distances among Uralic speakers suggest co-dispersion of genes and languages. Yet, we do not find long-range genetic ties between Estonians and Hungarians with their linguistic sisters that would distinguish them from their non-Uralic-speaking neighbours. Conclusions We show that most Uralic speakers share a distinct ancestry component of likely Siberian origin, which suggests that the spread of Uralic languages involved at least some demic component. |000|missing data, missing code, Uralic languages, population genetics, correlational studies, dataset 4265|Mahowald2018|Zipf famously stated that, if natural language lexicons are structured for efficient communication, the words that are used the most frequently should require the least effort. This observation explains the famous finding that the most frequent words in a language tend to be short. A related prediction is that, even within words of the same length, the most frequent word forms should be the ones that are easiest to produce and understand. Using orthographics as a proxy for phonetics, we test this hypothesis using corpora of 96 languages from Wikipedia. We find that, across a variety of languages and language families and controlling for length, the most frequent forms in a language tend to be more orthographically well‐formed and have more orthographic neighbors than less frequent forms. We interpret this result as evidence that lexicons are structured by language usage pressures to facilitate efficient communication.|000|missing data, missing code, word frequency, word length, dataset, Wikipedia 4266|Mahowald2018|Interestingly, the data is not missing, but shared openly with OSF, only the authors did not mention this in their paper.|000|missing data, missing code, word frequency, Wikipedia, corpus 4267|Lieber2016|Compounding is a subject that has received extensive attention in morpho- logical literature in recent years, with volumes like Lieber and Štekauer (2009) and Scalise and Vogel (2010) giving prominent overviews. Both structural and semantic properties of compounding have been explored in many different frameworks, but disagreement still exists on the best way of modeling the interpretation of various kinds of compounds. This chapter will provide a general introduction to the treatment of compounds within the lexical semantic framework developed in Lieber (2004, 2006, 2009, 2010, forthcoming).|000|compounding, overview, lexical semantic framework, semantics, compositionality 4268|Stekauer2016|This chapter is supposed to discuss compounding from an onomasiological point of view. This seems to be, however, a contradictory requirement, because the onomasiological approach as outlined and discussed mainly in Štekauer (1998, 2005b) does not recognize the term compounding in the same way, as there is no place for the traditional word formation terms like prefixation, suffixation, back-formation, blending, conversion, reduplication. The reason is simple and logical – the traditional terminology stems from the classical and deep-rooted semasiological approach to word formation which takes the form of complex words as its point of departure. [...] First, a general theoretical background will be outlined to set the scene for an onomasiological account of the interrelation between semantics and what has traditionally been labelled as the process of compounding.|000|onomasiological approach, compounding, 4269|Schaar2018|Article describes the mathematics of chirotopes, that is, the attempts to infer locations from minimal informations like "left of X" or "right of X", and whether it is possible to infer the constellation or not. This is important with respect to the question of orientational prefixes across different langauges.|000|orientational prefixes, orientation, lexical semantics, geography, 4270|Baker2010|Corpora are often annotated (or tagged) with additional information, allow- ing more complex calculations to be performed on them. Such information can take several forms, for example, individual texts within a corpus are often stored as separate files and each one can contain a ‘header’ which gives infor- mation about the text such as its author, date of publication, genre, etc. This information can be useful in allowing researchers to focus on particular types of texts (e.g. just newspaper articles) or carry out comparisons between differ- ent types (e.g. male vs female authors). Such annotation sometimes employs standard generalized mark-up language (SGML), whereby tags take the form of codes (known as elements) inside matching angle brackets < >.|96|corpus studies, tagging, linguistic annotation, annotation, Standardized Mark-Up Language 4271|Baker2010|This chapter examines how corpus linguistics techniques can be used to aid a range of linguistic analyses. The chapter begins by defi ning corpus linguistics and describes some of the theoretical concepts surrounding the fi eld (such as the importance of using large bodies of naturalistic data in order to investigate language usage, and the distinction between corpus-based and corpus-driven approaches). I then discuss various principles that are useful to take into account when building and annotating a corpus, as well as the different types of corpora that can be built, their relationship to the various fi elds of linguistics that cor- pus research has contributed to, and the sorts of research questions that corpus linguistics can enable us to ask. Then, a number of techniques of analysis are demonstrated on general corpora of British English. These include comparisons of word frequencies, a keyword analysis, and examinations of collocates and concordances. The chapter ends with a critical discussion of issues that need to be considered when carrying out corpus analysis, noting that corpus methods should not be considered as only quantitative, but rather an approach which can combine both qualitative and quantitative processes.|000|corpus studies, methodology, introduction 4272|Dellert2018b|Based on a recently published large-scale lexicostatistical database, we rank 1,016 con- cepts by their suitability for inclusion in Swadesh-style lists of basic stable concepts. For this, we define separate measures of basicness and stability. Basicness in the sense of morphological simplicity is measured based on information content, a generaliza- tion of word length which corrects for distorting effects of phoneme inventory sizes, phonotactics and non-stem morphemes in dictionary forms. Stability against replace- ment by semantic shift or borrowing is measured by sampling independent language pairs, and correlating the distances between the forms for the concept with the overall language distances. In order to determine the relative importance of basicness and sta- bility, we optimize our combination of the two partial measures towards similarity with existing lists. A comparison with and among existing rankings suggests that concept rankings are highly data-dependent and therefore less well-grounded than previously assumed. To explore this issue, we evaluate the robustness of our ranking against lan- guage pair resampling, allowing us to assess how much volatility can be expected, and showing that only about half of the concepts on a list based on our ranking can safely be assumed to belong on the list independently of the data.|000|Swadesh list, concept list, ranked concept list 4273|Baese-Berk2018|Phonological knowledge is influenced by a variety of cues that reflect predictability (e.g. semantic predictability). Listeners utilize various aspects of predictability when determining what they have heard. In the present paper, we ask how aspects of the acoustic phonetic signal (e.g. speaking rate) interact with other knowledge reflecting predictability (e.g. lexical frequency and collocation strength) to influence how speech is perceived. Specifically, we examine perception of function words by native and non-native speakers. Our results suggest that both native and non-native speakers are sensitive to factors that influence the predictabil- ity of the signal, including speaking rate, frequency, and collocation strength, when listening to speech, and use these factors to predict the phonological structure of stretches of ambiguous speech. However, reliance on these cues differs as a function of their experience and proficiency with the target language. Non-native speakers are less sensitive to some aspects of the acoustic phonetic signal (e.g. speaking rate). However, they appear to be quite sensitive to other factors, including frequency. We discuss how these results inform our understanding of the interplay between predictability and speech perception by different listener popula- tions and how use of features reflecting predictability interacts with recovery of phonological structure of spoken language.|000|perception, listener, speaker, second language learning, experimental study 4274|Bott2018|Particle verbs represent a type of multi-word expression composed of a base verb and a particle. The meaning of the particle verb is often, but not always, derived from the meaning of the base verb, some- times in quite complex ways. In this work, we computationally assess the levels of German particle verb compositionality by applying dis- tributional semantic models. Furthermore, we investigate properties of German particle verbs at the syntax-semantics interface that in- fluence their degrees of compositionality: (i) regularity in semantic particle verb derivation and (ii) transfer of syntactic subcategoriza- tion from base verbs to particle verbs. Our distributional models show that both superficial window co-occurrence models as well as theoret- ically well-founded syntactic models are sensitive to subcategoriza- tion frame transfer and can be used to predict degrees of particle verb compositionality, with window models performing better even though they are conceptually and computationally simpler.|000|particle verbs, German, computational approaches, syntax-semantics interface, compositionality 4275|Cohen2018|The Anatolian Dissimilation Rule (ADR) was first introduced in an oral presentation by us in 2006 and first published by us in 2012, though it had, in several fundamental aspects, been prefigured in articles by, e.g., Gillian Hart and Birgit Olsen. The ADR expresses the following sound change(s): Proto- Indo-European *h 3 > {Hittite š; Luvian t/d; Lycian, Milyan t; Lydian s} / ## __ X Labiovelar Y, where X and Y are arbitrary (possibly null) phone strings and X does not contain #. There are five PIE roots/words with attested reflexes in Anatolian that are subject to the ADR, and all of them exhibit the appropriate outcomes: *h 3 ok w - ‘eye’, *h 3 ēh 2 u̯ r̥ ‘urine’, *h 3 n̥ g wh - ‘fingernail, toenail’, *h 3 óng w n̥ ‘fat, butter, oil, salve’, *h 3 (o)rh 2 u̯ ent- ‘innards, intestine(s)’. The ADR covers all relevant items exceptionlessly; nevertheless, it has not been widely accepted. Potential reasons—both Anatolian-specific and more generally phonological—will be discussed and rebutted below, in the light of our previous arguments/suggestions and some newly added and upgraded ones.|000|Anatolian, Hittite, dissimilation, sound change, Indo-European, linguistic reconstruction, methodology 4276|Cohen2018|What is interesting about this article is that it lists five examples in favor of the rule (only), but it may well be representatative of normal example size in Indo-European linguistics. |000|sample size, Indo-European, methodology, linguistic reconstruction 4277|Hawkins2018|A study describing what people think about political correctness (maybe disputed, in terms of results), essentially claiming that only a few rich people care about it.|000|political correctnesss, empirical study 4278|Ellestroem2018|The aim of this article is to form a new communication model, which is centered on the intermediate stage of communication, here called medium. The model is intended to be irreducible, to highlight the essential communication entities and their interrelations, and potentially to cover all conceivable kinds of communication of meaning. It is designed to clearly account for both verbal and nonverbal meaning, the different roles played by minds and bodies in commu- nication, and the relation between presemiotic and semiotic media features. As a result, the model also pinpoints fundamental obstacles for communication located in media products themselves, and demonstrates how Shannon’s model of transmission of computable data can be incorporated in a model of human communication of meaning.|000|communication, semiotics, modeling 4279|Ellestroem2018|In “A mathematical theory of communication” (1948), Claude Shannon distinctly declared that “[t]he fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem” (@1948: 379).|272|communication model, Shannon, theory of communication, 4280|Shannon1948|The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. :comment:`Quoted after` @Ellestroem2018 (272) |379|Shannon, communication model, theory of communication 4281|Ellestroem2018|Roman Jakobson’s communication model (Figure 2), presented in the article “Closing statement: Linguistics and poetics” (1960), crosses the border between linguistics and literary studies. His aim was to investigate language “in all the variety of its functions”; an exploration that “demands a concise survey of the constitutive factors in any speech event, in any act of verbal communication” (@1960: 353).|273|theory of communication, Roman Jakobson 4282|Jakobson1960|The ADDRESSER sends a MESSAGE to the ADDRESSEE . To be operative the message requires a CONTEXT [that is] seizable by the addressee, and either verbal or capable of being verbalized; a CODE fully, or at least partially, common to the addresser and addressee (or in other words, to the encoder and decoder of the message); and, finally, a CONTACT , a physical channel and psychological connection between the addresser and the addressee, enabling both of them to enter and stay in communication. :comment:`Quoted after` @Ellestroem2018 273f|353|communication model, Roman Jakobson, nice quote 4283|Ellestroem2018|Paper is a nice introduction to different communication models, including @Jakobson1960 and @Shannon1948, so a very good introduction to differen theories.|000|communication model, theory of communication, Shannon, Roman Jakobson, overview, introduction 4284|Giuliano2011|Native tone language experience has been linked with alterations in the production and perception of pitch in language, as well as with the brain response to linguistic and non-linguistic tones. Here we use two experiments to address whether these changes apply to the discrimination of simple pitch changes and pitch intervals. Event related potentials (ERPs) were recorded from native Mandarin speakers and a control group during a same/different task with pairs of pure tones differing only in pitch height, and with pure tone pairs differing only in interval distance. Behaviorally, Mandarin speakers were more accurate than controls at detecting both pitch and interval changes, showing a sensitivity to small pitch changes and interval distances that was absent in the control group. Converging evidence from ERPs obtained during the same tasks revealed an earlier response to change relative to no-change trials in Mandarin speakers, as well as earlier differentiation of trials by change direction relative to controls. These findings illustrate the cross-domain influence of language experience on the perception of pitch, suggesting that the native use of tonal pitch contours in language leads to a general enhancement in the acuity of pitch representations.|000|pitch, tone, discrimination of tones, perception, neurolinguistics 4285|Gast2015|GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow from corpus import to data export and thus provides a suitable environment for the manual annotation of modals in their sentential contexts. Given its generic data model, it is particularly suitable for enriching existing corpora, e.g. by adding semantic annotations to syntactic ones. In this contribution, we present the functionalities of GraphAnno and make a concrete proposal for the treatment of modals in a corpus, with a focus on scope interactions.|000|GraphAnno, linguistic annotation, graph theory, annotation, multi-level annotation 4286|Gast2015b|In this paper, we propose an annotation scheme for the manual annotation of tense and aspect in natural language corpora, as well as an implementation using GraphAnno, a configurable tool for manual multi- level annotation.|000|tense, aspect, linguistic annotation, annotation framework, GraphAnno 4287|Scerri2018|We challenge the view that our species, Homo sapiens, evolved within a single population and/or region of Africa. The chronology and physical diversity of Pleistocene human fossils suggest that morphologically varied populations pertaining to the H. sapiens clade lived throughout Africa. Similarly, the African archaeological record demonstrates the polycentric origin and persistence of regionally distinct Pleistocene material culture in a variety of paleoecological settings. Genetic studies also indicate that present-day population structure within Africa extends to deep times, paralleling a paleoenvironmental record of shifting and fractured habitable zones. We argue that these fields support an emerging view of a highly structured African prehistory that should be consid- ered in human evolutionary inferences, prompting new interpretations, ques- tions, and interdisciplinary research directions.|000|Out-of-Africa, population genetics, discussion 4288|Kokko2018|Does the progress in understanding evolutionary theory depend on the species that is doing the investigation? This question is difficult to answer scientifi- cally, as we are dealing with an n 1⁄4 1 scenario: every individual who has ever written about evolution is a human being. I will discuss, first, whether we get the correct answer to questions if we begin with ourselves and expand outwards, and second, whether we might fail to ask all the interesting questions unless we combat our tendencies to favour taxa that are close to us. As a whole, the human tendency to understand general biological phenomena via ‘putting oneself in another organism’s shoes’ has upsides and downsides. As an upside, our intuitive ability to rethink strategies if the situation changes can lead to ready generation of adaptive hypotheses. Downsides occur if we trust this intuition too much, and particular danger zones exist for traits where humans are an unusual species. I argue that the levels of selection debate might have proceeded differently if human cooperation patterns were not so unique, as this brings about unique challenges in biology teaching; and that theoretical insights regarding inbreeding avoidance versus tolerance could have spread faster if we were not extrapolating our emotional reactions to incest disproportionately depending on whether we study animals or plants. I also discuss patterns such as taxonomic chauvinism, i.e. less attention being paid to species that differ more from human-like life histories. Textbooks on evolution reinforce such biases insofar as they present, as a default case, systems that resemble ours in terms of life cycles and other features (e.g. gono- chorism). Additionally, societal norms may have led to incorrect null hypotheses such as females not mating multiply.|000|human-centrism, cultural evolution, adaptation, interpretation, taxonomic bias 4289|Kokko2018|If we discuss a taxonomic bias, i.e., the wrong selection of a certain number of species to study evolution, this is an interesting article to be quoted, as it points to the problem of biased selection. In the context of archaic languages, this may also be interesting to be mentioned.|000|taxonomic bias, nice paper 4290|McLeod2018| Purpose The aim of this study was to provide a cross-linguistic review of acquisition of consonant phonemes to inform speech-language pathologists' expectations of children's developmental capacity by (a) identifying characteristics of studies of consonant acquisition, (b) describing general principles of consonant acquisition, and (c) providing case studies for English, Japanese, Korean, and Spanish. Method A cross-linguistic review was undertaken of 60 articles describing 64 studies of consonant acquisition by 26,007 children from 31 countries in 27 languages: Afrikaans, Arabic, Cantonese, Danish, Dutch, English, French, German, Greek, Haitian Creole, Hebrew, Hungarian, Icelandic, Italian, Jamaican Creole, Japanese, Korean, Malay, Maltese, Mandarin (Putonghua), Portuguese, Setswana (Tswana), Slovenian, Spanish, Swahili, Turkish, and Xhosa. Results Most studies were cross-sectional and examined single word production. Combining data from 27 languages, most of the world's consonants were acquired by 5;0 years;months old. By 5;0, children produced at least 93% of consonants correctly. Plosives, nasals, and nonpulmonic consonants (e.g., clicks) were acquired earlier than trills, flaps, fricatives, and affricates. Most labial, pharyngeal, and posterior lingual consonants were acquired earlier than consonants with anterior tongue placement. However, there was an interaction between place and manner where plosives and nasals produced with anterior tongue placement were acquired earlier than anterior trills, fricatives, and affricates. Conclusions Children across the world acquire consonants at a young age. Five-year-old children have acquired most consonants within their ambient language; however, individual variability should be considered. |000|age of acquisition, consonants, phoneme inventory, dataset 4291|Henrich2003|Humans are unique in their range of environments and in the nature and diversity of their behavioral adaptations. While a variety of local genetic adaptations exist within our species, it seems certain that the same basic genetic endowment produces arctic foraging, tropical horticulture, and desert pastoralism, a constellation that represents a greater range of subsistence behavior than the rest of the Primate Order combined. The behavioral adaptations that explain the immense success of our species are cultural in the sense that they are transmitted among individuals by social learning and have accumulated over generations. Understanding how and when such culturally evolved adaptations arise requires understanding of both the evolution of the psycho- logical mechanisms that underlie human social learning and the evolutionary (popu- lation) dynamics of cultural systems.|000|social learning, cultural evolution, overview 4292|Montavon2018|This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. As a tutorial paper, the set of methods covered here is not exhaustive, but sufficiently representative to discuss a number of questions in interpretability, technical challenges, and possible applications. The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which we provide theory, recommendations, and tricks, to make most efficient use of it on real data.|000|blackbox methods, neural network, interpretation 4293|Moorkens2018|Book deals with the assessment of translation quality, which is a generally interesting topic.|000|machine translation, translation, translation quality assessment 4294|Reali2010|Scientists studying how languages change over time often make an analogy between biological and cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent work has exploited this analogy by using models of biological evolution to explain the properties of languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are very different: biological traits are passed between generations by genes, while languages and concepts are transmitted through learning. Here we show that these different mechanisms can have the same results, demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus provides a justification for the use of models of genetic drift in studying language evolution. In addition to providing an explicit connection between biological and cultural evolution, this allows us to define a ‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of language change, the distribution of word frequencies, and the relationship between word frequencies and extinction rates.|000|biological parallels, analogy, language evolution, simulation studies, s-curve 4295|Lapuschkin2016|Fisher vector (FV) classifiers and Deep Neural Networks (DNNs) are popular and successful algorithms for solving image classification problems. However, both are generally considered ‘black box’ predictors as the non-linear trans- formations involved have so far prevented transparent and interpretable reasoning. Recently, a principled technique, Layer-wise Relevance Propagation (LRP), has been devel- oped in order to better comprehend the inherent structured reasoning of complex nonlinear classification models such as Bag of Feature models or DNNs. In this paper we (1) extend the LRP framework also for Fisher vector classifiers and then use it as analysis tool to (2) quantify the impor- tance of context for classification, (3) qualitatively compare DNNs against FV classifiers in terms of important image re- gions and (4) detect potential flaws and biases in data. All experiments are performed on the PASCAL VOC 2007 and ILSVRC 2012 data sets.|000|blackbox methods, heatmap, neural network, evaluation 4296|Lapuschkin2016|Standard paper introducing the heatmap method that shows the bias with the horses where the copyright mark was not removed from the images.|000|heatmap, neural network, evaluation, blackbox methods 4297|Wilkins2014|Charles Darwin, while trying to devise a general theory of heredity from the observations of animal and plant breeders, discovered that domesticated mammals possess a distinctive and unusual suite of heritable traits not seen in their wild progenitors. Some of these traits also appear in domesticated birds and fish. The origin of Darwin’s “domestication syndrome” has remained a conundrum for more than 140 years. Most explanations focus on particular traits, while neglecting others, or on the possible selective factors involved in domestication rather than the underlying developmental and genetic causes of these traits. Here, we propose that the domestication syndrome results predominantly from mild neural crest cell deficits during embryonic development. Most of the modified traits, both morphological and physiological, can be readily explained as direct consequences of such deficiencies, while other traits are explicable as indirect consequences. We first show how the hypothesis can account for the multiple, apparently unrelated traits of the syndrome and then explore its genetic dimensions and predictions, reviewing the available genetic evidence. The article concludes with a brief discussion of some genetic and developmental questions raised by the idea, along with specific predictions and experimental tests.|000|domestication, domestication syndrom, 4298|VanHoey2019|From a typological perspective, Chinese meteorological expressions are argument- oriented. However, using a lexical semantic approach, based on corpus data as well as dictionaries and Chinese WordNet, a taxonomical lexical field can be established to further analyze the basic level items. Five main clusters of meteorological expres- sions are identified: precipitation, wind, thunder, sunshine and cloud. A comparison of these clusters with frames derived from the English FrameNet shows that Chinese has a narrower conception of weather phenomena than English. There is significant influence from the script on the linguistic system, at least in relation to meteorological expressions. It is shown that Chinese uses iconicity in its writing system where it is lacking in its phonology. A special case study are weather-related ideophones, where two strata are found: those that are phonologically and semantically motivated and re- ceive iconic support from the writing system vs. those that do not receive this support.|000|weather terms, Mandarin, iconicity, examples, qualitative data 4299|Shaw2018|Many papers in this special issue grew out of the talks given at the Symposium “The role of pre- dictability in shaping human language sound patterns,” held at Western Sydney University (Dec. 10–11, 2016). Some papers were submitted in response to an open call; others were invited contributions. This introduction aims to contextualize the papers in the special issue within a broader theoretical context, focusing on what it means for phonological theory to incorporate gradient predictability, what questions arise as a consequence, and how the papers in this issue address these questions.|000|predictability, phonology, introduction 4300|Shaw2018|Predictability has always been central to understanding sound patterns in human language. The modern theoretical landscape features two kinds of predictability: (1) the general notion of probability, which we will refer to as gradient predictability, and (2) the theory-specific notion of predictability that is dichotomous, as developed in early generative phonology.|1|predictability, phonology, phonological theory, sound patterns 4301|Kapust2018|In prokaryotes, known mechanisms of lateral gene transfer (transformation, transduction, conjugation, and gene transfer agents) generate new combinations of genes among chromosomes during evolution. In eukaryotes, whose host lineage is descended from archaea, lateral gene transfer from organelles to the nucleus occurs at endosymbiotic events. Recent genome analyses studying gene distributions have uncovered evidence for sporadic, discontinuous events of gene transfer from bacteria to archaea during evolution. Other studies have used traditional models designed to investigate gene family size evolution (Count) to support claims that gene transfer to archaea was continuous during evolution, rather than involving occasional periodic mass gene influx events. Here, we show that the methodology used in analyses favoring continuous gene transfers to archaea was misapplied in other studies and does not recover known events of single simultaneous origin for many genes followed by differential loss in real data: plastid genomes. Using the same software and the same settings, we reanalyzed presence/absence pattern data for proteins encoded in plastid genomes and for eukaryotic protein families acquired from plastids. Contrary to expectations under a plastid origin model, we found that the methodology employed inferred that gene acquisitions occurred uniformly across the plant tree. Sometimes as many as nine different acquisitions by plastid DNA were inferred for the same protein family. That is, the meth- odology that recovered gradual and continuous lateral gene transfer among lineages for archaea obtains the same result for plastids, even though it is known that massive gains followed by gradual differential loss is the true evolutionary process that generated plastid gene distribution data. Our findings caution against the use of models designed to study gene family size evolution for investigating gene transfer processes, especially when transfers involving more than one gene per event are possible.|000|model misspecification, problem, protein families, lateral gene transfer, methodology 4302|Kapust2018|They describe the program Count (@Csuroes2010), which is used to study the evolution of protein family sizes across species, which seems quite interesting to think about as well in linguistics.|000|protein evolution, protein families, 4303|Csuroes2010|SUMMARY: Count is a software package for the analysis of numerical profiles on a phylogeny. It is primarily designed to deal with profiles derived from the phyletic distribution of homologous gene families, but is suited to study any other integer-valued evolutionary characters. Count performs ancestral reconstruction, and infers family- and lineage-specific characteristics along the evolutionary tree. It implements popular methods employed in gene content analysis such as Dollo and Wagner parsimony, propensity for gene loss, as well as probabilistic methods involving a phylogenetic birth-and-death model. AVAILABILITY: Count is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the web site http://www.iro.umontreal.ca/ approximately csuros/gene_content/count.html. It can also be launched using Java Webstart from the same site. The software is distributed under a BSD-style license. Source code is available upon request from the author. |000|software, protein evolution, protein families 4304|Csuroes2010|Some aspects of genome evolution are best captured by integer quantities. Given a phylogeny with terminal taxa 𝒳, such a quantity forms a numerical profile, which extends the so-called phylogenetic profile of presence–absence (Koonin and Galperin, 2002; Pellegrini et al., 1999) Φ : 𝒳 ↦ {0, 1, 2,…,}. In a typical application, Φ[x] denotes the number of genes in genome x ∈ 𝒳 for a certain homolog gene family: a homolog family comprises all descendants of the same ancestral gene (Fitch, 2000) in evolutionary lineages. Such families are routinely identified by pairwise sequence comparisons, coupled with the clustering of postulated homolog pairs (Alexeyenko et al., 2006; Tatusov et al., 1997). In other interesting examples, Φ[x] might be the size (Caetano-Anollés, 2005) of genome x or a sequence length polymorphism in population x (Witmer et al., 2003).|1910|protein families, gene families, protein evolution, 4305|Orlandi2018|This paper analyses and evaluates the alleged genetic relationship between Sino-Tibetan and Austronesian, proposed by the French sinologist Laurent Sagart. The aim of the following paper is neither to prove, nor to disprove the Sino-Tibeto-Austronesian superphylum but to argue whether the data presented in favour of this proposed genetic relationship do or do not stand the scrutiny of a historical linguist. This paper also considers the hypothetical homeland of Proto-Sino-Tibeto-Austronesian people, with an eye towards competing hy- potheses, such as Sino-Indo-European. It is concluded that Sagart’s approach may be insuffi- cient for proof of controversial cases of disputed genetic relationship, given the non-obvious relatedness of the languages he is comparing.|000|Sino-Tibetan, Sino-Austronesian, review, critics 4306|Martin2017|The origin of mitochondria was a crucial event in eukaryote evolution. A recent report claimed to provide evidence, based on branch length variation in phylogenetic trees, that the mitochondrion came late in eukaryotic evolution. Here, we reinvestigate their claim with a reanalysis of the published data. We show that the analyses underpinning a late mitochondrial origin suffer from multiple fatal flaws founded in inappropriate statistical methods and analyses, in addition to erroneous interpretations.|000|methodology, artefacts, evolutionary biology, mitochondria, eukaryotes 4307|Kaplan2017|Black-boxed in these accounts and decidedly not new, however, are the actual data in question—typically, wordlists of “basic” vocabulary. Words like all, louse, seed, blood, claw, belly, bite, know, sun, yellow, night, new, and round have been selected for their apparent frequency, universality, and resistance to change over time. Defined in opposition to “cultural” vocabulary, this lexicon has been rigorously sampled in an effort to construct “diagnostic” lists, typically some 100 or 200 items in length. While embedded assumptions about historical process and translatability have been challenged since these wordlists were first proposed in the mid-twentieth century, they have become such a staple of routine fieldwork and comparative practice that even their loudest critics can be found endorsing their expediency on record.|204|comparative wordlist, basic vocabulary, history of science 4308|Kaplan2017|With such emphasis on the very givens, it challenges claims that computing technology has caused a radical break in the development of the language sciences. To the con- trary, researchers have consistently gone back to their philological roots in an effort to improve wordlists on the basis of what is already known.|204|nice quote, Swadesh list, basic vocabulary 4309|Kaplan2017|Linguist Lyle Campbell points to a lively tradition, born during the late Renaissance, of compiling “large-scale word collections for language comparisons.” Konrad Gesner (1555), Johan Christoph Adelung (1782, 1806), Lorenzo Hervas y Panduro (1784, 1800), and Peter Simon Pallas (1786) are each known to have made notable contributions in this respect.|206|Renaissance, Swadesh list, concept list, word list, history of science 4310|Kaplan2017|:comment:`After quoting different ancient concept lists proposed by scholars` While it is not my focus here, these examples call for an analysis of the politics of sourcing and reuse, such as the one Joanna @Radin<2017> offers in her contribution to this volume.|-|concept list, Swadesh list, history of science, word list, reuse, sourcing, 4311|Radin2017|This case considers the politics of reuse in the realm of “Big Data.” It focuses on the history of a particular collection of data, extracted and digitized from patient records made in the course of a longitudinal epidemiological study involving Indigenous members of the Gila River Indian Community Reservation in the American South- west. The creation and circulation of the Pima Indian Diabetes Dataset (PIDD) dem- onstrates the value of medical and Indigenous histories to the study of Big Data. By adapting the concept of the “digital native” itself for reuse, I argue that the history of the PIDD reveals how data becomes alienated from persons even as it reproduces complex social realities of the circumstances of its origin. In doing so, this history highlights otherwise obscured matters of ethics and politics that are relevant to communities who identify as Indigenous as well as those who do not.|000|sourcing, reuse, quoting, 4312|Kaplan2017|While proponents of historical lexicostatistics have argued since the 1950s that the very definition of basic wordlists reduces the possibility of noise due to chance and borrowing, critics have repeatedly opened up intuitions about the frequency, universality, and stability of basic vocabu- lary to empirical scrutiny. These critical assessments, examined below, have paradox- ically reinforced the category, contributing to the phenomenon of data drag.|208|Swadesh list, history of science, quantitative turn, basic vocabulary, problem, critics 4313|Kaplan2017|Russell Gray has done just that, sourcing data from Dyen, Kruskal, and Black for some of his most important and controversial papers on prehistorical “phylolinguis- tics.” [@Gray2003] While most of his discussion has focused on the novelty of the computational phylogenetic methods involved, with any luck this essay has shown that the data in use are much older and thoroughly cooked. Decades of controversy with respect [pb] to the key claims that basic vocabulary words are especially frequent, universal, and stable over time have reified basic vocabulary as a tool that can be picked up and ap- plied to new stochastic models of human prehistory. If absolute time depths held promise to reduce disciplinary friction in the 1950s, phylogenetic networks enjoy sim- ilar attention today.|221f|phylogenetic reconstruction, lexicostatistics, dataset, concept list, Swadesh list 4314|Kaplan2017|This essay has demonstrated significant continuity in the history of basic vocabulary. The intuition that basic vocabulary hangs together and is especially “criterial” of ge- netic relationship has been remarkably stable over time. Moreover, the philological foundations on which diagnostic lists of basic vocabulary were first developed go back hundreds of years and have been calibrated over decades of testing and modification. Digitization in recent years has been a fairly straightforward process of direct tran- scription, give or take a diacritic. Far from a radical break in the history of comparative linguistics, the advent of computing can be seen as the next step in a long line of at- tempts to make intuitive linguistic knowledge explicit—assumptions about the con- stancy and homogeneity of the rate of change; the universality of basic terms; and, lately, model parameters and significance criteria.|222|phylogenetic reconstruction, Swadesh list, history of science, data, 4315|Whittaker2018|The system of writing employed by the Aztec Empire and its immediate neighbours in early 16th-century Mesoamerica has a number of characteristics that make it highly unusual in comparative perspective. Among them is the fact that it was almost entirely restricted to the recording of names or, more precisely, of personal and place names, titles and professions. All other areas of information were recorded by means of iconography and a numeral-based notation system. This article will discuss the nature of Aztec names and the manner in which they are recorded in Aztec writing.|000|Aztec writing system, Aztec hieroglyphics, Aztecan, writing systems 4316|Whitney1894|Article discusses sporadic sound change, i.e., lexical diffusion, and should be re-read to understand if he really means that.|000|lexical diffusion, history of science, sound change, sporadic sound change 4317|Nichols2018|An attractor, in complex systems theory, is any state that is more easily or more often entered or acquired than departed or lost; attractor states therefore accumulate more members than non-attractors, other things being equal. In the context of language evolution, linguistic attractors include sounds, forms, and grammatical structures that are prone to be selected when sociolinguistics and language contact make it possible for speakers to choose between competing forms. The reasons why an element is an attractor are linguistic (auditory salience, ease of processing, paradigm structure, etc.), but the factors that make selection possible and propagate selected items through the speech community are non-linguistic. This paper uses the consonants in personal pronouns to show what makes for an attractor and how selection and diffusion work, then presents a survey of several language families and areas showing that the derivational morphology of pairs of verbs like fear and frighten, or Turkish korkmak ‘fear, be afraid’ and korkutmak ‘frighten, scare’, or Finnish istua ‘sit’ and istutta ‘seat (someone)’, or Spanish sentarse ‘sit down’ and sentar ‘seat (someone)’ is susceptible to selection. Specifically, the Turkish and Finnish pattern, where ‘seat’ is derived from ‘sit’ by addition of a suffix—is an attractor and a favored target of selection. This selection occurs chiefly in sociolinguistic contexts of what is defined here as linguistic symbiosis, where languages mingle in speech, which in turn is favored by certain demographic, sociocultural, and environmental factors here termed frontier conditions. Evidence is surveyed from northern Eurasia, the Caucasus, North and Central America, and the Pacific and from both modern and ancient languages to raise the hypothesis that frontier conditions and symbiosis favor causativization.|000|correlational studies, causation, missing data, missing code, 4318|Adams2017|Most languages have no established writ- ing system and minimal written records. However, textual data is essential for nat- ural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fun- damental task of documentary linguistics. We investigate the use of such lexicons to improve language models when tex- tual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolin- gual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploy- ing the approach in real low-resource en- vironments.|000|word embeddings, cross-linguistic study, 4319|Ahnert2015|Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering.|000|protein structure, protein assembly, protein evolution, domain emergence, biological parallels 4320|Albright2007|Phonological judgments are often gradient: blick > ?bwick > *bnick. The theoretical interpre- tation of gradient acceptability remains controversial, however, with some authors maintaining that it is a performance/task effect based on similarity to the lexicon (neighborhood effects), and others attributing it to a probabilistic grammar regulating possible sequences (phonotac- tics). In a study that directly compared the predictions of similarity-based and sequential models against experimental ratings of non-words, Bailey and Hahn (2001) argued that both types of knowledge are needed, though the relative contribution of sequential models was quite small. In this paper, additional phonotactic models are considered, including the widely used positional phonotactic probability model of Vitevitch and Luce (2004), and a model based on phonological features and natural classes. The performance of all models is tested against Bai- ley and Hahn’s data and against data from Albright and Hayes (2003). The results show that probabilistic phonotactic models do not play a minor role; in fact, they may actually account for the bulk of gradient phonological acceptability judgments.|000|phonological theory, pseudo word, grammaticality, phonology, dataset 4321|Apic2001|There is a limited repertoire of domain families that are duplicated and combined in different ways to form the set of proteins in a genome. Pro- teins are gene products, and at the level of genes, duplication, recombina- tion, fusion and ®ssion are the processes that produce new genes. We attempt to gain an overview of these processes by studying the evolutionary units in proteins, domains, in the protein sequences of 40 genomes. The domain and superfamily de®nitions in the Structural Classi®cation of Proteins Database are used, so that we can view all pairs of adjacent domains in genome sequences in terms of their superfamily combinations. We ®nd 783 out of the 859 superfamilies in SCOP in these genomes, and the 783 families occur in 1307 pairwise combinations. Most families are observed in combination with one or two other families, while a few families are very versatile in their combinatorial behaviour; 209 families do not make combinations with other families. This type of pattern can be described as a scale-free network. We also study the N to C-terminal orientation of domain pairs and domain repeats. The phyloge- netic distribution of domain combinations is surveyed, to establish the extent of common and kingdom-speci®c combinations. Of the kingdom- speci®c combinations, signi®cantly more combinations consist of families present in all three kingdoms than of families present in one or two king- doms. Hence, we are led to conclude that recombination between com- mon families, as compared to the invention of new families and recombination among these, has also been a major contribution to the evolution of kingdom-speci®c and species-speci®c functions in organisms in all three kingdoms. Finally, we compare the set of the domain combinations in the genomes to those in the RCSB Protein Data Bank, and discuss the implications for structural genomics|000|domain emergence, protein structure, protein evolution, overview 4322|Baluska2011|Biological evolution represents one of the most successful, but also controversial scientific concepts. Ever since Charles Darwin formulated his version of evolution via natural selection, biological sciences experienced explosive development and progress. First of all, although Darwin could not explain how traits of organisms, selected via natural selection, are inherited and passed down along generations; his theory stimulated research in this respect and resulted in the establishment of genetics and still later in the discovery of DNA and genome sequencing some hundred years after his evolutionary theory. Nevertheless, there are several weaknesses in classical Darwinian as well as Neodarwinian gene-centric views of biological evolution. The most serious drawback is its narrow focus: the modern evolutionary synthesis, as formulated in the 20th Century, is based on the concept of gene and on the mathematical/statistical analysis of populations. While Neodarwinism is still generally considered a valid theory of biological evolution, its narrow focus and incompatibility with several new findings and discoveries calls for its update and/ or transformation. Either it will be replaced with an updated version or, if not flexible enough, it will be replaced by a new theory. In his book “Evolution – A New View from the 21st Century,” 1 James A. Shapiro discusses these problems as well as newly emerging results which are changing our understanding of biological evolution. This new book joins a row of several other recent books highlighting the same issues.|000|review, epigenetics, evolutionary theory, paradigm shift 4323|Castillo2018|Most charring experiments are carried out in the muffle furnace in highly controlled conditions and tackle the taphonomic issues of seed shrinkage and distortion caused by carbonisation. This paper presents the results from charring experiments conducted using real fire conditions. The objective of this study is to reproduce the charring processes that occur naturally and so address the issue of preservation biases which occurred in prehistoric contexts of carbonisation. This is particularly important in addressing the possible overrepresentation of rice over other taxa in the archaeological record. Prior charring experiments focus on Old World crops, but in this study the taxa used are East, South and Southeast Asian cereals and pulses. An ethnographic study conducted in Thailand examining the rice processing stages of dehusking and winnowing is also included since differential preservation may also result from crop processing. Archaeological results from sites in Mainland Southeast Asia are then interpreted using the results of the charring experiments and the ethnographic data.|000|rice cultivation, preservation bias, archaeology, human prehistory, 4324|Flad2007|The history of prehistoric domesticated animal exploitation in the northwestern areas of modern China is complex and involves different processes for each of the various animals that have been documented. This chapter comprehensively summarizes the zooarchaeological evidence for animal domestication in this region and summarizes our current understanding of dog, pig, cattle, water buffalo, yak, sheep/goat, camel, donkey, and horse exploitation.|000|China, domestication, archaeology, 4325|Griffiths2015|Several authors have argued that causes differ in the degree to which they are ‘specific’ to their effects. Woodward has used this idea to enrich his influential interventionist the- ory of causal explanation. Here we propose a way to measure causal specificity using tools from information theory. We show that the specificity of a causal variable is not well de- fined without a probability distribution over the states of that variable. We demonstrate the tractability and interest of our proposed measure by measuring the specificity of coding DNA and other factors in a simple model of the production of mRNA.|000|causal inference, specificity, DNA sequence, coding DNA, 4326|Frandsen2015|Background: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. Results: We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. Conclusions: Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.|000|partitioning, biological parallels, phylogenetic reconstruction, 4327|Frandsen2015|This could be quoted in context of the sound correspondence pattern algorithm, where we also partition the data into blocks that should evolve in the same paste.|000|partitioning, sound correspondences, correspondence patterns, biological parallels 4328|Himmelmann2012|This paper presents a new definition of documentary linguistics, based on a typology of linguistic data types. It clarifies the distinction between raw, primary, and structural data and argues that documentary linguistics is concerned with raw and primary data and their interrelationships, while descriptive linguistics is concerned with the relations between primary and structural data. The fact that primary data are of major concern in both fields reflects the fact that the two fields are very closely interlinked and difficult to separate in actual practice. The details of their interaction in actual practice, however, are still a matter for further discussion and investigation, as the second main part of the paper attempts to make clear.|000|structural data, primary data, datatypes, linguistic data, introduction, overview, 4329|Hudson2018|Language is thought to be a crucial element behind Pleistocene expansions of Homo sapiens but our understanding of language change over the very long term is still poor. There have been two main approaches to language dynamics in this context. One assumes a continual ebb and flow of local human populations and languages, leading to high levels of ‘patchiness’ in both genes and languages. Another approach argues that long-term equilibrium leads not to patchiness but to areal diffusion and convergence. Both of these approaches assume equilibrium to be the norm. However, research in ecology since the 1970s has found that ecosystems have multiple potential states rather than a single equilibrium point. Under the name of resilience theory, such thinking is being increasing- ly applied to coupled socio-ecological systems using the concept of the adaptive cycle. This article proposes a model of long-term language change based on the adaptive cycle of resilience theory.|000|resilience theory, resilience, language change, language origin, uniformitarianism, overview 4330|Loebner2004|The principle of compositionality (PC) states that the meaning of a complex expression is a function of the lexical meanings of its components and the syntactic structure of the whole. PC is usually considered necessary for explaining the apparent ability of human language users to interpret arbitrary regular complex expressions efficiently and uniformly. The meaning of a syntactically regular expression derives from the meanings of its components in a regular way. Any given lexical expression is not just of a certain type, but belongs to hierarchies of types in the sense of Carpenter. The possible types form a semi-lattice ordered by the partial ordering relation of subsumption. The most specific, or minimal, type consists of solely an individual lexical expression. The most general type comprises all lexical expressions indiscriminately.|000|compositionality, overview, semantics, 4331|Pietrowski2004|Many concepts, which can be constituents of thoughts, are somehow indicated with words that can be constituents of sentences. But this assumption is compatible with many hypotheses about the concepts lexicalized, linguistic meanings, and the relevant forms of composition. The lexical items simply label the concepts they lexicalize, and that composition of lexical meanings mirrors composition of the labeled concepts, which exhibit diverse adicities. If a phrase must be understood as an instruction to conjoin monadic concepts that correspond to the constituents, lexicalization must be a process in which non-monadic concepts are used to introduce monadic analogues. But given such analogues, along with some thematic concepts, conjunctions can mimic the effect of saturating polyadic concepts. The lexical items efface conceptual adicity distinctions, making it possible to treat a recursive combination of expressions as a sign of monadic predicate conjunction.|000|semantic monadicity, semantic theory, compositionality, overview 4332|Ruder2017|Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.|000|cross-linguistic study, word embeddings, semantics, 4333|Nature2018|At Nature, we recognize that our peer reviewers have certain ‘rights’. One of the most well known is the right to anonymity. Less widely known is that referees have the right to view the data and code that underlie a work if it would help in the evaluation, even if these have not been provided with the submission. Yet few referees exercise this right. They should do so.|000|missing data, missing code, review practice, review, referee, open research 4334|Sagart2018|Background: Genetic data for traditional Taiwanese (Formosan) agriculture is essential for tracing the origins on the East Asian mainland of the Austronesian language family, whose homeland is generally placed in Taiwan. Three main models for the origins of the Taiwanese Neolithic have been proposed: origins in coastal north China (Shandong); in coastal central China (Yangtze Valley), and in coastal south China. A combination of linguistic and agricultural evidence helps resolve this controversial issue. Results: We report on botanically informed linguistic fieldwork of the agricultural vocabulary of Formosan aborigines, which converges with earlier findings in archaeology, genetics and historical linguistics to assign a lesser role for rice than was earlier thought, and a more important one for the millets. We next present the results of an investigation of domestication genes in a collection of traditional rice landraces maintained by the Formosan aborigines over a hundred years ago. The genes controlling awn length, shattering, caryopsis color, plant and panicle shapes contain the same mutated sequences as modern rice varieties everywhere else in the world, arguing against an independent domestication in south China or Taiwan. Early and traditional Formosan agriculture was based on foxtail millet, broomcorn millet and rice. We trace this suite of cereals to northeastern China in the period 6000–5000 BCE and argue, following earlier proposals, that the precursors of the Austronesians, expanded south along the coast from Shandong after c. 5000 BCE to reach northwest Taiwan in the second half of the 4th millennium BCE. This expansion introduced to Taiwan a mixed farming, fishing and intertidal foraging subsistence strategy; domesticated foxtail millet, broomcorn millet and japonica rice; a belief in the sacredness of foxtail millet; ritual ablation of the upper incisors in adolescents of both sexes; domesticated dogs; and a technological package including inter alia houses, nautical technology, and loom weaving. Conclusion: We suggest that the pre-Austronesians expanded south along the coast from that region after c. 5000 BCE to reach northwest Taiwan in the second half of the 4th millennium BCE.|000|rice cultivation, Chinese, Sino-Austronesian, archaeology, linguistic palaeography 4335|Tordai2005|Originally the term ‘protein module’ was coined to distinguish mobile domains that frequently occur as building blocks of diverse multidomain proteins from ‘static’ domains that usually exist only as stand-alone units of single-domain proteins. Despite the widespread use of the term ‘mobile domain’, the distinction between static and mobile domains is rather vague as it is not easy to quantify the mobility of domains. In the present work we show that the most appropriate measure of the mobility of domains is the number of types of local environments in which a given domain is pre- sent. Ranking of domains with respect to this parameter in different evo- lutionary lineages highlighted marked differences in the propensity of domains to form multidomain proteins. Our analyses have also shown that there is a correlation between domain size and domain mobility: smaller domains are more likely to be used in the construction of multidomain pro- teins, whereas larger domains are more likely to be static, stand-alone domains. It is also shown that shuffling of a limited set of modules was facilitated by intronic recombination in the metazoan lineage and this has contributed significantly to the emergence of novel complex multidomain proteins, novel functions and increased organismic complexity of metazoa.|000|protein structure, protein evolution, overview, domains, domain emergence 4336|Moravec2018|Where a newly-married couple lives, termed post-marital residence, varies cross-culturally and changes over time. While many factors have been proposed as drivers of this change, among them general features of human societies like warfare, migration and gendered division of subsistence labour, little is known about whether changes in residence patterns exhibit global regularities. Here, we study ethnographic observations of post- marital residence in societies from five large language families (Austronesian, Bantu, Indo-European, Pama- Nyungan and Uto-Aztecan), encompassing 371 ethnolinguistic groups ranging widely in local ecologies and lifeways, and covering over half the world's population and geographical area. We apply Bayesian comparative methods to test the hypothesis that post-marital residence patterns have evolved in similar ways across different geographical regions. By reconstructing past post-marital residence states, we compare transition rates and models of evolution across groups, while integrating the historical descent relationships of human societies. We find that each language family possesses its own best fitting model, demonstrating that the mode and pace of post-marital residence evolution is lineage-specific rather than global.|000|character mapping, marriage patterns, cultural evolution, correlational studies 4337|Moravec2018|Maybe interesting for some other contexts, as they describe a method how it can be tested whether patterns of change differ across lineages. |000|character mapping, marriage patterns, cultural evolution, methodology 4338|Ku2015|Chloroplasts arose from cyanobacteria, mitochondria arose from proteobacteria. Both organelles have conserved their prokaryotic biochemistry, but their genomes are reduced, and most organelle proteins are encoded in the nucleus. Endosymbiotic theory posits that bacterial genes in eukaryotic genomes entered the eukaryotic lineage via organelle ancestors. It predicts episodic influx of prokaryotic genes into the eukaryotic lineage, with acquisition corresponding to endosymbiotic events. Eukaryotic genome sequences, however, increasingly implicate lateral gene transfer, both from prokaryotes to eukaryotes and among eukaryotes, as a source of gene content variation in eukaryotic genomes, which predicts continuous, lineage-specific acquisition of prokaryotic genes in divergent eukaryotic groups. Here we discriminate between these two alternatives by clustering and phylogenetic analysis of eukaryotic gene families having prokaryotic homologues. Our results indicate (1) that gene transfer from bacteria to eukaryotes is episodic, as revealed by gene distributions, and coincides with major evolutionary transitions at the origin of chloroplasts and mitochondria; (2) that gene inheritance in eukaryotes is vertical, as revealed by extensive topological comparison, sparse gene distributions stemming from differential loss; and (3) that continuous, lineage-specific lateral gene transfer, although it sometimes occurs, does not contribute to long-term gene content evolution in eukaryotic genomes.|000|lateral gene transfer, prokaryotic evolution, eukaryotic evolution, methodology 4339|Preminger2017|This paper argues that the filtration-based approach to syntactic competence adopted in the context of minimalist syntax (Chomsky 1995, 2000, 2001), where freely-assembled syntactic outputs are filtered at the interfaces with the sensorimotor (SM) and conceptual-intentional (C-I) systems, is empirically wrong. The solution, I argue, is a return to a non-generation alternative, of the kind put forth in Syntactic Structures (Chomsky 1957).|000|Chomsky syntax, critics, language model, 4340|Grossmann2018|This article shows that a hitherto unattested construction type – namely, adverbial subordinator prefixes – is in fact attested in several languages. While Dryer’s 659-language convenience sample does not turn up any clear example of such a construction, we argue that this is in part due to arbitrary coding choices that a priori exclude potential constructions of this type. In order to document the existence of adverbial subordinator prefixes, we present a number of languages with different genealogical and areal affiliations, each of which shows solid synchronic evidence for what appears to be a universally dispreferred feature. Furthermore, we identify some diachronic pathways through which adverbial subordinator prefixes grammaticalize.|000|universals, grammatical coding, database, cross-linguistic study, language typology 4341|Wiens2006|Concerns about the deleterious effects of missing data may often determine which characters and taxa are included in phylogenetic analyses. For example, researchers may exclude taxa lacking data for some genes or exclude a gene lacking data in some taxa. Yet, there may be very little evidence to support these decisions. In this paper, I review the effects of missing data on phylogenetic analyses. Recent simulations suggest that highly incomplete taxa can be accurately placed in phylogenies, as long as many characters have been sampled overall. Furthermore, adding incomplete taxa can dramatically improve results in some cases by subdividing misleading long branches. Adding characters with missing data can also improve accuracy, although there is a risk of long-branch attraction in some cases. Consideration of how missing data does (or does not) affect phylogenetic analyses may allow researchers to design studies that can reconstruct large phylogenies quickly, economically, and accurately.|000|missing data, phylogenetic reconstruction, 4342|Wiens2011|This paper will attempt to resolve some controversies about the effects of missing data on phylogenetic analysis. Whether missing data are generally problematic is a critical issue in modern phylogenetics, especially as wildly different amounts of molecular data become available for different taxa, ranging from entire genomes, to single genes, to none (e.g., fossils). Our perception of the impact of missing data (or lack thereof) may strongly influence which taxa and characters we include in a phylogenetic analysis (Wiens 2006) and may lead to a diversity of serious errors. For example, if we think that missing data are problematic when they are not, then we may exclude taxa and characters that would otherwise benefit our analyses, given the abundant evidence that increasing numbers of both taxa and characters can potentially improve the accuracy of phylogenetic analyses (e.g., Huelsenbeck 1995; Rannala et al. 1998; Poe 2003), where accuracy is generally defined as the similarity between the estimated tree and the correct, known phylogeny. In contrast, if missing data cells are themselves intrinsically problematic (e.g., Huelsenbeck 1991), including taxa or characters with many missing data cells may lead to inaccurate phylogenetic estimates.|000|missing data, phylogenetic reconstruction 4343|Driem2018|This invited response to a piece by @LaPolla<2016a>, published in issue 39/2 of LT BA , addresses both LaPolla’s misrepresentations of the history of linguistics and his flawed understanding of historical linguistics. The history of linguistic thought with regard to the Tibeto-Burman or Trans-Himalayan language family vs. the Indo-Chinese or “Sino-Tibetan” family tree model is elucidated and juxtaposed against the remarkable robustness of certain ahistorical myths and the persis- tence of unscientific argumentation by vocal proponents of the Sino-Tibetanist paradigm, such as LaPolla.|000|Sino-Tibetan, Trans-Himalayan, subgrouping, methodology, debate, critics 4344|LaPolla2016a|There have been challenges to the received view of the structure of the Sino- Tibetan language family. This is all well and good, as we should constantly chal- lenge our most basic assumptions. In this paper I look at the arguments present- ed with a view to convincing us to change our conception of Sino-Tibetan and to change the name of the family to “Trans-Himalayan”, and find them less than convincing, due to problems of fact and argumentation.|000|methodology, Sino-Tibetan, history of science, subgrouping, debate 4345|VanGijn2017|Article discusses the spreading of shared features (structural data) among South American languages in comparison with distances derived from the river systems. So instead of using bird's distances to calculate geographic distances, they compare the role that rivers play and also identify ecological regions in South America which reflect the zones where people's life style is similar to to the way they produce food and the environment, and the like.|000|geographic distance, South American languages, structural data, river systems, 4346|Blevins2016|The modern revival of the word and paradigm model dates, for all intents and purposes, from the publication of Hockett’s Two models of grammatical description in 1954. This revival is something of an unintended consequence, given that Hockett’s study is mainly an extended comparison of two variants of morphemic analysis and, in many ways, represents the high-water mark of the morpheme- based tradition. Bloomfield (1933) had earlier provided the foundation for models of morphemic analysis by decomposing words into minimal units of lexical form (morphemes) and minimal features of ‘arrangement’ (taxemes). But Bloomfield’s proposals seemed programmatic and obscure in many respects, and it fell to his successors to develop his approach into a general model of morphemic analysis. The most influential line of development led to what @Hockett<1954> (1954) termed the ‘item and arrangement’ (IA) model. By reducing Bloomfield’s diverse features of arrangement to features of ‘order’ and ‘selection’, Harris (1942) and @Hockett<1947> (1947) arrived at a simple model in which word structure could be represented by linear sequences of morphemes. The remaining features of arrangement were relegated to other levels of analysis, notably to a ‘morphophonemic’ level that mediated between sequences of morphemes and surface forms (consisting of sequences of phonemes).|3|morphology, morphological theory, modeling morphology, 4347|Blevins2016|The basic hierarchy, in which phrases consist of arrangements of words, and words consist of arrangements of morphemes, had again been set out earlier, in @Bloomfield<1926> (1926: §III), but it was only later that Bloomfield’s schematic postulates were elaborated into a uniform system of levels based on part-whole relations.|3|hierarchies, structure of language, double articulation 4348|Blevins2016|As noted with particular clarity by Lounsbury (1953), the attempt to impose an agglutinative analysis on the forms of a fusional (or ‘flectional’) language often led to a type of indeterminacy that the model could not resolve:|4|item and arrangement model, agglutinative languages, morphology, morphological analysis 4349|Lounsbury1953|In a fusional language, if one seeks to arrive at constant segments [...] conflicts arise in the placing of the cuts. One comparison of forms suggests one placement, while another comparison suggests another. Often, in fact, no constant segment can be isolated at all which corresponds to a given constant meaning. Situations of this kind often permit of more than one solution according to different manners of selecting and grouping environments. :comment:`Quoted after ` @Blevins2016 :comment:`(p 4)`|172|word segmentation, morpheme segmentation, morphological theory, methodology 4350|Blevins2016|The model that Matthews proposes is strikingly simple in its basic conception. The paradigm of an item is represented by sets of properties (what in other traditions are termed ‘features’), each corresponding to a cell of the paradigm. The lexical entry of the item specifies a root or stem form on which the forms of the paradigm are based. The realization rules of a grammar ‘interpret’ properties by applying operations to a form. A set of such rules realize or ‘spell out’ the inflected surface form that is associated with a paradigm cell of an item by interpreting the properties of the cell and successively modifying the base form of the item.|7|word and paradigm model, morphology, morphological analysis 4351|Blevins2016|The models that instantiate IA frameworks are typically ‘atomistic’ in that they operate with segmentally minimal units. However, these models could just as well concatenate larger units, such as stems, or even sequences of affixes that frequently collocate and pattern like units. The defining property is that word structure (and word meaning) arises through the concatenation of units.|11|word meaning, morphological model, morphological framework, word paradigm morphology 4352|Blevins2016|After acknowledging that these ‘items’ were really processes masquerading as forms, Hockett proceeded to outline a uniformly process-based alternative. Hockett termed this model the ‘item and process’ (IP) model, recognizing an intellectual [pb] debt to the process-based perspective of Sapir (1921), which he, like other Post- Bloomfieldians, had previously regarded with considerable suspicion.|4f|morphological model, item and process model, 4353|Blevins2016|The central innovation in the IP framework is the introduction of processes that apply operations other than concatenation to a base. Unlike realization rules, processes do not ‘spell out’ previously specified features. Rather they are feature- modifying or ‘incremental’ in the sense of Stump (2001). Just as the concatenation of a morpheme to a base yields a unit that augments the features and form of the base, the application of a process may alter the features and modify the form associated with an input to which it applies. However, models that instantiate IP frameworks are again free to specify different types of inputs (roots, stems or other units), along with different formal operations.|11|morphological model, morphological framework, item and process model 4354|Blevins2016|Recognizing WP models as specific instantiations of a more general implica- tional framework helps to clarify how item and pattern approaches can be applied to languages in which words and paradigms play a less significant role. Con- sider first isolating patterns and derivational formations. In an isolating language, standard inflectional paradigms will not guide deductions about novel forms. Yet other sets of word forms may still establish patterns that are of predictive value. ‘Morphological families’ consisting of sets of derivationally-related forms exhibit their own patterns of interpredictability, which in some cases are as reliable as the expectations generated by inflectional paradigms.  Both in size and composition, these ‘families’ show far more item-specific variation than inflectional paradigms. Whereas inflectional paradigms are broadly uniform within a given word or inflection class, families of derivational formations can vary in size by orders of magnitude. The ‘lexical neighbours’ of an item may also be of deductive value, as may be other word classes. The idea that an inflectional paradigm is the extreme case of a predictive pattern is implicit in the way that the notion ‘paradigm’ is extended to broader classes of related forms in Robins (1959: 121) and Moscoso del Prado Martín (2003). |12|word and paradigm model, word family, morphological model 4355|Blevins2016|Very good book introducing morphological problems and probably amenable for machine-readable representations.|000|morphological theory, morphological model, word and paradigm model, introduction, analogy 4356|Kazartsev2018|This article is concerned with conceptualizing the stress patterns of poetic texts in languages which developed metrical and, most of all, syllabo-tonic versification. This entails accounting for the actually-felt stresses within a verse, as well as formulating the systematic rules for describing the rhythm-engendering metrical units that structure poetic speech. These rules aim towards a definite universality and can be applied to the comparative study of poetic rhythm in texts from different languages.|000|poetry, linguistic annotation, stress patterns, rhythmic pattern, 4357|Kazartsev2018|Very interesting article, presenting a week-strong differentiation for certain stress patterns. This should be included in the annotation framework for poetry.|000|stress patterns, rhyme patterns,rhythmic pattern, poetry, linguistic annotation 4358|Alshehri2018|This paper presents a cross-linguistic investigation of a constraint on the use on intrinsic frames of reference proposed by Levelt (1984, 1996). This proposed constraint claims that use of intrinsic frames when the ground object is in non-canonical position is blocked due to conflict with gravitational-based reference frames. Regression models of the data from Arabic, K’iche’, Spanish, Yucatec, and Zapotec suggest that this constraint is valid across languages. However, the strength at which the constraint operates is predicted by the frequency of canonical intrinsic frames in the particular language. The ratio of the incidence of intrinsic usage with canonical vs. non-canonical orientation appears to be remarkably uniform across languages, which suggests the possibility of a strong cognitive universal.|000|orientation, cross-linguistic study, canonical orientation, 4359|Round2015|n this chapter, I wish to draw attention to a distinction between three species of morphome. A morphome (Aronoff 1994) is a category which figures prominently in the organization of a language’s morphological system, yet in its most intricate man- ifestations is anisomorphic with all syntactic, semantic and phonological categories that are active elsewhere in the grammar. Research into morphomes has intensified in recent years and it is possible now to formulate a more nuanced theory of this object of study. To that end, a distinction can be drawn between what I propose to term rhizomorphomes, meromorphomes, and metamorphomes. All three are equally morphomic categories, but of different kinds. A summary appears in Table 3.1.|29|morphology, terminology, introduction, syntax 4360|Round2015|Text heavily syntax loaded, not directly formalizable for computer applications, but maybe interesting because of the terms introduced before.|000|morphology, terminology, introduction 4361|Brewis2018|Community sanitation interventions increasingly leverage presumed innate human disgust emotions and desire for social acceptance to change hygiene norms. While often effective at reducing open defecation and en- couraging handwashing, there are growing indications from ethnographic studies that this strategy might create collateral damage, such as reinforcing stigmatized identities in ways that can drive social or economic mar- ginalization. To test fundamental ethnographic propositions regarding the connections between hygiene norm violations and stigmatized social identities, we conducted 267 interviews in four distinct global sites (in Guatemala, Fiji, New Zealand, USA) between May 2015 and March 2016. Based on 148 initial codes applied to 23,278 interview segments, text-based analyses show that stigmatizing labels and other indices of contempt readily and immediately attach to imagined hygiene violators in these diverse social settings. Moral concerns are much more salient at all sites than disease/contagion ones, and hygiene violators are extended little empathy. Contrary to statistical predictions, however, non-empathetic moral reactions to women hygiene violators are no harsher than those of male violators. This improved evidentiary base illuminates why disgust- and shame-based sanitation interventions can so easily create unintended social damage: hygiene norm violations and stigma- tizing social devaluations are consistently cognitively connected.|000|hygiene norms, stigmatization, community norms, sociology 4362|Blevins2006|This paper examines two contrasting perspectives on morphological analysis, and considers inflectional patterns that bear on the choice between these alternatives. On what is termed an ABSTRACTIVE perspective, surface word forms are regarded as basic morphotactic units of a grammatical system, with roots, stems and exponents treated as abstractions over a lexicon of word forms. This traditional standpoint is contrasted with the more CONSTRUCTIVE perspective of post-Bloomfieldian models, in which surface word forms are 'built' from sub-word units. Part of the interest of this con- trast is that it cuts across conventional divisions of morphological models. Thus, realization-based models are morphosyntactically 'word-based' in the sense that they regard words as the minimal meaningful units of a grammatical system. Yet mor- photactically, these models tend to adopt a constructive 'root-based' or 'stem-based' perspective. An examination of some form-class patterns in Saami, Estonian and Georgian highlights advantages of an abstractive model, and suggests that these ad- vantages derive from the fact that sets of words often predict other word forms and determine a morphotactic analysis of their parts, whereas sets of sub-word units are of limited predictive value and typically do not provide enough information to re- cover word forms.|000|morphological analysis, morphological theory, word and paradigm model, 4363|Blevins2006|The descriptive challenges raised by inflection classes, lexical classes and morphomic stems illustrate different facets of a single phenomenon. In each case, an analysis that takes a larger form as the basis for abstracting smaller forms avoids difficulties that arise if the smaller forms are taken as the basis for deriving the larger forms. Patterns of this sort lend a strong measure of support to Anderson's (1992: 369) suggestion that a word-based model which 'regards the grammar as a set of relations among full surface forms ... may merit more consideration than it has sometimes received'. The present paper presents a sustained argument for this position.|533|word and paradigm model, morphological analysis, 4364|Majid2018|Is there a universal hierarchy of the senses, such that some senses (e.g., vision) are more accessible to consciousness and linguistic description than others (e.g., smell)? The long-standing presumption in Western thought has been that vision and audition are more objective than the other senses, serving as the basis of knowledge and understanding, whereas touch, taste, and smell are crude and of little value. This predicts that humans ought to be better at communicating about sight and hearing than the other senses, and decades of work based on English and related languages certainly suggests this is true. However, how well does this reflect the diversity of languages and communities worldwide? To test whether there is a universal hierarchy of the senses, stimuli from the five basic senses were used to elicit descriptions in 20 diverse languages, including 3 unrelated sign languages. We found that languages differ fundamentally in which sensory domains they linguistically code systematically, and how they do so. The tendency for better coding in some domains can be explained in part by cultural preoccupations. Although languages seem free to elaborate specific sensory domains, some general tendencies emerge: for example, with some exceptions, smell is poorly coded. The surprise is that, despite the gradual phylogenetic accumulation of the senses, and the imbalances in the neural tissue dedicated to them, no single hierarchy of the senses imposes itself upon language.|000|perception, differential coding, cross-linguistic study, 4365|Shalal2018|Difficulties in classifying words as to their morphological source cause us to ques- tion whether such a classification should be implemented through a linear morphemic or a whole-word approach. The present paper presents an analysis of which of these approaches could be the most viable account for cases in which the derivative form reflects the follow- ing: 1) multiplicity of potential bases; 2) semantic / orthographic match with the base; and 3) heterogeneity of form / meaning correspondence. The morphemic approach seems accept- able when morphemes are organised in a linear arrangement, such as демократ /d j im 5" krat/ ‘democrat (m.)’ > демократка /d j im 5" kratk @ / ‘democrat (f.)’, etc. This facilitates identify- ing the base form, which is демократ. However, this approach cannot be generalised over other formations that show a mismatch of form and meaning between the derivative elements and their bases as found with, e.g., белый / " b j elıj/ ‘white’ > белка / " b j elk @ / ‘squirrel’. Hence, we argue that the word-based approach is possibly better utilised in this cases.|000|morphological analysis, word-based approach, morpheme-based approach, syntagmatic processes 4366|Shalal2018|Potentially contains interesting examples and further clarification of what we call paradigmatic vs. syntagmatic morphology.|000|syntagmatic processes, morphological analysis, 4367|Thieberger2017|I am pleased to offer this paper in tribute to Luise Hercus who has always been quick to adopt new approaches to working with older sources on Australia’s Indigenous languages (see also Nathan, this volume). In that spirit, I offer an example of using a novel method of working with a large set of material created by Daisy Bates (1859-1951) in the early 1900s. The masses of papers she produced over her lifetime have been an ongoing source of information for Aboriginal people and for researchers (e.g. White 1985; McGregor 2012; Bindon & Chadwick 1992). The collection at the National Library of Australia (NLA) takes up 51 boxes and 8.16m of shelf space and contains a range of material, but here I will focus on the vocabularies of Australian languages. Bates sent out a questionnaire in 1904 that was filled in by various people by hand, creating a set of manuscript pages. She then supervised the typing of these manuscripts. Over the past two years I have been working with the NLA to make digital images of some 23,000 pages of these vocabulary manuscripts, and to create digital text versions of the 4,368 typescripts, which can then be linked back to the page images of both the typescripts and handwritten questionnaire manuscripts.|000|Daisy Bates, biography, introduction, Australian languages 4368|Hayashi2018|Akha-Chicho (Burmish) wordlist presented in IPA, contains a couple of hundred entries.|000|wordlist, dataset, Sino-Tibetan, Burmish languages 4369|Felsenstein2001|Statistical inference of phylogenies almost didn’t happen. The story of the origin, growth, and spread of “statistical phyloge- netics” needs to be told, because it is so strange. It is not the straightforward story of gradual spread that one might imagine.|000|phylogenetic reconstruction, likelihood, history of science, cladistics 4370|Bjoern2017|Typical example of regular etymologial annotation that destroys data, as it lists the information in prose.|000|linguistic annotation, lexical borrowing, Indo-European, dataset 4371|Falush2007|Dominant markers such as amplified fragment length polymorphisms (AFLPs) provide an economical way of surveying variation at many loci. However, the uncertainty about the underlying genotypes presents a problem for statistical analysis. Similarly, the presence of null alleles and the limitations of genotype calling in polyploids mean that many conventional analysis methods are invalid for many organisms. Here we present a simple approach for accounting for genotypic ambiguity in studies of population structure and apply it to AFLP data from whitefish. The approach is implemented in the program STRUCTURE version 2.2, which is available from http://pritch.bsd.uchicago.edu/structure.html.|000|structure software, software, null alleles, dominant markers, population genetics 4372|Kunze2017|n this paper we present a draft vocabulary for making “persistence statements.” These are simple tools for pragmatically addressing the concern that anyone feels upon experiencing a broken web link. Scholars increasingly use scientific and cultural assets in digital form, but choosing which among many objects to cite for the long term can be difficult. There are few well-defined terms to describe the various kinds and qualities of persistence that object reposi- tories and identifier resolvers do or don’t provide. Given an object’s identifier, one should be able to query a provider to retrieve human- and machine-readable information to help judge the level of service to expect and help gauge whether the identifier is durable enough, as a sort of long-term bet, to include in a citation. The vocabulary should enable providers to articulate persistence policies and set user expectations.|000|citation, best practice, 4373|Nathan1992|This paper discusses keyword finderlists for bilingual dictionaries, focussing on their structure and a system for computer-generating them. We examine the concept of finderlist and discuss techniques for specifying finderlist headwords. We develop a typology of finderlist format and argue for the merits of a particular type. A description is presented of an automated system which transforms dictionary glosses according to a set of associated rules to produce a keyword finderlist. Finally we address a number of computer-related lexicographical issues, including data representation and strategies for computer-aided lexicography.|000|concept list, dictionary, concept mapping, elicitation 4374|Nathan1992|Paper discusses some strategies for comparing dictionary entries. Should be mentioned if we start comparing automatic glossings in Conepticon for meta data, but is otherwise not very important or interesting.|000|concept mapping, elicitation, dictionary 4375|Hubisz2009|Genetic clustering algorithms require a certain amount of data to produce informative results. In the common situation that individuals are sampled at several locations, we show how sample group information can be used to achieve better results when the amount of data is limited. New models are developed for the STRUCTURE program, both for the cases of admixture and no admixture. These models work by modifying the prior distribution for each individual’s population assignment. The new prior distributions allow the proportion of individuals assigned to a particular cluster to vary by location. The models are tested on simulated data, and illustrated using microsatellite data from the CEPH Human Genome Diversity Panel. We demonstrate that the new models allow structure to be detected at lower levels of divergence, or with less data, than the original STRUCTURE models or principal components methods, and that they are not biased towards detecting structure when it is not present. These models are implemented in a new version of STRUCTURE which is freely available online at http://pritch.bsd.uchicago.edu/structure.html|000|population genetics, population structure, software, structure software 4376|Koryakov2017|Linguists have always seen the notion of “language” as inherently problematic, and the question of whether a particular form of speech should be classified as a separate language or a dialect of a language cannot be answered easily indeed. In this paper, criteria such as shared standard norm, common ethnic identity, and mutual intelligibility are shortly reviewed. Then another method of disentangling languages and dialects based on the number of shared cognates in the 100-word Swadesh list is proposed. The method is tested on various examples.|000|lexicostatistics, mutual intelligibility, dialect classification, language variation 4377|Koryakov2017|Should be quoted in any paper that deals with questions of mutual intelligibility.|000|lexicostatistics, mutual intelligibility, language variation, dialect classification 4378|Fraiberger2018|In areas of human activity where performance is difficult to quantify in an objective fashion, reputation and networks of influence play a key role in determining access to resources and rewards. To understand the role of these factors, we reconstructed the exhibition history of half a million artists, mapping out the coexhibition network that captures the movement of art between institutions. Centrality within this network captured institutional prestige, allowing us to explore the career trajectory of individual artists in terms of access to coveted institutions. Early access to prestigious central institutions offered life-long access to high-prestige venues and reduced dropout rate. By contrast, starting at the network periphery resulted in a high dropout rate, limiting access to central institutions. A Markov model predicts the career trajectory of individual artists and documents the strong path and history dependence of valuation in art.|000|network, graph theory, partitioning, art, reputation, cultural evolution 4379|Fraiberger2018|What is interesting potentially is the way they structure their "coexhibition network", maybe in the context of our work on borrowing detection.|000|network, partitioning, visualization, graph theory, 4380|Wright2016|The Mk model was developed for estimating phylogenetic trees from discrete morphological data, whether for living or fossil taxa. Like any model, the Mk model makes a number of assumptions. One assumption is that transitions between character states are symmetric (i.e., the probability of changing from 0 to 1 is the same as 1 to 0). However, some characters in a data matrix may not satisfy this assumption. Here, we test methods for relaxing this assumption in a Bayesian context. Using empirical data sets, we perform model fitting to illustrate cases in which modeling asymmetric transition rates among characters is preferable to the standard Mk model. We use simulated data sets to demonstrate that choosing the best-fit model of transition-state symmetry can improve model fit and phylogenetic estimation.|000|character change heterogeneity, morphological characters, phylogenetic reconstruction, priors, 4381|Wright2016|Potentially super interesting for our purpose, needs a thorough read.|000|priors, character change heterogeneity, MK model, phylogenetic reconstruction, morphological characters 4382|Drienko2018|We apply the largest-chunk segmentation algorithm to texts consisting of syllables as smallest units. The algorithm was proposed in Drienkó (2016, 2017a), where it was used for texts considered to have letters/ characters as smallest units. The present study investigates whether the largest chunk segmentation strategy can result in higher precision of boundary inference when syllables are processed rather than characters. The algorithm looks for subsequent largest chunks that occur at least twice in the text, where text means a single sequence of characters, without punctuation or spaces. The results are quantified in terms of four precision metrics: Inference Precision, Alignment Precision, Redundancy, and Boundary Variability. We segment CHILDES texts in four languages: English, Hungarian, Mandarin, and Spanish. The data suggest that syllable-based segmentation enhances inference precision. Thus, our experiments (i) provide further support for the possible role of a cognitive largest-chunk segmentation strategy, and (ii) point to the syllable as a more optimal unit for segmentation than the letter/phoneme/ character, (iii) in a cross-linguistic context.|000|morphological segmentation, algorithms, syllabification, 4383|Kuemmel2018|Bei der vergleichenden Betrachtung metrischer Systeme lässt sich feststel- len, dass es eine gewisse Korrelation zwischen normalsprachlicher Prosodie und (einheimischer) Metrik gibt. So ist eine „akzentuierende“ Metrik, die sich nach dem Wortakzent richtet, häufig in Sprachen mit „dynamischem“ Wortakzent, wie z.B. in germanischen Sprachen. Eine quantitierende Metrik ist gewöhnlich mit einer phonologischen Quantitätskorrelation verknüpft, wie im Altgriechischen, Sanskrit und Arabischen; eine morenzählende Metrik ist typisch für Sprachen mit einer starken Bedeutung der More, wie im Mittelindoarischen. Diese Korrelationen können bei der Deutung unbekannter oder unklarer Systeme helfen. Erwägungen über die prosodische Phonologie sollen hier auf das Altavestische angewandt werden, dessen Metrik im indo- germanischen Vergleich eine Sonderstellung einzunehmen scheint. Folgende Abkürzungen und Symbole werden verwendet.|000|metric, rhythm, prosody, Avestan, meter, poetry, annotation 4384|Wiens2011|In summary, all of these simulation and empirical studies seem to fit into this common framework, with highly incomplete taxa being potentially problematic [pb] when the overall number of characters is small and generally unproblematic when the number is large. This common framework seems to apply to all phylogenetic methods, not simply likelihood and Bayesian analysis.|720f|missing data, phylogenetic reconstruction, 4385|Yu2018|Several online dictionaries documenting the lexicon of a variety of sign languages (SLs) are now available. These are rich resources for comparative studies, but there are methodological issues that must be addressed regarding how these resources are used for research purposes. We created a web-based tool for annotating the articulatory features of signs (handshape, location, movement and orientation). Videos from online dictionaries may be embedded in the tool, providing a mechanism for large-scale theoretically-informed sign language annotation. Annotations are saved in a spreadsheet format ready for quantitative and qualitative analyses. Here, we provide proof of concept for the utility of this tool in linguistic analysis. We used the SL adaptation of the Swadesh list (Woodward, 2000) and applied lexicostatistic and phylogenetic methods to a sample of 23 SLs coded using the web-based tool; supplementary historic information was gathered from the Ethnologue of World Languages and other online sources. We report results from the comparison of all articulatory features for four Asian SLs (Chinese, Hong Kong, Taiwanese and Japanese SLs) and from the comparison of handshapes on the entire 23 language sample. Handshape analysis of the entire sample clusters all Asian SLs together, separated from the European, American, and Brazilian SLs in the sample, as historically expected. Within the Asian SL cluster, analyses also show, for example, marginal relatedness between Chinese and Hong Kong SLs.|000|sign language, lexicostatistics, concept list, missing data, missing code 4386|Ranacher2017|In linguistics, there is broad consensus that river networks have played an important role in the diffusion of languages in South America. However, the presence of a river alone does not imply language diffusion. Languages have spread along specific routes in the network. It is largely unknown where these routes are located, since evidence of language diffusion is often sparse and only spatially-implicit. In this paper we propose an approach to identify probable pathways of language diffusion along the Amazon River network, combining ideas from route planning and route inference. Route planning proposes possible routes of linguistic diffusion along the river network. Route inference tests these against evidence from linguistic data. We find significant evidence for language diffusion along few, specific branches of the Amazon. Our approach is not restricted to linguistic data alone. It is generally suitable to explore deep-time processes in space for which evidence is sparse and spatially-implicit.|000|pathways, language diffusion, South American languages, river systems, 4387|Bentz2018|There are more than 7,000 languages spoken in the world today 1 . It has been argued that the natural and social envi- ronment of languages drives this diversity 2–13 . However, a fundamental question is how strong are environmental pres- sures, and does neutral drift suffice as a mechanism to explain diversification? We estimate the phylogenetic signals of geo- graphic dimensions, distance to water, climate and population size on more than 6,000 phylogenetic trees of 46 language families. Phylogenetic signals of environmental factors are generally stronger than expected under the null hypothesis of no relationship with the shape of family trees. Importantly, they are also—in most cases—not compatible with neutral drift models of constant-rate change across the family tree branches. Our results suggest that language diversification is driven by further adaptive and non-adaptive pressures. Language diversity cannot be understood without modelling the pressures that physical, ecological and social factors exert on language users in different environments across the globe.|000|language evolution, correlational studies, neutral evolution, 4388|Traag2018|Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected communities. In the worst case, communities may even be disconnected, especially when running the algorithm iteratively. In our experimental analysis, we observe that up to 25% of the communities are badly connected and up to 16% are disconnected. This may present serious issues in subsequent analyses. To address this problem, we introduce the Leiden algorithm. We prove that the Leiden algorithm yields communities that are guaranteed to be connected. In addition, we prove that, when the Leiden algorithm is applied iteratively, it converges to a partition in which all subsets of all communities are locally optimally assigned. Furthermore, by relying on a fast local move approach, the Leiden algorithm runs faster than the Louvain algorithm. We demonstrate the performance of the Leiden algorithm for several benchmark and real-world networks. We find that the Leiden algorithm is faster than the Louvain algorithm and uncovers better partitions, in addition to providing explicit guarantees. Based on our results, we conclude that the Leiden algorithm is preferable to the Louvain algorithm.|000|community detection, Louvain algorithm, Leiden algorithm, graph theory, partitioning 4389|Villanea2018|Neanderthals and anatomically modern humans overlapped geographically for a period of over 30,000 years following human migration out of Africa. During this period, Neanderthals and humans interbred, as evidenced by Neanderthal portions of the genome carried by non-African individuals today. A key observation is that the proportion of Neanderthal ancestry is ~12–20% higher in East Asian individuals relative to European individuals. Here, we explore various demographic models that could explain this observation. These include distinguishing between a single admixture event and multiple Neanderthal con- tributions to either population, and the hypothesis that reduced Neanderthal ancestry in modern Europeans resulted from more recent admixture with a ghost population that lacked a Neanderthal ancestry component (the ‘dilution’ hypothesis). To summarize the asymmetric pattern of Neanderthal allele frequencies, we compiled the joint fragment frequency spectrum of European and East Asian Neanderthal fragments and compared it with both analytical theory and data simulated under vari- ous models of admixture. Using maximum-likelihood and machine learning, we found that a simple model of a single admixture did not fit the empirical data, and instead favour a model of multiple episodes of gene flow into both European and East Asian populations. These findings indicate a longer-term, more complex interaction between humans and Neanderthals than was previously appreciated.|000|Neandertal, interbreeding, human evolution, geographic models, human prehistory 4390|Lord2019|Explaining the evolution of animals requires ecological, developmental, paleontological and phylogenetic considerations, because organismal traits are affected by complex evolutionary processes. Modeling a plurality of processes, operating at distinct time-scales on potentially interdependent traits, requires complementary treatments to phylogenetic analyses. Consistently, we developped an inclusive network approach and analyzed the co-occurrence between traits during the evolution of Rhinocerotids. We identified stable, unstable and pivotal traits, as well as traits contributing to complexes, which may obey to a common developmental regulation, pointing to an early implementation of the postcranial Bauplan among rhinocerotids. However, strikingly, most traits are highly dissociable, used repeatedly in different combinations, in different taxa, which usually do not form clades. Therefore, the genes encoding these traits might be recruited into novel gene regulation networks during the course of evolution. Our evo-systemic framework, generalizable to other evolved organisations, supports a pluralistic modeling of organismal evolution, including trees and networks.|000|trait networks, trait evolution, biological evolution, graph theory 4391|Kortlandt2009|Article is a good example for the discussion of linguistic paleography, so inspiring for discussions of the method and how it should be enhanced.|000|linguistic palaeography, semantic change, Urheimat 4392|Xu2017|The study of mixed languages and language mixing has drawn interest in the linguistic community for the past two decades both inside and outside China. Work on this topic includes publications by Bakker and Mous (1994 eds.), Thomason (1995), Mous (2003), Matras and Bakker (2003 eds.) among others. These investigations focus on some of the world’s mixed languages including Michif (from Cree and French), Mednyj Aleut (from Aleut and Russian), Ma’a (from Bantu and Cushitic) and others. In China, detailed descriptions can be found of some languages which are apparently mixes of Han or Sinitic languages and non-Han languages. For example, Chen Naixiong (1982), Xi Yuanlin (1983) indicate important loans from Amdo into the Wutun language, Chen Yuanlong (1985) reveals impact from Dongxiang into the Tangwang language, and Chen Naixiong (1990a and 1990b) reports Amdo and Chinese influences upon the Bao’an language. Yixiweisa Acuo (2004) confirms that Daohua is a mixed language.|000|Tanwang language, mixed languages, Sino-Tibetan 4393|Xu2017|In previous sections, it is assumed that the Tangwang language is a Sinitic language variety and is influenced by the Dongxiang language. This means that the Tangwang language was inherited from Chinese and shares similar characteristics with other Northwestern Chinese dialects. With the definition by Thomason that in a mixed language, the parental source is no longer traceable, Tangwang cannot be classified as a mixed language. This has also demonstrated in previous chapters. The point of view that the Tangwang language is a one of the Sinitic varieties will be tested by different means in this chapter.|1|mixed languages, Tangwang language, dataset, structural data 4394|Xu2017|Paper contains data on structural features in about four languages.|000|structural data, dataset, Sino-Tibetan 4395|Greenhill2018b|Change is coming to historical linguistics. Big, or at least “bigish data” (Gray and Watts 2017), are now becoming increasingly available in the form of large web accessible lexical, typological and phonological databases (e.g. ABVD (Greenhill et al 2008), Chirilla (Bowern 2016), Phoible (Moran 2014), WALS (Haspelmath 2014), Autotyp (Bickel et al 2017) and the soon to be released Lexibank, Grambank, Parabank and Numeralbank http://www.shh.mpg.de/180672/glottobank). This deluge of data is way beyond the ability of any one person to process accurately in their head. The deluge will thus inevitably drive the demand for appropriate computational tools to process and analyze the fast wealth of freely available linguistic information. In this chapter we will briefly describe one such set of computational tools – Bayesian phylogenetic methods – and outline their utility for historical linguistics. We will focus on four main questions: what is Bayesian phylolinguistics, why does this approach typically focus on lexical data, how is it able to estimate divergence dates, and how reliable are the results?|000|introduction, Bayesian approaches, historical linguistics, phylogenetic reconstruction 4396|Yabu1981|Wordlist presented on the Taung'yo dialect of Burmese.|000|wordlist, Burmese, Burmish languages, Taung'yo dialect 4397|Xiao2008|Zipf's Law uncovers the relationship between word frequency and its rank. This paper addresses applicability of Zipf's Law in Chinese word frequency distribution. The previous studies on Zipf’s law in Chinese were primarily based on raw corpus, without word segmentation, hence there are obvious limitations. This study investigates the topic in several large-scale POS-tagged Chinese corpora. The results of these experiments prove that word frequency distribution in Chinese exhibits Zipf’s law. The paper further examined the distribution of low frequency word in Chinese corpus, which is estimated by Zipf’s law as the majority part of a corpus word list. The result also supports the argument since low frequency words constitute over half of the corpus word occurrences. It indicates that data sparse in statistical approaches could not be magnificently reduced by expanding the corpus scale.|000|Zipf's law, Chinese, Mandarin, corpus studies 4398|Raman1997|This p a p e r addresses the problem of de- riving distance measures between parent and daughter languages with specific rele- vance to historical Chinese phonology. The diachronic relationship between the lan- guages is modelled as a Probabilistic Fi- nite State Automaton. The Minimum Mes- sage Length principle is then employed to find the complexity of this structure. The idea is t h a t this measure is representative of the a m o u n t of dissimilarity between the two languages.|000|mutual intelligibility, Chinese dialects, 4399|Lawson2018|Genetic clustering algorithms, implemented in programs such as STRUCTURE and ADMIX- TURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans as a product of recent admixture between highly differentiated popula- tions. Histories can also be reconstructed using the same procedure for groups that do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be mis- leading. We have implemented an approach, badMIXTURE, to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with addi- tional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history.|000|structure software, overinterpretation, tutorial, critics 4400|Pakendorf2014|The evolution of languages shares certain characteristics with that of genes, such as the predominantly vertical line of transmission and the retention of traces of past events such as contact. Thus, studies of language phylogenies and their correlations with genetic phylogenies can enrich our understanding of human prehistory, while insights gained from genetic studies of past population contact can help shed light on the processes underlying language contact and change. As demonstrated by recent research, these evolutionary processes are more complex than simple models of gene- language coevolution predict, with linguistic boundaries only occasionally functioning as barriers to gene flow. More frequently, admixture takes place irrespective of linguistic differences, but with a detectable impact of contact-induced changes in the languages concerned.|000|coevolution, languages and genes, population genetics, co-evolution, gene-language co-evolution, introduction 4401|LaBar2016|A major aim of evolutionary biology is to explain the respective roles of adaptive versus non- adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origina- tion of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological popula- tions. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small popula- tions evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations.|000|complexity, evolution, simulation studies, 4402|Kirby2015|Language exhibits striking systematic structure. Words are composed of combinations of reusable sounds, and those words in turn are combined to form complex sentences. These properties make language unique among natural communication systems and enable our species to convey an open-ended set of messages. We provide a cultural evolu- tionary account of the origins of this structure. We show, using simulations of rational learners and laboratory experiments, that structure arises from a trade-off between pres- sures for compressibility (imposed during learning) and expressivity (imposed during com- munication). We further demonstrate that the relative strength of these two pressures can be varied in different social contexts, leading to novel predictions about the emergence of structured behaviour in the wild.|000|expressivity, language evolution, simulation studies 4403|Rockmoore2017|We explore how ideas from infectious disease and genetics can be used to uncover patterns of cultural inheritance and innovation in a corpus of 591 national constitutions spanning 1789–2008. Legal “ideas” are encoded as “topics”—words statistically linked in docu- ments—derived from topic modeling the corpus of con- stitutions. Using these topics we derive a diffusion network for borrowing from ancestral constitutions back to the US Constitution of 1789 and reveal that con- stitutions are complex cultural recombinants. We find systematic variation in patterns of borrowing from ancestral texts and “biological”-like behavior in patterns of inheritance, with the distribution of “offspring” aris- ing through a bounded preferential-attachment process. This process leads to a small number of highly innova- tive (influential) constitutions some of which have yet to have been identified as so in the current literature. Our findings thus shed new light on the critical nodes of the constitution-making network. The constitutional net- work structure reflects periods of intense constitution creation, and systematic patterns of variation in consti- tutional lifespan and temporal influence.|000|cultural evolution, network, graph theory, disease outbreak, national institutions 4404|Rockmoore2017|Potentially interesting for the purpose of network modeling of borrowing events.|000|lexical borrowing, network modeling, network, graph analysis 4405|Murawaki2018|Statistical phylogenetic models have allowed the quantitative analysis of the evolution of a single categorical feature and a pair of binary features, but correlated evolution involving multiple discrete features is yet to be explored. Here we propose latent representation-based analysis in which (1) a sequence of discrete surface features is projected to a sequence of independent binary variables and (2) phylo- genetic inference is performed on the latent space. In the experiments, we analyze the fea- tures of linguistic typology, with a special fo- cus on the order of subject, object and verb. Our analysis suggests that languages sharing the same word order are not necessarily a co- herent group but exhibit varying degrees of di- achronic stability depending on other features.|000|structural data, concerted evolution, correlated evolution, latent representations, evolutionary model 4406|Haspelmath2018c|This paper reasserts the fundamental conceptual distinction between language-particular categories of individual languages, defined within particular systems, and comparative concepts at the cross-linguistic level, defined in substantive terms. The paper argues that comparative concepts are also widely used in other sciences and that they are always distinct from social categories, of which linguistic categories are special instances. Some linguists (especially in the generative tradition) assume that linguistic categories are natural kinds (like biological species or chemical elements) and thus need not be defined but can be recognized by their symptoms, which may be different in different languages. I also note that category-like comparative concepts are sometimes very similar to categories and that different languages may sometimes be described in a unitary commensurable mode, thus blurring (but not questioning) the distinction. Finally, I note that cross-linguistic claims must be interpreted as being about the phenomena of languages, not about the incommensurable systems of languages.|000|comparative concept, language typology, linguistic typology, methodology, terminology 4407|Feldmann2018|By dividing the world into concepts and forcing speakers to encode specific aspects of the world in a particular way, language shapes speakers’ mental representation of reality. Linguistic categories can thus influence cognitive categories and processes. This can affect the way its speakers think about the world and behave. |p1|Sapir-Whorff hypothesis, correlational studies, economy, 4408|Feldmann2018|This paper empirically studies the human capital effects of grammatical rules that permit speakers to drop a personal pronoun when used as a subject of a sentence. By de‐emphasizing the significance of the individual, such languages may perpetuate ancient values and norms that give primacy to the collective, inducing governments and families to invest relatively little in education because education usually increases the individual's independence from both the state and the family and may thus reduce the individual's commitment to these institutions. Carrying out both an individual‐level and a country‐level analysis, the paper indeed finds negative effects of pronoun‐drop languages. The individual‐level analysis uses data on 114,894 individuals from 75 countries over 1999‐2014. It establishes that speakers of such languages have a lower probability of having completed secondary or tertiary education, compared with speakers of languages that do not allow pronoun drop. The country‐level analysis uses data from 101 countries over 1972‐2012. Consistent with the individual‐level analysis, it finds that countries where the dominant languages permit pronoun drop have lower secondary school enrollment rates. In both cases, the magnitude of the effect is substantial, particularly among females.|000|Sapir-Whorff hypothesis, economy, correlational studies 4409|Larrivee2018|Semantic maps, to which Johan van der Auwera has brought a major intellectual contribution, are a representation of implicational relations in the ty- pological domain. They have increasingly been used to chart historical evolution. They are arranged as a series of contiguous cells that define pathways of variation and change. The questions raised concern the rationale for the contiguity ar- rangement. It is demonstrated on the basis of novel diachronic analyses that the cells making up a semantic map should be semantic functions and that the con- tiguous arrangement of these functions relates to the existence of bridging con- texts. Because evolution from one function to the next is made possible by bridg- ing contexts, a specific pathway of function pairs defines the evolution of items that can only proceed between cells that share bridging contexts.|000|bridging context, semantic map, linguistic typology 4410|Wagner1996|The problem of complex adaptations is studied in two largely disconnected research traditions: evolutionary biology and evolutionary computer science. This paper summarizes the results from both areas and compares their implications. In evolutionary computer science it was found that the Darwinian process of mutation, recombination and selection is not universally effective in improving complex systems like computer programs or chip designs. For adaptation to occur, these systems must possess "evolvability," i.e., the ability of random variations to sometimes produce improvement. It was found that evolvability critically depends on the way genetic variation maps onto phenotypic variation, an issue known as the representation problem. The genotype-phenotype map determines the variability of characters, which is the propensity to vary. Variability needs to be distinguished from variations, which are the actually realized differences between individuals. The genotype-phenotype map is the common theme underlying such varied biological phenomena as genetic canalization, developmental constraints, biological versatility, devel- opmental dissociability, and morphological integration. For evolutionary biology the representation problem has im- portant implications: how is it that extant species acquired a genotype-phenotype map which allows improvement by mutation and selection? Is the genotype-phenotype map able to change in evolution? What are the selective forces, if any, that shape the genotype-phenotype map? We propose that the genotype-phenotype map can evolve by two main routes: epistatic mutations, or the creation of new genes. A common result for organismic design is modularity. By modularity we mean a genotype-phenotype map in which there are few pleiotropic effects among characters serving different functions, with pleiotropic effects falling mainly among characters that are part of a single functional complex. Such a design is expected to improve evolvability by limiting the interference between the adaptation of different functions. Several population genetic models are reviewed that are intended to explain the evolutionary origin of a modular design. While our current knowledge is insufficient to assess the plausibility of these models, they form the beginning of a framework for understanding the evolution of the genotype-phenotype map.|000|evolvability, phenotype, genotype, methodology, simulation studies 4411|Bromham2018|A growing number of studies seek to identify predictors of broad-scale patterns in human cultural diversity, but three sources of non-independence in human cultural variables can bias the results of cross-cultural studies. First, related cultures tend to have many traits in common, regardless of whether those traits are functionally linked. Second, societies in geographical proximity will share many aspects of culture, environment and demography. Third, many cultural traits covary, leading to spurious relationships between traits. Here, we demonstrate tractable methods for dealing with all three sources of bias. We use cross-cultural analyses of proposed associations between human cultural traits and parasite load to illustrate the potential problems of failing to correct for these three forms of statistical non- independence. Associations between parasite stress and sociosexuality, authoritarianism, democracy and language diversity are weak or absent once relatedness and proximity are taken into account, and parasite load has no more power to explain variation in traditionalism, religiosity and collectivism than other measures of biodiversity, climate or population size do. Without correction for statistical non- independence and covariation in cross-cultural analyses, we risk misinterpreting associations between culture and environment.|000|genetic relationship, controll for relatedness, Galton's problem, correlational studies 4412|KoptjevskajaTamm2018|Our study aims to explore how much information about areal patterns of colexification we can gain from lexical databases such as CLICS and ASJP. We adopt a bottom-up (rather than hypothesis-driven) approach, identifying areal patterns in three steps: (i) determine spatial autocorrelations in the data, (ii) iden- tify clusters as candidates for convergence areas and (iii) test the clusters result- ing from the second step controlling for genealogical relatedness. Moreover, we identify a (genealogical) diversity index for each cluster. This approach yields promising results, which we regard as a proof of concept, but we also point out some drawbacks of the use of major lexical databases.|000|linguistic area, areal diffusion, colexification, computational approaches, 4413|Greenhill2018c|treemaker is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.|000|journal of open source software, software, examples, open source 4414|Wagner2008|Abstract | Neutralism and selectionism are extremes of an explanatory spectrum for understanding patterns of molecular evolution and the emergence of evolutionary innovation. Although recent genome-scale data from protein- coding genes argue against neutralism, molecular engineering and protein evolution data argue that neutral mutations and mutational robustness are important for evolutionary innovation. Here I propose a reconciliation in which neutral mutations prepare the ground for later evolutionary adaptation. Key to this perspective is an explicit understanding of molecular phenotypes that has only become accessible in recent years.|000|neutral evolution, network, graph theory, methodology, natural selection, debate 4415|Pathmanathan2017|Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descrip- tions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.|000|compositionality, composite genes, algorithms, gene fusion 4416|Hessler2018|Interesting course findings: apparently, when giving students cookies, they give better course evaluations.|000|cookies, bribing, evaluation, Evaluierung, psychology 4417|Vejdemo2016|he rate of lexical replacement estimates the diachronic stability of word forms on the basis of how frequently a proto-language word is replaced or retained in its daughter languages. Lexical replacement rate has been shown to be highly related to word class and word fre- quency. In this paper, we argue that content words and function words behave differently with respect to lexical replacement rate, and we show that semantic factors predict the lexi- cal replacement rate of content words. For the 167 content items in the Swadesh list, data was gathered on the features of lexical replacement rate, word class, frequency, age of acquisition, synonyms, arousal, imageability and average mutual information, either from published databases or gathered from corpora and lexica. A linear regression model shows that, in addition to frequency, synonyms, senses and imageability are significantly related to the lexical replacement rate of content words–in particular the number of synonyms that a word has. The model shows no differences in lexical replacement rate between word clas- ses, and outperforms a model with word class and word frequency predictors only.|000|Swadesh list, lexical replacement, content words, concept list 4418|Vejdemo2016|Paper is interesting because it provides some insights into the replacement behavior of some words. It also contains norm data that was used for testing, so when dealing with lexical change, it may be worth consulting.|000|lexical replacement, lexical change, Swadesh list, concept list, 4419|Buchanan2018|This article presents the Linguistic Annotated Bibliography (LAB) as a searchable Web portal to quickly and easily access reliable database norms, related programs, and variable calculations. These publications were coded by language, number of stimuli, stimuli type (i.e., words, pictures, symbols), keywords (i.e., frequency, semantics, valence), and other useful information. This tool not only allows researchers to search for the specific type of stimuli needed for experiments but also permits the exploration of publication trends across 100 years of research. Details about the portal creation and use are outlined, as well as various analyses of change in publication rates and keywords. In general, advances in computational power have allowed for the increase in dataset size in the recent decades, in addition to an increase in the number of linguistic variables provided in each publication.|000|bibliiography, interfaces 4420|Tuillard2018|The study of the narrative elements in tales and myths (motifs) belongs to a long tradition, initially aimed at finding the area of origin of early narratives (Urtexts). This objective, which has been much criticized, is generally abandoned today, but is it possible to establish the basis for an objectively verifiable mythogeography? Computer technology enables sophisticated mathematical computations on databases of an unprecedented scale, which makes it possible to base the comparative mythology on replicable calculation processes. In order to check for several subsets of motifs that could be specific to particular zones or continents, we test here two new methods on a corpus of 2264 motifs from ca. 40.000 myths recorded among 934 peoples around the globe, and we show that these motifs are best classified into two main groups.|000|mythology, cross-cultural study, world-wide study 4421|Tuillard2018|We present a new method for unsupervised learning of multilingual symbol (e.g. char- acter) embeddings, without any parallel data or prior knowledge about correspondences be- tween languages. It is able to exploit similari- ties across languages between the distributions over symbols’ contexts of use within their lan- guage, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned repre- sentations open the possibility of fully unsu- pervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.|000|word embeddings, parall data, automatic approach 4422|Lee2016|Can the language we speak determine how we represent the world around us? To those familiar with the theory of linguistic relativity, this may seem like an age-old question about which everyone has their own answer. Although the evidence supporting linguistic relativity remains controversial, the long reach of language into our perception and behavior is nevertheless an intriguing possibility that deserves further investigation. Here I take a closer look at a case of linguistic relativity that had a particularly strong impact on cross-cultural research: the pronoun-drop effect. The theory of pronoun- drop effect posits that languages that allow their speakers to drop subject pronouns in verbal commu- nication would lead their speakers to create collectivistic culture. It was argued that the absence of pronouns necessitates the speakers to embed their self-identities in the context of social interaction, so the linguistic practice of omitting pronouns would reduce the sense of individuality in the minds of speakers. After conducting a series of Bayesian multilevel analyses on the original dataset, however, the current study concludes that the pronoun-drop effect is unlikely to be a robust, universal phenom- enon. The analyses revealed that the majority of statistical signal supporting the phenomenon comes from the Indo-European language family, and other families provided little or inconsistent evidence. It was also observed that the Indo-European languages alone made up 61 per cent of the original data- set, and dropping them from analysis completely nullified the pronoun-drop effect. These observa- tions suggest that the pronoun-drop effect is a consequence of failing to account for (i) varying effects among language families and (ii) overrepresentation of the Indo-European languages. With these re- sults, this article suggests that the theory of pronoun-drop effect should be thoroughly revised. Additionally, the article provides several suggestions for many similar cross-cultural studies that suf- fer from the same problems as the pronoun-drop effect study.|000|pro-drop, correlational studies, 4423|Kutsenkov2018|Despite the increased attention to the Dogon by anthropologists and ethnologists, there are many “white spots” in the history and ethnography of this people. For example, not so long ago it was believed that they speak six languages; then their number grew steadily, and now linguists number already thirty Dogon languages, conditionally united in the family of Dogon languages of the macro-family of Niger-Congo; it is possible that there are even more of them. The history of migrations on the Bandiagara Highlands and the adjoining plains also remains poorly understood. All existing hypotheses, one way or another, based on oral traditions (of- ten without specifying the informant and/or source). Only to a small extent are they based on archaeological data. In addition to the “common Dogon” historical tradition, which states that this people came to the Plateau around the turn of the 16 th century, there are historical legends of individual villages, their neighbourhoods and even families. They can be very different from the ‘general’ version. From this point of view, two oral histories of the village of Endé are of great interest. Based on the analysis of these legends, it is possible to draw with all possible caution a preliminary conclusion that the Dogon country was populated in two stages: the first one falling between the 10 th and the 13 th centuries, and the second between the 15 th and the 19 th centuries. In all examined villages exists the same model of relations between the local population and the aliens: the new group usurps political and military power and gives the old population its clan name, but itself adopts its language and culture. Such relations designed to prevent possible conflicts. The article based on an analysis of the Dogon oral history collected during field research between 2015–2018.|000|population history, Dogon languages, African languages 4424|Raja2014|New words are required not only to increase our vocabulary but also to create new sentences. New words are acquired by the process of word formation which can be done in several ways. One of the most commonly used ways to form new words is affixation either through prefixation or suffixation. Confixation or infixation is hardly ever used and is evidenced in the Indonesian Language. Other methods of word formation include coining, clipping, blending, acronym, and compounding. A difficulty arises when one has to decide which morpheme comes first, if he encounters a word with bound morphemes at both sides, since the two bound morphemes are not simultaneously attached to the root. Confixation occurs when morphemes are bounded both ends of the root simultaneously. Confixation can be seen in the Indonesian language.|000|word formation, affixation, morphology 4425|Pigliucci2008|In recent years, biologists have increasingly been asking whether the ability to evolve — the evolvability — of biological systems, itself evolves, and whether this phenomenon is the result of natural selection or a by-product of other evolutionary processes. The concept of evolvability, and the increasing theoretical and empirical literature that refers to it, may constitute one of several pillars on which an extended evolutionary synthesis will take shape during the next few years, although much work remains to be done on how evolvability comes about.|000|evolvability, evolutionary theory, universal evolutionary theory 4426|Rockmoore2017|We explore how ideas from infectious disease and genetics can be used to uncover patterns of cultural inheritance and innovation in a corpus of 591 national constitutions spanning 1789–2008. Legal “ideas” are encoded as “topics”—words statistically linked in docu- ments—derived from topic modeling the corpus of con- stitutions. Using these topics we derive a diffusion network for borrowing from ancestral constitutions back to the US Constitution of 1789 and reveal that con- stitutions are complex cultural recombinants. We find systematic variation in patterns of borrowing from ancestral texts and “biological”-like behavior in patterns of inheritance, with the distribution of “offspring” aris- ing through a bounded preferential-attachment process. This process leads to a small number of highly innova- tive (influential) constitutions some of which have yet to have been identified as so in the current literature. Our findings thus shed new light on the critical nodes of the constitution-making network. The constitutional net- work structure reflects periods of intense constitution creation, and systematic patterns of variation in consti- tutional lifespan and temporal influence.|000|network, network approaches, diffusion, constitution, cultural evolution, stemmatics 4427|Rockmoore2017|Article mentions a method for detecting an underlying diffusion network based on a method presented originally by @GomezRodriguez2012.|000|graph theory, network approaches, cultural evolution 4428|GomezRodriguez2012|Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of dif- fusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.|000|network approaches, diffusion, information flow, 4429|GomezRodriguez2012|May be very relevant, this approach, for borrowing networks, as we also have an underlying contact network that we may want to infer from our data.|000|lexical borrowing, lexical diffusion, network approaches, idea 4430|Smith2018|There is still little agreement regarding the most important evidence for Old Chinese (OC) onset complexity—Middle Chinese (MC) mixed-onset phonetic series. This study explores a remarkable feature of this evidence first noticed by Sagart (1999). Within series such as those involving mixture of MC labials and velars with l-, x- with m-, velars with hj- (/y/), and d- with y- (/j/), MC onset and so-called A/B (syllable) Type fail to vary independently of one another. An unrecognized but inescapable implication of this association is that these MC onset results and A/B Type require a unified explanation in early Chinese. In light of the phonetic series material, I demonstrate that pre-OC Type involved two contrasting onset configurations. A number of phonetic specifications are conceivable; here, based on ideas of Ferlus (1998), I show how the data can be explained in terms of an early contrast between minor syllable forms **CǝR- (“Type A”) and tautosyllabic clusters **CR- (“Type B”) where R is a sonorant.|000|Old Chinese, A/B distinction, phonetic series, xiéshēng 4431|Adelaar2018|The Neogrammarians of the Leipzig School introduced the principle that sound changes are regular and that this regularity is without exceptions. At least as a working hypothesis, this principle has remained the basis of the comparative method up to this day. In the first part of this paper, I give a short account of how historical linguists have defended this principle and have dealt with apparent counter evidence. In the second part, I explore if a sound change can be regular if it is attested in one instance only. I conclude that it is, provided that the concomitant phonetic (and phonotactic) evidence supporting it is also based on regularity. If the single instance of a sound change is the result of developments which are all regular in themselves, it is still in line with the regularity principle.|000|regular sound change, regularity hypothesis, methodology, 4432|Adelaar2018|Interesting paper, as the author suggests that regularity does not require multiple examples to be proven, which also holds for cases where evidence is obvious from cross-linguistic considerations, or from systemic assumptions.|000|regular sound change, regularity hypothesis, methodology 4433|Adelung1806|In the Mithridates, Adelung points directly (in the introduction) to the importance of basic vocabulary, but also to distinguish borrowing from inheritance (which is why he doesn't like numbers), but also points to the importance of considering the syntax or the "general structure". |000|proof of relationship, history of science, historical language comparison, methodology, 4434|Arnold2012|The lack of a widely-accepted, objectively-defined standard list of ‘basic’ meanings for use in the initial stages of the comparative method is identified as a priority in the resolution of areas of unnecessary subjectivity in historical and comparative linguistics. A methodology is presented, capable of ranking meanings by a score fully representative of the four features identified as necessary for a meaning to be considered optimal for use in the initial stages of language comparison: maximal item stability, maximal resistance to replacement of form by borrowing, maximal conceptual simplicity, and maximal universality. The stability of 67 meanings is quantified using a procedure described, but not adequately implemented, by Dolgopolsky (1986) and Lohr (1999); the results are integrated with Tadmor et al.’s (2010) borrowed, analyzability, and representation scores, to form a composite score by which the meanings are ranked. The resultant ranking, while not representative of the definitive list of meanings optimal for use in the initial stages of the comparative method, owing to the limitation on the number of input meanings, demonstrates the viability of the methodology presented here. Statistical results are presented to support the hypothesis that there is a strongly significant relationship between item stability and variation in stability; however, contrary to expectations and conflations evident in the literature, no evidence is found to support the hypotheses that there are strongly significant relationships between item stability and item borrowability, analyzability of form, or item universality. Finally, the results are used to test the validity of the glottochronological hypothesis of a constant rate of replacement; no support whatsoever is found to support this hypothesis.|000|item stability, concept list, stability, basic vocabulary, 4435|Arnold2012|This contains a concept list of 67 meanings, whose stability is quantified using procedures by @Dolgopolsky1986 and by @Lohr1999.|000|concept list, dataset, stability 4436|Boc2010b|T-REX (Tree and reticulogram REConstruction) is a web server dedicated to the reconstruction of phylogenetic trees, reticulation networks and to the inference of horizontal gene transfer (HGT) events. T-REX includes several popular bioinfor- matics applications such as MUSCLE, MAFFT, Neighbor Joining, NINJA, BioNJ, PhyML, RAxML, random phylogenetic tree generator and some well-known sequence-to-distance transformation models. It also comprises fast and effective methods for inferring phylogenetic trees from complete and incomplete distance matrices as well as for reconstructing reticulograms and HGT networks, including the detection and validation of complete and partial gene transfers, inference of consensus HGT scenarios and interactive HGT iden- tification, developed by the authors. The included methods allows for validating and visualizing phylo- genetic trees and networks which can be built from distance or sequence data. The web server is available at: www.trex.uqam.ca.|000|web-based tool, reticulate evolution, phylogenetic network, gene tree reconciliation 4437|Bradley2018|The Sino-Tibetan (ST) languages, including Chinese, are the major family of languages in China and many areas surrounding China in Southeast and South Asia. There have been many proposals about the phylogeny of Proto-Sino-Tibetan (PST) and the possible connections between specific ancient cultures of China and specific subgroups of PST; the basic modern subgrouping followed here is set out in more detail in Bradley (1997, 2002). It has recently been proposed (van Driem 2014) that the entire family should be renamed Trans-Himalayan (TH); others have suggested that the entire family should be called TB, with Sinitic as just one subgroup of TB (DeLancey 2013); both proposals aim to question the central historical position of Sinitic within TB.|000|Sino-Tibetan, subgrouping, 4438|Brown2017|In an endeavor to objectify and provide uniformity to the comparative method of his- torical linguistics, this study describes the Beck-Wichmann-Brown (bwb) system for evaluating lexical sets assembled as evidence for proposals of language genealogical relationship. The approach quantitatively assesses the degree of support that collec- tions of comparative sets provide for proposals, with regard to whether or not observed lexical similarity exceeds coincidental expectation. bwb is illustrated through applica- tion to an assemblage of 51 comparative sets compiled by Pache (2016) for the affiliation of Pumé and Chocoan languages of South America. This study presents and ranks bwb quantitative results for 65 language comparisons (of global distribution) and pro- poses a framework for interpreting ranked findings. Evaluations for the 65 comparisons are compared with those provided by three online classifications of the world’s lan- guages.|000|proof of relationship, methodology, genetic relationship 4439|Cheshire2017|This paper provides the solution to understanding the hitherto unknown writing system used for the manuscript listed as MS 408 at the Beinecke Library, Yale University. The writing system uses symbols, punctuation, grammar and language that are each unique. The manuscript is not encrypted, in the sense that its author made an effort to conceal the contents of the manuscript, as has been presumed by some scholars. Instead, it is code only in the sense that the modern reader needs to be versed in the calligraphic and linguistic rules to be able to translate and read the texts. Furthermore, in discovering its writing system, it became apparent that the manuscript is of invaluable importance to the study of the evolution of the Romance languages and the scheme of Italic letters and associated punctuation marks now commonplace in those and other modern languages. In short; it is revealed to be the only known document both written in Vulgar Latin, or proto- Romance, and using proto-Italic symbols. The original title for the manuscript, given by its female author, is: What one needs to be sure to acquire for the evils set in one’s fate. It is a book offering homeopathic advice and instruction to women of court on matters of the heart, of sexual congress, of reproduction, of motherhood and of the physical and emotional complications that can arise along the way through life.|000|decipherment, Beinecke manuscript, Vulgar Latin, Romance 4440|Coloma2012|This article describes variation of the pronunciation of *s* in the Spanish varieties of Argentina.|000|phonetic variation, sound change, Spanish, Argentina, 4441|Gast2015|GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow from corpus import to data export and thus provides a suitable environment for the manual annotation of modals in their sentential contexts. Given its generic data model, it is particularly suitable for enriching existing corpora, e.g. by adding semantic annotations to syntactic ones. In this contribution, we present the functionalities of GraphAnno and make a concrete proposal for the treatment of modals in a corpus, with a focus on scope interactions. We have nothing to say about the specific categories to be annotated. Its generic design allows GraphAnno to be used with various annotation schemes, like those proposed by Hendrickx et al. (2012), Nissim et al. (2013) and Rubinstein et al. (2013). We will use generic category labels from theoretical linguistics for illustration purposes.|000|GraphAnno, graph theory, linguistic annotation 4442|Goddard2018|Following the seminal work of Wierzbicka (1985, 2013), this paper proposes and discusses a set of semantic analyses of words from three different levels of the English ethnozoological taxonomic hierarchy (Berlin 1992): creature (unique beginner), bird, fish, snake, and animal (life-form level), dog and kangaroo (generic level). The analytical framework is the Natural Semantic Metalanguage approach (Wierzbicka 1996, 2014, Goddard and Wierzbicka 2014). Though ultimately resting on the foundational elements of the NSM system, i.e. 65 semantic primes and their inherent grammar of combination, the analysis relies on the analytical concepts of semantic molecules and semantic templates (Goddard 2012, 2016). These provide mechanisms for encapsulating semantic complexity and for modelling relations between successive layers of the hierarchy. Other issues considered include the extent to which cultural components feature in the semantics of ethnozoological categories, and the extent to which semantic knowledge may vary across different speech communities.|000|natural semantic metalanguage, concept list, 4443|Goddard2018|Potentially interesting in the form of adding it to concepticon.|000|concept list, dataset 4444|Heine2007|This book is interesting as it shows some nice ideas regarding uniformitarianism, but also regarding the idea of using grammaticalization to push the borders of linguistics reconstruction, in order to arrive at the original form of language. The major idea sounds interesting, but it needs a more proper read, and it seems that they just don't really handle their data in a proper statistical way, so the results can already be at fault due to the data handling procedure.|000|grammaticalization, language origin, uniformitarianism 4445|Kelly2018|West Africa is a fertile zone for the invention of new scripts. As many as twenty-seven have been devised since the 1830s (Dalby 1967, 1968, 1969; Rovenchak, Glavy 2011, inter alia) including one created as recently as 2010 (Ibekwe 2012, 2016). Talented individuals with no formal literacy are likely to have invented at least three of these scripts, suggesting that they had reverse-engineered the ‘idea of writing’ on the same pattern as the Cherokee script, i.e. with minimal external input. Influential scholars like E. B. Tylor, A. L. Kroeber and I. J. Gelb were to approach West African scripts as naturalistic experiments in which the variable of explicit literacy instruction was eliminated. Thus, writing systems such as Vai and Bamum were invoked as productive models for theorising the dynamics of cultural evolution (Tylor [1865] 1878, Crawford 1935, Gelb [1952] 1963), the diffusion of novel technologies (Crawford 1935, Kroeber 1940), the acquisition of literacy (Forbes 1850, Migeod 1911, Scribner and Cole 1981) the cognitive processing of language (Kroeber 1940, Gelb [1952] 1963), and the evolution of writing itself (Crawford 1935, Gelb [1952] 1963; Dalby 1967, 2). This paper revisits the three West African scripts that are known to have been devised by non- literates. By comparing the linguistic, semiotic and sociohistorical contexts of each known case I suggest various circumstances that may have favoured their invention, transmission and diffusion. I argue that while the originators of scripts drew inspiration from known systems such as Roman and Arabic, they are likely to have drawn on indigenous pictorial culture and annotation systems to develop their own scripts. Once established, their creations were used to circumscribe alternative politico-religious formations in direct opposition to the discourses of colonial administrations. The appeal of these scripts was thus tied more to their relative indexical power than their apparent technological or cognitive advantages. Just as earlier theorists imagined, I contend that West African scripts do have the potential to illuminate historical processes of creativity, transmission and evolution, but only when local particularities are given due consideration.|000|writing systems, African languages 4446|Wolfenden1937|I1HE purpose of the present somewhat desultory notes may be said to be twofold : firstly, to emphasize the necessity of comparing the word stock of one Indo-Chinese language with that of another by word families only,1 secondly, to make a preliminary investigation into certain variations of a particular type within such families, as there are here certain anomalies of which very careful note will have to be taken in any comparative work along these lines.|000|word family, word formation, Sino-Tibetan, final consonants 4447|Seifart2018a|This discussion note reviews responses of the linguistics profession of to the grave issues of language endangerment identified a quarter of a century ago in the journal Language by Krauss, Hale, England, Craig, and others (Hale et al. 1992). Two and a half decades of worldwide research not only have given us a much more accurate picture of the number, phylogeny, and typological variety of the world’s languages, but they have also seen the development of a wide range of new approaches, conceptual and technological, to the problem of documenting them. We review these approaches and the manifold discoveries they have unearthed about the enormous variety of lin- guistic structures. The reach of our knowledge has increased by about 15% of the world’s lan- guages, especially in terms of digitally archived material, with about 500 languages now reasonably documented thanks to such major programs as DoBeS, ELDP, and DEL. But linguists are still falling behind in the race to document the planet’s rapidly dwindling linguistic diversity, with around 35–42% of the world’s languages still substantially undocumented, and in certain countries (such as the US) the call by Krauss (1992) for a significant professional realignment to- ward language documentation has only been heeded in a few institutions. Apart from the need for an intensified documentarist push in the face of accelerating language loss, we argue that existing language documentation efforts need to do much more to focus on crosslinguistically comparable data sets, sociolinguistic context, semantics, and interpretation of text material, and on methods for bridging the ‘transcription bottleneck’, which is creating a huge gap between the amount we can record and the amount in our transcribed corpora.|000|language documentation, discussion 4448|Schaarschmidt2019|Der mittlere Intelligenzquotient wuchs über Jahrzehnte kontinuierlich an. Doch nun schwächelt der so genannte Flynn- Effekt, und in einigen Ländern scheint der Durchschnitts-IQ sogar zu sinken. Woran liegt das?|000|intelligence test, IQ test, testing, validity, 4449|Schaarschmidt2019|The article emphasizes that one likely explanation is that we get better at IQ tests since we train them more, and that training IQ tests can help getting better results, so essentially, this means that an IQ test is not measuring intelligence, if intelligence is assumed to be impossible to train.|000|intelligence test, IQ test, validity 4450|Rama2018a|We present and evaluate two similarity depen- dent Chinese Restaurant Process (sd-CRP) al- gorithms at the task of automated cognate de- tection. The sd-CRP clustering algorithms do not require any predefined threshold for de- tecting cognate sets in a multilingual word list. We evaluate the performance of the algorithms on six language families (more than 750 lan- guages) and find that both the sd-CRP variants performs as well as InfoMap and better than UPGMA at the task of inferring cognate clus- ters. The algorithms presented in this paper are family agnostic and can be applied to any linguistically under-studied language family.|000|chinese restaurant process, cognate detection, automatic approach 4451|Swarte2016|Very interesting thesis discussing different approaches to mutual intelligibility and intelligibility in general. It contains a list of lexical units that are tested for prediction of words. For any study on automatic approaches to measuring intelligibility, this thesis offers a wealth of information. |000|mutual intelligibility, intelligibility testing, prediction, Germanic 4452|Smith2019|This blog post discusses the problems of big data when people work without hypotheses, and also points to the famous *pizza studies* as an example for bad scientific practice.|000|big data, small data, data mining, statistical analysis 4453|Braun2017|This article discusses the problem of the pizza papers. |000|pizza papers, statistics, pi-hacking, 4454|Matsumae2019|Culture evolves in ways that are analogous to, but distinct from, genetic evolution. Previous studies have demonstrated correlations between genetic and cultural diversity at small scales within language families, but few studies have empirically investigated parallels between genetic and cultural evolution across multiple language families using a diverse range of cultural data. Here we report an analysis comparing cultural and genetic data from 13 populations from in and around Northeast Asia spanning 10 different language families/isolates. We construct distance matrices for language (grammar, phonology, lexicon), music (song structure, performance style), and genomes (genome-wide SNPs) and test for correlations among them. After controlling for spatial autocorrelation and recent contact, robust correlations emerge between genetic and grammatical distances. Our results suggest that grammatical structure might be one of the strongest cultural indicators of human population history, while also demonstrating differences among cultural and genetic relationships that highlight the complex nature of human cultural and genetic evolution.|000|population genetics, coevolution, gene-language co-evolution, 4455|Mattiello2018|Splinters, combining forms, and secreted affixes are three morpheme (or morpheme- like) elements which are often conflated in the literature on English word-formation. Scholars have differently focused on their morphological origin (i.e. blending, paradigmatic substitution, analogy) or on their semantics (i.e. secretion vs. mere abbreviation) (Warren 1990; Fradin 2000; Mattiello 2007; Bauer et al. 2013). This paper investigates these phenomena as part of paradigmatic morphology, or similarity among words. In particular, the investigation of five case studies (i.e. - (a)holic, docu-, -exit, -umentary, -zilla) shows that they are frequently used to create new words and even to produce series, through analogy via schema (cf. Köpcke 1993, 1998). In the paper, diachronic study combined with corpus-based analysis help us 1) categorise these phenomena as ‘marginal’ vs. ‘extra-grammatical’, and as ‘productive’ vs. ‘creative’, and 2) shed some light on their role in the development of morphological rules and in the expansion of the English lexicon.|000|paradigms, paradigmatic morphology, morphology, word formation 4456|Melcuk2018|The paper aims to demonstrate that the main contribution of Anna Wierzbicka to linguistics is the idea of semantic decomposition — that is, representing meaning in terms of structurally organized con- figurations of simpler meanings — and a huge amount of specific decompositions of lexical meanings from many languages. One of possible developments of this idea of Wierzbicka’s is the Meaning-Text linguistic approach, and in particular — the Meaning-Text model of natural language. To illustrate the importance and fruitfulness of semantic decomposition, two Meaning-Text mini-models are presented for English and Russian.|000|semantic decomposition, natural semantic metalanguage, 4457|Zee2017|We present the initial results of a reanalysis of four articles from the Cornell Food and Brand Lab based on data collected from diners at an Italian restaurant buffet. On a first glance at these articles, we immediately noticed a number of apparent inconsistencies in the summary statistics. A thorough reading of the articles and careful reanalysis of the results revealed additional problems. The sample sizes for the number of diners in each condition are incongruous both within and between the four articles. In some cases, the degrees of freedom of between-participant test statistics are larger than the sample size, which is impossible. Many of the computed F and t statistics are inconsistent with the reported means and standard deviations. In some cases, the number of possible inconsistencies for a single statistic was such that we were unable to determine which of the components of that statistic were incorrect. We contacted the authors of the four articles, but they have thus far not agreed to share their data. The attached Appendix reports approximately 150 inconsistencies in these four articles, which we were able to identify from the reported statistics alone. We hope that our analysis will encourage readers, using and extending the simple methods that we describe, to undertake their own efforts to verify published results, and that such initiatives will improve the accuracy and reproducibility of the scientific literature.|000|pizza papers, statistics, pi-hacking, p-hacking 4458|Kirkman2011|Researchers who are fortunate enough to collect large datasets sometimes wish to publish multiple papers using the same dataset. Unfortunately, there are few guidelines that authors can follow in managing these multiple papers. In this article, we address three main questions including: (i) how do authors know if they have a dataset truly worthy of multiple papers; (ii) what procedures do authors follow when they are ready to submit multiple papers from a single dataset to top tier journals; and (iii) what are the main issues when attempting to publish multiple papers from a single dataset? We provide a set of concrete recommendations for authors who wish to maximize their data collection efforts with multiple papers|000|data slicing, scientific practice, 4459|Alzahrani2016|There is increasing motivation to study bipartite complex networks as a separate category and, in particular, to investigate their community structure. We outline recent work in the area and focus on two high-performing algorithms for unipartite networks, the modularity-based Louvain and the flow-based Infomap. We survey modifications of modularity-based algorithms to adapt them to the bipartite case. As Infomap cannot be applied to bipartite networks for theoretical reasons, our solution is to work with the primary projected network. We apply both algorithms to four projected networks of increasing size and complexity. Our results support the conclusion that the clusters found by Infomap are meaningful and better represent ground truth in the bipartite network than those found by Louvain.|000|Infomap, community detection, algorithms, 4460|Alzahrani2016|Authors essentially find that Infomap works better on projected bipartite networks than Louvain.|000|community detection, bipartite network, Infomap, evaluation, 4461|Nichols2019|Very unsatisfying book-chapter showing very far-fetched claims that do not seem to be supported by the data. |000|zipping, linguistic complexity, morphology, bible corpus, peopling of South America 4462|Menninghaus2016|Ever since antiquity, it is a declared goal of the arts to emotionally move its audience (lat. movere). The studies reported here offer a scientific definition of feelings of being moved and prove its role in aesthetic evaluation. Confirming a "mixed" affective nature of these feelings, goosebumps – w hich accompany peak states of being moved – simultaneously activate the primary rew ard netw ork and high levels of negative affect as measured by facial electromyography (EMG). Moreover, the distribution of goosebumps episodes across the trajectory of poems reveals secrets of artistic composition.|000|emotion, poetry, poetic function, 4463|Konnerth2018| This is a collection of 18 Karbi te xts involving 15 different native spea kers of Karbi from different regions that all speak Hills varieties of the language. The collection represents a variety of genres, including folk stories, general narratives, personal narratives, procedural texts, a narration of the pear story that was recorded as a commentary on the video clip, as well as one interview/conversation. This collection of texts represents the core corpus that formed the basis for A grammar of Karbi (Konnerth 2014a). All texts are fully analyzed and presented with morpheme-by-morpheme glosses, and have literal translations along with explanations where needed. The original audio and video recordings are currently being made available in the Endangered Languages Archive (ELAR) at SOAS at the University of London. |000|Karbi, Sino-Tibetan, Mikir, introduction, language archive 4464|Kwiatkowski2019|The article introduces a corpus for testing and training of natural questions and answers. It is exemplary for the importance of training data in certain tasks, without which methods cannot be developed.|000|corpus, natural questions and answers, NLP, testing, dataset, training data 4465|Branner2006|Diasystems arare the result of the methodological comparison of linguistic forms, but they are not rare, nor are they the dominion of any on elinguisttic school. People who grow up in places where languages and dialects are in contact often develop informal diasystemic understtanding of varieties related to their own, and ddialect fieldworkers encounter such cases in the field. To an extent, one transcribed dialect can always be read by speakers of a related dialect. |215|diasystem, prediction, reflex retrodiction, Chinese, Chinese dialects 4466|Gabelentz2016|Manchmal, namentlich beim Beginn der vergleichenden Forschung, ist ein gewisser Leichtsinn recht heilsam. Man arbeitet eine Weile mit Fictionen, thut als wäre die Sprache, die des Alterthümlichen am Meisten zu bieten scheint, die Mutter oder doch der Urtypus der ganzen Familie. Über Schwierigkeiten, Unregelmässigkeiten, die sich ergeben, schlüpft man wohlgemuth hinweg und überlässt das Aufklären und Berichtigen der Zukunft. So wird schnell ein geräumiges, für den ersten Bedarf wohnliches Gebäude aufgeführt und die Ernte unter Dach und Fach gebracht, – wahrscheinlich viel Unkraut unter vielem Weizen. Wer sich dessen schämt, der wage sich nicht auf ein neues Gebiet; wem davor bangt, dass er in Einzelheiten irre, der verzichte darauf, im Grossen zu entdecken.|185|methodology, errors, fictions, comparative method, Gabelentz, nice quote 4467|Gabelentz2016|Sie wird also zunächst annehmen, dass jeder Laut der Ursprache in jeder Tochtersprache immer die gleiche Wandelung erlitten habe, dass also, wo die Tochtersprachen unterscheiden, auch die Ursprache unterschieden haben müsse. In diesem Sinne mag sie vorerst ganz unvorgreiflich schematische Zeichen einführen: a 1 a 2 a 3 u. s. w. Der Zukunft bleibt es dann überlassen, den Sinn dieser Ziffern zu bestimmen. Vielleicht war die Articulation der Ursprache unsicher, – dann muss sich eben die Forschung insolvent erklären. Oder es lag doch von Hause aus ein wirklicher Unterschied vor; und da sind nun zwei Fälle möglich. Der Unterschied mochte in den Lauten an sich beruhen, es waren wirklich von Hause aus ver||197||schiedene Laute. So [pb] haben die Indogermanisten neben dem ă der Schleicher’schen Ursprache ein ĕ und ein ŏ, statt deren einfacher Gutturalreihe eine doppelte nachgewiesen. Oder der Unterschied war nicht durch die Beschaffenheit des Lautes selbst, sondern durch den Einfluss seiner Nachbarn bedingt, er war euphonischer Natur. – Von anderen störenden Mächten haben wir jetzt noch nicht zu reden.|205|sound correspondences, correspondence patterns, linguistic reconstruction, methodology 4468|Gabelentz2016|Hier, freilich in sehr, sehr weiter Ferne, glaube ich den wichtigsten Gewinn zu erkennen, den die allgemeine Sprachwissenschaft von der genealogisch-historischen erhoffen mag. Es wäre zu beklagen, wenn so glänzender Scharfsinn und so rastloser Fleiss in alle Zukunft nichts |188| weiter zu Tage förderte, als die Erkenntniss, dass aus den und den Lauten hier die und dort jene geworden, dass in der einen Sprache diese, in der anderen Sprache jene Formen verloren gegangen [pb] und durch Neubildungen ersetzt seien, und, wenn es hoch kommt, dass es sich mit der Etymologie der Wörter und der Bildungselemente so oder so verhalte. Soweit, dass wir sagen könnten: In der Sprachgeschichte ist dies nothwendig, jenes unmöglich, – soweit sind wir noch lange nicht. Aber die Erfahrung lehrt schon jetzt, dass erstaunlich Vieles möglich ist, und dem sorgsamen Beobachter gelingt es oft, die Gründe dieser Möglichkeiten zu entdecken. Somit ||179|| lässt sich mindestens zum Theile feststellen, worauf der Erforscher der Sprach- geschichte gefasst sein muss. Und dabei erfordert allerdings die Methode, dass wir von dem einfacheren Falle, von der vereinzelten Sprache ausgehen, ehe wir die Wechselwirkungen der Sprachen und Mundarten aufeinander betrachten.|187f|methodology, Gabelentz, nice quote, historical linguistics, 4469|Yu2019|From an abstract, informational perspective, protein domains ap- pear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of pro- teins. Comparison of the complexity measures of “protein lan- guages” in major branches of life shows that the relative entropy difference (information gain) between the observed do- main architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are ob- served in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of sig- nal processing that is required to maintain a functioning cell.|000|protein structure, protein domain, protein domain, biological parallels, n-gram model 4470|Gerber2017|The Dene-Yenisseian hypothesis (Vajda 2010a, 2013) linking the Yenisseian languages and the Na-Dene languages has gained some attention as the first substantial proposal of a linguistic connection across the Bering Strait. At the same time, morphological material has been interpreted as evidence for a genealogical relationship between Yenisseian, Burushaski and Kusunda (van Driem 2001, 2008, 2014). The two hypotheses have been linked under the name ‘Dene-Yenisseian’ by van Driem (2014: 80) but I hereby introduce the term ‘Dene- Kusunda’ to designate the hypothesis of a genealogical relationship between Kusunda, Burushaski, Yenisseian and Na-Dene. This paper aims to review the Dene-Kusunda hypothesis by presenting a critical evaluation of the morphological data amassed as evidence in van Driem (2001, 2008, 2014), Vajda (2010a, 2013) and Gerber (2013). The argumentation in favour of Dene-Kusunda looks promising at first sight, but much of it can be explained by chance or selective analysis. A more definite evaluation of this proposal must await more studious work on the individual languages, but it is in fact likely that the putative time depth inhibits an ultimate verification or falsification.|000|Na-Dene, Caucasian languages, Kusunda, long-range comparison, critics 4471|Muthukrishna2019|Can’t we solve this problem with Bayesian statistics?. Frequentist and Bayesian approaches will tend to give the same answer with uninformative priors. The trouble is having a justifiable reason for one prior over another, opening new researcher degrees of freedom for Bayesian B-hacking [@Savalei2015]. But a Bayesian approach is ideal when we have an a priori theory tested by empirical data to inform our prior. Indeed, many cosmologists, dealing with one of the messier fields of physics, are only now moving from frequentist to Bayesian statistics, because Bayesian approaches offer more powerful tools for testing their now better-defined theories with less-than-ideal datasets.|6/9|p-hacking, multiple testing, replicability, replication crisis 4472|Muthukrishna2019|Can’t we solve this problem with Big Data?. In the age of Big Data, we can perhaps be surer of our findings—gather solider stones—but lack of theory is just as concerning. Even when you can download and run your analysis on the world, prediction or even descrip- tion does not necessarily mean explanation. The space of possible hypotheses and theories remains impossibly large even when your dataset grows. Even if we are now very sure two variables co-vary in the dataset, without knowing why, we have no way of knowing whether the relationship will hold in other populations or over time. None of this, of course, diminishes the importance of data or the value of Big Data approaches, especially for applied problems that are purely about prediction. But, if we want to understand the world, Big Data needs Big Theory.|6/9|big data, small data, replicability, replication crisis, nice quote 4473|Muthukrishna2019|The replication crisis facing the psychological sciences is widely regarded as rooted in methodological or statistical shortcom- ings. We argue that a large part of the problem is the lack of a cumulative theoretical framework or frameworks. Without an overarching theoretical framework that generates hypotheses across diverse domains, empirical programs spawn and grow from personal intuitions and culturally biased folk theories. By providing ways to develop clear predictions, including through the use of formal modelling, theoretical frameworks set expectations that determine whether a new finding is confirmatory, nicely integrating with existing lines of research, or surprising, and therefore requiring further replication and scrutiny. Such frameworks also prioritize certain research foci, motivate the use diverse empirical approaches and, often, provide a natural means to integrate across the sciences. Thus, overarching theoretical frameworks pave the way toward a more general theory of human behaviour. We illustrate one such a theoretical framework: dual inheritance theory.|000|big data, p-hacking, replication crisis, replicability, methodology 4474|Savalei2015|In a recent article, Cumming (2014) called for two major changes to how psychologists conduct research. The first suggested change—encouraging transparency and replication—is clearly worth- while, but we question the wisdom of the second suggested change: abandoning p-values in favor of reporting confidence intervals (CIs) only in all psychological research reports. This article has three goals. First, we correct the false impression created by Cumming that the debate about the usefulness of NHST has been won by its critics. Second, we take issue with the implied connec- tion between the use of NHST and the current crisis of replicability in psychology. Third, while we agree with other critics of Cumming (2014) that hypothesis testing is an important part of science (Morey et al., 2014), we express skepticism that alternative hypothesis testing frameworks, such as Bayes factors, are a solution to the replicability crisis. Poor methodological practices can com- promise the validity of Bayesian and classic statistical analyses alike. When it comes to choosing between competing statistical approaches, we highlight the value of applying the same standards of evidence that psychologists demand in choosing between competing substantive hypotheses.|000|p-hacking, Bayesian approaches, multiple testing, replicability, replication crisis 4475|Laeubli2018|Abstract: Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.|000|machine translation, machine learning, human parity, 4476|Mukai2019|Recursion at word-level is productive in many languages across the world, just as it is at phrase-level (Roeper et al. 2002; Bisetto 2010). The standard assumption is that left-branching recursive compounds (e.g. [[student film] society]) are more productive than right-branching recursive compounds (e.g. [student [film society]]) (Mukai 2008, 2017; Tokizaki 2011; Pöll 2015). However, this assumption has hardly ever been tested empirically in more detail. Using Corpus of Contemporary American English and National Web Japanese Corpus for Japanese and native speaker’s judgements on the semantic interpretation of the data, we found that the prediction is borne out. In other words, recursive compounds are preferably interpreted as left-branching. After presenting representative data from each language I will propose that left- branching recursive compounds [[A B] C] are easier to parse, since a constituent can be formed earlier than in [A [BC]] structures (Pöll 2015).|000|recursion, productivity, compound words, recursive compounds 4477|Mukai2019|R ECURSION is said to be a fundamental property of human language that potentially differentiates it from both other human cognitive domains and known communication systems in animals (@Hauser<2002> et al. 2002; @Corballis<2001> 2011). So, to reveal the characteristics of recursion at word-formation can reveal some aspects of human language. Recursion has been questioned in one of the Amazonian languages, Pirahã (@Everett<2005> 2005). Everett (2005: 5) gives an example of embedded clauses in English where in Pirahã it is expressed without embedding. However, this question will not be discussed in this paper, as this is not the focus of the paper.|35|recursion, linguistic theory, human language 4478|Mukai2019|Before discussing the main objectives of the paper, let us define what recursion is. Summarising definitions of recursion by several authors (@Chomsky1965; @Hauser<2002> et al. 2002; @Roeper2007; @Bisetto2010; @Corballis2011; @Ralli2013), the author defines recursion as a phenomenon of cyclic fashion to create sentences, phrases or words, as complex or long as we like.|35|recursion, definition, linguistic theory 4479|SuttonSpence2005| 1. Some General Points about Sign Languages 2. What is sign language poetry? 3. Repetition in sign language poetry? 4. Symmetry and balance 5. Neologisms 6. Ambiguity 7. Themes in sign poetry 8. Metaphor and allusion 9. The poem and performance 10. Blendes sign language and spoken language poetry 11. The Hang Glider 12. Trio 13. Five Senses and Three Queens 14. Afterword by Paddy Ladd |000|sign language, poetry, introduction, overview 4480|Allan2018|This corpus study compares lexical bundles found in the language input of a selection of historical and current English language teaching materials to see what insights they can give into changes in spoken language use. English teaching texts published between 1905 and 1917 were used to construct a historical corpus, and a collection of English language self-study texts published between 2004 and 2014 were used for comparison. Both groups of texts focused on spoken language. The most frequent three-word lexical bundles extracted from each corpus varied considerably. The contemporary texts showed both a greater use of formulaic language and more syntactic complexity within it, while the historical texts relied on simpler structures. An exploratory analysis of the lexical bundles in the historical texts suggests, however, that viewed in conjunction with other historical sources, they can assist in building a picture of spoken language use of the period|000|teaching, English, lexical bundles, spoken language, language change 4481|Allan2018|Interesting corpus study that compares teaching material for English across centuries and points to changes.|000|teaching, teaching material, English, language change, corpus studies 4482|Kroeger2019| 1. The meaning of meaning 2. Referring, denoting, and expressing 3. Truth and inference 4. The logic of truth 5. Word senses 6. Lexical sense relations 7. Components of lexical meaning 8. Grice's theory of implicature 9. Pragmatic inference after Grice 10. Indirect speech acts 11. Conventional implicature and use-conditional meaning 12. How meeanings are composed 13. Modeling compositionality 14. Quantifiers 15. Intensional contexts 16. Modality 17. Evidentiality 18. Because 19. Conditionals 20. Aspect and Aktionsart 21. Tense 22. Varieties of the perfect|000|semantics, introduction, overview 4483|Schlenker2018|ASL (American Sign Language) can express plurals by repeating a noun, in an unpunctuated fashion, in different parts of signing space. We argue that this construction may come with a rich (and at-issue) iconic component: the geometric arrangement of the repetitions provides information about the arrangement of the denoted plurality; in addition, the number and speed of the repetitions provide information about the size of the denoted plurality. Interestingly, the shape of the repetitions may introduce a new singular discourse referent when a vertex can be inferred to denote a singular object. Thus one may point towards the first or last iteration of a horizontal repetition of BOOK to denote the left- or right-edge of the corresponding row. This yields a remarkable interaction between iconic semantics and standard logical semantics. We show that our analysis extends to ‘punctuated’ repetitions, which involve clearly individuated iterations of a singular noun. While these may initially look like coordinated indefinites, they are better handled by the same iconic framework as plural, unpunctuated repetitions. Some repetition-based mass terms also give rise to iconic effects, and to different readings depending on whether the repetition is continuous, unpunctuated, or punctuated. Our analysis highlights the need for a formal semantics with iconicity to study the integration of such iconic and logical conditions. It also raises a question: can similar facts be found in spoken language when gestures are taken into account? We suggest that several effects can be replicated, especially when one considers examples involving ‘pro-speech gestures’ (= gestures that fully replace some spoken expressions).|000|American Sign Language, sign language, semantics, iconicity, plural 4484|Konvicka2019|In this blog post, I will sketch the history of grammaticalisation clines. Hopper and Traugott (2003: 6) understand this concept as “a metaphor for the empirical observation that cross-linguistically forms tend to undergo the same kinds of changes”.|000|grammaticalization, grammaticalization cline, history of science 4485|Konvicka2019|A prototypical cline, following @Hopper<2003> and Traugott (2003: 7), looks as follows: (1) content item > grammatical word > clitic > inflectional affix A grammaticalisation cline entails a synchronic and a diachronic perspective. In the former case, a grammaticalisation cline represents the continuum of expressions within a single language ranging from indisputably lexical to indisputably grammatical. In the latter case, a grammaticalisation cline can represent a historical development in that it illustrates the stages through which an expression goes on its way from lexical to grammatical status. Moreover, both the synchronic and the diachronic interpretation of a grammaticalisation cline can be also understood typologically. Synchronically, a cline such as (1) does not represent the possibilities in a single language, but also across different languages. Diachronically, the development from a lexical to a grammatical expression within one language can also be understood as a cross- linguistic generalisation. |p1|grammaticalization, grammaticalization cline, definition 4486|Konvicka2019|A grammaticalisation cline such as the one given in (1) is, however, not without problems. If we look at it more closely, we find out that it in fact comprises two clines: first, one illustrating functional changes from a content item to a grammatical word, and a second one describing formal changes from a (free) word to a bound affix.|p2|pathways, grammaticalization, grammaticalization cline, form-meaning pairs 4487|Kurilowidz1965|Grammaticalization consists in the increase of the range of a morpheme advancing from a lexical to a grammatical or from less grammatical to a more grammatical status, e.g. from a derivative formant to an inflectional one. :comment:`Quoted after` @Konvicka2019|69|grammaticalization, definition, nice quote 4488|Heine2019|The paper is concerned with linguistic data suggesting that one and the same lexical source of grammaticalization can give rise to different morphological processes, leading not only to compounding and lexicalization but also to derivation, and even to inflection. Based on data from African languages for which little or no earlier written documents are available, the paper argues that even in the absence of historical records it is possible to reconstruct some features of earlier processes of word formation.|000|word formation, African languages, 4489|Heine2019|(3) Parameters of grammaticalization (@Heine<2007> & Kuteva 2007: 34-46): a. Extension: linguistic expressions are extended to new contexts that invite the rise of grammatical functions (context-induced reinterpretation), b. desemanticization (or semantic bleaching): loss (or generalization) of meaning content, c. decategorialization: loss of morphosyntactic properties characteristic of lexical or other less grammaticalized forms, and d. erosion (or phonetic reduction): loss of phonetic substance.|4|grammaticalization, grammaticalization parameters, model of grammaticalization 4490|Heine2019|Each of these parameters concerns a different aspect of language structure or language use; (3a) is pragmatic in nature, (3b) relates to semantics, (3c) to morphosyntax, and (3d) to phonetics. :comment:`But what is a parameter in this case anyway?`|4|grammaticalization, grammaticalization parameters, 4491|Heine2019|The ordering of these parameters reflects the diachronic sequence in which they typically apply: grammaticalization tends to start out with extension, which triggers desemanticization, and subsequently decategorialization, and finally erosion. Erosion is the last parameter to come in when grammaticalization takes place, and in many of the examples to be presented below it is not (or not yet) involved. Paradigm instances of grammaticalization involve all four parameters but, as we will see below, there are also cases where not all of the parameters play a role. :comment:`Again, it is a model for a process, not a parameter which is described here.`|5|grammaticalization, grammaticalization parameters, 4492|Heine2019|Since the paper is concerned with word formation, a note on the key concepts that will figure below seems in order. Compounding and derivation are commonly classified as word formation, that is, as the creation of new lexemes (e.g. @Lieber<2014> & Štekauer 2014: 3). The former is defined by @Bauer<2003> (2003: 40) as “the formation of a new lexeme by adjoining two or more lexemes”, cf. English football. Compounding may take on a number of quite divergent forms and, accordingly, has been used for a range of different kinds of meaning (e.g. @Bauer<1978> 1978; @Bisetto<2005> & Scalise 2005; @Wälchli<2005> 2005; @Lieber<2009> & Štekauer 2009).|6|definition, compounding 4493|Heine2019|Derivation shares with compounding that it belongs to word formation, and with inflection that it typically, though not necessarily, involves affixation. But in the same way as the distinction between compounding and derivation, that between derivation and inflection is complex, having been portrayed as being either problematic, essentially undefinable, or even as non-existent (see the discussions in @Bybee<1985> 1985; @Anderson<1992> 1992: 72ff.; @CarstairsMcCarthy<1992> 1992). This issue is immediately relevant to the subject matter of the present paper, but we will not be able to deal with it in as much detail as might be desirable.|6|word derivation, definition 4494|Heine2019|(4) Compounding > derivation > inflection (@Bybee<1985> 1985: 82; @Heine<1991> et al. 1991: 17–8; @Brinton<2005> & Traugott 2005: 85–7) The first part of the chain of grammaticalization sketched in (4) (compounding > derivation) is well established (see Heine et al. 2016) while the second part (derivation > inflection) is far from uncontroversial, since a number of counterexamples have been identified (e.g. @Norde<2009> 2009). As we will see in §3, the present paper nevertheless is in support of this pathway, even if it does not seem to be a canonical process of diachronic change.|7|grammaticalization, pathways, word derivation, compounding, inflection 4495|Schwarzwald2019|Two major word formation processes exist in Hebrew (in addition to minor compounds, blends and acronyms): (a) nonlinear formation: a combination of consonantal root with template, e.g. higdil ‘increased’ and migdal ‘tower’ are derived from the root √gdl using the templates hiCCiC and miCCaC; (b) linear formation: affixation to a stem, for example balšanut ‘linguistics’ from balšan ‘linguist’ + -ut, and xidon ‘quiz’ from xida ‘riddle’ + -on. The ending -on exhibits ambiguous cases of root and template construction as opposed to suffixed word formations. In many cases this ending is built using the nominal templates CiCaCon, CiCCon and CaCCon, the first of which usually create abstract nouns. In other cases -on is attached to various stems carrying the following connotations which are not always mutually exclusive, and sometimes share some of their meanings with words formed by the above templates: diminutive (e.g. suson ‘small horse’); collective (e.g. še'elon ‘questionnaire’); instrumental (e.g. 'ecba'on ‘thimble’); flora and fauna (e.g. zeron ‘harrier (bird)’); periodicals (e.g. šavu'on ‘weekly newspaper’); and division related words (e.g. 'axuzon ‘percentile’). Thus the ending -on creates opacity as part of a template and as a suffix for both derivational processes and meanings. One outcome of the findings is that syllabic structure is the most important factor in determining Hebrew word structure.|000|word formation, linear word formation, non-linear word formation, syntagmatic processes, paradigmatic processes 4496|Schwarzwald2019|Most Hebrew words are derived from one of two major processes, linear and nonlinear (in addition to minor compounds, blends and acronyms): (i) A combination of consonantal root with vocalic template which sometimes includes additional consonants, for example gadal ‘grew up’, higdil ‘increased’, gidel ‘raised’, and migdal ‘tower’, all of which are derived from the root √gdl using the following templates: CaCaC, hiCCiC, CiCeC, miCCaC. This formation is nonlinear because the root is interwoven into the template, neither of which can be pronounced without the other. (ii) Linear affixation to a stem, for example, du-mašma'i ‘ambiguous’ from du- ‘two’ + mašma ‘meaning’ + -i ( ADJECTIVAL suffix); balšanut ‘linguistics’ from balšan ‘linguist’, and + -ut ( ABSTRACT suffix); xidon ‘quiz’ from xida ‘riddle’ + -on. Most affixation in Hebrew is suffixal. Suffixes are added to various word classes for inflection and also to certain stems for derivation. The analysis of suffixes raises the issue of possible clashes between the two word formation processes when endings are involved.|109|Hebrew, word formation, non-linear word formation, linear word formation 4497|Schwarzwald2019|The article contains potentially interesting examples for derivation in Hebrew that should be explained in annotation frameworks.|000|non-linear word formation, linear word formation, Hebrew 4498|Rzepiela2019|This paper discusses terminology used to classify the groups of derivatives characterized by some common formal and semantic features and especially those coined with one specific affix. Special attention is given to the concept of word-formation by Czech scholar Miloš Dokulil and the phrase word-formation type introduced by him. One points out a strict hierarchical order of the terms relating to the products of word-formation in Miloš Dokulil’s framework and demonstrates how the phrase word-formation type was reinterpreted by other scholars regarding the exploitation of electronic corpora by František Štichá and to the onomasiological theory by Pavol Štekauer. The terms microstructure (lexicale) and semantische Nische, employed in French and German studies of word-formation, respectively, are comparatively recalled. Finally, attention is focused on the phrase lexico-semantic class and its use as usually encountered in computational linguistics.|000|word formation, word formation type, Czech, Slavic languages 4499|Rzepiela2019|For me, particularly interesting in his concept – let us remember, based on onomasiological presumptions and structuring lexical items according to the degree of meaning’s abstractness and generality – is the level at which formal (morphological) and semantic criteria meet. Dokulil (1962), as the most general term combining these criteria, proposes word-formation category. Such a category within nouns, for example, is constituted by names of professions. Lexemes such as decretista ‘expert in canon law’ (← decretum), forestarius ‘forest ranger’ (← foresta), mensator ‘carpenter’ (← mensa) may be enclosed into this category in medieval Latin. They represent the identity of semantics and WF bases (all are desubstantive) but are coined with different suffixes. However, when derivatives [pb] characterized by semantics and WF base are, in addition, coined with the same suffix, e.g. Lat. -tor; they are formed according to terminology used by Dokulil: WFT, as illustrated in the Latin deverbal names of professions: braxator ‘brewer’ (← braxare ‘to brew’), falcastrator ‘mower’ (← falcastrare ‘to mow’), impressator ‘printer’ (← impressare ‘to print’). Moreover, to mark any slight semantic differentiation between the lexemes belonging to the same type, it is possible to divide into further subtypes. For example, Latin nouns using -ista to designate musicians of very strict specializations (clavichordista ‘clavichordist’, lutnista ‘lutenist’, organista ‘organist’) can be interpreted as a subtype of the WFT of desubstantive names of professions using -ista. What is striking in this concept is certainly its hierarchical order, which goes from the most general to the most specific term. |130f|word formation, word formation type, terminology 4500|Rzepiela2019|What is interesting about this article is that the terminology used, which starts from broad semantic categories, can be used to define a set of tools to infer the morphology of a language as well. We could start to search for potential semantic categories among words, see if we find shared morphology, and later use this to further expand the search.|000|morphology, word formation, semantics, morpheme detection 4501|Olsen2019|The examples in (2) illustrate the wide range of referents that fall under the denotation of -er nominals as predicted by i). [...] (2) Range of referents of the external argument of a nominal in -er: a. *signer of the contract* agent b. *admirer of talent* experiencer c. *ownwer of the car* possessor d. *receiver of the package* goal [pb] e. *contributor of money* source f. *heater (`*`of the room)* instrument |20f|suffixation, word formation, English, examples 4502|Olsen2019|What the different examples in the table show is that we could in fact annotated our corpus for Concepticon in such a way that we indicate for nouns, what categories they cover. We could then use this information to search for colexification pathways in partial colexification in our corpus.|000|word formation, partial colexification, suffixation, 4503|Combes2009|This article reviews the historical and present prospects of ethnohistorical and ethnographic work in the South American Gran Chaco. Geographically the Chaco is a semi-arid central South American plain, some one million square kilometers in size, encompassing portions of northern Argentina, eastern Bolivia, and western Paraguay. Average rainfall oscillates around 800 mm/yr, with the peripheries being wetter and the central Chaco drier. Some 250,000 indigenous people belonging to more than twenty ethnic groups live in the Chaco. Traditional ethno- linguistic categorization classifies them into six main linguistic groups: Mataco-maká (Wichí-Mataco, Chorote, Nivaclé-Chulupí, Maká), Guaycurú (Toba, Toba-Pilagá, Pilagá, Mocoví, Mbayá-Caduveo), Lule- Vilela (Chunupí), Lengua-Maskoi (Lengua, Sanapaná, Angaité, Enenlhet), Zamuco (Chamacoco-Ishir, Ayoreo) and Tupí-Guaraní (Ava-Chiriguano, Chané, Tapiete, Isoseño-Guaraní, and Guaraní Occidental). The last group is the largest, including nearly 100,000 people, of whom the majority live in Bolivia. Unlike their Amazonian and Andean counterparts, Chaco indigenous peoples have yet to establish transnational, pan-indigenous representative bodies of their own.|000|Gran Chaco, South American languages, introduction, 4504|ReyesCenteno2016|Genetic and fossil evidence has accumulated in support of an African origin for modern humans. Despite this consensus, several questions remain with regard to the mode and timing of dispersal out of the continent. Competing models differ primarily by the number of dispersals, their geographic route, and the extent to which expanding modern humans interacted with other hominins. Central in this debate is whether Southeast Asia was occupied significantly earlier than other parts of Eurasia and, if so, whether the population ancestral to extant Southeast Asians was notably different from the ancestors of extant Eurasians. Here, genetic and fossil evidence for the dispersal process out of Africa and into Asia is reviewed. A scenario that can resolve the current archaeological, genetic, and paleontological evidence is one which considers an initial expansion of anatomically modern humans into the Arabian Peninsula and the Levant during the terminal Middle Pleistocene, with continued exchange with Africans until the Late Pleistocene, when modern humans then dispersed into Eurasia in two waves. Advances in population genomics and methods applying evolutionary theory to the fossil record will serve to further clarify modern human origins and the out-of-Africa process.|000|Out-of-Africa, genetics, population geneticsnetics, 4505|Vihman2018|Linguistic animacy reflects a particular construal of biological distinctions encountered in the world, passed through cultural and cognitive filters. This study explores the process by which our construal of animacy becomes encoded in the grammars of human languages. We ran an iterated learning experiment investigating the effect of animacy on language transmission. Participants engaged in a simple artificial language learning task in which they were asked to learn which affix was assigned to each noun in the language. Though initially random, the language each participant produced at test became the language that the subsequent participant in a chain was trained on. Results of the experiment were analysed in terms of learnability, measured through the accuracy of responses, and structure, using an entropy measure. We found that the learnability of languages increased over generations, as expected, but entropy did not decrease. Languages did not become formally simpler over time. Instead, structure emerged through a reorganisation of noun classes around animacy-based categories. The use of semantic animacy distinctions allowed languages to retain morphological complexity while becoming more learnable. Our study shows that grammatical reflexes of animacy distinctions can arise out of learning alone, and that structuring grammar based on animacy can make languages more learnable.|000|language evolution, experimental study, animacy hierarchy 4506|Qiu2018|This paper introduces the ongoing ERC-funded project Chrono- logicon Hibernicum, which studies the diachronic developments of the Irish language between c. 550–950, and aims at refining the absolute chronology of these developments. It presents firstly the project organization, its subject matter and objective, then gives an overview of the potentials and challenges in studying the Early Irish language. The project combines historical linguistic analysis, corpus linguistic methods and Bayesian statistic tools. Finally the paper explains the impact of this project in preserving the Irish cultural heritage and the lessons learned in the first three years.|000|Irish, corpus studies, language change, 4507|Sennblad2007|Background: Evolutionary processes, such as gene family evolution or parasite-host co- speciation, can often be viewed as a tree evolving inside another tree. Relating two given trees under such a constraint is known as reconciling them. Adequate software tools for generating illustrations of tree reconciliations are instrumental for presenting and communicating results and ideas regarding these phenomena. Available visualization tools have been limited to illustrations of the most parsimonious reconciliation. However, there exists a plethora of biologically relevant non-parsimonious reconciliations. Illustrations of these general reconciliations may not be achieved without manual editing. Results: We have developed a new reconciliation viewer, primetv. It is a simple and compact visualization program that is the first automatic tool for illustrating general tree reconciliations. It reads reconciled trees in an extended Newick format and outputs them as tree-within-tree illustrations in a range of graphic formats. Output attributes, such as colors and layout, can easily be adjusted by the user. To enhance the construction of input to primetv, two helper programs, readReconciliation and reconcile, accompany primetv. Detailed examples of all programs' usage are provided in the text. For the casual user a web-service provides a simple user interface to all programs. Conclusion: With primetv, the first visualization tool for general reconciliations, illustrations of trees-within-trees are easy to produce. Because it clarifies and accentuates an underlying structure in a reconciled tree, e.g., the impact of a species tree on a gene-family phylogeny, it will enhance scientific presentations as well as pedagogic illustrations in an educational setting. primetv is available at http://prime.sbc.su.se/primetv, both as a standalone command-line tool and as a web service. The software is distributed under the GNU General Public License.|000|tree viewer, software, visualization, 4508|Tversky1985|Explanations and predictions of people's choices, in everyday life as well as in the social sciences, are often founded on the assumption of human rationality. The definition of rationality has been much debated, but there is general agreement that rational choices should satisfy some elementary requirements of consistency and coherence. In this chapter, we describe decision problems in which people systematically violate the requirements of consistency and coherence, and we trace these violations to the psychological principles that govern the perception of decision problems and the evaluation of options.|000|decision making, framing, psychology, behaviour, 4509|Klein1977|Paraguay is divided into two geographical zones: the Oriental or eastern part of the country, and the western part or Chaco. The latter, consisting of 60% of the land mass of Paraguay, is inhabited by 3% of the country' s population. It is also the area which is home to the largest number of Paraguayan Indians, representing thirteen languages and five linguistic familie s.|000|Paraguayan Chaco, South American languages, Paraguay 4510|Koliska2017|As a journalistic norm transparency has gained institutional acceptance in the United States. However, comparatively little is known about the extent to which news organizations in other national contexts have adopted this norm. This paper explores how transparency, as an innovation in journalism, has been diffused, i.e. perceived and possibly implemented, in German newsrooms. Interviews with 17 journalists from leading German news organizations indicate that although certain forms of openness have been conceptually adopted, transparency is far from being embraced as an innovation nor institutionally implemented, indicating that the adoption of an innovation such as transparency remains contingent on national contextual factors.|000|transparancy, media, journalism, 4511|Carvalho2018|Relations between Arawakan and Guaicuruan groups in the Chaco region of South America have been widely discussed in ethnohistorical and anthropological sources. This paper offers the first systematic investigation of the linguistic effects of these interactions, relying on, and contributing to, the historical study of both language families. I discuss a number of nominal lexemes in Terena, the extant descendant of Guaná, that lack plausible Arawakan etymologies, thus being candidates for items adopted from some non-Arawakan language. I show that the Terena items in question are loans from Northern Guaicuruan languages and can be traced to source forms with established Guaicuruan etymologies. I propose some linguistic markers that characterize the Guaicuruan stratum in the Ter- ena lexicon, particularly well-represented in the domains of zoonyms and phytonyms. Consequences of these findings are discussed, including their relation to the independent claims of the ethnohistorical literature.|000|Gran Chaco, Guaicuruan, Arawakan, Southern American languages, 4512|Carvalho2018|This author also wrote in @Carvalho2017 on a similar topic, so this may be interesting in this context.|000|Arawakan, Guaycuruan, language contact, Southern American languages, Gran Chaco 4513|Bruening2018|The lexicalist hypothesis, which says that the component of grammar that produces words is distinct and strictly separate from the component that produces phrases, is both wrong and super- fluous. It is wrong because (i) there are numerous instances where phrasal syntax feeds word for- mation; (ii) there are cases where phrasal syntax can access subword parts; and (iii) claims that word formation and phrasal syntax obey different principles are not correct. The lexicalist hypothesis is superfluous because where there are facts that it is supposed to account for, those facts have independent explanations. The model of grammar that we are led to is then the most parsimonious one: there is only one combinatorial component of grammar that puts together both words and phrases.|000|lexicalist hypothesis, morphosyntax, 4514|Bruening2018|The basic tenet of the lexicalist hypothesis is that the system of grammar that assembles words is separate from the system of grammar that assembles phrases out of words.|1|definition, lexicalist hypothesis, grammatical theory, 4515|Asimov1973|Book describes some «100 questions about science», which are intended to be rather basic, and are answered one by one.|000|scientific practice, science, popular science, 4516|Donohue2013|Kusunda has been described in sketch form (@Watters<2006> et al. 2006), but a number of morphemic and syntactic structures remain unclear. In particular, the Watters et al. description was based on elicited materials, resulting in confusion about some morphemes. Following the collection of a naturalistic corpus we discuss the function of two verbal suffixes with interesting uses.|000|Kusunda, language isolate, grammatical description, evidentiality, 4517|Everson2019|We present two pieces of interlocking technol- ogy in development to facilitate community- based, collaborative language description and documentation: (i) a mobile app where speakers submit text, voice recordings and/or videos, and (ii) a community language por- tal that organizes submitted data and provides question/answer boards whereby community members can evaluate/supplement submis- sions.|000|online, web-based tool, language documentation, 4518|Gelman2014|This book is intended to have three roles and to serve three associated audiences: an introductory text on Bayesian inference starting from first principles, a graduate text on effective current approaches to Bayesian modeling and computation in statistics and related fields, and a handbook of Bayesian methods in applied statistics for general users of and researchers in applied statistics. Although introductory in its early sections, the book is definitely not elementary in the sense of a first text in statistics. The mathematics used in our book is basic probability and statistics, elementary calculus, and linear algebra. A review of probability notation is given in Chapter 1 along with a more detailed list of topics assumed to have been studied. The practical orientation of the book means that the reader’s previous experience in probability, statistics, and linear algebra should ideally have included strong computational components.|000|Bayesian inference, Bayesian data analysis, methodology, machine learning, introduction, tutorial, 4519|Gelman2014|Book was recommended by Gerhard, and may be quite interesting to get started.|000|Bayesian data analysis, introduction, 4520|Kalfa2018|This article draws on the sociology of Bourdieu to explore how academics respond to managerialist imperatives. Bourdieu’s metaphor of the game is applied to a case study of a regional Australian university, which underwent significant changes in 2007, the most notable being the introduction of performance appraisals. In-depth interviews (N=20) reveal evidence of symbolic violence: staff compliance with and complicity in the changes. This is evident in the way that the interviewees, mostly early career academics, chose to play the game by concentrating their efforts on increasing their capital within the new order. To further support this argument, signs of resistance to the new regime were explored. Findings show that vocal resistance was sparse with silence, neglect and exit being the more realistic options. The article concludes that it is academics’ illusio, their unwavering commitment to the game, which neutralizes resistance by pitting colleagues against each other.|000|academia, science, scientific practice, sociology, 4521|Boerger2018|In this paper we describe single-event Rapid Word Collection (RWC) workshop results in 12 languages, and compare these results to fieldwork lexicons collected by other means. We show that this methodology of collecting words by semantic domain by community engagement leads to obtaining more words in less time than conventional collection methods. Factors contributing to high and low net word senses are summarized, addressed, and suggestions given for increasing ef- fectiveness of the RWC procedures. Relevant points are illustrated in detail using a 2015 Natügu [ntu] RWC workshop in the Solomon Islands. We conclude that the advantages of the single-event RWC workshop strategy warrant recommend- ing it as best practice in lexicographic fieldwork for minority languages.|000|data collection, workshop, 4522|Brunson2016|Domestic taurine cattle (Bos taurus) were introduced to China from Central Asia between 3600 and 2000 cal BCE. Most of the earliest domestic cattle remains in China come from sacrificial or ritual con- texts, especially in the form of oracle bones used in divination rituals. These oracle bones became closely tied to royal authority and are the source of the earliest written inscriptions in ancient China. In this article, we use ancient DNA to identify uninscribed bovine oracle bones from the Longshan period archaeological sites of Taosi and Zhoujiazhuang (late third millennium BCE). We found that in addition to making oracle bones out of domestic cattle scapulae, people also used aurochs (wild cattle: Bos pri- migenius) scapulae for oracle bone divination. Wild water buffalo (Bubalus mephistopheles) were also exploited at Zhoujiazhuang, but we did not identify water buffalo oracle bones in our analysis. We propose some morphological criteria that may be useful for distinguishing between these animals, but conclude that it is not always possible to identify bovine scapulae based on morphology alone. Our results indicate that wild and domestic bovines were sometimes present at the same sites and their bones were used in similar ways to make oracle bones. This raises the possibility that these species interbred and that people in ancient China may have experimented with managing indigenous Chinese wild bovines.|000|Oracle Bone inscriptions, genetics, archaelogy, 4523|Brzoza2018|Lexical frequency is one of the major variables involved in language processing. It constitutes a cornerstone of psycholinguistic, corpus linguistic as well as applied research. Linguists take frequency counts from corpora and they started to take them for granted. however, voices emerge that corpora may not always provide a comprehensive picture of how frequently lexical items appear in a language. In the present contribution I compare corpus frequency counts for English and Polish words to native speaker's perception of frequency. The analysis shows that, while generally objective and subjective values are related, there is a disparity between measures for frequent Polish words. The direction of the relationship though positive, is also not as strong as in previous studies. I suggest linking objective with subjective frequency measures in research.|000|word frequency, corpus linguistics, Polish, English, problem, 4524|Adelaar2012|The extreme language diversity that was characteristic for South America must have been a challenge to native groups throughout the subcontinent, struggling to maintain commercial and political relations with each other. Due to the absence of phonetically based writing systems in pre-European times there is hardly any documentation about the way cross-linguistic communication was achieved. How- ever, the outlines of a conscious linguistic policy can be assumed from the Incas’ success in imposing their language upon a millenary multilingual society. Second- language learning, often by users of typologically widely different languages, must have been an everyday concern to the subjects of the Inca empire. Sixteenth-cen- tury chroniclers often report in a matter-of-fact way on the ease and rapidity with which native Americans mastered the language of their conquerors, be it Quechua, Spanish or any other language. Apart from such cases of political necessity, there are indications that language played an essential role in many South American native societies and that it could be manipulated and modified in a deliberate way.|000|South American languages, overview, 4525|Campbell2012a|The purpose of this chapter is to present a general overview of the classification of the indigenous languages of South America. 2 The aim is to present a classifi- cation which reflects as nearly as possible the current state of research, while making clear where disagreements may lie and pointing directions for future re- search.|000|South American languages, classification, 4526|Muysken2012|this chapter I will try to describe a few aspects of language contact in the history of the languages of the American Indian communities of South America. The topic of contacts between the indigenous languages in South America is vast and almost in- tractable.|000|contact area, South American languages, language contact 4527|Campbell2012b|This chapter has two goals. One is to present an overview of languages of the Southern Cone, concentrating on their classification and on structural traits which characterize languages in the region. The second goal, related to the first, is to try to answer the question, is the Gran Chaco a linguistics area?|000|Gran Chaco, South American languages, introduction 4528|Campbell2012|Table of Contents (incomplete) @Adelaar2012: Historical overview @Campbell2012a: Classification of the indigenous languages of South America @Muysken2012: Contacts between indigenous languages in South America @Campbell2012c: Languages of the Chaco and Southern Cone|000|South American languages, introduction, overview, 4529|Gavrilets1999|The world as we perceive it is three dimensional. Physicists currently believe one needs on the order of a dozen dimensions to explain physical world. However, biological evolution occurs in a space with millions dimensions. Sewall Wright’s powerful metaphor of rugged adaptive landscapes with its emphasis on adaptive peaks and valleys is based on analogies coming from our three-dimensional experience. Because the properties of multidimensional adaptive landscapes are very different from those of low dimension, for many biological questions Wright’s metaphor is not useful or is even misleading. A new unifying framework that provides a plausible multidimensional alternative to the conventional view of rugged adaptive landscapes is emerging for deepening our understanding of evolution and speciation. The focus of this framework are percolating (nearly) neutral networks of well-fit genotypes which appear to be a common feature of genotype spaces of high dimensionality. A variety of important evolutionary questions have been approached using the new framework.|000|biological evolution, drift, selection, hyperspace 4530|Kanojia2019|Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and techniques to analyze these variants and infer phylogenetic trees for a hypothesized accurate representation based on the out- put of the computational algorithm used. In our work, we detect cognates among a few Indian languages namely Hindi, Marathi, Punjabi, and Sanskrit for helping build cognate sets for phyloge- netic inference. Cognate detection helps phylogenetic inference by helping isolate diachronic sound changes and thus detect the words of a common origin. A cognate set manually annotated with the help of a lexicographer is generally used to automatically infer phylogenetic trees. Our work creates cognate sets of each language pair and infers phylogenetic trees based on a bayesian framework using the Maximum likelihood method. ylogenetic trees based on automatically detected cognate sets. The online interface helps create phylogenetic trees based on the textual data provided as an input. It helps a lexicographer provide manual input of data, edit the data based on their expert opinion and eventually create phylogenetic trees based on various algorithms including our work on automatically creating cognate sets. We go on to discuss the nuances in detection cognates with respect to these Indian languages and also discuss the categorization of Cognate words i.e., “Tatasama” and “Tadbhava” words.|000|cognate detection, phylogenetic tree, Indian 4531|Ding2014|As an important word classes in the system of vocabulary,common words is a hierarchical relative concept,not a single concept in the same level. The hierarchy of common words shows in multi-aspects. This paper discusses the main hierarchies and types of common words,from macro perspective of human language and covered language object,existing form,status in vocabulary system,time dimension,syllable structure,space dimension, style,emotional color and so on,to promote the research of common words and vocabulary history.|000|basic words, Swadesh list, 4532|Ding2014|Rather uninspired article dealing with questions on basic vocabulary.|000|basic vocabulary, Swadesh list, 4533|Dai2011|The comparison between the Chinese language and minority languages and among different minority languages is an important part of the typological studies of the Sino - Tibetan language family. The past twenty - year studies have proved that there should be clear distinction between related languages and unrelated languages while the comparison among the related languages should distinguish the cognate relations from the non - cognate relations and attach importance to the related parameters and language contact. The exiting weaknesses include certain negligence of some language facts, and some typological rules are not very useful. Thus, there is much room for the further studies of some lexical and phonetic problems in the Sino - Tibetan language family.|000|Sino-Tibetan, typological features, overview, 4534|Fowler2017|There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson, 2003). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the “Data Package”, a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal.|000|research data, data, data quality, 4535|Paperno2019|Questionnaires constitute a crucial tool in linguistic typology and language descrip- tion. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in turn based on the analysis of cross-linguistic data. We attempt to alleviate linguists’ work by construc- ting lexical Questionnaires automatically prior to any manual analysis. A convenient Questionnaire format for revealing fine-grained semantic distinctions includes pairings of words with diagnostic contexts that trigger different lexicalizations across languages. Our method to construct this type of a Questionnaire relies on distribut- ional vector representations of words and phrases which serve as input to a clustering algorithm. As an output, our system produces a compact prototype Questionnaire for cross-linguistic exploration of contextual equivalents of lexical items, with groups of three homogeneous contexts illustrating each usage. We provide examples of automatically generated Questionnaires based on 100 frequent adjectives of Russian, including veselyj ‘funny’, ploxoj ‘bad’, dobryj ‘kind’, bystryj ‘quick’, ogromnyj ‘huge’, krasnyj ‘red’, byvšij ‘former’ etc. Quantitative and qualitative evaluation of the Ques- tionnaires confirms the viability of our method.|000|questionnaire, research questionnaire, lexical typology, 4536|Marneffe2019|Dependency grammar is a descriptive and theoretical tradition in linguis- tics that can be traced back to antiquity. It has long been influential in the European linguistics tradition and has more recently become a mainstream approach to representing syntactic and semantic structure in natural lan- guage processing. In this review, we introduce the basic theoretical assump- tions of dependency grammar and review some key aspects in which differ- ent dependency frameworks agree or disagree. We also discuss advantages and disadvantages of dependency representations and introduce Universal Dependencies, a framework for multilingual dependency-based morphosyn- tactic annotation that has been applied to more than 60 languages.|000|dependency grammar, dependency parsing, introduction, 4537|Arrazola2019|Every language produces some type of verse in the form of songs, poems, or nursery rhymes, which can be analysed as a layer of words set to a template (e.g. a tune, a poetic metre). Verse templates typ- ically consist of hierarchically organised sections: songs are made up of stanzas, divided into lines, containing bars, etc. We hypothesise that this kind of patterns may emerge in the process of cultural transmission; unstructured sound sequences impose a challenge to short-term memory, but chunking the input makes it easier to parse and reproduce the sequences accurately. In order to test this hypothesis, we have run an iterated learning experiment where random sequen- ces of syllables are evolved across four transmission chains with ten generations of subjects each (all native Dutch speakers). The initial random sequences are generated by concatenating twelve tokens of the set {ban, bi, ta, tin}, as a way to materialise the abstract verse templates without using content- words. More precisely, the experiment aims to model the sequences of nonsense syllables used in many traditions to communicate the rhythmic patterns underlying songs (e.g. bols in Hindustani music, lalay patterns in Berber verse). Participants listened to the sequences of syllables, and tried to reproduce them using four computer keys, each mapped to one of the four syllables used in the input sequences. The relative timing of the participants’ responses were normalised so that the input al- ways consisted of completely isochronous sequences. Overall, the results show that sequences become shorter, easier to recall and more structured in the trans- mission process. Some regularities can be related to a global tendency to chunk the input and increase the popularity of a handful of ngrams. Besides, sequences increasingly tend to be opened by a heavy syllable (e.g. ban) and closed by a light syllable (e.g. ta), which can derive from a Dutch-specific bias.|000|verse, poetry, transmission, iterative learning, experimental study, 4538|Chen2008|T his article is a glot tochronological approach to Fuzhou dialect . Based on English-Chinese Dictionary of t he Foochow Dialect published i n 1891 , the w riter concluded that the dialect parted f rom the mainstream of the Chinese language in the period f rom 767 A. D.t o 1020 A. D..|000|glottochronology, dating, Fúzhōu, Mǐn, 4539|Chevenet2015|Reconciliation methods aim at recovering the evolutionary processes that shaped the history of a given gene family including events such as duplications, transfers and losses by com- paring the discrepancies between the topologies of the associated gene and species trees. These methods are also used in the framework of host/parasite studies to recover co-diversification scen- arios including co-speciation events, host-switches and extinctions. These evolutionary processes can be graphically represented as nested trees. These interconnected graphs can be visually messy and hard to interpret, and despite the fact that reconciliations are increasingly used, there is a shortage of tools dedicated to their graphical management. Here we present SylvX, a reconciliation viewer which implements classical phylogenetic graphic operators (swapping, highlighting, etc.) and new methods to ease interpretation and comparison of reconciliations (multiple maps, mov- ing, shrinking sub-reconciliations).|000|tree viewer, software, visualization, 4540|Zhang2018a|In Rgyalrongic studies, it is believed that the most complex stem alternation sys- tem is found in Zbu, a Northern Rgyalrong language, whereas other languages, including Situ, have simpler systems. However, as a dialect of Situ Rgyalrong, Brag-bar presents a complex stem alternation system exhibiting several opaque features in comparison with other Situ dialects. This paper documents the stem alternations in the Brag-bar dialect of Situ Rgyalrong. It first describes the dis- tribution and stem formation devices of different verb stems in Brag-bar, then explains the occurrence of the irregular stem I′/II′ in Brag-bar and investigates their synchronic status.|000|Rgyalrong, word formation, stem alternation, 4541|Chen2012|This research,based on a large database of sound correspondence among languages in China,aims at proposing an algorithm model to work out the importance of each basic word,and then adjust the basic word between the high - rank set and the low - rank set automatically. The result will be that when the languages in question are genetically related,the distribution of basic words in the two sets differs obviously from that when the languages in question are in contact relationship. That is,through the algorithm of adjusting the two set of basic words,the obviousness of ranking will increase. This algorithm model can be divided into two interrelated parts: counting to what degree a word being basic,and adjusting the word between high - rank set and low - rank set.|000|basic vocabulary, ranking, algorithm, 4542|Cook2018|The question of how best to classify Modern Standard Chinese loanwords is rather a fraught one. Various principles of categorization have been proposed in the literature; however, previous classification systems have generally covered only a relatively small proportion of all loanwords currently in use. Even attempts to provide an exhaustive catalog of lexical borrowing strategies have often been characterized by non- transparent structure, internal inconsistency or even incompleteness. This has hindered meaningful cross-linguistic comparisons of language change in Modern Standard Chinese vis-à-vis other languages. The aim of this paper is to present a new and clearly structured, comprehensive inventory of the different types of lexical borrowing that have occurred in Modern Standard Chinese over the past 30 to 40 years. Systematic cross-linguistic comparisons reveal that examples of almost all of the categories of lexical borrowing noted in the literature on English language change can likewise be provided in relation to Modern Standard Chinese. In addition, Chinese offers several options for borrowing lexical items not available to speakers of English. Overall, this paper presents a picture of Modern Standard Chinese speakers as cultivating a flexible, creative, playful approach to their use of language. The explicit recognition of the fact that many so-called “alphabetic words” are established loanwords is found to have implications for the typological classification of Chinese script, as well as for other fields such as second language teaching. A secondary finding not anticipated in the research question is that Chinese orthography shows tentative early signs of potentially developing from a morpho-logographic to a phonetic writing system.|000|borrowing, Mandarin, typology, 4543|Jiang2000|In this paper, the author proposes his view about howw to definate a basic vocabulary list, which is very important for comparative linguistics. As principles of selecting core vocabulary items, the following steps should be considered: to deduce the times of common ancestry of Sino-Tibetan languages, to make philological studies of ancient text, to find out the general knowledge on primitive hordes, so as to get rid of cultural words, which are easy to spread around, and make a supplementary list which is used for confirming the sound correspondences.|000|basic vocabulary, Swadesh list, Sino-Tibetan, 4544|He2003|There is no co nsensus w hich bra nch T ujia lang uag e belong s to. Based o n " 300 Kernel Wo rds in Ti bet an a nd M yanmese Lang uag es ", w hi ch is enlarged to 400, this paper makes a resea rch into the rela ti onshi p betw een T ujia language and Tibeta n and M yanmese Lang uag es , and m akes a com parison am ong the 54 lang uag e poi nt s i n 39 la ng uag es of the 5 Tujia lang uag es a nd Tibetan and M yanm ese Languag es branches, rev ealing the pro babi li ty of co gnate w o rds and t he phoneti cs change pat tern. It is concluded tha t Tujia lang uag e belo ng s to t he bra nch of Qiang la nguage.|000|Tujia, Sino-Tibetan, genetic classification, 4545|Karlsson2010|The nature and origin of syntactic recursion in natural languages is a topical problem. Important recent contributions include those of Johansson (2005), Parker (2006), Tomalin (2006; 2007), and Heine and Kuteva (2007). Syntactic recursion will here be discussed especially in relation to its cognate concept of iteration. Their basic common feature is plain structural repetition: “keep on emitting instances of the current structure, or stop”. Their main difference is that recursion builds structure by increasing embedding depth whereas iteration yields flat output structures which do not increase depth. My focus here is on the types of recursion and iteration, and on what empirically determinable constraints there are on the number of recursive and iterative cycles of application. Recursion comes in two subtypes, nested recursion (= center-embedding) and tail-recursion, the latter covering left-recursion and right-recursion. There are six functionally different types of iteration: structural iteration, apposition, reduplication, repetition, listing and succession. It will be empirically shown that multiple nested syntactic recursion of degrees greater than 3 does not exist in written language, neither in sentences nor in noun phrases or prepositional phrases. In practice, even nesting of degree 2 is extremely rare in writing. In speech, nested recursion at depths greater than 1 is practically non-existing, thus partly confirming an early hypothesis of Reich (1969). Left-branching tail-recursion of clauses is strictly constrained to maximally two recursive cycles. Right-branching clausal tail-recursion rarely transcends three cycles in spoken language and five in written language. On constituent level both left- and right-branching is less constrained (especially in written language), but e.g. left-branching genitives rarely recurse more than two cycles ([[[Pam’s] mum’s] baggage]).|000|syntactic recursion, syntax, 4546|Jing2000|The Curve Theory of Ranks (CTR) proposed by Dr. Chen Baoya suggests two methods for the research in the field of historic relattionships of languages: Probability Limit (PL) and the Curve of Ranks (CR). Both of CR and PL depend on a reasonable relative word list compiled by researcher. The author tests the ways introduced by Dr. Chen with a case in the paper. The result shows thatt English and Shanghai daialect are related linguisticaly. No doubt, this result is quite ridicoulous. It is clear that Chen's theory won't be acceptable until a suitble standard of relative words is fixed.|000|ranked concept list, basic vocabulary, ranking, 4547|Willet2019|Article discusses the role of food in our current time, when people are threatened by climate change etc.|000|food, anthropocene, climate change, sustainability, 4548|Mesoudi2018|In recent years, the phenomenon of cumulative cultural evolution (CCE) has become the focus of major research interest in biology, psychology and anthropology. Some researchers argue that CCE is unique to humans and underlies our extraordinary evolutionary success as a species. Others claim to have found CCE in non-human species. Yet others remain sceptical that CCE is even important for explaining human behavioural diversity and complexity. These debates are hampered by multiple and often ambiguous definitions of CCE. Here, we review how researchers define, use and test CCE. We identify a core set of criteria for CCE which are both necessary and sufficient, and may be found in non-human species. We also identify a set of extended criteria that are observed in human CCE but not, to date, in other species. Different socio-cognitive mechanisms may underlie these different criteria. We reinterpret previous theoretical models and observational and experimental studies of both human and non-human species in light of these more fine-grained criteria. Finally, we discuss key issues surrounding information, fitness and cognition. We recommend that researchers are more explicit about what components of CCE they are testing and claiming to demonstrate.|000|cultural evolution, cumulative cultural evolution, introduction, overview 4549|Murawaki2019|We borrow the concept of representation learning from deep learning research, and we argue that the quest for Greenbergian implicational universals can be reformulated as the learning of good latent representations of languages, or sequences of surface typological features. By projecting languages into latent representations and performing inference in the latent space, we can handle complex dependencies among features in an implicit manner. The most challenging problem in turning the idea into a concrete computational model is the alarmingly large number of missing values in existing typological databases. To address this problem, we keep the number of model parameters relatively small to avoid overfitting, adopt the Bayesian learning framework for its robustness, and exploit phylogenetically and/or spatially related languages as additional clues. Experiments show that the proposed model recovers missing values more accurately than others and that some latent variables exhibit phylogenetic and spatial signals comparable to those of surface features.|000|structural data, machine learning, typology, 4550|Murawaki2018a|A major pursuit within the study of language evolution is to advance understanding of the historical behavior of typological features. Previous studies have identified at least three factors that deter- mine the typological similarity of a pair of languages: (1) vertical stability, (2) horizontal diffusibility, and (3) universality. Of these factors, the first two are of particular interest. Although observed data are affected by all three factors to a greater or lesser degree, previous studies have not jointly mod- eled them in a straightforward manner. Here, we propose a solution that is derived from the field of cultural anthropology. We present a simple and extensible Bayesian autologistic model to jointly infer the three factors from observed data. Although a large number of missing values in the data set pose serious difficulties for statistical modeling, the proposed model can robustly estimate these parameters as well as missing values. Applying missing value imputation to indirectly evalu- ate the estimated parameters, we quantitatively demonstrated that they were meaningful. In con- clusion, we briefly compare our findings with those of previous studies and discuss future directions.|000|typology, structural data, machine learning, 4551|Ehmer2018|Utterances usually convey more meaning than is expressed. This ‘surplus’ of meaning can be explained by the process of inferencing. A typical definition is given, for example, by Huang, who defines inference as the “process of accepting a statement or proposition (called the conclusion) on the basis of the (possibly provisional) acceptance of one or more other statements or propositions (called the premises)“ (Huang 2011: 397). This definition rests on the basic distinction that there is an encoded meaning for linguistic signs from which further meaning may be arrived at by inferences. Two types of inference can be distinguished: semantic inference, i.e. logical entailment, and pragmatic inference. Entailment reflects logical connections between sentences; for instance the sentence All of my friends like reading inescapably entails Some of my friends like reading. In contrast, pragmatic inference is based on default logic, i.e. “reasoning on the basis of stereotypes and prototypes” (Eckardt 2006: 86). For instance, in the correct context and with the correct intonation the sentence ALL of my friends like reading might lead to the inference on part of the hearer that she is either not considered a friend or should pick up reading as a hobby. Given that pragmatic inferences are based on non-monotonic, i.e. probabilistic, logic, they can be canceled, whereas entailments cannot.|000|entailment, prototypes, probability, inference, pragmatic inference, pragmatics, 4552|Malzahn2016|One popular linguistic theory states that Tocharian – much like Anatolian – has a spe- cial status among the IE languages by having branched off from the common proto- language earlier than the remaining branches such as Indo-Iranian and Greek. Evi- dence for such an early split-off mainly comes from the Tocharian lexicon. In my paper I would like to reconsider the etymologies that have been put forth for such a claim.|000|subgrouping, Tocharian, Indo-European, 4553|Jiang2016|This paper targets at subjects of different ethnic groups with different vocabulary bases and examines their time of reaction to picture names under different conditions. The results show that the mental lexicon is unlikely to have hierarchical structures in terms of its semantic and ontological nature. It is argued that any particular set of words( including the Swadesh wordlist) are extracted from the mental lexicon with some purpose for application,which inevitably reflect hierarchical features of human cognition of multi - cultures and pragmatics. The findings of this paper will give new insights into the understanding of mental lexicon,into further application of the Swadesh wordlist,and into research in the building of core wordlists.|000|Sino-Tibetan, basic vocabulary, mental lexicon, 4554|Mi2018|Comparable corpus is the most important re- source in several NLP tasks. However, it is very expensive to collect manually. Lexical borrowing happened in almost all languages. We can use the loanwords to detect useful bilingual knowledge and expand the size of donor-recipient / recipient-donor comparable corpora. In this paper, we propose a recur- rent neural network (RNN) based framework to identify loanwords in Uyghur. Addition- ally, we suggest two features: inverse lan- guage model feature and collocation feature to improve the performance of our model. Ex- perimental results show that our approach out- performs several sequence labeling baselines.|000|loan word, borrowing detection, neural network, Uyghur 4555|Nikitina2019|The study addresses the relationship between diachronic change and synchronic polysemy based on the use of diminutives in four closely related Southeastern Mande languages. It explores the synchronic patterns of use of cognate diminutive markers deriving from the word ‘child’, and accounts for differences between the languages in terms of a Radial Category network, which is designed to capture in one representation both mechanisms of diachronic change and mechanisms of regular meaning extension. The study argues that the same approach can be used to account for the ways diminutive markers acquire new meanings and for the ways an old diminutive category disinte- grates, when new markers start replacing the old one in some of the core diminutive functions. The invasion and expansion of new markers may result in discontinuous semantic structures that can only be understood when the diachrony is taken into account (in this particular case study, the evidence for historical change comes from a synchronic comparison with closely related languages).|000|lexical typology, Mande, diminutive, 4556|Kastrisios2018|Voronoi tessellation, and its dual the Delaunay triangulation, pro- vide a cohesive framework for the study and interpretation of phenomena of geographical space in two and three dimensions. The planar and spherical solutions introduce errors in the posi- tional accuracy of both Voronoi vertices and Voronoi edges due to errors in distance computations and the path connecting two locations with planar lines or great circle arcs instead of geodesics. For most geospatial applications the introduction of the above errors is insignificant or tolerable. However, for applications where the accuracy is of utmost importance, the ellipsoidal model of the Earth must be used. Characteristically, the introduction of any positional error in the delimitation of maritime zones and bound- aries results in increased maritime space for one state at the expense of another. This is a situation that may, among others, have a serious impact on the financial activities and the relations of the states concerned. In the context of previous work on maritime delimitation we show that the Voronoi diagram consti- tutes the ideal solution for the development of an automated methodology addressing the problem in its entirety. Due to lack of a vector methodology for the generation of Voronoi diagram on the ellipsoid, the aforementioned solution was constrained by the accuracy of existing approaches. In order to fill this gap, in this paper we deal with the inherent attributes of the ellipsoidal model of the Earth, e.g. the fact that geodesics are open lines, and we elaborate on a methodology for the generation of the Voronoi diagram on the ellipsoid for a set of points in vector format. The resulting Voronoi diagram consists of vertices with positional accuracy that is only bounded by the user needs and edges that are comprised of geodesics densified with vertices equidistant to their generators. Finally, we present the implementation of the proposed algorithm in the Python programming language and the results of two case studies, one on the formation of closest service areas and one on maritime boundaries delimitation, with the positional accuracy set to 1 cm.|000|Voronoi tesselation, geographic map, 4557|Mueller2015|The article offers a description of the historical development of German word-formation and the forms of change in word-formation which are thereby revealed. Following an overview of the history of research, general developmental tendencies are initially ex- plained. Subsequently, a description is presented of the important processes of change for the areas of compounding, derivation and conversion which reaches from the begin- nings to the present. A brief overview of further types of word-formation concludes the article.|000|word formation, diachronic linguistics, historical linguistics, German 4558|Pooth2018|This is a brief comment on Kiparsky’s theory of Vedic Sanskrit accent as- signments. Kiparsky’s proposal will be rejected in its details and will be broadly modified.|000|accent, Vedic Sanskrit, 4559|Sagart1981|Paper deals with aspiration-conditioned tone lowering in Chinese dialects.|000|tone change, Chinese dialects, aspiration 4560|Pellard2019|Le livre à succès de l’archéologue Jean-Paul Demoule Mais où sont passés les Indo-Eu- ropéens ? (Seuil, 2014) met en doute l’existence d’une langue ancestrale à la famille indo- européenne sur la base de critiques portées à la linguistique indo-européenne et à la lin- guistique historique en général. Nous montrons ici que ces critiques reposent sur une docu- mentation biaisée, et qu’elles comportent de nombreuses erreurs et contresens, dont nous présentons une sélection. Nous examinons les alternatives potentielles à l’idée d’une langue ancestrale : pidginisation, créolisation, interactions dans le cadre d’une Sprachbund, forma- tion de langues mixtes par contact mutuel prolongé, et montrons que toutes échouent à rendre compte des flexions verbales, nominales et pronominales communes aux diverses branches de la famille. Enfin, nous rejetons l’équation entre la linguistique indo-européenne et les idéologies racistes. Nous réaffirmons, s’il en était besoin, le caractère scientifique et non idéologique de la linguistique historique indo-européenne.|000|Indo-European, methodology, 4561|Zlatev2011|Starting from emphasizing the richness of human experience, over twenty years ago, Cognitive Linguistics currently oscillates between a positivist and a subjectivist perspective both of which reveal an ontologically and methodologically limited understanding of language. I propose that E. Coseriu’s Integral Linguistics can substantially broaden this understanding, in distinguishing between three levels and three points of view, or perspectives, on linguistic (and cognitive) reality. Coseriu’s “matrix” of levels and perspectives is discussed, offering an interpretation along phenomenological lines. A key point is the emphasis on consciousness rather than “the cognitive unconscious”. Finally, I outline how the distinctions made within Integral Linguistics can help resolve debates within Cognitive Linguistics concerning the nature of “image schemas” and “conceptual metaphor”.|000|cognitive linguistics, empiricism, philosophy of science, 4562|Zlatev2011|Interesting thought: we can't really observe all aspects of language without introspection, since corpora only give us the surface, not what is being made with a language by a speaker.|000|introspection, cognitive linguistics, philosophy of science, 4563|Schleicher1859|Den gegenstand der morphologie der sprache bildet die lautliche form des Wor tes, seine äußere gestalt, d. h. das vorhandensein oder felen seiner teile und die stellung, welche dise teile ein nemen; unberüksichtigt laßen wir das material, auß dem das wort gebildet ist, den klang der zum aufbau desselben verwanten lautelemente. Eine ergänzung der morphologie, der lere von der äußeren lautlichen form der sprache, bildet die lere von der function der einzelnen teile des wortes und des wortes selbst; namentlich der nachweis, wie sich in jeder sprache der gegensatz von nomen und verbum entwickelt hat. In den kreiß diser ins innerste wesen der sprache vor dringenden forschung zu treten, wage ich zur zeit noch nicht, da es mir hier an leitenden grundan- schauungen und an methode noch gebricht. Im folgenden wird nur in so ferne auf die function der lautlichen elemente rüksicht genommen, als diß zur sonderung von wurzel und beziehungslaut und ferner von stammbildungselementen und declinations- und conju- gationszusätzen unumgänglich nötig ist.|000|August Schleicher, morphology, history of science, 4564|Deng2017|Four types of exceptions are identified of phonetic changes of inland Min dialcects, exceptions of the diachronic evolution, exceptions induced by the linguistic contact, exceptions led by pragmatic needs and exceptions caused by Chinese characters. Each type contains some specific sorts of their own.|000|sound change, exceptionlessness hypothesis, pragmatics, Mǐn, Chinese dialects, 4565|Spencer2004|Biological evolution has parallels with the development of natural languages, man-made artifacts, and manuscript texts. As a result, phylogenetic methods developed for evolutionary biology are increasingly being used in linguistics, anthropology, archaeology, and textual criticism. Despite this popularity, there have been few critical tests of their suitability. Here, we apply phylogenetic methods to artificial manuscripts with a known true phylogeny, produced by modern ‘scribes’. Although the survival of ancestral forms and multiple descendants from a single ancestor are probably much more common in manuscript evolution than biological evolution, we were able to reconstruct most of the true phylogeny. This is important because phylogenetic methods are influencing the production of critical editions of major written works. We also show that the variation in rates of change at different locations in the text follows a gamma distribution, as is often the case in DNA sequences.|000|stemmatics, manuscript evolution, phylogenetic reconstruction, 4566|Sun2003a|Author discusses to which degree Baima is a subdialect of Tibetan.|000|Sino-Tibetan, Baima, Tibetan dialects, 4567|Makarova2018|The present paper regards the fourth version of the "Languages of the World" database developed in the Institute of Linguistics of Russian Academy of Sciences. We substantiate the necessity of innovations in the sphere of the data representation and explain the decision to shift from a binary tree to a list of paradigms. The article gives an insight into the new version of the database, which is now under development, and discusses the possibilities that the paradigmatic data representation will reveal.|000|database, languages of the world, typology, structural data, 4568|Cuneo2014|The purpose of this paper is to analyse the morphosyntactic and semantic aspects of the augmentative forms in the Toba language (Guaycuruan family), spoken in the Gran Chaco region (Argentina). The study of the forms linked to the notion of augmentative in this language comprises both derivational morphology and nominal composition, and embodies a great range of meanings from ‘big size’, ‘abundance’, ‘intensity’ or ‘affection’ to the (generally pejorative) notions of ‘excess’ or ‘mockery’. Augmentative forms also play a role as a source of lexical creation and as a nominal categorization device. Furthermore, in the ethnobiological lexicon, they constitute a preferred means in word formation for naming animals and plants, and they code meanings such as ‘hierarchy’ (‘more dangerous’ or ‘with outstanding qualities’) or ‘anomaly’ (‘stranger/ unknown/ unusual’).|000|Toba, South American languages, word formation, overview, 4569|Tacconi2014|This paper explores the formation of compounds in Maká and intends to characterize these constructions in formal, semantic and typological terms. To this end, we have considered elements that constitute compounds and the resulting lexemes, as well as the relationship between constituents. We also seek to contribute to the discussion about the parameters that should be taken into account for lexemes to be considered compounds, and whether they correspond to the features found in languages from the same family or to a typological universal. In order to do so, we also compare the findings to derivational strategies, and discuss the sometimes unclear boundaries between the phenomena of composition and derivation.|000|Maká, South American languages, compounding, word formation, 4570|Wotton1730|Apparently a book on the Tower of Babel, investigating language relatedness in pre-linguistic times.|000|history of science, Tower of Babel, language comparison, 4571|Mueller2017a|Assuming that it needs to be decided at some point whether a given Merge(α,β) operation is legitimate, there are two basic options. The first possibility is that one of the two categories is equipped with an intrinsic formal property (typically encoded as a feature) requiring the other one to combine with it. The second possibility is that Merge applies freely throughout, and that filters check the output representation and decide about the legitimacy of the operation. The two approaches are often extensionally equivalent. In this paper, I provide an argument for the first view that is based on the hypothesis that in addition to the Merge operation that builds structure, there is also a mirror image operation Remove that removes structure: If such an operation exists, the legitimacy of the original Merge operation cannot be checked by output filters anymore. Empirical evidence for an elementary syntactic operation Remove is drawn from four domains of German syntax: passive, applicative, restructuring, and complex prefields.|000|merge, Chomsky syntax, German, 4572|Olko2018|This paper is based on extensive team research focusing on the reconstruction of the history of contact-induced change in Nahuatl from the first encounter with Spanish until the present day, taking into account both peripheral and central varieties. We trace the long-term trajectories of several morphosyntactic features that mark typological change: animacy as a gramma- tical category; the relational word as a lexical category; the formal distinction between comitative and instrumental markers; existential predicative posses- sion; and relatively free word order. We argue that key innovations in Nahuatl during the colonial period are either borrowed from Spanish or begin as minor internal patterns that gradually become dominant due to similarity with an element of Spanish structure, and that these two processes have driven typolo- gical change in the language.|000|convergence, typological change, grammatical change, Nahuatl 4573|Lipton2017|Supervised machine learning models boast re- markable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but inter- pretable. And yet the task of interpretation ap- pears underspecified. Papers provide diverse and sometimes non-overlapping motivations for in- terpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim inter- pretability axiomatically, absent further explana- tion. In this paper, we seek to refine the dis- course on interpretability. First, we examine the motivations underlying interest in interpretabil- ity, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of dif- ferent notions, and question the oft-made asser- tions that linear models are interpretable and that deep neural networks are not.|000|machine learning, theoretical problems, interpretability, neural network, 4574|Verhagen2020|In a usage-based framework, variation is part and parcel of our linguistic experiences, and therefore also of our mental representations of language. In this article, we bring attention to variation as a source of informa- tion. Instead of discarding variation as mere noise, we examine what it can reveal about the representation and use of linguistic knowledge. By means of metalinguistic judgment data, we demonstrate how to quantify and interpret four types of variation: variation across items, participants, time, and methods. The data concern familiarity ratings assigned by 91 native speakers of Dutch to 79 Dutch prepositional phrases such as in de tuin ‘in the garden’ and rond de ingang ‘around the entrance’. Participants performed the judgment task twice within a period of one to two weeks, using either a 7-point Likert scale or a Magnitude Estimation scale. We explicate the principles according to which the different types of variation can be considered information about mental repre- sentation, and we show how they can be used to test hypotheses regarding linguistic representations.|000|language variation, familiarity ratings, pragmatics, linguistic knowledge 4575|Wang2015c|Paper discusses the reasons for the stability of basic vocabulary, which is generally an interesting topic.|000|basic vocabulary, stability, 4576|Virpioja2009|We describe Allomorfessor, which extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding small modifications, called mutations, for them. Using Maximum a Posteriori estimation, the model is able to decide the amount and types of the mutations needed for the particular language. The method is evaluated in Morpho Challenge 2009.|000|morpheme detection, Morfessor, software, 4577|Monaghan2018|Languages change due to social, cultural, and cognitive influences. In this paper, we provide an assessment of these cognitive influences on diachronic change in the vocabulary. Previously, tests of stability and change of vocabulary items have been conducted on small sets of words where diachronic change is imputed from cla- distics studies. Here, we show for a substantially larger set of words that stability and change in terms of documented borrowings of words into English and into Dutch can be predicted by psycholinguistic properties of words that reflect their representational fidelity. We found that grammatical category, word length, age of acquisition, and frequency predict borrowing rates, but frequency has a non-linear relationship. Frequency correlates negatively with probability of borrowing for high-frequency words, but positively for low-frequency words. This borrowing evidence documents recent, observable diachronic change in the vocabulary enabling us to distinguish between change associated with transmission during language acquisition and change due to innovations by proficient speakers.|000|borrowing, lexical interference, language contact, 4578|Beekes1995|Root, the morpheme of a word carrying the basic meaning, as opposed to affix(es) and ending. E.g. *run* in *running*, *runner*, a *run*.|282|root, definition, 4579|Wijnholds2018|We investigate the extent to which compositional vector space models can be used to account for scope ambiguity in quantified sentences (of the form Every man loves some woman). Such sentences containing two quantifiers introduce two readings, a direct scope reading and an inverse scope reading. This ambiguity has been treated in a vector space model using bialgebras by Hedges and Sadrzadeh (2016) and Sadrzadeh (2016), though without an explanation of the mechanism by which the ambiguity arises. We combine a polarised focussed sequent calculus for the non-associative Lambek calculus NL, as described in Moortgat and Moot (2011), with the vector-based approach to quantifier scope ambiguity. In particular, we establish a procedure for obtaining a vector space model for quantifier scope ambiguity in a derivational way.|000|semantic vectors, vector-space models, semantics, compositionality, Montague grammar 4580|Hudson2019|Language is thought to be a crucial element behind Pleistocene expansions of Homo sapiens but our understanding of language change over the very long term is still poor. There have been two main approaches to language dynamics in this context. One assumes a continual ebb and flow of local human populations and languages, leading to high levels of ‘patchiness’ in both genes and languages. Another approach argues that long-term equilibrium leads not to patchiness but to areal diffusion and convergence. Both of these approaches assume equilibrium to be the norm. However, research in ecology since the 1970s has found that ecosystems have multiple potential states rather than a single equilibrium point. Under the name of resilience theory, such thinking is being increasingly applied to coupled socio-ecological systems using the concept of the adaptive cycle. This article proposes a model of long-term language change based on the adaptive cycle of resilience theory.|000|resilience, ecology, resilience theory, language evolution, punctuated equilibrium model 4581|Hudson2019|A number of preliminary general conclusions can be suggested from the above: 1. Rather than a two-stage equilibrium/punctuation model as proposed by Dixon (1997), language change seems better represented by the four-stage adaptive cycle used in resilience theory. 2. The adaptive cycle reflects actual links between languages and socio-ecological systems. The more we know about change in the latter, the better we can understand the former. 3. Language often becomes an important element of building resilience within socio-ecological systems. Cases where languages change (‘reorganize’) slowly often reflect broader processes of resilience. Conversely, sudden language extinction or replacement can be assumed to be linked with low resilience. 4. Linguistic theories of social selection such as Labov (1963) and LePage (1968) propose that speakers will choose from traditional or newly available speech models in order to further their social position and alliances within a particular society. The resilience approach suggests that studies of linguistic selection also need to take account of how language may work to foster socio-ecological resilience. 5. Although the concept of ‘equilibrium’ is better understood as one phase in a dynamic cycle rather than as a static ‘baseline’, this does not necessarily rule out the concept of Sprachbund or areas of [pb] linguistic convergence. From a resilience perspective, we might expect that particular ways of doing things would diffuse outwards if they support resilience; yet very long-term resilience would also be supported by the persistence (modularity) of very different cultures, customs, and types of governance. Thus, even if they are to some extent influenced by neighbouring languages, the persistence of language isolates within a Sprachbund can normally be assumed to foster resilience. 6. Through human history there have been several periods where language diversity has declined quite rapidly. Nettle (1999) identifies the transition from foraging to farming and the expansion of modern states and empires as two such periods of decline. The adaptive cycle approach supports this conclusion, suggesting that these are periods when previously high resilience was dramatically reduced by outside social and economic impacts. However, hunter-gatherers were perhaps the most resilient human societies ever known and they were often able to absorb a range of systemic shocks and maintain their social and linguistic functions. After the Bronze Age until around AD 1600, many hunter-gatherers were able to reorganize themselves to take advantage of previously unavailable commercial opportunities (Scott 2017). Modernization marks another phase of reorganization which is characterized by extreme linguistic standardization and reduced diversity. Although the impact of modern national languages on minority languages and dialects has been widely discussed, Culiberg (2013) shows how modern national languages were themselves created by significant reductions in diversity and thus in resilience. 7. Finally, the resilience approach emphasizes that change is the norm and that humans are in control of only a small part of that change (Walker and Salt 2006: 28–31). A system that does not change becomes increasingly vulnerable. This is often a difficult concept to accept because ‘Most of us prefer the comfort of an accustomed life … to the adventure of dramatic change’ (Tainter 2006: 92). This question of how we view change has at least two implications here. Firstly, seeing change as normal and even desirable provides a further argument against the old idea that languages ‘decay’ from ‘purer’ prototypes. The change from Latin to Romance (Old French) has particularly been seen in such negative terms as a ‘degeneration’ involving ‘A weakening, through ignorance, through loss of tradition, and through chaotic conditions, of the norms of Classical Latin’ (Rickard 1989: 8). Although most linguists now accept that—even for Latin—change is the norm, linguists such as Banniard (2013: 64) and Culiberg (2013) have still found it necessary to critique such ideas with respect to research on Latin and Japanese, respectively. Resilience theory provides an explanation as to why some types of change do not represent a ‘degeneration’ but can in fact promote socio-ecological resilience. To use the quote from Rickard just cited, for example, a resilience approach would argue that the linguistic transition from Latin to Romance may have been a successful way to adapt to ‘chaotic conditions’ and cannot necessarily been seen as involving the simple ‘loss of tradition’. The second implication follows on from this but involves a much more difficult problem relating to the function of language. Resilience theory’s emphasis on diversity and innovation may seem inconsistent with the use of language as a highly standardized marker of social identity. Language use emphasizes conformity to the extent that ‘There is abundant experimental evidence from several societies that people are more disposed to cooperate with others who have the same dialect as themselves than those who have different dialects’ (Nettle 1999: 57). While this social marking function of language often makes it difficult for speakers to acknowledge change, the concept of the adaptive cycle provides a useful way of modelling this change, and this is an area where further research is warranted.|23f|resilience theory, punctuated equilibrium model, language evolution 4582|Collins2017|The Slavic (Sl) branch of Indo-European (IE) has three sub-branches − South (SSl), West (WSl), and East (ESl). SSl has eastern and western divisions. E-SSl comprises Bulgarian (Bg), Macedonian (Mc), and the Sl dialects of northern Greece; Old Church Slavonic (OCS, 10 th −11 th cc.) was E-SSl in its basis. W-SSl comprises Slovenian (Sln) and pluricentric Bosnian/Croatian/Serbian (BCS; separately Bo, Cr, Sb), also known as Serbo-Croatian. BCS has three markedly different dialects, each with an old written tradition: Štokavian (Što), the basis of standard Bo, Cr, and Sb; Kajkavian (Kaj) in Croatia, historically affiliated with ESln; and Čakavian (Čak) along the Adriatic coast and in the islands, in a continuum with WSln.|000|Slavic languages, Indo-European, Slavic phonology, examples, 4583|Collins2017|Very interesting overview, specifically also because it offers very many tables for correspondence patterns.|000|Slavic languages, Slavic phonology, overview, Indo-European, 4584|Morrison2011|An alternative, but commonly used, approach to parsimony networks is a minimum- spanning network (also called molecular variance parsimony) (Excoffier and Smouse 1994; Bandelt et al. 1999). This is based on combining minimum-spanning trees into a single network. Minimum-spanning trees differ from the usual phylogenetic trees (which are called steiner trees) in restricting all of the (internal) nodes to being from the sampled set of taxa. That is, steiner trees allow hypothetical nodes, possibly representing unobserved ancestors, whereas minimum-spanning trees do not. The minimum-spanning network then has this same restriction, so that the total pathlength of a minimum-spanning network may be greater than for the other parsimony methods (see Figure 3.25), and it will only display the most-parsimonious trees if all of the internal nodes of the trees have been sampled. This seems to seriously limit the usefulness of the method, and it usually performs poorest in comparison studies (e.g. Cassens et al. 2005).|85|minimum spanning tree, minimum spanning network, definition, 4585|Blasi2019|INTRODUCTION Human speech manifests itself in spectacular diversity, ranging from ubiquitous sounds such as “m” and “a” to the rare click consonants in some languages of southern Africa. This range is generally thought to have been fixed by biological constraints since at least the emergence of Homo sapiens. At the same time, the abundance of each sound in the languages of the world is commonly taken to depend on how easy the sound is to produce, perceive, and learn. This dependency is also regarded as fixed at the species level. RATIONALE Given this dependency, we expect that any change in the human apparatus for production, perception, or learning affects the probability—or even the range—of the sounds that languages have. Paleoanthropological evidence suggests that the production apparatus has undergone a fundamental change of just this kind since the Neolithic. Although humans generally start out with vertical and horizontal overlap in their bite configuration (overbite and overjet, respectively), masticatory exertion in the Paleolithic gave rise to an edge-to-edge bite after adolescence. Preservation of overbite and overjet began to persist long into adulthood only with the softer diets that started to become prevalent in the wake of agriculture and intensified food processing. We hypothesize that this post-Neolithic decline of edge-to-edge bite enabled the innovation and spread of a new class of speech sounds that is now present in nearly half of the world’s languages: labiodentals, produced by positioning the lower lip against the upper teeth, such as in “f” or “v.” RESULTS Biomechanical models of the speech apparatus show that labiodentals incur about 30% less muscular effort in the overbite and overjet configuration than in the edge-to-edge bite configuration. This difference is not present in similar articulations that place the upper lip, instead of the teeth, against the lower lip (as in bilabial “m,” “w,” or “p”). Our models also show that the overbite and overjet configuration reduces the incidental tooth/lip distance in bilabial articulations to 24 to 70% of their original values, inviting accidental production of labiodentals. The joint effect of a decrease in muscular effort and an increase in accidental production predicts a higher probability of labiodentals in the language of populations where overbite and overjet persist into adulthood. When the persistence of overbite and overjet in a population is approximated by the prevalence of agriculturally produced food, we find that societies described as hunter-gatherers indeed have, on average, only about one-fourth the number of labiodentals exhibited by food-producing societies, after controlling for spatial and phylogenetic correlation. When the persistence is approximated by the increase in food-processing technology over the history of one well-researched language family, Indo-European, we likewise observe a steady increase of the reconstructed probability of labiodental sounds, from a median estimate of about 3% in the proto-language (6000 to 8000 years ago) to a presence of 76% in extant languages. CONCLUSION Our findings reveal that the transition from prehistoric foragers to contemporary societies has had an impact on the human speech apparatus, and therefore on our species’ main mode of communication and social differentiation: spoken language.|000|hunter gatherers, labiodentals, uniformitarianism, farming, 4586|Blasi2019|Causes quite a stir in the news. * https://www.theatlantic.com/science/archive/2019/03/farming-hunter-gatherers-labiodentals-linguistics/584950/ Touches upon principles of uniformitarianism, but the question is to what degree the presence of labiodentals is not rather due to our reconstruction practice? Or does our reconstruction practice reflect something deeper here with respect to earlier sounds? |000|labiodentals, farming, hunter gatherers, 4587|DeSmet2018|The relation between functionally similar forms is often described in terms of competition. This leads to the expectation that over time only one form can survive (substitution) or each form must find its unique niche in functional space (differentiation). However, competition cannot easily explain what causes functional overlap or how form-function mappings will be reor- ganized. It is argued here that the changes which competing forms undergo are steered by various analogical forces. As a result of analogy, competing forms often show attraction, becoming functionally more (instead of less) alike. Attraction can maintain and increase functional overlap in language. At the same time, competing forms are analogically anchored to a broader construc- tional network. Cases of differentiation typically follow from the relations in that network. Evidence is drawn from the literature and from three corpus- based case studies, addressing attraction and differentiation in English aspec- tual constructions, English secondary predicate constructions, and in a pair of Dutch degree modifiers. Evidence is provided of a phenomenon competition- based accounts could not predict (attraction), and a solution is offered for one they could not very well explain (differentiation). More generally it is shown that the development of competing forms must be understood against their broader grammatical context.|000|attraction, competition, lexical evolution, differentiation 4588|DeSmet2018|Potentially interesting paper on lexical evolution processes, or processes of phrase selection in general.|000|attraction, differentiation, lexical evolution, selection, 4589|Jardine2018|This paper extends a notion of local grammars in formal language theory to autosegmental representations, in order to develop a sufficiently expressive yet compu- tationally restrictive theory of well-formedness in natural language tone patterns. More specifically, it shows how to define a class ASL g of stringsets using local grammars over autosegmental representations and a mapping g from strings to autosegmental structures. It then defines a particular class ASL g T using autosegmental representa- tions specific to tone and compares its expressivity to established formal language grammars that have been successfully applied to other areas of phonology.|000|regular grammar, grammatical theory, Chomsky syntax, phonology, 4590|Jardine2018|Paper has interesting summary on questions about the regularity of phonology, and what type of grammar would be sufficient to describe a phonology of a language.|000|phonology, phonological theory, grammatical theory, Chomsky syntax, 4591|Jardine2018|An interesting question in formal language theory is to what extent enriching representation increases the expressiveness of a class of grammars. For example, first-order logic describes exactly the locally threshhold testable sets of strings when inter- preted over string models with successor, but given string models with the full order over the positions first-order logic describes exactly the star-free stringsets, a strict [pb] superclass of the locally threshhold testable sets (McNaughton and Papert 1971; Thomas 1982). Similarly, Rogers (1997) gives a local logic that, when interpreted over strings, describes the exactly strictly 2-local stringsets (a strict subclass of the locally threshhold testable sets), but when interpreted over trees describes the context- free stringsets.|1f/46|representation, expressiveness, richness, syntax, formal grammar, 4592|Jardine2018|A classic example is the fact that English speakers, when presented with the two non-words blick and bnick, will invariably judge blick as well- formed but bnick as ill-formed (@Chomsky<1965> and Halle 1965). This is because in English, bn sequences are ill-formed syllable onsets.|3/46|pseudo word, illformedness, 4593|Jardine2018|It has been well established that phonological well-formedness patterns are at most regular (Johnson 1972; Kaplan and Kay 1994; Heinz and Idsardi 2011). However, a claim that “phonological well-formedness is regular” would be an incomplete char- acterization. There are many regular patterns that are not attested as phonological well-formedness patterns; for instance, “the number of ns in a word must be even” is regular, but no such pattern has so far been discovered in natural language. It has been argued, then, that more restrictive subregular (that is, star-free (SF) and weaker) classes of stringsets are a tighter fit to the range of attested phonological patterns and thus better characterize the nature of the computation of phonological well-formedness (Heinz 2010a; Heinz et al. 2011; Rogers et al. 2013; McMullin and Hansson 2016). These classes are, namely, the the strictly local (SL) stringsets (McNaughton and Papert 1971; Rogers and Pullum 2011; Rogers et al. 2013), the tier-based strictly local (TSL) stringsets (Heinz et al. 2011), and the strictly piecewise (SP) stringsets (Rogers et al. 2010; Fu et al. 2011). These classes, which are all sub-SF, are depicted in Fig. 1. All of these classes can be characterized by forbidden k-factor grammars over strings, and all come with provable learning results (García et al. 1990; Heinz 2010b; Heinz and Rogers 2013; Jardine and Heinz 2016b; Jardine and McMullin 2017). To illustrate, the above restriction on bn in English is SL, as it can be (partially) modeled by the forbidden substring 3-factor grammar {bn}, where  indicates the beginning of a string (this will be explained in more detail in Sect. 3.2).|3/46|regular grammar, phonological rules, formal syntax 4594|Koplenig2019|Large-scale empirical evidence indicates a fascinating statistical relationship between the estimated number of language users and its linguistic and statistical structure. In this context, the linguistic niche hypothesis argues that this relationship reflects a negative selection against morphological paradigms that are hard to learn for adults, because languages with a large number of speakers are assumed to be typically spoken and learned by greater proportions of adults. In this paper, this conjecture is tested empirically for more than 2000 languages. The results question the idea of the impact of non-native speakers on the grammatical and statistical structure of languages, as it is demonstrated that the relative proportion of non-native speakers does not significantly correlate with either morphological or information-theoretic complexity. While it thus seems that large numbers of adult learners/speakers do not affect the (grammatical or statistical) structure of a language, the results suggest that there is indeed a relationship between the number of speakers and (especially) information-theoretic complexity, i.e. entropy rates. A potential explanation for the observed relationship is discussed.|000|correlational studies, population size, complexity, n-gram model, bible corpus, 4595|GoncalesVoyer2014|The questions addressed by macroevolutionary biologists are often impervious to experimental approaches, and alternative methods have to be adopted. The phy- logenetic comparative approach is a very powerful one since it combines a large number of species and thus spans long periods of evolutionary change. However, there are limits to the inferences that can be drawn from the results, in part due to the limitations of the most commonly employed analytical methods. In this chapter, we show how confirmatory path analysis can be undertaken explicitly controlling for non-independence due to shared ancestry. The phylogenetic path analysis method we present allows researchers to move beyond the estimation of direct effects and analyze the relative importance of alternative causal models including direct and indirect paths of influence among variables. We begin the chapter with a general introduction to path analysis and then present a step-by-step guide to phylogenetic path analysis using the d-separation method. We also show how the known statistical problems associated with non-independence of data points due to shared ancestry become compounded in path analysis. We finish with a discussion about the potential effects of collinearity and measurement error, and a look toward possible future developments.|000|statistical dependencies, phylogenetic path analysis, statistics, correlational studies, 4596|Karrebaek2018|We interrogate the many ways that language and food intersect. Food and its uses provide setting and structure for language, just as language and its uses constrain and inform food activities. We illuminate where and how food and language co-occur and how they are dynamically co-constitutive, foregrounding the potential for food-and-language scholarship to contribute to understandings of political economic processes and structures. We organize our review around the mutual production, consumption, and circulation of food and language. We show that the richness of scholarship about consumption (especially around the family meal) has not been matched by research concerning the production of food and language, whereas the co-constituting circulation of food and language contributes to new meanings and values for both. More research is needed to clarify the surging attention to food, which may be motivated by the complex global food system and the speed and ease of mediatization and circulation of food images and ideologies.|000|food, language, review, 4597|Lee2019|One of the most persistent debates in anthropology and related disciplines has been over the relative weight of aggression and competition versus nonaggression and cooperation as drivers of human behavioral evolution. The literature on hunting and gathering societies—past and present—has played a prominent role in these debates. This review compares recent literature from both sides of the argument and evaluates how accurately various authors use or misuse the ethnographic and archaeological research on hunters and gatherers. Whereas some theories provide a very poor fit with the hunter-gatherer evidence, others build their arguments around a much fuller range of the available data. The latter make a convincing case for models of human evolution that place at their center cooperative breeding and child-rearing, as well as management of conflict, flexible land tenure, and balanced gender relations.|000|hunter gatherers, anthropology, review, human evolution, 4598|Ball2018|Language has long been at the center of kinship studies, where there has been a tendency to see the role of language in terms of nomenclature for labeling preexisting relations. Linguistic anthropologists have turned to the constitutive role of language in the formation of kin relations. People enact kin relations through behaviors that include, but are not limited to, the linguistic. Rather than static grids of terminology, linguistic anthropology finds its empirical object in the reflexive practices of speakers as they construct, reformulate, transform, and sometimes undercut cultural norms for being kin. Taking kinship behaviors that include language to be in dialectical relation to kinship structures, I review recent work that exemplifies linguistic anthropology's pragmatic approach to kinship, from the richness and diversity of kin relations to the possibility of the lack of kin relations as such.|000|kinship terms, anthropology, linguistics, review 4599|Yu2019|Individual variation is ubiquitous and empirically observable in most phonological behaviors, yet relatively few studies aim to capture the heterogeneity of language processing among individuals, as opposed to those focusing primarily on group-level patterns. The study of individual differences can shed light on the nature of the cognitive representations and mechanisms involved in phonological processing. To guide our review of individual variation in the processing of phonological information, we consider studies that can illuminate broader issues in the field, such as the nature of linguistic representations and processes. We also consider how the study of individual differences can provide insight into long-standing issues in linguistic variation and change. Since linguistic communities are made up of individuals, the questions raised by examining individual differences in linguistic processing are relevant to those who study all aspects of language.|000|language processing, phonology, individuals, heterogeneity 4600|Yu2019|Sounds very interesting, since heterogeneity in language processing is often simply disregarded in no matter what research.|000|heterogeneity, language processing, phonology, review 4601|Jarosz2019|Recent advances in computational modeling have led to significant discoveries about the representation and acquisition of phonological knowledge and the limits on language learning and variation. These discoveries are the result of applying computational learning models to increasingly rich and complex natural language data while making increasingly realistic assumptions about the learning task. This article reviews the recent developments in computational modeling that have made connections between fully explicit theories of learning, naturally occurring corpus data, and the richness of psycholinguistic and typological data possible. These advances fall into two broad research areas: (a) the development of models capable of learning the quantitative, noisy, and inconsistent patterns that are characteristic of naturalistic data and (b) the development of models with the capacity to learn hidden phonological structure from unlabeled data. After reviewing these advances, the article summarizes some of the most significant consequent discoveries.|000|phonological learning, phonological rules, language acquisition, computational modeling 4602|Jarosz2019|Given how learning is at the core of understanding language consistently, it is a very important question to investigate learning as part of phonological theory, but also as part of historical linguistics.|000|phonological theory, complexity, learning, computational modeling, revew 4603|Antoniou2019|Bilingualism was once thought to result in cognitive disadvantages, but research in recent decades has demonstrated that experience with two (or more) languages confers a bilingual advantage in executive functions and may delay the incidence of Alzheimer's disease. However, conflicting evidence has emerged leading to questions concerning the robustness of the bilingual advantage for both executive functions and dementia incidence. Some investigators have failed to find evidence of a bilingual advantage; others have suggested that bilingual advantages may be entirely spurious, while proponents of the advantage case have continued to defend it. A heated debate has ensued, and the field has now reached an impasse. This review critically examines evidence for and against the bilingual advantage in executive functions, cognitive aging, and brain plasticity, before outlining how future research could shed light on this debate and advance knowledge of how experience with multiple languages affects cognition and the brain.|000|bilingualism, advantage, debate, review, overview 4604|Antoniou2019|Very interesting with respect to actual topics in linguistics, given that people often claim, without even investigating the evidence, that bilingualism would generally have advantages.|000|bilingualism, advantage, debate, review, overview 4605|Liberman2019|Semiautomatic analysis of digital speech collections is transforming the science of phonetics. Convenient search and analysis of large published bodies of recordings, transcripts, metadata, and annotations—up to three or four orders of magnitude larger than a few decades ago—have created a trend towards “corpus phonetics,” whose benefits include greatly increased researcher productivity, better coverage of variation in speech patterns, and crucial support for reproducibility. The results of this work include insights into theoretical questions at all levels of linguistic analysis, along with applications in fields as diverse as psychology, medicine, and poetics, as well as within phonetics itself. Remaining challenges include still-limited access to the necessary skills and a lack of consistent standards. These changes coincide with the broader Open Data movement, but future solutions will also need to include more constrained forms of publication motivated by valid concerns for privacy, confidentiality, and intellectual property.|000|corpus phonetics, corpus studies, overview, review 4606|Meyerhoff2019|Research on language and gender encompasses a variety of methods and focuses on many aspects of linguistic structure. This review traces the historical development of the field, explicating some of the major debates, including the need to move from a reductive focus on difference and dichotomous views of gender to more performative notions of identity. It explains how the field has come to include language, gender, and sexuality and how queer theory and speaker agency have influenced research in the field.|000|gender studies, sexuality, feminism, overview, review 4607|Blust2019|The Austronesian language family is the second largest on Earth in number of languages, and was the largest in geographical extent before the European colonial expansions of the past five centuries. This alone makes the determination of its homeland a research question of the first order. There is now near-universal agreement among both linguists and archaeologists that the Austronesian expansion began from Taiwan, somewhat more than a millennium after it was settled by Neolithic rice and millet farmers from Southeast China. The first “long pause,” between the settlement of Taiwan and of the northern Philippines, may have been due to inadequate sailing technology, an obstacle that was overcome by the invention of the outrigger canoe complex. The second “long pause,” between the settlement of Fiji–Western Polynesia and of the rest of Triangle Polynesia, may also have been due to inadequate sailing technology, an obstacle that was overcome by the invention of the double-hulled canoe.|000|Austronesian, homeland, review, overview 4608|Nazzi2019|All languages instantiate a consonant/vowel contrast. This contrast has processing consequences at different levels of spoken-language recognition throughout the lifespan. In adulthood, lexical processing is more strongly associated with consonant than with vowel processing; this has been demonstrated across 13 languages from seven language families and in a variety of auditory lexical-level tasks (deciding whether a spoken input is a word, spotting a real word embedded in a minimal context, reconstructing a word minimally altered into a pseudoword, learning new words or the “words” of a made-up language), as well as in written-word tasks involving phonological processing. In infancy, a consonant advantage in word learning and recognition is found to emerge during development in some languages, though possibly not in others, revealing that the stronger lexicon–consonant association found in adulthood is learned. Current research is evaluating the relative contribution of the early acquisition of the acoustic/phonetic and lexical properties of the native language in the emergence of this association.|000|speech recognition, consonants, vowels, spoken language, pseudo word 4609|Harris1954|To see that there can be a distributional structure we note the following: First, the parts of a language do not occur arbitrarily relative to each other: each element occurs in certain positions relative to certain other elements. The peren- nial man in the street believes that when he speaks he freely puts together what- ever elements have the meanings he intends; but he does so only by choosing members of those classes that regularly occur together, and in the order in which these classes occur.|146|distribution, structure, morphology 4610|Grossmann2019|It has repeatedly been observed that there is a worldwide preference for suf- fixes over prefixes. In this article, we argue that universally dispreferred – or rare – structures can and do arise as the result of regular processes of lan- guage change, given the right background structures. Specifically, we show that Ancient Egyptian-Coptic undergoes a long-term diachronic macro- change from exhibiting mixed suffixing-prefixing to showing an over- whelming preference for prefixing. The empirical basis for this study is a comparison of ten typologically significant parameters in which prefixing or affixing is potentially at stake, based on Dryer’s (2013a) 969-language sam- ple. With its extremely high prefixing preference, Coptic belongs to the rare 6% or so of languages that are predominantly prefixing. We argue that each of the micro-changes implicated in this macro-change are better understood in terms of changes at the level of individual constructions, rather than in terms of a broad structural “drift.” Crucially, there is nothing unusual about the actual processes of change themselves.|000|affixation, Egypt, word formation, universals 4611|Morey2018|Tangsa is a very linguistically diverse group spoken on the India-Myanmar border, with around 80 distinct and named varieties, many of which are mutually intelligible but many of which are not. In Myanmar, the Tangsa sub-tribes are grouped under Tangshang Naga. This paper examines and exemplifies the phenomneon of verb stem alternation, whereby a single verb has two stems, a verbal stem, which is demonstrated to be the ‘base’ or underived form, and a nominal stem which has different form, most frequently a different tone category. As with other Tibeto-Burman languages, both the forms and the functions of the stem alternations in the Tangsa varieties show considerable diversity. In the Tangsa varieties treated here, one of the stems, which we term the verbal form, is clearly the underived root and the alternate stem is derived from it. The most frequent way of forming an alternate stem is a change of tone category, keeping the vowel and any final consonants the same. In multiple cases in one Tangsa variety there is an alternate stem carrying a different form, and in a related variety there is no alternation between the stems. The verbal and nominal stems for 151 roots in the Mueshaung and Ngaimong varieties will be compared, and the findings of this comparison will be enhanced by observations of stem alternation from other Tangsa varieties.|000|stem alternation, Tangsa, Sino-Tibetan, morphosyntax 4612|Morey2018|Text contains invaluable tables illustrating verb stem alternation in two varieties.|000|stem alternation, Tangsa, morphosyntax, 4613|Basumatary2018|This paper discusses the behavior of nominal structures in Bodo. It will demonstrate that Bodo has two nominalization processes, namely, derivational and clausal nominalization. In derivational nominalization, it will be shown that a lexical noun is derived mainly from verbs and to some extent from nouns and adjectives by suffixation of one of -thai, -thi, -sula/suli,-zalu/-zali, -giri, -gra, -ari, -ma, and -sa. In clausal nominalization, a noun phrase is derived from a clause, often by suffixation of -nai with the verb. In terms of distribution and function, it will be shown that clausal nominalizations are found in attributive phrases, complementation, relative clauses, adverbial clauses and independent clauses.|000|nominalization, affixation, Bodo, word formation, 4614|Basumatary2018|Nominalization is a prominent characteristic of Tibeto-Burman languages (Genetti 2011, @DeLancey2011). It refers to the process by which we derive nominal expressions (Comrie & Thompson 1985), for example, from verbs or adjectives. Two types of nominalization are attested in world’s languages – lexical or derivational and clausal. Derivational nominalization takes as its domain the verb root and works at the morphological level to derive lexical nouns, whereas clausal nominalization takes its domain the clause or clause combination and works at the syntactic level to allow a grammatical clause to be treated as a noun phrase within a broader syntactic context (@Genetti<2010> 2010).|110|nominalization, definition, Sino-Tibetan, 4615|Kalyan2019|We find J&L’s use of the notion of “incomplete lineage sorting” (section 4.1) to be illuminating, and have learned much from their discussion of “undetectable borrowings” and loanword nativization (section 4.2). However, we believe they have misunderstood the aims of Historical Glottometry (François 2014, 2017; Kalyan & François 2018), the model of language diversification that it assumes, and our reasons for making certain methodological choices when applying it. By clarifying these points, we hope to show that our approach to language diversification—and that of other researchers who subscribe to a wave-based approach to language genealogy—is in fact largely compatible with that of J&L.|000|response, discussion, family tree, methodology, 4616|Kalyan2019|This is a response to @Jacques2019a, the article on family tree models in historical linguistics and alternative approaches.|000|family tree, response, methodology, 4617|Crumpacker1976|The choice of a proper measure of distance between birthplaces of individuals is basic to the application of MalCcot’s “Isolation by Distance” theory [lo, 11, 121 and also to the “Probability of Consanguineous Marriages” theory developed by Cavalli-Sforza, Kimura, and Barrai [ 5 ] . If these theories are to find important application, the distance measure should have biological relevance. In the case of man it should be reasonably representative of the way in which people traveled during the time period in which a particularpopulation acquired its characteristic breeding structure.|000|geographic distance, population genetics, 4618|Kittlitz2018|Manche Menschen glauben nicht an Gott, aber wenn sie Bach hören, werden sie religiös. Es könnte sein, dass dieser Mann vor 300 Jahren die ideale Musik erfunden hat.|000|Johann-Sebastian Bach, portrait, newspaper article 4619|Norquest2015|This book offers a reconstruction of Proto-Hlai, but unfortunately, it offers little real evidence on the various findings, since it offers correspondence patterns only in a rudimentary fashion, without showing the original data.|000|Tai-Kadai, Proto-Hlai, reconstruction, correspondence patterns, 4620|Marlett2010|Very interesting article that makes exhaustive use of alignments to illustrate similarities in orthographies for the description of the same languages. This is very interesting, because it shows that alignments could also be used for different purposes, compare also @Urban2018a on what he calls reconstitution of another language based on source.|000|alignment, orthography, Mexico, Mesoamerican languages, South American languages, 4621|Urban2018a|This article concerns the extinct and poorly described native speech of the Venezuelan Andes conven- tionally known as Timote-Cuica, in particular its phonetics and phonology. While the available pre- phonemic and unsystematically transcribed corpus of data (consisting of about 900 words and 300 phrases and sentences) has already been analyzed using the method of reconstitution of imperfect data, remarks by the transcribers on the sound of Timote-Cuica have not been taken into account so far. Here, it is shown that these provide valuable clues to the reconstitution of Timote-Cuica pronunciation. In particu- lar, such observations in conjunction with a close analysis of the available data reinforce the idea of the presence of a high central vowel, but also suggest hitherto unrecognized properties, notably the presence of prenasalized stops that likely contrasted with their plain counterparts.|000|South American languages, Timote-Cuica, alignment, Mesoamerican languages, 4622|Urban2018a|Compare, for this use of alignments (indirect here) with @Marlett2010, who uses a similar approach.|000|alignment, Mesoamerican languages, orthography, 4623|Pasquini2019|More than sixty years ago Morris Swadesh proposed to measure the similarity between two languages by considering their overlap, i.e., the percentage of cognates. The overlap was estimated by comparing two ad hoc lists with words corresponding to same meanings for the two languages. The notable assumption of Swadesh was that replacements in a vocabulary occur at a universal constant rate so that the time distance from the eventual common ancestor can be determined. In his view only replacements are relevant while horizontal transfers (borrowings from other languages) are less important and their effect can be eventually taken into account assuming that the divergence is diminished by contact. Later, the mainstream of glottochronology adopted the point of view that the effect of loanwords could be eliminated by careful work devoted to their identification so that they would not affect any measure of distance between languages. The aim of this paper is to show by experimental evidence that horizontal transfers, on the contrary, are a primary aspect of languages evolution since their effect on the vocabulary is at least as important as that of spontaneous replacements. We finally show that this phenomenon severely and unavoidably limits the possibility to fully reconstruct a proto-language. This limitation is fundamental, i.e., it gives a bound which cannot be infringed, independently of the method used for the reconstruction.|000|Romance, glottochronology, statistics, borrowing, language contact 4624|Pasquini2019|Interesting article claiming that borrowing is so constitutive of language evolution that it cannot be excluded in lexicostatistical analyses.|000|Romance, borrowing, statistics, glottochronology, Global Lexicostatistical Database, lexicostatistics 4625|Aaley2017|Dictionary on Kusunda language.|000|dictionary, Kusunda, language isolate, 4626|Amrhein2019|Let’s be clear about what must stop: we should never conclude there is ‘no differ- ence’ or ‘no association’ just because a P value is larger than a threshold such as 0.05[pb] or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.|304|significance, p-hacking, statistics, problems, critics 4627|Augst2008|Overview on classifications of the human lexicon according to word families. Provides a historical overview, and some theoretical introduction.|000|word family, overview, introduction 4628|Beutel2008|Summary on specifics of the Chinese lexicon.|000|lexicon, Wortschatz, word family, 4629|Busse2008|Summary on language contact and its impact on the English lexicon.|000|word family, lexicon, Wortschatz, English, 4630|Cordeiro2019|Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results.|000|compounding, nominal compounds, unsupervised method, automatic approach, distributional semantics, 4631|Cordeiro2019|This study relates to semantics, not the morphological structure of a compound, as far as I understand from quickly reading the paper.|000|compositionality, unsupervised method, distributional semantics 4632|Su2019|The Late Paleogene surface height and paleoenvironment for the core area of the Qinghai-Tibetan Plateau (QTP) remain critically unresolved. Here, we report the discovery of the youngest well-preserved fossil palm leaves from Tibet. They were recovered from the Late Paleogene (Chattian), ca. 25.5 ± 0.5 million years, paleolake sediments within the Lunpola Basin (32.033°N, 89.767°E), central QTP at a present elevation of 4655 m. The anatomy of palms renders them intrinsically susceptible to freezing, imposing upper bounds on their latitudinal and altitudinal distribution. Combined with model-determined paleoterrestrial lapse rates, this shows that a high plateau cannot have existed in the core of Tibet in the Paleogene. Instead, a deep paleovalley, whose floor was <2.3 km above mean sea level bounded by (>4 km) high mountain systems, formed a topographically highly varied landscape. This finding challenges prevailing views on tectonic processes, monsoon dynamics, and the evolution of Asian biodiversity.|000|Sino-Tibetan, Tibet, population genetics, archaelogy, 4633|Jacquesson2005|Introduction to the Deuri language along with a dictionary.|000|dictionary, Sino-Tibetan, Deuri 4634|Hudson2010|This book consists of three parts, each of which is an introduction to a separ- ate discipline:€cognitive science, linguistics (a branch of cognitive science) and English grammar (a branch of linguistics). Part I, called ‘How the mind works’, is a very modest alternative to Steven Pinker’s bestseller of the same name (Pinker 1998a), and is a personal selection of rather commonplace psychological ideas about concepts and mental networks and the activation that flows round them, together with a few novelties such as default inheritance and node building. These ideas are selected so as to provide a foundation for the next part. oundation for the next part. In Part II, ‘How language works’, I make a theoretical point that’s exactly the opposite of the one made famous by Pinker, following the mainstream Chomskyan tradition (Pinker 1994). Where Pinker finds a ‘language instinct’, I find ordinary cognition. Like other ‘cognitive linguists’, I believe that language is very similar to other kinds of thinking. I also believe that the fine details that we linguists find when looking at language tell us a great deal not only about lan- guage, but also about how we think in general. Every single phenomenon that I know about, as a linguist, is just as you’d expect given the way in which (accord- ing to Part I) the mind works. Finally, Part III, ‘How English works’, gives a brief survey of English grammar. The chapter on syntax summarizes my little 1998 textbook English Grammar which supported my first-year undergraduate course on English grammar. The students seemed to enjoy learning to draw dependency arrows and appreciated the idea that this was a skill that they could apply to virtually any English sentence. |000|word grammar, Stephen Pinker, critic, grammatical theory, 4635|Hollenbach2008|Overview on vocabulary and lexicon in Middle American languages.|000|Mesoamerican languages, overview, lexicon, Wortschatz 4636|Geckeler2008|Introduces the beginnings of the idea of lexical fields.|000|overview, lexical field, semantic field, lexicon, Wortschatz 4637|Patacchiola2016|Blog post series on reinforcement learning, containing some critics with respect to machine learning.|000|machine learning, critics, overview, 4638|SchmidtWiegand2008|Overview on onomasiological perspective on the lexicon.|000|onomosiological perspective, concept-based perspective, word formation, lexicon, Wortschatz 4639|Rayfield2008|Overview on characteristics of word families and lexicon of Caucasian languages.|000|Caucasian languages, word family, lexicon, Wortschatz 4640|Stanforth2008|Overview on language contact and the lexicon.|000|language contact, stratification, borrowing, layering, lexicon, Wortschatz 4641|Dingemanse2018|Ideophones (also known as expressives or mimetics, and including onomatopoeia) have been systematically studied in linguistics since the 1850s, when they were first described as a lexical class of vivid sensory words in West-African languages. This paper surveys the research history of ideophones, from its roots in African linguistics to its fruits in general linguistics and typology around the globe. It shows that despite a recurrent narrative of marginalisation, work on ideophones has made an impact in many areas of linguistics, from theories of phonological features to typologies of manner and motion, and from sound symbolism to sensory language. Due to their hybrid nature as gradient vocal gestures that grow roots in discrete linguistic systems, ideophones provide opportunities to reframe typological questions, reconsider the role of language ideology in linguistic scholarship, and rethink the margins of language. With ideophones increasingly being brought into the fold of the language sciences, this review synthesises past theoretical insights and empirical findings in order to enable future work to build on them.|000|ideophones, sound symbolism, 4642|Ebert2000|Glossary on Camling.|000|dictionary, Sino-Tibetan, Camling 4643|Goddard2008|Overview on lexical decomposition and its semantics.|000|natural semantic metalanguage, lexicon, Wortschatz 4644|Behr2015|Reply to article by @Sampson2015.|000|syllabification, Chinese, language history, 4645|Miller1998|THIS book has taken longer to write than we expected. It began as an account of the syntactic structures and discourse devices to be found in spontaneous spoken language. It gradually developed into a comparison of syntax and discourse in spontaneous spoken English, Russian, and German. One motive was to make linguists in English-speaking countries aware of the work that has been carried out elsewhere; another motive was to gather cross-language evidence to support the view that we were dealing with regular structures and not with performance errors. The book also developed into an attempt to demonstrate that the growing body of analyses of spontaneous spoken language was relevant to a number of areas in theoretical linguistics: received views of constructions in particular languages, to theories of the on-line processing of spoken language by humans, to theoretical work in typology, and, perhaps most crucially, to theories of first language acquisition. |000|spoken language, spontaneous spoken language, performance, syntax, Chomsky syntax, complexity, 4646|Miller1998|The central thesis of the book is that the syntactic structure of phrases and clauses in spontaneous spoken language is very different from the structure of phrases and clauses in written language. The differences reside in the complexity of such constructions—what is meant by complexity will be explained in Chapters 3 and 4—and in the types of constructions. There are many types of phrase and clause construction that occur frequently in writing but very rarely in speech and other types that occur frequently in speech but never in writing. A secondary thesis, which forms the subject matter of Chapters 5 and 6, is that the organization of spontaneous spoken discourse is very different from the organization of written discourse, and has its own discourse-organizing devices.|1|spontaneous spoken language, complexity, 4647|Miller1998|**Table of contents** * 1 Introduction * 2 Sentences and Clauses * 3 Clauses: Type, combination, and integration * 4 Noun Phrases: complexity and configuration * 5 Focus constructions * 6 Focus constructions: clefts and like * 7 Historical linguistics and typology * 8 Written language, first language acquisition, and education * Epilogue|000|spontaneous spoken language, spoken language, historical linguistics, theoretical problems, language variation 4648|Miller1998|Very important book discussing specifically to which degree spoken language differs from language in its written form, and what impact this has on historical linguistics, linguistic typology, and the study of syntax.|000|spontaneous spoken language, spoken language, historical linguistics, theoretical problems, language variation, 4649|Mensh2017|Good scientific writing is essential to career development and to the progress of science. A well-structured manuscript allows readers and reviewers to get excited about the subject matter, to understand and verify the paper’s contributions, and to integrate these contributions into a broader context. However, many scientists struggle with producing high-quality manuscripts and are typically untrained in paper writing. Focusing on how readers consume information, we present a set of ten simple rules to help you communicate the main idea of your paper. These rules are designed to make your paper more influential and the process of writing more efficient and pleasurable.|000|paper, writing, scientific practice, guidelines, 4650|Mensh2017|Useful study on how to write papers in a way that makes them easy to read and digest.|000|writing, paper, guidelines, scientific practice, 4651|Coelho2019|Although many hypotheses have been proposed to explain why humans speak so many languages and why languages are unevenly distributed across the globe, the factors that shape geographical patterns of cultural and linguistic diversity remain poorly understood. Prior research has tended to focus on identifying universal predictors of language diversity, without accounting for how local factors and multiple predictors interact. Here, we use a unique combination of path analysis, mechanistic simulation modelling, and geographically weighted regression to investigate the broadly described, but poorly understood, spatial pattern of language diver- sity in North America. We show that the ecological drivers of language diversity are not universal or entirely direct. The strongest associations imply a role for previously developed hypothesized drivers such as popu- lation density, resource diversity, and carrying capacity with group size limits. The predictive power of this web of factors varies over space from regions where our model predicts approximately 86% of the variation in diversity, to areas where less than 40% is explained.|000|linguistic diversity, model, modeling, simulation studies, correlational studies, 4652|Coelho2019|Interesting in so far as it shows that the ecological factors are not necessarily good predictors.|000|linguistic diversity, ecology, 4653|Abuarrah2018|Some traditional accounts view literal meaning (LM) as the central component in the process of meaning interpretation. This paper supports this view while adding that LM is the first but not the only piece of evidence available to the hearer of the speaker’s meaning. After critically evaluating examples from previous studies and my own examples, the study concludes that discourse comprehension is a sequential and graded process. To understand the significance of LM as evidence in the process of meaning understanding, the study has to reconsider the notion of evidence according to Relevance Theory (RT) and define the vigorously debated term of LM. The results from this study suggest that literal meaning is initial and context is subsequential; while both co-determine the speaker’s meaning in implicature, the latter enriches the speaker’s meaning into a higher order speech act in explicature.|000|semantics, literal meaning, 4654|Eberhard2005|Presents to wordlists of Northern Nambiquara languages and compares their cognate scores based on a Swadesh list plus additional items.|000|Northern Nambiquara, Southern American languages, comparative wordlist, 4655|Isphording2019|We study the effect of ethno-linguistic classroom composition in college on educational performance, educational choices and post-graduation migration in a setting of quasi- random assignment to undergraduate seminars at a British university. We focus on two core variables: the share of non-English-speaking students and the diversity within the group of non-English-speaking students with respect to their linguistic background. English-speaking students are largely unaffected by the ethno-linguistic classroom composition. Non-English- speaking students benefit from a larger diversity in their performance and increase their interaction with English-speaking students. Educational choices of non-English-speaking students become more similar to choices of English-speaking students in response to more diverse classes. Post-graduation, non-English students who have been assigned to higher shares of non-English students in the compulsory stage are more likely to leave the country. Our results imply that current levels of internationalisation do not impose a threat to native education. Avoiding segregation along ethnic lines is key in providing education for an internationalised studentship.|000|education, classroom, diversity, 4656|Holman2017|Since the early 1970s, biologists have debated whether evolution is punctuated by speciation events with bursts of cladogenetic changes, or whether evolution tends to be of a more gradual, anagenetic nature. A similar discussion among linguists has barely begun, but the present results suggest that there is also room for controversy over this issue in linguistics. The only previous study correlated the number of nodes in linguistic phylogenies with branch lengths and found support for punctuated equilibrium. We replicate this result for branch lengths, but find no support for punctuated equilibrium using a different, automated measure of linguistic divergence and a much larger data set. With the automated measure, segments of trees containing more nodes show no greater divergence from an outgroup than segments containing fewer nodes.|000|simulation studies, punctuational change, ASJP, 4657|Nguyen2019|The combination of pitch and glottalization (glottal constriction or lapse into creaky voice) as relevant phonetic/phonological dimensions of lexical tone is found in several language families in Asia. The Vi- etic subbranch of Austroasiatic stands out in that all its languages have at least one glottalized tone. Vietnamese is a well-documented example, but the others remain little-studied. The research reported here contributes experimental evidence on one of these languages: Muong (Mường). Excerpts from a database of audio and electroglottographic record- ings of twenty speakers allow for a characterization of this dialect's glottalized tone, as contrasted with the four other tones of this five-tone system. The ul- timate goal is to determine what (sub)types of glot- talized tones exist in the world's languages, bringing out typological differences in terms of (i) phonetic realizations and (ii) degree of importance of glottal- ization as a feature of linguistic tones.|000|Muong, Vietic, Austro-Asiatic, tone, glottalization, creaky voice, phonetics, 4658|Pellard2018|Le livre à succès de l’archéologue Jean-Paul Demoule Mais où sont passés les Indo-Eu- ropéens ? (Seuil, 2014) met en doute l’existence d’une langue ancestrale à la famille indo- européenne sur la base de critiques portées à la linguistique indo-européenne et à la lin- guistique historique en général. Nous montrons ici que ces critiques reposent sur une docu- mentation biaisée, et qu’elles comportent de nombreuses erreurs et contresens, dont nous présentons une sélection. Nous examinons les alternatives potentielles à l’idée d’une langue ancestrale : pidginisation, créolisation, interactions dans le cadre d’une Sprachbund, forma- tion de langues mixtes par contact mutuel prolongé, et montrons que toutes échouent à rendre compte des flexions verbales, nominales et pronominales communes aux diverses branches de la famille. Enfin, nous rejetons l’équation entre la linguistique indo-européenne et les idéologies racistes. Nous réaffirmons, s’il en était besoin, le caractère scientifique et non idéologique de la linguistique historique indo-européenne.|000|Indo-European, critics, methodology, linguistic reconstruction, Indo-European, 4659|Kleinert2009|The article deals with two widespread mistakes concerning Galileo: A false quotation and a translation error. The quotation reads as follows: “Measure what is measurable, and make measurable what is not so.” Although Galileo is quoted with these words in a large number of publications, the authenticity of the sentence is highly dubious because no one has ever provided a precise bibliographical reference for where to find it in Galileo’s works. Galileo’s alleged rule about measurement can be traced back to the works of two nineteenth century French scholars. This phrase was subsequently picked up by some internationally renowned scientists, who were responsible for its dissemination in German and English books and articles. The two English versions of the measurement quotation published by Hermann Weyl in the late forties of the last century strongly contributed to its worldwide diffusion. The sentence was even re-translated into German and French, and in recent scientific textbooks it is frequently used in or- der to characterize the methods of modern science. Notwithstanding its increasing popularity, referring to this expression as a quotation from Galileo is a striking example of academic sloppiness. The translation error concerns the name of the Roman Academy of which Galileo was a member. Referring to its emblem, a lynx, Accademia dei lincei is often translated as “Academy of (the) Lynxes”. But the Italian noun for lynx is feminine (la lince), and the Italian translation of “Academy of the Lynx- es” would be Accademia delle linci. The adjective linceo, however, means “lynx-eyed” in the sense of “sharp-sighted”, and therefore the correct translation of Accademia dei lincei is “Academy of the Lynx-Eyed”.|000|quotes, nice quote, Galileo Galilei, history of science, 4660|Filko2019|In this paper we present CroDeriV – a large morphological database for Croatian. Croatian is a Slavic language with rich morphology and numer- ous derivational processes. A derivational database consisting of morpho- logically analyzed lexemes which are connected into derivational families via shared roots is an essential language resource. So far, the derivational database of Croatian consisted solely of verbs. Here we will present its ex- pansion with adjectives.|000|word family, Croatian, database, word formation, 4661|Filko2019|In this paper we present CroDeriV – a large morphological database for Croatian. Croatian is a Slavic language with rich morphology and numer- ous derivational processes. A derivational database consisting of morpho- logically analyzed lexemes which are connected into derivational families via shared roots is an essential language resource. So far, the derivational database of Croatian consisted solely of verbs. Here we will present its ex- pansion with adjectives.|000|word family, Croatian, database, word formation, 4662|Filko2019|Contains a large derivation graph for words derived from «glas» (speak).|000|word derivation, Croatian, visualization, database 4663|Coblin2019|Article gives a very good overview on the traditional approach to stratification in historical linguistics.|000|Chinese, stratification, Chinese dialects, linguistic reconstruction, 4664|Coblin2019|Use of the comparative method of phonological reconstruction is standard in most of the world today. But in China it remains controversial as a tool for the comparative study of modern Chinese dialects, for various historical and cultural reasons and also because the Sinitic languages pose certain language-specific problems for the comparativist. The exact nature of this unusual state of affairs, and a suggested resolution thereof, are the primary topics of the present paper.|000|Chinese, stratification, Chinese dialects, linguistic reconstruction, 4665|Coblin2019|Two points are significant here, one major and the other minor. First, Bloomfield surely knew exactly how the various sets in each of these groups differed semantically and/or in other respects. And he of course also knew the correspondence patterns and concomitant reconstructive procedures he had formulated for dealing with such data in his 1925 paper. When he compiled these sets and posited his reconstructions for them, he probably drew on this mentally stored information virtually instantaneously as needed and in no set order, rather than proceeding mechanically, step by step, as a computer would do. But what is of primary interest to us here is that, had he for some reason been compelled to move forward in such a stepwise manner, it is likely that he could have first assembled his sets by gross inspection of the phonological forms and general semantic meanings of the comparanda. Then he could have posited his reconstructions without drawing on further information about the data. And, finally, when all this had been done he could have turned to the questions of exactly how the sets differed, whether semantically or otherwise, and what there was in the forms themselves, or elsewhere in the languages, that signaled these differences. In other words, had it been necessary, he could have performed the task of phonological reconstruction without reference to the precise semantic or other content of the sets. What was essential was gross semantic identity and systematic and patterned structural parallelism within each set. Determining and semantic differences between then, and how and why these existed, could have been deferred to a later stage of the investigation, had that been necessary or preferable.|8|Chinese, stratification, correspondence patterns, Chinese dialects, linguistic reconstruction, 4666|Stetsyuk1987|Very interesting early attempt of a network to illustrate genetic similarities among languages, here, Slavic languages, based on a large vocabulary (more than 1000 Slavic roots).|000|network approaches, phylogenetic reconstruction, Slavic languages, automatic approach, 4667|Chandra2019|In current generative terms, individual features trigger small-scale micro and nano-level differences among mutually intelligible varieties with shared geography (cf. Barbiers 2009, Kayne 2000, 2013). However, as we show in this paper, dialects may also exhibit macro-level differences such as in the domain of case alignment. Specifically, we employ novel data on ergativity from Braj, a Western Indo-Aryan language, to present two such instances. First, despite a rigid ergative system in the transitive domain, some Braj varieties have undergone a macro-level change in the unergative domain by opting for phi-triggering, unmarked/nominative subjects. Another instance of a macro-level difference is provided by the duality of grammars within two registers of the same Braj variety. The occurrence of such macro-level differences at the dialectal level is unexplained in the literature, which advocates a complete separation of big, structural differences from featural variation (Baker 2008). Our submission is that structural differences also define dialects and registers, though they are mostly restricted to specific domains, unlike those found in typologically distinct languages with typical cascading effects.|000|Indo-Aryan languages, dialects, linguistic diversity, 4668|Munoz2018|En el presente artículo tratamos de mostrar una panorámica en la evolución de los estudios y la datación de lenguas an- tiguas y protolenguas. La glotocronología iniciada por M. Swadesh, pese a las polémi- cas y críticas suscitadas, ha evolucionado enormememente gracias a la incorporación de elementos estadísticos y etimológicos. Las nuevas metodologías utilizan grandes bases de datos y métodos relacionados con la filo- genética. Proyectos como Ielex, ASJP o GDL muestran la importancia de la léxico-estadís- tica actualmente y constituyen un elemento de gran valor para los investigadores así como un campo con enormes posibilidades dentro del aula.|000|overview, phylogenetic reconstruction, historical linguistics, glottochronology, 4669|Munoz2018|Article useful, since it provides Spanish terminology for questions of phylogenetic reconstruction, so helpful when discussing things in Spanish on these matters.|000|phylogenetic reconstruction, terminology, Spanish 4670|Louwerse2009|Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathe- matical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordi- nates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.|000|geography, spatial cognition, Weltwissen, knowledge organization, 4671|Louwerse2009|Important article for the question of the degree to which languages encode geographic information. Can be used as some kind of a model for studies on orientation prefixes and similar aspects.|000|geography, spatial cognition, multi-dimensional scaling, MDS 4672|Hitchcock199X|Paper was never published but now shared as a draft, dealing with the treatment of probabilities in @Ringe1995.|000|chance resemblance, statistics, proof of relationship, 4673|Anceaux1961|It would take us too far afield to give all the words from the lists which could be considered for comparative purposes, so I have curtailed the material by about half. The choice is somewhat arbitrary, preference having been given to words which occurred in the greatest number of lists. The Basic Vocabulary, i.e. the vocabulary for simple, everyday concepts, is relatively well represented in consequence. The collection of material was, in fact, based on this preference for words from the Basic Vocabulary, on the assumption that there would be the fewest cases of borrowing from other languages, the obscuring effects of mutual borrowings would thereby be eliminated as far as possible, and lexical similarities and dissimilarities would more clearly demon- strate the degree of relationship.|13|comparative wordlist, dataset, Swadesh list, concept list, Papua New Guinea, Trans-New-Guinean languages, 4674|Jacques2019|This paper proposes that the labial causative prefixes found in various Trans-Himalayan languages of North-Eastern India are not innovations as is generally assumed. Instead, it is argued that they are related to labial causative prefixes found in Rgyalrongic languages, whose traces are perhaps attested in other branches of the family, and a bilabial prefix that derived stative verbs into transitive verbs is potentially reconstructible to proto-Trans-Himalayan.|000|causative, Sino-Tibetan, labial prefix, prefix, word formation, 4675|Reesink2010|This paper offers some thoughts on the question what effect language has on the understanding and hence behavior of a human being. It reviews some issues of linguistic relativity, known as the “Sapir-Whorf hypothesis,” suggesting that the culture we grow up in is reflected in the language and that our cognition (and our worldview) is shaped or colored by the conventions developed by our ancestors and peers. This raises questions for the degree of translatability, illustrated by the comparison of two poems by a Dutch poet who spent most of his life in the USA. Mutual understanding, I claim, is possible because we have the cognitive apparatus that allows us to enter different emic systems.|000|Sapir-Whorff hypothesis, translation, Dutch poems, cognition, 4676|Waelchli2009|Confronting low data reduction typologies, as established by using data from parallel texts, with the high data reduction typologies of WALS reveals a sys- tematic bias of WALS typologies toward highly bimodal distribution. Prop- erties with a distribution supporting a discrete feature analysis in many lan- guages are likelier to be represented in WALS and to be represented accu- rately. This bias has important consequences when WALS typologies are in- terpreted theoretically or further processed statistically.|000|WALS, critic, parallel corpus, typology, modeling, 4677|Sookias2019|The neglected concept of homoiology is discussed in the context of palaeontological phylogenetic methods. Homoiology is the case of homoplasy where a parallel or convergent phenotypic development is actually due to shared ancestry, whether a common inherited genetic mechanism or simply a shared initial phenotype. A number of parallelisms/convergences in vertebrate and linguistic evolution are discussed to illustrate the concept of homoiology. It is proposed that parallelisms/convergences, although probably not useful in initial tree inference from morphological data, must not be simply written off as phylogenetically uninformative, but can actually be potentially used to provide additional support values for clades and to help to choose between equally parsimonious or likely trees. An R function is provided to calculate two measures for a given tree and matrix: (a) the potential support for clades based on potential homoiologies; and (b) the fit of the tree to all states given the concept of homoiology. The relationship between homoiology and constraints and to underlying mechanisms is also discussed.|000|homoiology, homology, terminology, evolutionary biology, 4678|Voogt2019|The presence of clicks in Southern African languages has inspired theories of human language history since first encountered by European travelers. Catford (1997) points out their early notions about clicks were often rooted in historical bias rather than in empirical evidence. The following discussion highlights potentially problematic terms that continue to be used in the literature on clicks and language evolution, in particular, that clicks are exotic, not human or rare.|000|clicks, phoneme inventory, African languages, 4679|Eckart2012|To annotate a resource usually means to enhance it with various types of (linguistic) information [McEnery/Wilson 2001, page 32]. This is done by attaching some kind of information to parts of the resource and/or introduce relations between those parts which can again be annotated with information. Thereby the annotated information is always an interpretation of the data with respect to a particular understanding.|000|annotation, linguistic annotation, introduction, overview 4680|Eckart2012|Annotations have mostly been attached to the related parts of the resource as inline annotations or stand-off annotations.|31|inline annotation, stand-off annotation, annotation, definition 4681|Adams2009|This article systematically explores the concept of flow in rap music, with the goal of understanding how rappers’ uses of flow contribute both to the surface rhythmic vitality of a song and to deeper levels of musical meaning. I will explain the three most significant metrical techniques that constitute a rapper’s flow, and give examples of rap songs using each technique. The article concludes with some thoughts on how changes in flow as rap music evolved contributed not only to different style features, but also to value judgments by both rappers and audiences|000|rap music, music, metrics, meter, rhymes, rhyme annotation, 4682|Adams2009|Interesting article showing some interesting visualizations in the online version of the journal.|000|rhyme annotation, rap music, 4683|Apel1961|A book illustrating early European styles of musical notation.|000|music, musical notation, introduction 4684|Bedford2018|Two recent papers, by Lipson et al. and @Posth<2018> et al., have challenged current interpretations of the initial settlement of Remote Oceania. We invited Stuart Bedford, who is an author on both papers, to outline their importance, and a number of scholars in various disciplines to comment on their findings|000|Vanuatu, ancient DNA, review 4685|Ciobanu2019|Dissertation by the author summarizing her work on studying language relatedness and similarities among words and similar.|000|algorithms, phonetic alignment, cognate detection 4686|Collins2019|In this short paper, I elaborate on previous work by Givón (1971) and Aristar (1991) to argue that a substantial part of the well-known word-order correlations is best explained by grammaticalisation processes. Functional-adaptive accounts in terms of processing or learning constraints are currently weakly substantiated, and they suffer from the fact that they do not adequately control for language-internal inher- itance patterns. More generally, historical relatedness between different types of phrases constitutes an important confound in typological research, one that needs to be taken seriously before word-order correlations are motivated by anything other than the diachronic patterns that link the word order pairs in question.|000|grammaticalization, word order, linguistic typology 4687|Haspelmath2019a|This paper addresses a recent trend in the study of language variation and univer- sals, namely to attribute cross-linguistic patterns to diachrony, rather than to other causal factors. This is an interesting suggestion, and I try to make the basic con- cepts clearer, by distinguishing clearly between language-particular regularities, universal tendencies, and mere recurrent patterns, as well as three kinds of causal factors (preferences, constraints, restrictions). I make four claims: (i) Explanations may involve diachrony in different ways; (ii) for causal explanations of universal tendencies, one needs to invoke mutational constraints (change constraints); (iii) in addition to mutational constraints, we need functional-adaptive constraints as well, as is clear from cases of multi-convergence; and (iv) successful functional- adaptive explanations do not depend on understanding the precise pathways of change.|000|grammaticalization, universals, linguistic typology, 4688|Jackson2018|Literature and art have long depicted God as a stern and elderly white man, but do people actually see Him this way? We use reverse correlation to understand how a representative sample of American Christians visualize the face of God, which we argue is indicative of how believers think about God’s mind. In contrast to historical depictions, Americans gener- ally see God as young, Caucasian, and loving, but perceptions vary by believers’ political ideology and physical appearance. Liberals see God as relatively more feminine, more Afri- can American, and more loving than conservatives, who see God as older, more intelligent, and more powerful. All participants see God as similar to themselves on attractiveness, age, and, to a lesser extent, race. These differences are consistent with past research showing that people’s views of God are shaped by their group-based motivations and cognitive biases. Our results also speak to the broad scope of religious differences: even people of the same nationality and the same faith appear to think differently about God’s appearance.|000|psychology, God, 4689|Jeszensky2018|Finding the boundaries of linguistic variants and studying transitions between variants are key interests in classical linguistic geography. However, the definition of boundaries in areal linguistics is vague, and a quantitative characterization of transitions at the interface between dialectal variants is missing. We conceptualize these transitions as gradients, aiming to quantitatively account for the transition patterns which are traditionally only implicitly inferred from visualizations. Fitting of logistic functions in different spatial scopes (profiles as well as surfaces) is proposed as an approach to model the transition at the interface between the dominant usage areas of dialectal variants. Logistic functions can accommodate the breadth of boundary concepts, ranging from sharp isoglosses to transitions with different gradualities. The parameters of the fitted logistic models as well as supplementary measures then allow for the quantitative characterization and comparison of transitions across variables. To demonstrate the proposed methodology, we use Swiss German syntactic data on dialectal variables with a single transition zone.|000|automatic approach, boundary detection, dialectology, 4690|Jeszensky2018|Method seems to be able to detect whether a transition of variables on geographic space is gradual or abrupt.|000|transition, boundary detection, dialectology, 4691|Kuijken2013|When reading most twentieth- or twenty-first-century scores, trained musicians can hear them quite precisely in their “mind’s ear.” The exact instrumentation is given; the characteristics of the instruments are familiar; standard modern pitch and equal temperament are pre- supposed; tempo is prescribed by metronome markings; rhythm, phrasing, articulation and dynamics are clearly indicated; the realiza- tion of the few ornament signs is obvious; even the playing techniques and sound colors are accurately notated. Except in pieces that include aleatoric composition techniques or improvisation, performers do not have much room for adding individual accents or textual changes. This adherence to the written text is exactly what many composers wanted. Consequently, this kind of traditionally notated composition can be studied quite accurately from the score.|1|score, musical notation, annotation, 4692|Gerou1996|This is a dictionary on musical notation which explains, as far as I understand, the state of the art in current musical notation systems.|000|musical notation, annotation, overview, introduction 4693|Pagel2019|A puzzle of language is how speakers come to use the same words for particular meanings, given that there are often many compet- ing alternatives (e.g., “sofa,” “couch,” “settee”), and there is sel- dom a necessary connection between a word and its meaning. The well-known process of random drift—roughly corresponding in this context to “say what you hear”—can cause the frequencies of alternative words to fluctuate over time, and it is even possible for one of the words to replace all others, without any form of selec- tion being involved. However, is drift alone an adequate explanation of a shared vocabulary? Darwin thought not. Here, we apply models of neutral drift, directional selection, and positive frequency- dependent selection to explain over 417,000 word-use choices for 418 meanings in two natural populations of speakers. We find that neutral drift does not in general explain word use. Instead, some form of selection governs word choice in over 91% of the meanings we studied. In cases where one word dominates all others for a particular meaning—such as is typical of the words in the core lexicon of a language—word choice is guided by positive frequency-dependent selection—a bias that makes speakers disproportionately likely to use the words that most others use. This bias grants an increasing advantage to the common form as it becomes more popular and provides a mechanism to explain how a shared vocabulary can spon- taneously self-organize and then be maintained for centuries or even millennia, despite new words continually entering the lexicon.|000|selection, language change, word choice, denotation, 4694|Pagel2019|The authors use the data of http://www.let.rug.nl/~kleiweg/lamsas/ (LAMSAS project) to assess the word choice for a given lexeme. This data is particularly interesting to assess differences in expression, as it lists variants given as responses.|000|dialectology, word choice, denotation, lexical change 4695|Cuskley2019|We report associations between vowel sounds, graphemes, and colors collected online from over 1,000 Dutch speakers. We also provide open materials, including a Python implementation of the structure measure and code for a single-page web application to run simple cross-modal tasks. We also provide a full dataset of color–vowel associations from 1,164 participants, including over 200 synesthetes identified using consistency measures. Our analysis reveals salient patterns in the cross-modal associations and introduces a novel measure of isomorphism in cross-modal mappings. We found that, while the acoustic features of vowels significantly predict certain mappings (replicating prior work), both vowel phoneme category and grapheme category are even better predictors of color choice. Phoneme category is the best predictor of color choice overall, pointing to the importance of phonological representations in addition to acoustic cues. Generally, high/front vowels are lighter, more green, and more yellow than low/back vowels. Synesthetes respond more strongly on some dimensions, choosing lighter and more yellow colors for high and mid front vowels than do nonsynesthetes. We also present a novel measure of cross-modal mappings adapted from ecology, which uses a simulated distribution of mappings to measure the extent to which participants’ actual mappings are structured isomorphically across modalities. Synesthetes have mappings that tend to be more structured than nonsynesthetes’, and more consistent color choices across trials correlate with higher structure scores. Nevertheless, the large majority (~ 70%) of participants produce structured mappings, indi- cating that the capacity to make isomorphically structured mappings across distinct modalities is shared to a large extent, even if the exact nature of the mappings varies across individuals. Overall, this novel structure measure suggests a distribution of structured cross-modal association in the population, with synesthetes at one extreme and participants with unstructured associations at the other.|000|color terms, synaesthesia, empirical study, 4696|Reiss1991|Book introduces some theory on translation, including the concept of adequacy, beyond which the study goes.|000|translation theory, adequacy, introduction, 4697|Feld2019|In this article, we suggest that lexicostatisticians should seek to quantify the magnitude of sampling error using the McNemar test, a version of the widely-used chi-square test. While none of the lexicostatistical studies we examined provide enough data to perform the test directly, reported cognate percentages neverthe- less allow us to infer which McNemar test results are possible. We find that none of these studies is likely to be statistically significant at the conventional 5% signifi- cance level because their sample sizes are too small. To be more transparent about this shortcoming, we suggest that researchers include p-values with their lexico- statistical estimates. We also suggest that the branches of lexicostatistical dendro- grams should show whether language differences are statistically significant.|000|lexicostatistics, sampling error, errors, sample size, Slavic, inter-annotator agreement, 4698|Ratliff2018|`*`mbl-/`*`mbr- (Ratliff 2010) and `*`m.l (ɣ) -/`*`m.r (ɣ) - (Ostapirat 2016) have been proposed as reconstructions for correspondence sets that include NCL-, CL-, N-, and C- onsets across the Hmong-Mien family. Ostapirat assumes that the stop arose by a regular rule of epenthesis in the protolanguage. I examine the arguments for these two reconstructions and conclude that epenthesis in an onset is not without cross-linguistic support, but it is not the better analysis in this case. The arguments against a regular epenthesis rule for Hmong-Mien are based primarily on laryngeal contrasts in stops occurring in this position and the relationship of NCL- onsets to Proto-Hmong-Mien prenasalized stops. Secondary arguments involve exceptions to an epenthesis rule, and a reconsideration of the loanword evidence.|000|epenthesis, Proto-Hmong-Mien, linguistic reconstruction, sound change, 4699|SchmidtkeBode2019|In this epilogue, we summarize and reflect on the major threads and arguments from the individual contributions to this volume (§1), and we also briefly outline some challenges and directions for future work on the topic (§2).|000|linguistic typology, universals, grammaticalization, discussion 4700|SchmidtkeBode2019|The article addresses work by @Haspelmath2019a and @Collins2019 and @Levshina2019 and @Serzant2019.|000|discussion, grammaticalization, universals, linguistic typology 4701|Levshina2019|The scarcity of diachronic data represents a serious problem when linguists try to explain a typological universal. To overcome this empirical bottleneck, one can simulate the process of language evolution in artificial language learning exper- iments. After a brief discussion of the main principles and findings of such ex- periments, this paper presents a case study of causative constructions showing that language users have a bias towards the efficient organisation of communica- tion. They regularise their linguistic input such that more frequent causative situ- ations are expressed by shorter forms, and less frequent situations are expressed by longer forms. This supports the economy-based explanation of the universal form-meaning mapping found in causative constructions of different languages.|000|causative, artificial language learning, 4702|Serzant2019|Standard typological methods are designed to test hypotheses on strong universals that broadly override all other competing universal and language-specific forces. In this paper, I argue that there exist also weak universal forces. Weak universal forces systematically operate in the course of development but then interact with, or are even subsequently overridden by, other processes such as analogical exten- sion, persistence effects from the source function, etc. This, in turn, means that there can be statistically significant evidence for violations at the synchronic level and, accordingly, only a weak positive statistical signal. But crucially, the absence of statistical prima-facie evidence for such forces does not amount to evidence for their absence. The assumption that there are also weak universal forces that affect language evolution goes in line with the view that human cognition in general and language acquisition in particular are constrained by probabilistic biases of differ- ent range, including weak ones (cf. Thompson et al. 2016). By way of example, the present paper claims that the discriminatory function of case in differential object marking (DOM) systems is a weak universal: It keeps appearing in historically, syn- chronically and typologically very divergent constellations but is often overridden by other processes in further developments and is, therefore, not significant at the synchronic level in a large sample.|000|universals, grammaticalization, linguistic typology 4703|Winters2019|Humans commit information to graphic symbols for three basic reasons: as a memory aid, as a tool for thinking, and as a means of communication. Yet, despite the benefits of transmit- ting information graphically, we still know very little about the biases and constraints acting on the emergence of stable, powerful, and accurate graphic codes (such as writing). Using a reference game, where participants play as Messengers and Recipients, we experimentally manipulate the function of the task (communicative or non-communicative) and investigate whether this shapes the emergence of stable, powerful, and accurate codes for both syn- chronous and asynchronous modes of information transfer. Only in the Dialogue condition, where Messenger and Recipient are two different persons communicating within the same time frame (i.e., synchronously), do we consistently observe the emergence of stable, powerful, and accurate graphic codes. Such codes are unnecessary for participants in Recall, where Messenger and Recipient are the same person transferring information within the same time frame, and they fail to emerge in Correspondence, where Messenger and Recipient are two different per- sons communicating across time frames (i.e., asynchronously). Lastly, in the Mnemonic condi- tion, where Messenger and Recipient are the same person at different points in time, participants achieve high accuracy but with codes that are suboptimal in terms of power and stability. Our results suggest that the rarity and late arrival of stable, powerful, and accurate graphic codes in human history largely stems from strong constraints on information transfer. In particular, we suggest that these constraints limit a code’s ability to reach an adequate trade- off between information that needs to be explicitly encoded and information that needs to be inferred from context.|000|graphic code, writing systems, artificial language learning, 4704|Suchard2019|For nearly a thousand years, the texts of the Hebrew Bible were transmitted both in writing, as consonantal texts lacking much of the information about their pronunciation, and orally, as an accompanying reading tradition which supplied this information. During this period of oral trans- mission, sound changes affected the reading tradition. This paper identifies a number of sound changes that took place in the reading tradition by comparing their effects on Biblical Hebrew to those on Biblical Aramaic, the related but distinct language of a small part of the biblical corpus. Sound changes that affect both languages equally probably took place in the reading tradition, while those that are limited to one language probably preceded this shared oral transmission. Drawing this distinction allows us to reconstruct the pronunciation of Biblical Aramaic as it was fixed in the reading tradition, highlighting several morphological discrepancies between the dialect underlying it and that of the consonantal texts.|000|Biblical Hebrew, Aramaic, sound change, linguistic reconstruction 4705|Suchard2019|Paper is a nice illustration why alignments are so useful. If we want to assess sound change in real examples, we need to provide alignnments.|000|example, sound change, phonetic alignment, 4706|Isphording2014|There are various degrees of similarity between the languages of different immigrants and the language of their destination country. This linguistic distance is an obstacle to the acquisition of a language, which leads to large differences in the attainments of the language skills necessary for economic and social integration in the destination country. This study aims at quantifying the influence of linguistic distance on the language acquisition of immigrants in the US and in Germany. Drawing from comparative linguistics, we derive a measure of linguistic distance based on the automatic comparison of pronunciations. We compare this measure with three other linguistic and non-linguistic approaches in explaining self-reported measures of language skills. We show that there is a strong initial disadvantage from the linguistic origin for language acquisition, while the effect on the steepness of assimilation patterns is ambiguous in Germany and the US.|000|linguistic distance, language learning, second language learning, 4707|Isphording2014|Interesting aspect since the authors discuss how languages can be learned more easily, using comparative linguistic tools to measure linguistic distance. This shows another area where comparative linguistics may become important, namely in assisting research on second language learning.|000|linguistic distance, second language learning, 4708|Sahle2018|The emerging evidence supports an African origin and out-of-Africa dispersal of modern humans. However, the details of these events are still largely unknown. Fossil and genetic evidence is converging on a consensus of deep roots for the sapiens lineage and a more recent evo- lution of populations with highly encephalized, globular crania, thus far first observed in the fossil record of eastern Africa. Determining how this hallmark anatomically modern trait is related to the demographic success of modern humans will require further work, but it seems that,|000|Out-of-Africa, origin of modern humans, human dispersal, population genetics, 4709|Whitehouse2019|The origins of religion and of complex societies represent evolutionary puzzles 1–8 . The ‘moralizing gods’ hypothesis offers a solution to both puzzles by proposing that belief in morally concerned supernatural agents culturally evolved to facilitate cooperation among strangers in large-scale societies 9–13 . Although previous research has suggested an association between the presence of moralizing gods and social complexity 3,6,7,9–18 , the relationship between the two is disputed 9–13,19–24 , and attempts to establish causality have been hampered by limitations in the availability of detailed global longitudinal data. To overcome these limitations, here we systematically coded records from 414 societies that span the past 10,000 years from 30 regions around the world, using 51 measures of social complexity and 4 measures of supernatural enforcement of morality. Our analyses not only confirm the association between moralizing gods and social complexity, but also reveal that moralizing gods follow—rather than precede— large increases in social complexity. Contrary to previous predictions 9,12,16,18 , powerful moralizing ‘big gods’ and prosocial supernatural punishment tend to appear only after the emergence of ‘megasocieties’ with populations of more than around one million people. Moralizing gods are not a prerequisite for the evolution of social complexity, but they may help to sustain and expand complex multi-ethnic empires after they have become established. By contrast, rituals that facilitate the standardization of religious traditions across large populations 25,26 generally precede the appearance of moralizing gods. This suggests that ritual practices were more important than the particular content of religious belief to the initial rise of social complexity.|000|moralizing gods, complex societies, anthropology, 4710|Gibson2019|A potential solution to this problem lies in the classic learning problem for language: how learners can acquire facility with an infinitely expressive system mapping forms to meanings from limited exposure? For the most frequently intended meanings, learners encounter suffi- cient learning instances to support a holistic, memorized relationship between utterance form and meaning, such as the English greeting ‘Hello’. But in light of the complete repertoire of utterance meanings that a speaker may need to convey or understand, the totality of any native speaker's linguistic experience is extraordinarily sparse. For our species, the solution to this problem is that language is compositional: smaller meaningful forms can be put together into a novel, larger form whose meaning is a predictable function of its parts [5]. :comment:`Hocket 1960` By mastering the basic units and composition functions of a compositional system through simplicity principles [128,133, @Smith2017a ] a learner can acquire an infinitely expressive set of form–meaning mappings. Humans might have a strong inductive bias constraining language learning to some set of compositional systems, which may have emerged as a side effect of multiple evolutionary events in our lineage [134] or which may have been selected for directly [135]. An alternative, usage-based possibility is that languages might have become compositional through diahronic selection from the transmission bottleneck in their cultural evolution [136–138]. Even if a noncompositional system were perfectly human-learnable from sufficient evidence, it could not survive over generations, due to input sparsity for any individual learner.|p12|compositionality, language learning,language acquisition, 4711|Gibson2019|Cognitive science applies diverse tools and perspectives to study human language. Recently, an exciting body of work has examined linguistic phenom- ena through the lens of efficiency in usage: what otherwise puzzling features of language find explanation in formal accounts of how language might be opti- mized for communication and learning? Here, we review studies that deploy formal tools from probability and information theory to understand how and why language works the way that it does, focusing on phenomena ranging from the lexicon through syntax. These studies show how a pervasive pressure for efficiency guides the forms of natural language and indicate that a rich future for language research lies in connecting linguistics to cognitive psychology and mathematical theories of communication and inference.|000|communication, communicative efficiency, linguistic diversity, linguistic complexity, review, overview, 4712|Gibson2019|Very useful review article, touches on important topics, such as word length, compositionality, number of words a speaker knows, tradeoff morphology and syntax. A very useful article to cite whenever one is less sure about a topic.|000|frequency, Zipf's law, communicative efficiency, linguistic complexity, cognition, review, overview, compositionality 4713|Smith2017a|Linguistic universals arise from the interaction between the processes of language learning and language use. A test case for the relationship between these factors is linguistic variation, which tends to be conditioned on linguistic or sociolinguistic criteria. How can we explain the scarcity of unpredictable variation in natural language, and to what extent is this property of language a straightforward reflection of biases in statistical learning? We review three strands of experimental work exploring these questions, and introduce a Bayesian model of the learning and transmission of linguistic variation along with a closely matched artificial language learning experiment with adult participants. Our results show that while the biases of language learners can potentially play a role in shaping linguistic systems, the relationship between biases of learners and the structure of languages is not straightforward. Weak biases can have strong effects on language structure as they accumulate over repeated transmission. But the opposite can also be true: strong biases can have weak or no effects. Furthermore, the use of language during interaction can reshape linguistic systems. Combining data and insights from studies of learning, transmission and use is therefore essential if we are to understand how biases in statistical learning interact with language transmission and language use to shape the structural properties of language.|000|linguistic variation, language learning, artificial language learning 4714|Cepelewicz2018|A controversial theory suggests that perception, motor control, memory and other brain functions all depend on comparisons between ongoing actual experiences and the brain’s modeled expectations.|000|prediction, human brain, neurology, machine learning 4715|Durand1990|In positing phonemes, we operate at a fair degree of abstraction from real sounds by thinking of contrasting units at a given position within words. Observation of speech-events reveals, however, that sounds which we consider as tokens of identical phonemes can in fact be noticeably different according to their position within words or sentences. Let us take a familiar example from English. If we consider two words such as train [pb] and rain, transcribed in this book as /trejn/ and /rejn/, we find that the /r/ in train is auditorily different from the /r/ in rain. Under the influence of /t/ which is voiceless and aspirated in this position, the /r/ is partially devoiced and some friction is audible. We can indicate the distinction between these two r's by phonetic transcriptions (within square brackets) such as [rejn] vs. [trejn]. The reason why speakers of English do not normally attend to this phonetic distinction, and cope well with a writing system which uses only one r symbol, is that no two words of English are differentiated by [r] vs. [r]. These two r's are contextual realizations, technically called allophones, of the phoneme /r/. Following classical assumptions, we can say that there are two fundamental levels of representation of sounds: the level of phonemes (phonemic level) and the level of allophones (phonetic level). |4f|phonology, abstraction, phonological rules, allophone, 4716|Durand1990|In classical phonemics, the question of the mapping between the two levels of representation was not usually considered a burning issue. Most structuralist works simply give lists of allophones beside each phoneme or non-symbolic descriptions of the realiz- ations of phonemes. Generative phonology, by contrast, is committed to a programme where the primitive terms and the rules for their combination and transformation must be couched in symbolic notation and formally defined. Thereby, an important step is taken towards making the system easier to fault, or to [pb] refute, and our descriptions should be highly valued according to a common view of what defines a scientific enterprise.|5f|phonological rules, introduction, description, 4717|Durand1990|One of the central tenets of generative phonology is that the rules of a language interact in complex ways. Many structuralist writings seem to assume that the mapping is simple and one-shot - that is, a whole level is translated into another level by rules applying simultaneously to the input. But this does not account for even simple examples of allophonic statements to be found in standard structuralist expositions.|6|phonological rules, linear process, regular grammar, complexity 4718|Durand1990|:comment:`The example given in this passage for the complexity of rules needed is wrong. Essentially, we can very well represent the example of l becoming to ł and vowel change in a rule that includes the context of the rule before. Essentially, there is no example by which we could not just assume all contexts in the original form and apply the rules with slightly more complex contexts.`|7|phonological rules, application, generative grammar, linear process, regular grammar 4719|Xu2017a|This is a very interesting book that introduces several almost mixed languages in North China, and accounts on the degree to which they share grammatical features. In addition, it discusses topics of genetics, and also makes the claim that genetic studies would help to resolve language relationships where there is no further evidence. |000|population genetics, mixed languages, Tangwang, Sino-Tibetan, Chinese dialects, North China, language contact, borrowing, 4720|Boltz2007|In addition to their use as sources for Middle Chinese phonological data, the Qieyun [Chiehyunn] and its Song [Sonq] recension, the Guangyun [Goang yunn], open an important window on the lexical nature of the Middle Chinese language. Examination of the individual entries in the dong % ? rime, chosen more or less arbitrarily for the present experiment, shows that the salient question is how we map the data of the rime in question (or of any rime) to the lexicon of the actual language or languages represented. Or, phrased slightly differently, how many "real words" are represented in the sixty-six single-character entries of the dong % ? rime ?|000|rhyme books, lexicon, Middle Chinese 4721|Chirkova2018|This paper focuses on two types of voiceless nasal sounds in Xumi, a Tibeto-Burman language: (i) the voiceless aspirated nasals /m̥ / [m̥ h ̃ ] and /n̥ / [n̥ h ̃ ], and (ii) the voiceless nasal glottal fricative [h ̃ ]. We provide a synchronic description of these two types of sounds, and explore their similarities and differences. Xumi voiceless nasal consonants are described with reference to the voiceless nasal consonants /m̥ / and /n̥ / in Burmese and Kham Tibetan because Burmese voiceless nasals are the best described type of voiceless nasals, and are therefore used as a reference point for comparison; voiceless nasals in Kham Tibetan, which is in close contact with Xumi, represent a characteristic regional feature. The synchronic description is based on acoustic and aerodynamic measurements (the total duration of the target phonemes, the duration of the voiced period during the target phonemes, mean nasal and oral flow). Our study (i) contributes to a better understanding of voiceless nasals as a type of sound, (ii) provides a first-ever instrumental description (acoustic and aerodynamic) of the voiceless nasal glottal fricative [h ̃ ], as attested in a number of Tibeto-Burman languages of Southwest China, and (iii) suggests a possible phonetic basis for the observed dialectal and diachronic variation between voiceless nasals and [h ̃ ] in some Tibeto-Burman languages.|000|Sino-Tibetan, Tibeto-Burman, voiceless nasal, nasal sound, illustrations of the IPA, IPA, 4722|Frankovsky2016|Mandarin Chinese numeral classifiers receive considerable at- tention in linguistic research. The status of the general classifier 个 gè re- mains unresolved. Many linguists suggest that the use of 个 gè as a noun classifier is arbitrary. This view is challenged in the current study. Relying on the CCL-Corpus of Peking University and data from Google, we investigated which nouns for living beings are most likely classified by the general clas- sifier 个 gè. The results suggest that the use of the classifier 个 gè is motivated by an anthropocentric continuum as described by Köpcke and Zubin in the 1990s. We tested Köpcke and Zubin’s approach with Chinese native speakers. We examined 76 animal expressions to explore the semantic interdepen- dence of numeral classifiers and the nouns. Our study shows that nouns with the semantic feature [+ animate] are more likely to be classified by 个 gè if their denotatum is either very close to or very far located from the anthropo- centric center. In contrast animate nouns whose denotata are located at some intermediate distance from the anthropocentric center are less likely to be classified by 个 gè.|000|classifier system, onomasialogical approach, Mandarin, 4723|Frankovsky2016|Interesting paper, as it employs an onomasiological, concept-based approach, which makes use of some 76 animal phrases to check which classifier is being used.|000|classifier system, Mandarin, concept-based perspective, onomasiological approach, 4724|Goddard2014|This paper argues that the cross-linguistic study of subjective experience as expressed, described and construed in language cannot be set on a sound footing without the aid of a systematic and non-Anglocentric approach to lexical semantic analysis. This conclusion follows from two facts, one theoretical and one empirical. The first is the crucial role of language in accessing and communicating about feelings. The second is the demonstrated existence of substantial, culture-related differences between the meanings of emotional expressions in the languages of the world. We contend that the NSM approach to semantic and cultural analysis (Wierzbicka 1996; Gladkova 2010; Levisen 2012; Goddard and Wierzbicka 2014a; Wong 2014; among other works) provides the necessary conceptual and analytical framework to come to grips with these facts. This is demonstrated in practice by the studies of “happiness-like” and “pain-like” expressions across eight languages, undertaken in the present volume. At the same time as probing the precise meanings of these expressions, the authors provide extensive cultural contextualisation, showing in some detail how the meanings they are analysing are truly “cultural meanings”. The project exemplified by the volume can also be read as a linguistically-anchored contribution to cultural psychology (Shweder 2004, 2003), the quest to understand and appreciate the mental life of others in a full spirit of psychological pluralism.|000|natural semantic metalanguage,word emotions, emotion concepts, 4725|Da2019|This essay works at the empirical level to isolate a series of technical problems, logical fallacies, and conceptual flaws in an increasingly popular subfield in literary studies variously known as cultural analytics, literary data mining, quantitative formalism, literary text mining, computational textual analysis, computational criticism, algorithmic literary studies, so- cial computing for literary studies, and computational literary studies (the phrase I use here). In a nutshell the problem with computational literary analysis as it stands is that what is robust is obvious (in the empirical sense) and what is not obvious is not robust, a situation not easily overcome given the nature of literary data and the nature of statistical inquiry. There is a fundamental mismatch between the statistical tools that are used and the objects to which they are applied.|000|digital humanities, critics, computer-based approaches, data mining 4726|Chamberlain2019|This paper brings together a number of disciplines in order to demonstrate how historical, anthropological, ecological, zoogeographical, ethnobiological, and lin- guistic evidence relating to the physical distribution and linguistic representations of pythons in northern Southeast Asia and southern China can be brought to bear on Kra-Dai prehistory and intrafamilial as well as interethnic relationships. The normal and most recognized word for ‘python’ is confined to the Tai family proper, and even then there are some qualifications. Two species of python are found in much of the Tai linguistic area south of the Sino-Vietnamese border, but only one, the Burmese python, occurs in Guangxi, Guangdong, and Hainan. Some Central Tai dialects have acquired another name that seems to be Austroasiatic (AA) in origin, and yet no AA languages are found in those areas. It is suggested that these dialects received the word via Kra to the west. On the eastern side, yet another surprising correspondence is noted between Lung Ming in southern Guangxi and Hlai on Hainan. Sek, located far to the south, which usually preserves archaic forms of Be-Tai, has no words for ‘python’ that correspond to those in the rest of the family. Close examination of the linguistics of this particular member of the South- east Asian mega-fauna reveals a pattern of interaction between the families of the Kra-Dai stock, Austroasiatic, and southern Chinese that mirrors the phylogenetic tree.|000|python, snake, animal, Tai-Kadai, South-East Asia, 4727|Forker2019|Nakh-Daghestanian languages have encountered growing interest from typologists and linguists from other subdiscplines, and more and more languages from the Nakh-Daghestanian language family are being studied. This paper provides a grammatical overview of the hitherto undescribed Sanzhi Dargwa language, followed by a detailed analysis of the grammaticalized expression of spatial elevation in Sanzhi. Spatial elevation, a topic that has not received substantial attention in Caucasian linguistics, manifests itself across different parts of speech in Sanzhi Dargwa and related languages. In Sanzhi, elevation is a deictic category in partial opposition with participant- oriented deixis/horizontally-oriented directional deixis. This paper treats the spatial uses of demonstratives, spatial preverbs and spatial cases that express elevation as well as the semantic extension of this spatial category into other, non-spatial domains. It further compares the Sanzhi data to other Caucasian and non-Caucasian languages and makes suggestions for investigating elevation as a subcategory within a broader category of topographical deixis.|000|spatial cognition, Nakh-Daghestanian, Sanzhi Dargwa, Causasian, grammaticality, 4728|Tabain2018|Lisu (ISO 639-2 lis) is spoken by just over a million members of the group of this name in south-western China, north-eastern Burma, northern Thailand and north-eastern India. It formerly also had other names used by outsiders, including Yeren (Chinese yeren ‘wild people’), and Yawyin in Burma and Yobin in India (both derived from the Chinese term). Other names included Lisaw from the Shan and Thai name for the group, also seen in the former Burmese name Lishaw. About two-thirds of the speakers live in China, especially in north-western Yunnan Province, but also scattered elsewhere in Yunnan and Sichuan. About a quarter live in the Kachin State and the northern Shan State in Burma, with a substantial number in Chiangmai, Chiangrai and other provinces of Thailand, and a few thousand in Arunachal Pradesh in India. It is also spoken as a second language by many speakers of Nusu, Anung, Rawang and others in north-western Yunnan and northern Burma. Lisu has almost completely replaced Anung in China and is replacing Lemei in China. The Lisu are one of the 55 national minorities recognised in China, one of 135 ethnic groups recognised in Burma, a scheduled (officially listed and recognised) tribe in India, and one of the recognised hill tribe groups of Thailand.|000|illustrations of the IPA, IPA, Central Lisu, Lisu, Tibeto-Burman, Sino-Tibetan, 4729|Schmitt2011|This study focused on the relationship between percentage of vocabulary known in a text and level of comprehension of the same text. Earlier studies have estimated the percentage of vocabulary necessary for second language learners to understand written texts as being between 95% (Laufer, 1989) and 98% (Hu & Nation, 2000). In this study, 661 participants from 8 countries completed a vocabulary measure based on words drawn from 2 texts, read the texts, and then completed a reading comprehension test for each text. The results revealed a relatively linear relationship between the percentage of vocabulary known and the degree of reading comprehension. There was no indication of a vocabulary “threshold,” where comprehension increased dramatically at a particular percentage of vocabulary knowledge. Results suggest that the 98% estimate is a more reasonable coverage target for readers of academic texts.|000|reading comprehension, vocabulary knowledge, 4730|Schmitt2011|Interesting study, since it seems to suggest that we cannot understand texts fully beyond a certain level of know vocabulary, yet from personal experience, it is clear that this must not always be the case, and that texts can be read without having full understanding of all details.|000|text comprehension, vocabulary knowledge, vocabulary 4731|Chen2019|Denisovans are members of a hominin group who are currently only known directly from fragmentary fossils, the genomes of which have been studied from a single site, Denisova Cave 1–3 in Siberia. They are also known indirectly from their genetic legacy through gene flow into several low-altitude East Asian populations 4,5 and high- altitude modern Tibetans 6 . The lack of morphologically informative Denisovan fossils hinders our ability to connect geographically and temporally dispersed fossil hominins from Asia and to understand in a coherent manner their relation to recent Asian populations. This includes understanding the genetic adaptation of humans to the high-altitude Tibetan Plateau 7,8 , which was inherited from the Denisovans. Here we report a Denisovan mandible, identified by ancient protein analysis 9,10 , found on the Tibetan Plateau in Baishiya Karst Cave, Xiahe, Gansu, China. We determine the mandible to be at least 160 thousand years old through U-series dating of an adhering carbonate matrix. The Xiahe specimen provides direct evidence of the Denisovans outside the Altai Mountains and its analysis unique insights into Denisovan mandibular and dental morphology. Our results indicate that archaic hominins occupied the Tibetan Plateau in the Middle Pleistocene epoch and successfully adapted to high-altitude hypoxic environments long before the regional arrival of modern Homo sapiens.|000|Denisovan, peopling of the Tibetan Plateau, 4732|Post2019|‘Topographical deixis’ refers to a variety of spatial-environmental deixis in which typically distal reference to entities is made in terms of a set of topographically-anchored referential planes: most often, upward, downward, or on the same level. Thus defined, topographical deixis is a pervasive feature of Trans-Himalayan (= Sino-Tibetan) languages. However, while there have been several descriptions of Trans-Himalayan topographical deixis at the language or subgroup level, there has been as yet no account of its overall status and distribution within the family. The primary goal of this paper is to provide an account of topographical deixis from a pan-Trans-Himalayan perspective, to the maximum extent possible on the basis of existing descriptions. It discusses its formal coding, functions, distribution within the family, and environmental correlations. In addition to providing a benchmark account of the nature and distribution of topographical deixis within the Trans-Himalayan family, this study thus contributes to cross-linguistic typologies of spatial deictic systems and their environmental-interactional motivations more generally.|000|topographical deixis, spatial cognition, Tibeto-Burman, Sino-Tibetan, spatial expressions 4733|Gao2019|Peer review is a core element of the scientific process, particularly in conference-centered fields such as ML and NLP. However, only few studies have evaluated its properties em- pirically. Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018. We quantitatively and qualitatively assess the cor- pus. This includes a pilot study on paper weaknesses given by reviewers and on qual- ity of author responses. We then focus on the role of the rebuttal phase, and propose a novel task to predict after-rebuttal (i.e., fi- nal) scores from initial reviews and author re- sponses. Although author responses do have a marginal (and statistically significant) influ- ence on the final scores, especially for bor- derline papers, our results suggest that a re- viewer’s final score is largely determined by her initial score and the distance to the other reviewers’ initial scores. In this context, we discuss the conformity bias inherent to peer reviewing, a bias that has largely been over- looked in previous research. We hope our analyses will help better assess the usefulness of the rebuttal phase in NLP conferences.|000|review practice, NLP, discussion, analysis 4734|Gao2019|Interesting methods mentioned in the supplement of the paper, where automatic text analysis is done in a way that could be interesting for language comparison as well.|000|text mining, word embeddings, data mining, 4735|Kaplan2018|Described as a ‘sort of Human Genome Project for historical linguistics’, the Evolution of Human Languages Project (EHL) is dedicated to promoting long-range genealogical research into linguistic prehistory. Toward that end, its architects have sought to collect and coordinate evidence of every known human language, roughly 6000 in all, fostering an interdisciplinary and internationally accessible environment for the study of historical universals and contemporary diversity. This article investigates the roots and branches of the Global Lexicostatistical Database – a component project of the EHL. It pays special attention to strategies for encoding epistemological pluralism in a web-based archive of global proportions.|000|Global Lexicostatistical Database, introduction, history of science, 4736|Bidese2019|Over the last three decades, the study of the origin and evolution of human lan- guage has attracted more and more scholars from different disciplines, and earned a place in several internationally renowned symposia, such as the 51st Annual Meeting of the Societas Linguistica Europea, held in 2018 in Tallinn, where a work- shop with 13 contributions was dedicated to ‘New Directions in Language Evolu- tion Research’. Furthermore, the question of the origin and evolution of language is a topic that attracts not only the scientific community but also the lay public.|000|Chomsky syntax, origin of language, language faculty, language evolution 4737|Bidese2019|Very interesting paper, illustrating how Chomsky syntacticians think about evolutionary linguistics and computational historical linguistics.|000|Chomsky syntax, evolution of language, language evolution, language origin, 4738|Nasr2019|Humans and animals have a “number sense,” an innate capability to intuitively assess the number of visual items in a set, its numerosity. This capability implies that mechanisms to extract numerosity indwell the brain’s visual system, which is primarily concerned with visual object recognition. Here, we show that network units tuned to abstract numerosity, and therefore reminiscent of real number neurons, sponta- neously emerge in a biologically inspired deep neural network that was merely trained on visual object recognition. These numerosity-tuned units underlay the network’s number discrimination performance that showed all the characteristics of human and animal number discriminations as predicted by the Weber- Fechner law. These findings explain the spontaneous emergence of the number sense based on mecha- nisms inherent to the visual system|000|number sense, numerals, animal cognition, deep learning, neural network, machine learning, 4739|Li2019|Tianjin Mandarin is a member of the northern Mandarin Chinese family (ISO 693-3: [cmn]). It is spoken in the urban areas of the Tianjin Municipality (CN-12) in the People’s Republic of China, which is about 120 kilometers to the southeast of Beijing. Existing studies on Tianjin Mandarin have focused mainly on its tonal aspects, especially its intriguing tone sandhi system, with few studies examining the segmental aspects (on tone, see e.g. Li & Liu 1985, Shi 1986, Liu 1993, Lu 1997, Wang & Jiang 1997, Chen 2000, Liu & Gao 2003, Ma 2005, Ma & Jia 2006, Zhang & Liu 2011, Li & Chen 2016; on segmental aspects, see e.g. Han 1993a, b; Wee, Yan & Chen 2005). As also noted in Wee et al. (2005), this is probably due to the similarity in segmental structures between Tianjin Mandarin and Standard Chinese, especially among speakers of the younger generation, and what differentiates the two Mandarin varieties is most notably their tonal systems. The aim of the present description is therefore to provide a systematic phonetic description of both segmental and tonal aspects of Tianjin Mandarin, with main focus on the tonal aspects.|000|illustrations of the IPA, Tianjin Chinese, phoneme inventory, 4740|Wen2004b|Hmong-Mien (H-M) is a major language family in East Asia, and its speakers distribute primarily in southern China and Southeast Asia. To date, genetic studies on H-M speaking populations are virtually absent in the literature. In this report, we present the results of an analysis of genetic variations in the mitochondrial DNA (mtDNA) hypervariable segment 1 (HVS1) region and diagnostic variants in the coding regions in 537 individuals sampled from 17 H-M populations across East Asia. The analysis showed that the haplogroups that are predominant in southern East Asia, including B, R9, N9a, and M7, account for 63% (ranging from 45% to 90%) of mtDNAs in H-M populations. Furthermore, analysis of molecular variance (AMOVA), phylogenetic tree analysis, and principal component (PC) analysis demonstrate closer relatedness between H-M and other southern East Asians, suggesting a general southern origin of maternal lineages in the H-M populations. The estimated ages of the mtDNA lineages that are specific to H-M coincide with those based on archeological cultures that have been associated with H-M. Analysis of genetic distance and phylogenetic tree indicated some extent of difference between the Hmong and the Mien populations. Together with the higher frequency of north-dominating lineages observed in the Hmong people, our results indicate that the Hmong populations had experienced more contact with the northern East Asians, a finding consistent with historical evidence. Moreover, our data defined some new (sub-)haplogroups (A6, B4e, B4f, C5, F1a1, F1a1a, and R9c), which will direct further efforts to improve the phylogeny of East Asian mtDNAs.|000|population genetics, Hmong-Mien, 4741|IOS2018|The Sui script [le13 sui33], known in Chinese as shuǐshū 水书 “Sui writing”, is a logographic writing system for the Sui language of the Sui people in Guizhou province of China. The Sui script has traditionally been used by Sui ritual masters (Shuǐshū xiānshēng 水书先生) for ritual and divination purposes.|000|Sui writing system, writing systems, shuishu, Tai-Kadai, Sui languages, 4742|Pereira2018|Prior work decoding linguistic meaning from imaging data has been largely limited to con- crete nouns, using similar stimuli for training and testing, from a relatively small number of semantic categories. Here we present a new approach for building a brain decoding system in which words and sentences are represented as vectors in a semantic space constructed from massive text corpora. By efficiently sampling this space to select training stimuli shown to subjects, we maximize the ability to generalize to new meanings from limited imaging data. To validate this approach, we train the system on imaging data of individual concepts, and show it can decode semantic vector representations from imaging data of sentences about a wide variety of both concrete and abstract topics from two separate datasets. These decoded representations are sufficiently detailed to distinguish even semantically similar sentences, and to capture the similarity structure of meaning relationships between sentences.|000|linguistic meaning, imaging data, neurolinguistics, neuroscience 4743|Seifart2018b|By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant—speakers regularly speed up and slow down. Variation in speech rate is influ- enced by a complex combination of factors, including the frequency and predictability of words, their information status, and their po- sition within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lex- ical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or un- expected information; otherwise, they have to be replaced by pro- nouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in in- ternal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language pro- cessing that are intimately tied to how speakers manage referential information when they communicate with one another.|000|speed, communication speed, communication, word-planning effort, language universals, language processing 4744|Seifart2018b|Very interesting study, given that it finds universals where they weren't discussed before. It is a good example why work on digitizing the data available on human languages is needed, and more efforts need to be done.|000|speech speed, speed, language processing, corpus studies, nice illustration, linguistic universals 4745|Tennant2019|This document aims to agree on a broad, international strategy for the implementation of open scholarship that meets the needs of different national and regional communities but works globally.|000|open research, open science, data managment, 4746|Berkemer2018|Alignments, i.e., position-wise comparisons of two or more strings or ordered lists are of utmost practical importance in computational biology and a host of other fields, including historical linguistics and emerging areas of research in the Digital Humanities. The problem is well-known to be computationally hard as soon as the number of input strings is not bounded. Due to its prac- tical importance, a huge number of heuristics have been devised, which have proved very successful in a wide range of applications. Alignments nevertheless have received hardly any attention as formal, mathematical structures. Here, we focus on the compositional aspects of alignments, which underlie most algo- rithmic approaches to computing alignments. We also show that the concepts naturally generalize to finite partially ordered sets and partial maps between them that in some sense preserve the partial orders.|000|sequence alignment, biology, partially ordered sets, partial maps, graph theory, 4747|Kraaljenbrink2014|The greater Himalayan region demarcates two of the most prominent linguistic phyla in Asia: Tibeto-Burman and Indo- European. Previous genetic surveys, mainly using Y-chromosome polymorphisms and/or mitochondrial DNA polymor- phisms suggested a substantially reduced geneflow between populations belonging to these two phyla. These studies, however, have mainly focussed on populations residing far to the north and/or south of this mountain range, and have not been able to study geneflow patterns within the greater Himalayan region itself. We now report a detailed, linguistically informed, genetic survey of Tibeto-Burman and Indo-European speakers from the Himalayan countries Nepal and Bhutan based on autosomal microsatellite markers and compare these populations with surrounding regions. The genetic differentiation between populations within the Himalayas seems to be much higher than between populations in the neighbouring countries. We also observe a remarkable genetic differentiation between the Tibeto-Burman speaking populations on the one hand and Indo-European speaking populations on the other, suggesting that language and geography have played an equally large role in defining the genetic composition of present-day populations within the Himalayas.|000|Trans-Himalayan, population genetics, greater Himalayan region, 4748|Branner2014|This seventy-odd-page essay revises the author’s 2004 Master’s thesis at the University of British Columbia. It consists of twenty-three vignettes on common Chinese words, ancient and modern. The vignettes are compact and composed with welcome attention to readability, often a gross obstacle in historical linguistics. In all, some 285 morphemes are discussed with varying levels of depth, but the discussion always centers on the original subject of the vignette, a useful organizing strategy. |000|review, Old Chinese phonology, 4749|Goddard2019|We commend the target paper (henceforth S&N) for bringing reported speech to attention in the typological space, and for making a number of highly pertinent observations. We agree that reported speech deserves to be seen as a sui generis domain or topic, well deserving of typological attention and not reducible to an intersection of other phenomena. We would prefer to characterise it as a seman- tic or functional domain, rather than as a “syntactic” domain, given that key aspects of S&N’s definition hinge on semantic notions, but this is not our main concern in this commentary. Instead, we would like to take issue with the target paper on more important theoretical and methodological matters. The most significant concerns S&N’s reliance on complex, poorly-defined, English-bound terms, including both technical terms such as semiotic, ‘demonstratedness’, epistemic, modality, and representation, and ordinary, but equally English- bound, words such as report(ed), message, discourse, and utterance. In this commentary we aim to demonstrate, so far as possible in the space available, that the use of such opaque and/or English-bound terminology is unnecessary and to outline an alternate approach to the same phenomena.|000|reported speech, indirect speech, semantics, linguistic typology, comparative concept, 4750|GellMann2011|Recent work in comparative linguistics suggests that all, or almost all, attested human languages may derive from a single earlier language. If that is so, then this language—like nearly all extant languages—most likely had a basic ordering of the subject (S), verb (V), and object (O) in a declarative sentence of the type “the man (S) killed (V) the bear (O).” When one compares the distribution of the existing structural types with the putative phy- logenetic tree of human languages, four conclusions may be drawn. (i) The word order in the ancestral language was SOV. (ii) Except for cases of diffusion, the direction of syntactic change, when it occurs, has been for the most part SOV > SVO and, beyond that, SVO > VSO/VOS with a subsequent reversion to SVO occur- ring occasionally. Reversion to SOV occurs only through diffusion. (iii) Diffusion, although important, is not the dominant process in the evolution of word order. (iv) The two extremely rare word orders (OVS and OSV) derive directly from SOV.|000|word order, universals, proto-world, 4751|Round2011|The notion of ‘erosion’, a universal diachronic process affecting the phonetic content of certain language forms, has held a place in historical linguistics for almost two centuries now. Recently it has been argued that the erosion of high frequency words can be derived as a consequence of normal language use within a theory of phonology based on exemplars. Focusing on discrete changes to function words, this paper argues that types of erosion exist which cannot be derived in this manner. Instead, erosion as well as other less celebrated, but well attested, irregular changes to function words can be accounted for by a species of paradigm levelling. Prosodic paradigm levelling (PPL) is much like its familiar morphological cousin only it plays out over paradigms whose cells contain word forms selected for by prosodic, not morphological, features. While PPL can account for data which exemplar models cannot, it is maintained nevertheless that exemplar models can offer a reasonable account of much of the data, provided that the model incorporates a discrete level of phonological representation, in addition to exemplars. Arguments presented have implications for phonological representation in general, as well as for the explanation of discrete, irregular change to function words.|000|incomplete lineage sorting, erosion, sound change 4752|Hua2019|Language diversity is distributed unevenly over the globe. Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we present the first global analysis of language diversity that compares the relative importance of two key ecological mechan- isms – isolation and ecological risk – after correcting for spatial autocorrelation and phylo- genetic non-independence. We find significant effects of climate on language diversity, consistent with the ecological risk hypothesis that areas of high year-round productivity lead to more languages by supporting human cultural groups with smaller distributions. Climate has a much stronger effect on language diversity than landscape features, such as altitudinal range and river density, which might contribute to isolation of cultural groups. The asso- ciation between biodiversity and language diversity appears to be an incidental effect of their covariation with climate, rather than a causal link between the two.|000|linguistic diversity, computational approaches, climate, correlational studies, 4753|Kato2012| is paper presents a phonological analysis, conversational sample texts, and a basic vocabulary for the Myeik dialect of Burmese. e Myeik dialect has peculiar characteristics in terms of tonal contours, and voice quality in the tones and vowels. e tone of this dialect, which cor- responds to the Standard Burmese creaky falling tone, has a rising contour and is pharyngealized. e vowels of the syllables corresponding to Standard Burmese stopped syllables are pronounced with a conspicuous creaky phona- tion. Previous studies have paid little attention to these facts. Tone sandhis peculiar to this dialect are also described in this paper. e texts are translations of the twenty dialogues in Kato’s (1998) Bur- mese primer. Since these dialogues cover as many as possible of the most basic grammatical items of Burmese, translating them into the Myeik dialect can be the basis for future studies of morphosyntactic phenomena of this dialect. e basic vocabulary contains about nine hundred items.|000|dataset, Myeik, Burmese, Sino-Tibetan, word list 4754|Baayen2019|This methodological study provides a step-by-step introduction to a compu- tational implementation of word and paradigm morphology using linear mappings between vector spaces for form and meaning. Taking as starting point the linear regression model, the main concepts underlying linear mappings are introduced and illustrated with R code. It is then shown how vector spaces can be set up for Latin verb conjugations, using 672 inflected variants of two verbs each from the four main conjugation classes. It turns out that mappings from form to meaning (comprehension), and from meaning to form (production) can be carried out loss-free. This study concludes with a demonstration that when the graph of triphones, the units that underlie the form space, is mapped onto a 2-dimensional space with a self-organising algorithm from physics (graphopt), morphological functions show topological clustering, even though morphemic units do not play any role whatsoever in the model. It follows, first, that evidence for morphemes emerging from experimental studies using, for instance, fMRI, to localize morphemes in the brain, does not guarantee the existence of morphemes in the brain, and second, that potential topological organization of morpholog- ical form in the cortex may depend to a high degree on the morphological system of a language.|000|inflection, morphology, introduction, computational linguistics, 4755|Changizi2006|Are there empirical regularities in the shapes of letters and other human visual signs, and if so, what are the selection pres- sures underlying these regularities? To examine this, we determined a wide variety of topologically distinct contour configurations and examined the relative frequency of these configuration types across writing systems, Chinese writing, and nonlinguistic symbols. Our first result is that these three classes of human visual sign possess a similar signature in their configuration distribution, suggesting that there are underlying principles governing the shapes of human visual signs. Second, we provide evidence that the shapes of visual signs are selected to be easily seen at the expense of the motor system. Finally, we provide evidence to support an ecological hypothesis that visual signs have been culturally selected to match the kinds of con- glomeration of contours found in natural scenes because that is what we have evolved to be good at visually processing.|000|writing systems, evolution of writing, iconicity, 4756|Reid2016|This article is a suggested explanation for the multiple variants of the forms of some Malayo-Polynesian pronouns that have been characterized as the result of drift. The explanation that is given is referred to here as paradigmatic instability, a phenomenon not previously discussed with reference to these problems. In the cases discussed in this article it is the avoidance of forms that are semantically or pragmatically inappropriate within the context of the paradigm in which they occur which renders the forms morphologically variable and the paradigms in which they occur unstable. In Malayo-Polynesian languages, it is the avoidance of a form that is reconstructed as a plural pronoun in Proto-Austronesian but which in all Malayo-Polynesian daughter languages is found as a singular pro- noun. Where this form is retained as part of some other plural forms, it has been lost or modified in a wide range of variants in the daughter languages.|000|incomplete lineage sorting, variation, Malayo-Polynesian, drift, lexical change, 4757|Thornton2012|Overabundance occurs when two or more forms are available to realize the same cell in an inflectional paradigm, as in It. devo/debbo ‘must.1 SG.PRS.IND ’. Such multiple forms, called cell-mates, abound in Italian. This paper presents a case study of the diachronic evolution of the cell-mates realizing the 1 SG and 3 PL cells of the Present Indicative of the verbs DOVERE ‘must’, VEDERE ‘see’, CHIEDERE ‘ask’ , SEDERE ‘sit’, th POSSEDERE ‘possess’. Analysis of corpora of Italian texts dating from the 13 to the th 20 century shows that some cases of overabundance were reduced over time, while others have been maintained till the present. This paper explores the factors responsible for these different outcomes. Previous observations that overabundance tends to be better preserved in low-frequency items and in forms learned relatively late appear confirmed by the Italian data; in addition, conscious normative interventions have been found to play a great role in the Italian situation. In conclusion, this study shows that overabundance is a genuine type of non-canonical phenomenon that can occur in paradigms; the idea often put forward in the literature, that overabundance will eventually inevitably be eliminated in all cases, is not fully supported by the data.|000|overabundance, incomplete lineage sorting, variation, morphology 4758|Kentner2019|Does linguistic rhythm matter to syntax, and if so, what kinds of syntactic decisions are susceptible to rhythm? By means of two recall-based sentence production experiments and two corpus studies – one on spoken and one on written language – we investigated whether linguistic rhythm affects the choice between introduced and un-introduced complement clauses in German. Apart from the presence or absence of the complementiser dass (‘that’), these two sentence types differ with respect to the position of the tensed verb (verb-final/verb-second). Against our predictions, that were based on previously reported rhythmic effects on the use of the optional complementiser that in English, the experiments fail to obtain compelling evidence for rhythmic/prosodic influences on the structure of complement clauses in German. An overview of pertinent studies showing rhythmic influences on syntactic encoding suggests these effects to be generally restricted to syntactic domains smaller than a clause. We assume that, in the course of language production, initially, clause level syntactic projections are specified; their specification is in fact the prerequisite for phonological encoding to start. Consequently, prosodic effects may only touch upon the lower level categories that are to be integrated into the clausal projection, but not upon the syntactic makeup of the higher order projection itself.|000|prosody, linguistic change, syntax, rhythm, pragmatic evolution, poetic function, 4759|Eckert2019|I present the extreme proposal that change spreads by virtue of its role in a system of social meaning. And since individuals cannot construct meaning on their own, they can play no elemental role in sound change. Based on ethnographic-variationist studies of sound change among preadolescents and adolescents, I challenge two common assumptions in the study of variation and change: (1) that sound change is autonomous, and (2) that change spreads from individual to individual, by imitation and in isolation. Whatever its origins, whether from linguistic pressures (“change from below”) or social pressures (“change from above”), sound change spreads by virtue of its incorporation into a semiotic landscape, as non-referential material is recruited into signs articulating social distinctions. Participation in this landscape connects the individual to the immediate community and to the larger social order and it is through participation in this landscape that speakers produce and perceive – and accelerate – changes in progress.|000|sound change, actuation problem, semiotics, social class, pragmatic evolution 4760|Dediu2019|This paper argues that inter-individual and inter-group variation in language acquisition, perception, processing and production, rooted in our biology, may play a largely neglected role in sound change. We begin by discussing the patterning of these differences, highlighting those related to vocal tract anatomy with a foundation in genetics and development. We use our ArtiVarK database, a large multi-ethnic sample comprising 3D intraoral optical scans, as well as structural, static and real-time MRI scans of vocal tract anatomy and speech articulation, to quantify the articulatory strategies used to produce the North American English /r/ and to statistically show that anatomical factors seem to influence these articulatory strategies. Building on work showing that these alternative articulatory strategies may have indirect coarticulatory effects, we propose two models for how biases due to variation in vocal tract anatomy may affect sound change. The first involves direct overt acoustic effects of such biases that are then reinterpreted by the hearers, while the second is based on indirect coarticulatory phenomena generated by acoustically covert biases that produce overt “at-a-distance” acoustic effects. This view implies that speaker communities might be “poised” for change because they always contain pools of “standing variation” of such biased speakers, and when factors such as the frequency of the biased speakers in the community, their positions in the communicative network or the topology of the network itself change, sound change may rapidly follow as a self-reinforcing network-level phenomenon, akin to a phase transition. Thus, inter-speaker variation in structured and dynamic communicative networks may couple the initiation and actuation of sound change.|000|pragmatic evolution, actuation problem, sound change, experimental phonetics, 4761|Stevens2019|The study explored whether an asymmetric phonetic overlap between speech sounds could be turned into sound change through propagation around a community of speakers. The focus was on the change of /s/ to /ʃ/ which is known to be more likely than a change in the other direction both synchronically and diachronically. An agent-based model was used to test the prediction that communication between agents would advance /s/-retraction in /str/ clusters (e.g. string). There was one agent per speaker and the probabilistic mapping between words, phonological classes, and speech signals could be updated during communication depending on whether an agent listener absorbed an incoming speech signal from an agent talker into memory. Following interaction, sibilants in /str/ clusters were less likely to share a phonological class with prevocalic /s/ and were acoustically closer to /ʃ/. The findings lend support to the idea that sound change is the outcome of a fortuitous combination of the relative size and orientation of phonetic distributions, their association to phonological classes, and how these types of information vary between speakers that happen to interact with each other.|000|sound change, artificial agents, pragmatic evolution 4762|Linzen2018|The reliability of acceptability judgments made by individual linguists has often been called into question. Recent large-scale replication studies conducted in response to this criticism have shown that the majority of published English acceptability judgments are robust. We make two observations about these replication studies. First, we raise the concern that English acceptability judgments may be more reliable than judgments in other languages. Second, we argue that it is unnecessary to replicate judgments that illustrate uncontroversial descriptive facts; rather, candidates for replication can emerge during formal or informal peer review. We present two experiments motivated by these arguments. Published Hebrew and Japanese acceptability contrasts considered questionable by the authors of the present paper were rated for acceptability by a large sample of naive participants. Approximately half of the contrasts did not replicate. We suggest that the reliability of acceptability judgments, especially in languages other than English, can be improved using a simple open review system, and that formal experiments are only necessary in controversial cases.|000|acceptability judgments, Chomsky syntax, 4763|Linzen2018|This is a remarkable paper, since it seems to be one of the rare cases where acceptability judgments are actually discussed.|000|acceptability judgments, Chomsky syntax, 4764|Tkachmann2018|This paper investigates how structure emerges in a young language, focusing on compounding in two young sign languages, Israeli Sign Language (ISL) and Al-Sayyid Bedouin Sign Language (ABSL). We focus on novel compounds (tokens invented on the spot) to ensure that we are studying a productive process and to avoid issues contingent with lexicalization. We found that both languages make use both of compounding and size-and-shape classifier constructions (SASS-constructions), but ISL and ABSL have conventionalized different structures and the structures they do use are conventionalized to different degrees. We discuss the similarities and differences of those constructions in ISL and ABSL in the context of structure emergence and language evolution.|000|compounding, sign language, word formation, language origin, 4765|OrtegaAndres2019|Many word forms in natural language are polysemous, but only some of them allow for co-predication, that is, they allow for simultaneous predications selecting for two different meanings or senses of a nominal in a sentence. In this paper, we try to explain (i) why some groups of senses allow co-predication and others do not, and (ii) how we interpret co-predicative sentences. The paper focuses on those groups of senses that allow co-predication in an especially robust and stable way. We argue, using these cases, but focusing particularly on the multiply polysemous word school, that the senses involved in co-predication form especially robust activation packages, which allow hearers and readers to access all the different senses in interpretation. |000|polysemy, co-predication, colexification, introduction, 4766|Langsford2018|Understanding and measuring sentence acceptability is of fundamental importance for linguists, but although many measures for doing so have been developed, relatively little is known about some of their psychometric properties. In this paper we evaluate within- and between-participant test-retest reliability on a wide range of measures of sentence acceptability. Doing so allows us to estimate how much of the variability within each measure is due to factors including participant-level individual differences, sample size, response styles, and item effects. The measures examined include Likert scales, two versions of forced-choice judgments, magnitude estimation, and a novel measure based on Thurstonian approaches in psychophysics. We reproduce previous findings of high between-participant reliability within and across measures, and extend these results to a generally high reliability within individual items and individual people. Our results indicate that Likert scales and the Thurstonian approach produce the most stable and reliable acceptability measures and do so with smaller sample sizes than the other measures. Moreover, their agreement with each other suggests that the limitation of a discrete Likert scale does not impose a significant degree of structure on the resulting acceptability judgments.|000|acceptability judgments, sentence acceptability, empirical study 4767|Katz2018|This paper presents evidence that spirantization, a cross-linguistically common lenition process, affects English listeners’ ease of segmenting novel “words” in an artificial language. The cross-linguistically common spirantization pattern of initial stops and medial continuants (e.g. [ɡuβa]) results in improved word segmentation compared to the inverse “anti-lenition” pattern of initial continuants and medial stops (e.g. [ɣuba]). The study also tests the effect of obstruent voicing, another common lenition pattern, but finds no significant differences in segmentation performance. There are several points of broader interest in these studies. Most of the phonetic factors influencing word segmentation in past studies have been language-specific and/or prosodic in nature: stress, intonation, final lengthening, etc. Spirantization, while often prosodically conditioned, is different from all of these patterns in that it concerns a segmental alternation. Moreover, the effects reported here are for speakers of a language, American English, that only sporadically displays spirantization, and not in the phonological contexts used in the experiment. This suggests that the results may reflect more general properties of speech perception and word boundary detection, rather than a perceptual processing strategy transferred directly from English. As such, the studies offer partial support for theories of lenition rooted in notions of perceptual-acoustic continuity and disruption.|000|spirantization, fricativization, sound change, artificial language learning, word boundary detection, morpheme segmentation, 4768|Lin2018|This paper examines the interpretation of unclassified nouns in Mandarin Chinese from the perspective of three theoretical approaches to the mass-count distinction in Mandarin: a lexico-syntactic approach (Doetjes 1997; Cheng & Sybesma 1998), a syntax-driven approach (Borer 2005), and a hybrid approach (Pelletier 2012). Employing a Quantity Judgment Task (Barner & Snedeker 2005), we examined the interpretation of unclassified nouns of different ontological types (count, mass, flexible, object-mass) in both adult and child Mandarin. In order to explain possible interpretational preferences, we also analysed the distributions of the tested nouns in the Chinese Internet Corpus (Sharoff 2006). The results of 27 adults and 55 children (2;11–5;09), together with the corpus data provide strong support for Pelletier. We therefore conclude that Mandarin nouns are semantically both count and mass, and receive a number-based or a volume-based interpretation according to the type of classifier they appear with. However, we argue for one exception in this respect: following Bale & Barner (2009) we assume that nouns of the object-mass type (e.g., furniture) are marked for individualization in the lexicon. Finally, the emergence of adultlike preferences for number-based or volume-based interpretations in child Mandarin is argued to be linked to the acquisition of the classifier system. |000|mass-count distinction, word classes, lexicon, mental lexicon, Mandarin, classifier system, 4769|Bale2018|We review advances in the experimental study of the mass-count distinction and highlight problems that have emerged. First, we lay out what we see to be the scientific enterprise of studying the syntax and semantics of the mass-count distinction, and the assumptions we believe must be made if additional progress is to occur, especially as the empirical facts continue to grow in number and complexity. Second, we discuss the new landscape of cross-linguistic results that has been created by widespread use of the quantity judgment task, and what these results tell us about the nature of the mass-count distinction. Finally, we discuss the relationship between the mass-count distinction and non-linguistic cognition, and in particular the object-substance distinction. |000|mass-count distinction, lexicon, cross-linguistic study, 4770|Luo2017|Johnson (1972) and Kaplan & Kay (1994) showed that phonological processes belong to the computational class of regular relations. This paper provides a computational analysis of long-distance consonant agreement and shows that it belongs to a more restricted computational class called subsequential. This paper further argues that subsequentiality is a desirable computational characterization of long-distance consonant agreement for the following reasons. First, it is sufficiently expressive. Second, it is restrictive as it accounts for the absence of pathological patterns like Majority Rules and Sour Grapes from the typology (Heinz & Lai 2013), standing in contrast to Agreement by Correspondence analysis in Optimality Theory (Rose & Walker 2004; Hansson 2007).|000|phonological rules, phonotactics, regular grammar, long-distance consonant agreement, complexity, 4771|Hall2017|Article is potentially interesting in the context of offering some idea on sound change processes. The prosodic aspect of the analysis is important for syllabification studies and the like.|000|syllable peak, prosody, prosodic template, glides, Middle High German, diphthongs, nucleus 4772|Bjorkmann2017|Singular they enjoys a curious notoriety in popular discussions of English grammar. Despite this, and though its use with quantificational, non-specific, and genuinely epicene antecedents dates back at least to the 1400s (Balhorn 2004), it has been little discussed in formal linguistics. This squib suggests an analysis of this longstanding use of they, while also describing a more recent change in they’s distribution, whereby many speakers now accept it with singular, definite, and specific antecedents of known binary gender. I argue that the distribution of they, in both conservative and innovative varieties, has implications for our understanding of the syntactic representation of gender in English, the structure of bound variable pronouns, and the regulation of coreference.|000|gender systems, gender studies, English, singular they, 4773|Horn1997|In everyday speech, we often use words more to do things (e.g., greet, make bets, accuse, ask, marry, etc.) than to make statements of fact. In the philosophy of language, viewing utterances as these sorts of speech acts (Austin 1962), or as moves in a language game (Wittgenstein 1958), challenged the view of communication as an exchange of true or false propositions. For ethologists, applying similar concepts to animal signals may help keep concepts like information, manipulation, and honesty in their proper perspective. This essay shows the many parallels between speech acts and animal signals, and touches on their implications. According to this perspective, the main function of signals is not to state facts, although facts (e.g., about honesty, intentions, and external referents) are crucial for the evolutionary stability of signals. Just as the speech acts in a marriage ceremony resist translation into facts outside of the social system of which they are a part, most animal signals will likely resist translation into general classes of messages or functions. Nonetheless, if the rules that govern the use of signals are sufficiently understood, the parts played by manipulation and information in the evolution of those rules can also be understood.|000|pragmatic evolution, speech act, animal signals 4774|Cepelewicz2019|Mit Ideen aus der Evolutionsbiologie suchen Informatiker unter astronomisch vielen möglichen Lösungen die optimale. Dahinter steht eine mathematische Theorie, laut der es einfacher ist, ein Ergebnis durch ein Programm zu erzeugen, als dieses Ergebnis direkt zu erzielen.|000|complexity, coincidence, evolutionary theory, optimalization, 4775|Cepelewicz2019|Fascinating summary on a fascinating theory stating that apes typing on a typewriter may not write PI, but they could write the C-code needed to create the first 1000 or more numbers of PI. |000|typewriter, complexity, evolutionary theory, 4776|Tingley2019|Humans have been drinking fermented concoctions since the beginning of recorded time. But despite that long relationship with alcohol, we still don’t know what exactly the molecule does to our brains to create a feeling of intoxication. Likewise, though the health harms of heavy drinking are fairly obvious, scientists have struggled to identify what negative impacts lesser volumes may lead to. Last September, the prestigious peer-reviewed British medical journal The Lancet published a study [@GBGAlcohol2018] that is thought to be the most comprehensive global analysis of the risks of alcohol consumption. Its conclusion, which the media widely reported, sounded unequivocal: “The safest level of drinking is none.” |000|review, summary, alcohol, bias, informants, causation, 4777|Tingley2019|What is interesting about this study is that it shows how difficult it is to find causation when looking at data, and how careful one should be before making any conclusions.|000|causation, causal inference, bias, alcohol, medical study, media, 4778|GBGAlcohol2018|Background Alcohol use is a leading risk factor for death and disability, but its overall association with health remains complex given the possible protective effects of moderate alcohol consumption on some conditions. With our comprehensive approach to health accounting within the Global Burden of Diseases, Injuries, and Risk Factors Study 2016, we generated improved estimates of alcohol use and alcohol-attributable deaths and disability-adjusted life-years (DALYs) for 195 locations from 1990 to 2016, for both sexes and for 5-year age groups between the ages of 15 years and 95 years and older. Methods Using 694 data sources of individual and population-level alcohol consumption, along with 592 prospective and retrospective studies on the risk of alcohol use, we produced estimates of the prevalence of current drinking, abstention, the distribution of alcohol consumption among current drinkers in standard drinks daily (defined as 10 g of pure ethyl alcohol), and alcohol-attributable deaths and DALYs. We made several methodological improvements compared with previous estimates: first, we adjusted alcohol sales estimates to take into account tourist and unrecorded consumption; second, we did a new meta-analysis of relative risks for 23 health outcomes associated with alcohol use; and third, we developed a new method to quantify the level of alcohol consumption that minimises the overall risk to individual health. Findings Globally, alcohol use was the seventh leading risk factor for both deaths and DALYs in 2016, accounting for 2·2% (95% uncertainty interval [UI] 1·5–3·0) of age-standardised female deaths and 6·8% (5·8–8·0) of age-standardised male deaths. Among the population aged 15–49 years, alcohol use was the leading risk factor globally in 2016, with 3·8% (95% UI 3·2–4·3) of female deaths and 12·2% (10·8–13·6) of male deaths attributable to alcohol use. For the population aged 15–49 years, female attributable DALYs were 2·3% (95% UI 2·0–2·6) and male attributable DALYs were 8·9% (7·8–9·9). The three leading causes of attributable deaths in this age group were tuberculosis (1·4% [95% UI 1·0–1·7] of total deaths), road injuries (1·2% [0·7–1·9]), and self-harm (1·1% [0·6–1·5]). For populations aged 50 years and older, cancers accounted for a large proportion of total alcohol-attributable deaths in 2016, constituting 27·1% (95% UI 21·2–33·3) of total alcohol-attributable female deaths and 18·9% (15·3–22·6) of male deaths. The level of alcohol consumption that minimised harm across health outcomes was zero (95% UI 0·0–0·8) standard drinks per week. Interpretation Alcohol use is a leading risk factor for global disease burden and causes substantial health loss. We found that the risk of all-cause mortality, and of cancers specifically, rises with increasing levels of consumption, and the level of consumption that minimises health loss is zero. These results suggest that alcohol control policies might need to be revised worldwide, refocusing on efforts to lower overall population-level consumption.|000|alcohol, bias, medical study, causation, causal inference, 4779|Arapov1974|[Когда мы классифицируем языки,] то не существенно, что именно мы имеем в виду под языком, лишь бы это был дискретный объект x, который мы можем отличить от всех объектов этого рода, и можно было бы говорить об определенном интервале времени t = [t 1 , t 2 ], в течение которого этот объект существует. :translation:`[...] it is not important, what exactly we have in mind when using the term “language” as long as it is a discrete object x which we can distinguish from all other objects of this kind, and as long as one can give a certain period of time t = [t 1 , t 2 ] during which the object exists.` |7|language model, definition, lexicostatistics, computational historical linguistics, modeling 4780|Bengtson2018|This paper explores a few features of the proposed reconstruction of Euskaro-Caucasian, the putative ancestor of Basque and the North Caucasian languages, as put forth in a recent monograph. Here some features of the consonantal system are discussed, namely (I) the de- velopment of proto-Euskaro-Caucasian *m in Basque, (II) the non-initial Basque reflexes of Euskaro-Caucasian laryngeals, and (III) the Basque noun stem allomorphs involving an al- ternation between /rc/ and /śt/. It is shown how these details of Euskaro-Caucasian compara- tive phonology illuminate important unsolved problems of historical phonology on both the Basque and North Caucasian sides.|000|Basque, Causasian, long-range comparison, 4781|Bjorgvinsson2011|This essay examines J.L. Austin's theory regarding speech acts, or how we do things with words. It starts by reviewing the birth and foundation of speech act theory as it appeared in the 1955 William James Lectures at Harvard before going into what Austin's theory is and how it can be applied to the real world. The theory is explained and analysed both in regards to its faults and advantages. Proposals for the improvement of the theory are then developed, using the ideas of other scholars and theorists along with the ideas of the author. The taxonomy in this essay is vast and various concepts and conditions are introduced and applied to the theory in order for it to work. Those conditions range from being conditions of appropriateness through to general principles of communication. In this essay utterances are examined by their propositional content, the intention of the utterance, and its outcome. By studying how utterances are formed and issued, along with looking into utterance circumstances and sincerity, one can garner a clear glimpse into what constitutes a performative speech act and what does not. By applying the ideas of multiple thinkers in unison it becomes clear that a) any one single theory does not satisfyingly explain all the intricacies of the theory and b) most utterances which are not in the past tense can be considered to be either performative or as having some performative force.|000|speech act, history of science, overview, introduction, 4782|Butcher2018|The phonological systems of human languages are constrained by what are often assumed to be universal properties of human auditory perception. However, the atypical phonologies found in many hearing- impaired speakers (lack of voicing contrast, lack of fricatives) indicate that such constraints also operate at an individual level. Thus, if a large group of speakers in a speech community operates with an atypical auditory system over a number of generations, then it seems logical that the phonology of the language(s) spoken by such a community would also over time be influenced by the particular properties of that common auditory system. Over half of the Australian Aboriginal population develop chronic otitis media with effusion in infancy and 50-70% of Aboriginal children have a significant hearing loss at both the low and high ends of the frequency range. The majority of Australian languages have phonologies which are atypical in world terms, having no voicing distinction and no fricatives or affricates, but an unusually large number of places of articulation. Whilst there would seem to be no way of conclusively demonstrating a historical causal connection between atypical hearing profiles and atypical phonologies, this paper explores some of the minimal prerequisites for such a theory.|000|Australian languages, correlational studies, environmental factors, language change, phonology, phoneme inventory, 4783|Butcher2018|Interesting claim saying that a language might adapt to the facilities of the speakers, here with Australian language's relative simplicity being due to problems with their hearing caused by diseases, if I understand this correctly. Reminds of studies on tone and absolute hearing (absolutes Gehör) in South-East Asian languages.|000|Australian languages, environmental factors, language change, phoneme inventory, 4784|TorresCacoullos2019|A key parameter in received classifications of language types is the expression of pronominal subjects. Here we compare variation patterns in conversational data of English – considered a non-null-subject language – and Spanish – a well-studied null-subject language. English has a patently lower rate of expression (approximately 3% unexpressed 1sg and 3sg human subjects vs. 60% in Spanish). Despite the stark difference in rate of expression, the same probabilistic constraints are at work in the two languages. Contrary to popular belief, VP coordination is neither a discrete nor a distinguishing category of English. Instead, a shared constraint is linking with the preceding subject, a refinement of accessibility to include, alongside coreferentiality, measures of structural connectedness – both prosodic and syntactic. Other shared con- straints on unexpressed subjects are coreferential subject priming (a tendency to repeat the form of the previous mention) and lexical aspect (reflecting the contribution of a temporal relationship to subject expression). Where the lan- guages most differ is in the envelope of variation. In English, besides corefer- ential-subject verbs conjoined with a coordinating conjunction, unexpressed subjects are limited to prosodic initial-position in declarative main clauses, a restriction that is absent in Spanish. We propose that the locus of cross-language comparisons is the variable structure of each language, defined by the set of probabilistic constraints but also the delimitation of the variable context within which these are operative.|000|null subject, pro-drop, language universals, English, Spanish 4785|Asoulin2016|I show that there are good arguments and evidence to boot that support the language as an instrument of thought hypothesis. The underlying mechanisms of language, comprising of expressions structured hierarchically and recursively, provide a perspective (in the form of a conceptual structure) on the world, for it is only via language that certain perspectives are avail- able to us and to our thought processes. These mechanisms provide us with a uniquely human way of thinking and talking about the world that is different to the sort of thinking we share with other animals. If the primary function of language were communication then one would expect that the underlying mechanisms of language will be structured in a way that favours successful communication. I show that not only is this not the case, but that the underlying mechanisms of language are in fact structured in a way to maximise computational efficiency, even if it means causing communicative problems. Moreover, I discuss evidence from comparative, neuropatho- logical, developmental, and neuroscientific evidence that supports the claim that language is an instrument of thought.|000|cognition, animal cognition, human cognition, communication, nature of human language 4786|Asoulin2016|I show that there are good arguments and evidence to boot that support the language as an instrument of thought hypothesis. The underlying mechanisms of language, comprising of expressions structured hierarchically and recursively, provide a perspective (in the form of a conceptual structure) on the world, for it is only via language that certain perspectives are avail- able to us and to our thought processes. These mechanisms provide us with a uniquely human way of thinking and talking about the world that is different to the sort of thinking we share with other animals. If the primary function of language were communication then one would expect that the underlying mechanisms of language will be structured in a way that favours successful communication. I show that not only is this not the case, but that the underlying mechanisms of language are in fact structured in a way to maximise computational efficiency, even if it means causing communicative problems. Moreover, I discuss evidence from comparative, neuropatho- logical, developmental, and neuroscientific evidence that supports the claim that language is an instrument of thought.|000|cognition, animal cognition, human cognition, communication, nature of human language 4787|Brown1980|This paper presents the core of a descriptive theory of indirect speech acts, i.e. utterances in which one speech act form is used to realize another, different, speech act. The proposed characterization of indirect speech acts is based on principles of goal formation, viewed in the context of a general structural model of action. The model of action is used to develop rules that characterize a large number of indirect speech act forms. Computational implications of the theory are discussed.|000|speech act, introduction, classification, modeling, 4788|Dedio2019|Approaches to linguistic areas have largely focused either on purely qualitative investiga- tion of area formation processes, on quantitative and qualitative exploration of synchronic distributions of linguistic features without considering time, or on theoretical issues related to the definition of the notion “linguistic area”. What is still missing are approaches that sup- plement qualitative research on area formation processes with quantitative methods. Taking a bottom-up approach, we bypass notional issues and propose to quantify area formation processes by a) measuring the change in linguistic similarity given a geographical space, a socio-cultural setting, a time span, a language sample, and a set of linguistic data, and b) testing the tendency and magnitude of the process using Bayesian inference. Applying this approach to the expression of reflexivity in a dense sample of languages in north-western Europe from the early Middle Ages to the present, we show that the method yields robust quantitative evidence for a substantial gain in linguistic similarity that sets the languages of Britain and Ireland apart from languages spoken outside Britain and Ireland and cross-cuts lines of linguistic ancestry.|000|language contact, contact area, Britain, Ireland, English, Irish, typological study, contact zone, Bayesian inference 4789|DeSousa2012|Within the Mainland Southeast Asian (MSEA) linguistic area (e.g. Matisoff 2003; Bisang 2006; Enfield 2005, 2011), some languages are said to be in the core of the language area, while others are said to be periphery. In the core are Mon-Khmer languages like Vietnamese and Khmer, and Kra-Dai languages like Lao and Thai. The core languages generally have: – Lexical tonal and/or phonational contrasts (except that most Khmer dialects lost their phonational contrasts; languages which are primarily tonal often have five or more tonemes); – Analytic morphological profile with many sesquisyllabic or monosyllabic words; – Strong left-headedness, including prepositions and SVO word order. The Sino-Tibetan languages, like Burmese and Mandarin, are said to be periphery to the MSEA linguistic area. The periphery languages have fewer traits that are typical to MSEA. For instance, Burmese is SOV and right-headed in general, but it has some left-headed traits like post-nominal adjectives (‘stative verbs’) and numerals. Mandarin is SVO and has prepositions, but it is otherwise strongly right-headed. These two languages also have fewer lexical tones.|000|contact area, South-East Asia, Southern Chinese, Sinitic, Tai-Kadai, Hmong-Mien, Sinitic, 4790|Forker2019b|This paper investigates the impact of language contact on the Nakh-Daghestanian language Hinuq. Hinuq is a rather small language that has been in contact with larger languages for several centuries; among them the traces of Avar and Russian are particularly visible. The paper provides an over- view about all observable influences on the phonology, morphology and syntax of Hinuq as well as on the lexicon. Avar is the main source for borrowed morphology and loan words. The influence of Russian on the Hinuq lexicon is growing, especially among the young speakers, but it is still smaller compared to Avar. With respect to the syntax no Avar impact can be detected since the languages belong to the same language family and large parts of the syntactic features and rules bear strong resemblances in the two languages. By contrast, Russian, which is genetically unrelated and typologically different from Hinuq, has some influence on the Hinuq constituent order.|000|Nakh-Daghestanian, Hinuq, language contact, Russian, Avar, lexical borrowing, 4791|Walkden2019|The notion of uniformitarianism, originally borrowed into linguistics from the earth sciences, is widely considered to be a foundational principle in modern historical linguistics. However, there are almost as many interpretations of uniformitarianism as there are historical linguists who take the time to define the notion. In this paper I argue, following Gould (@1965 ; @1987), that this confusion results from the fact that uniformitarianism as originally proposed in geology is not itself a uniform notion, and permits at least four readings. Only some of these readings involve substantive claims rather than methodological imperatives, and only some of these readings are useful for the study of language change. The weakest conclusion to be drawn is that these distinct notions need to be kept apart when invoked by historical linguists.|000|uniformitarianism, linguistics, historical linguistics, history of science, overview, 4792|OBrien1972|The sweet potato originated in northwestern South America, arising possibly as a hybrid cross or through karyotypic alterations from an unknown plant of the genus Ipomoea. This domestication is associated with the development of Tropical Forest agricultural villages by ca. 2500 B.C. The Spanish introduced it to Europe and spread it to China and Japan and Malaysia and the Moluccas region. The Portuguese carried it to India, Indonesia, and Africa. The plant has a pre-Magellan introduction into Polynesia by possibly A.D. 1 in the Samoa area and is dispersed from there to the rest of the Pacific. The plant was transferred either by birds carrying the seed or, more likely, through an accidental casting of a vessel carrying it upon an island of the Samoa region. The word kumara, alleged by many to show direct contact between Polynesians and Quechuan-speaking Indians, apparently reconstructs to Proto-Polynesian and was introduced into the Quechua dictionaries to reflect the educated Spaniard's knowledge of sweet potato terms.|000|sweet potato, origin, crop dispersal, South America, Pacific, 4793|Moret2019|Alexander von Humboldt’s Tableau Physique (1807) has been one of the most influential diagrams in the history of environmental sciences. In particular, detailed observations of the altitudinal dis- tribution of plant species in the equatorial Andes, depicted on a cross-section of Mt. Chimborazo, allowed Humboldt to establish the concept of vegetation belt, thereby laying the foundations of biogeography. Surprisingly, Humboldt’s original data have never been critically revisited, probably due to the difficulty of gathering and interpreting dispersed archives. By unearthing and analyzing overlooked historical documents, we show that the top section of the Tableau Physique, above the tree line, is an intuitive construct based on unverified and therefore partly false field data that Hum- boldt constantly tried to revise in subsequent publications. This finding has implications for the documentation of climate change effects in the tropical Andes. We found that Humboldt’s primary plant data above tree line were mostly collected on Mt. Antisana, not Chimborazo, which allows a comparison with current records. Our resurvey at Mt. Antisana revealed a 215- to 266-m altitudinal shift over 215 y. This estimate is about twice lower than previous estimates for the region but is consistent with the 10- to 12-m/ decade upslope range shift observed worldwide. Our results show the cautious approach needed to interpret historical data and to use them as a resource for documenting environmental changes. They also profoundly renew our understanding of Humboldt’s sci- entific thinking, methods, and modern relevance.|000|Alexander von Humboldt, Tablueau Physique, visualization, ecology, history of science, 4794|Spengler2019|The apple (Malus domestica [Suckow] Borkh.) is one of the most economically and culturally significant fruits in the world today, and it is grown in all temperate zones. With over a thousand landraces recognized, the modern apple provides a unique case study for understanding plant evolution under human cultivation. Recent genomic and archaeobotanical studies have illuminated parts of the process of domestication in the Rosaceae family. Interestingly, these data seem to suggest that rosaceous arboreal crops did not follow the same pathway toward domestication as other domesticated, especially annual, plants. Unlike in cereal crops, tree domestication appears to have been rapid and driven by hybridization. Apple domestication also calls into question the concept of centers of domestication and human intentionality. Studies of arboreal domestication also illustrate the importance of fully understanding the seed dispersal processes in the wild progenitors when studying crop origins. Large fruits in Rosaceae evolved as a seed-dispersal adaptation recruiting megafaunal mammals of the late Miocene. Genetic studies illustrate that the increase in fruit size and changes in morphology during evolution in the wild resulted from hybridization events and were selected for by large seed dispersers. Humans over the past three millennia have fixed larger-fruiting hybrids through grafting and cloning. Ultimately, the process of evolution under human cultivation parallels the natural evolution of larger fruits in the clade as an adaptive strategy, which resulted in mutualism with large mammalian seed dispersers (disperser recruitment).|000|apple, domestication, plants, plant genetics, review, 4795|ScottPhillips|Recent years have witnessed an increased interest in the evolution of the human capacity for language. Such a project is necessarily interdisciplinary. However, that interdisciplinarity brings with it a risk: terms with a technical meaning in their own field are used wrongly or too loosely by those from other backgrounds. Unfortunately, this risk has been realized in the case of language evolution, where many of the terms of social evolution theory (reciprocal altruism, honest signaling, etc.) are incorrectly used in a way that suggests that certain key fundamentals have been misunderstood. In particular the distinction between proximate and ultimate explanations is often lost, with the result that several claims made by those interested in language evolution are epistemically incoherent. However, the correct application of social evolution theory provides simple, clear explanations of why language most likely evolved and how the signals used in language – words – remain cheap yet arbitrary.|000|evolution of language, origin of language, social evolution theory, interdisciplinary research, 4796|Wang2019|The salient facial feature discovery is one of the important research tasks in ethni- cal group face recognition. In this paper, we first construct an ethnical group face dataset including Chinese Uyghur, Tibetan, and Korean. Then, we show that the effective sparse sensing approach to general face recognition is not working any- more for ethnical group facial recognition if the features based on whole face image are used. This is partially due to a fact that each ethnical group may have its own characteristics manifesting only in specified face regions. Therefore, we will ana- lyze the particularity of three ethnical groups and aim to find the common charac- terizations in some local regions for the three ethnical groups. For this purpose, we first use the facial landmark detector STASM to find some important landmarks in a face image, then, we use the well-known data mining technique, the mRMR algo- rithm, to select the salient geometric length features based on all possible lines con- nected by any two landmarks. Second, based on these selected salient features, we construct three “T” regions in a face image for ethnical feature representation and prove them to be effective areas for ethnicity recognition. Finally, some extensive experiments are conducted and the results reveal that the proposed “T” regions with extracted features are quite effective for ethnical group facial recognition when the L 2 -norm is adopted using the sparse sensing approach. In comparison to face recognition, the proposed three “T” regions are evaluated on the olivetti research laboratory face dataset, and the results show that the constructed “T” regions for ethnicity recognition are not suitable for general face recognition.|000|facial recognition, racism, artificial intelligence, 4797|Szeto2019|This paper examines the close parallels between the contact phenom- ena in Cantonese-English bilingual children and Southeast Asian creoles, especially in the domain of perfective aspect marking. ‘Already’ is a cross- linguistically common lexical source of perfective aspect markers given its conceptual link with the sense of perfectivity. In contact scenarios involving a European lexifier and Southeast Asian substrates, the development of ‘already’ into a perfective marker is further triggered by the incompatibility between the verbal morphology of the former and the isolating typology of the latter. Adopting an ecological approach to language transmission and creole genesis we discuss how the transient grammaticalization phenomena in the bilingual children can be compared to decreolization, and how the study of bilingual acquisition can contribute to contact linguistics. Despite the preva- lence of unpredictable factors in contact scenarios, we argue that bilingual children can still serve as powerful “laboratories” for studying contact out- comes at the communal level.|000|Cantonese, English, bilingualism, language contact, grammaticalization, age of acquisition, 4798|Grafmiller2018|This special collection brings together research exploring and evaluating probabilistic ­variation patterns from a comparative perspective, thus highlighting current work situated at the c ­ rossroads of research on usage-based theoretical linguistics, variationist linguistics, and sociolinguistics. The contributions in the collection advance our understanding of the plasticity of syntactic knowledge on the part of language users with diverse regional and/or cultural backgrounds, and demonstrate how a probabilistic approach to grammatical variation can offer insight into the scope and limits of language variation. In this general introduction to the special collection, we provide some essential background for perspective, and subsequently summarize the contribu- tions in the collection.|000|linguistic variation, probabilistic variation, probabilistic grammar, background, introduction 4799|Amiridze2019|This issue of the journal Language Typology and Universals deals with contact-induced changes in some of the languages of the Caucasus. The linguistically diverse area of the Caucasus (Comrie 2008) houses three autochthonous language families: Northwest Caucasian (or Abkhaz-Adyghean) (Hewitt 1989; Hewitt 2005), Northeast Caucasian (or Nakh-Daghestanian) (Smeets 1994; Job 2004; van den Berg 2005), and South Caucasian (or Kartvelian) (Harris 1991; Boeder 2005). Throughout the history, languages of the three families have been in contact with each other as well as with other languages of the Caucasus of the Indo-European, Turkic and other origin (Klimov 1994). The widespread multilingualism of most of the speech commu- nities of the Caucasus (Dobrushina 2016) and the wealth and diversity of borrowed material on all levels of grammar (in the form of both matter and/or pattern borrowing (Matras and Sakel 2007a)) make the languages of the area valuable for the theories of language contact as well as for the typology of contact-induced changes.|000|contact area, Caucasian languages, Caucasus, Northwest Caucasian, Northeast Caucasian, South Caucasian, language contact, case study, introduction, 4800|DuninBarkowski2019|The dispersion of estimates of the time to achieve human level AI is discussed at length. Some of the reasons behind this diversity are exposed and thoroughly analyzed. A special role of human language in providing both natural human intelligence and AI is extensively discussed. The more straightforward and expectedly much faster than currently pursued way of proceeding to the goal is dotted and discussed in crude detail.|000|artificial intelligence, human-level artificial intelligence, human language, discussion 4801|Gehrmann2016|This thesis presents an exploration of the historical phonology of the West Katuic language family (< Katuic < Austroasiatic). West Katuic (WK) is divided into two sub-groups called Kuay and Bru (Ferlus 1974a, Diffloth 1982, Sidwell 2005). As no previous publication has concerned itself solely with the comparative phonology of WK, it was determined that this thesis should provide a review of previously published phonological descriptions of WK languages, a reconstruction of the segmental inventories of Proto-West Katuic (PWK), Proto-Kuay (PKuay) and Proto- Bru (PBru), and an isoglossic analysis of the phonological changes apparent in a representative sampling of modern Kuay and Bru languages. Additionally, a word list data collection tool was developed, which is aimed at eliciting etyma from other, previously undocumented WK varieties that will provide data pertinent towards the isoglossic analysis of those varieties.|000|Katuic, Austro-Asiatic, West-Katuic, historical phonology, linguistic reconstruction, comparative wordlist, 4802|Kim2019|How does first-person sensory experience contribute to knowl- edge? Contrary to the suppositions of early empiricist philoso- phers, people who are born blind know about phenomena that cannot be perceived directly, such as color and light. Exactly what is learned and how remains an open question. We compared knowledge of animal appearance across congenitally blind (n = 20) and sighted individuals (two groups, n = 20 and n = 35) using a battery of tasks, including ordering (size and height), sorting (shape, skin texture, and color), odd-one-out (shape), and feature choice (texture). On all tested dimensions apart from color, sighted and blind individuals showed substantial albeit imperfect agree- ment, suggesting that linguistic communication and visual percep- tion convey partially redundant appearance information. To test the hypothesis that blind individuals learn about appearance pri- marily by remembering sighted people’s descriptions of what they see (e.g., “elephants are gray”), we measured verbalizability of animal shape, texture, and color in the sighted. Contrary to the learn-from-description hypothesis, blind and sighted groups dis- agreed most about the appearance dimension that was easiest for sighted people to verbalize: color. Analysis of disagreement patterns across all tasks suggest that blind individuals infer phys- ical features from non-appearance properties of animals such as folk taxonomy and habitat (e.g., bats are textured like mammals but shaped like birds). These findings suggest that in the absence of sensory access, structured appearance knowledge is acquired through inference from ontological kind.|000|blind people, cognition, animal appearance, folk taxonomy, color terms, 4803|Willis2019|This paper introduces the pilot project for the Syntactic Atlas of Welsh Dialects, setting out the procedures for data collection and sketching a case study for one variable within the dataset, namely the patterns of negative concord found with the negative modal cau ‘won’t’. This item is a relatively recent innovation, and it is currently undergoing increasing integration into the negative concord system. The atlas fieldwork establishes current patterns of dialect variation, showing significant age variation indicative of change in progress and the rise of negative concord in this context. On the basis of this, it is argued that the diffuse geographical patterns attested are best interpreted as evidence of multiple innovation across a wide area, with new speakers re-implementing the innovation (“multiple reactuation”). A formal analysis is sketched out, treating the change as moving along a pathway of feature change with semantic features shifting to interpretable syntactic features and then to uninterpretable syntactic ones. This analysis is consistent with the dialect patterns and interspeaker implicational hierarchies found in the data.|000|dialectology, dialect syntax, dialect variation, case study, age variation, language change in progress, 4804|Hall2017|The present study investigates the phonology of glides in Middle High German. On the basis of surface contrasts between prevocalic nuclear glides in syllable-final position ([VG.V]) and postvocalic glides in onset position ([V.GV]), it is argued that the latter were underlying glides (e.g. the /w/ in [le.wə] ‘lion’) and that the former were glides derived from vowels (e.g. the ­offglide [o̯ ] in the diphthong [uo̯ ] from /uo/). Underlying glides are argued to be [+consonantal], while nuclear glides ‒ like the vowels from which they derive ‒ are [‒consonantal]. The analysis of Middle High German bears on several debates involving glides in the theoretical literature. First, a treatment with an underlying glide in /VGV/ cannot be reanalyzed by treating the vowels as peaks (e.g. Harris & Kaisse 1999 for Argentinian Spanish). Second, the treatment of underlying glides as [+consonantal] is to be preferred over alternatives which analyze those sounds as [‒vocalic] (e.g. Nevins & Chitoran 2008 for several languages). Third, an analysis of nuclear structure is adopted (from Harris & Kaisse 1999) which enables one to interpret which element in a complex nucleus is the peak and which is the nonpeak without stipulation. Fourth, the contrastive syllabification of surface glides (i.e. [VG.V] vs. [V.GV]) is shown to be a diagnostic of underlying glide languages that has not been discussed in the literature to date.|000|syllable peak, prosody, prosodic template, glides, Middle High German, diphthongs, nucleus 4805|Ling2018|This study compared the f0, duration and intensity patterns of disyllabic groups that undergo left- or right-dominant tone sandhi in Shanghai Chinese. We investigated the effects of contrastive focus and speech rate on these patterns, in order to further our understanding of the nature of left-/right-dominant sandhi and the phonetic realization mechanisms of focus encoding and speech rate. The results suggest that [Verb+Noun] phrases undergo right-dominant sandhi, which involves a phonetic reduction of the non-final tone as the f0 contour of σ1 is fully realized in focused or slow speech condition; [Adj.+Noun] compounds undergo left-dominant sandhi, which consists phonologically of rightward tone spread, since the f0 pattern of compounds is preserved in all conditions. The focus- induced adjustment patterns of f0, duration and intensity show that left-/right-dominant sandhi domains are composed of different prosodic structures and focus is indirectly encoded via prosodic structure. Although slow speech and focus both enhance the f0 realization, they are realized by way of different mechanisms. Focus is realized via a speaker's direct control of the intensity, while speech rate is realized via a speaker's intentional control of the duration, and the f0 adjustment is better regarded as the surface realization or byproduct of these adjustments.|000|tone sandhi, Shànghǎi dialect, Shanghainese, case study 4806|Chabot2019|The class of rhotics is subject to extensive variation, and a reliable phonetic correlate has not been found. This variation is also why identifying a segment as a rhotic in an unknown language is not a trivial matter. In contrast to other phonological classes whose membership is attributed based on principled criteria, the set of rhotics is arbitrary. This article identifies two properties independent of phonetics which characterize rhotics cross-linguistically procedural stability—rhotics that are implicated in phonological processes can vary in a phonetically arbi- trary manner without perturbing the process itself—and diachronic stability: the phonetics of rhotics can vary in diachronic evolution without impact on their phonotactics. On the empirical side the article establishes a cross-linguistic survey of the phonetic variability of rhotics. It is also argued that the phonetic realization of a rhotic may be unpredictable and divorced from its ­phonological identity and this shows that languages are happy to instantiate an arbitrary phonetics-­phonology ­relationship. Finally, it is argued that rhotics show that the interface which maps phonological objects to their phonetic instantiations is capable of handling an arbitrary relationship. ­Further, there is no reason to assume that this property of the interface is specific to rhotics; in ­principle, all phonetic and phonological categories could enter into an arbitrary relationship. This has ­important implications for theories which seek to impose phonetic or naturalness based ­constraints on phonology: it is difficult to see how the relationship between a phonetic object which has no obvious articulatory connection to its phonological represen- tation could be ­considered phonetically natural. Rhotics thus provide support for the view of substance-free phonology whereby phonological objects are devoid of any reference to phonetic categories.|000|rhotic sounds, definition, typology, sound classes, phonology, phonetics, overview 4807|Swadesh1952|One of the most significant recent trends in the field of prehistory has been the development of objective methods for measuring elapsed time. Where vague estimates and subjective judgments formerly had to serve, today we are often able to determine prehistoric time within a relatively narrow margin of accuracy. This development is im- portant especially because it adds greatly to the possibility of interrelating the separate reconstructions.|452|Morris Swadesh, 4808|Figueira2018|Acouple of months ago, I wrote here on Medium an article on mapping the UK’s traffic accidents hot spots. I was mostly concerned about illustrating the use of the DBSCAN clustering algorithm on geographical data. In the article, I used geographical information published by the UK government on reported traffic accidents. My purpose was to run a density-based clustering process to find the areas where traffic accidents are reported most frequently. The end result was the creation of a set of geo-fences representing these accident hot spots.|000|convex hull, concave hull, visualization, hypergraph, geography, geographic map 4809|Mayrhofer1980|Die Frage, welche Stellung die Etymologie in der Sprachwissenschaft von heute einnehme, und insbesondere die Frage, in welcher Form die etymologische Erforschung des gesamten Lexikons einer Sprache am besten dargeboten werde, finden wir in der neueren Fachliteratur mit rmerwarteter Intensität und Ausführlichkeit gestellt. Dies wird durch die Existenz eines ganzen Buches über etymologische Wörterbücher, des «Versuches einer Typologie» dieser Buchgattung wohl am augenfälligsten, den der bedeutende Romanist Yakov Malkiel @1976 erscheinen ließ.|000|data organization, etymological dictionary, Indo-European linguistics, data managment, standardization, 4810|Nikolaev2019|This paper discusses the impact of linguistic contact on the make-up of consonantal inventories of the languages of Eurasia. New measures for studying the importance of language contact for the development of phonological inventories are proposed, and two empirical studies are reported. First, using two different measures of dissimilar- ity of phonemic inventories (the Jaccard dissimilarity measure and the novel Closest- Relative Cumulative Jaccard Dissimilarity measure), it is demonstrated that language contact—operationalized as languages being connected by an edge in a neighbor network—makes a significant contribution to between-inventory differences when phylogenetic variables are controlled for. Second, a novel measure of the exposure of a language to a particular segment—the Neighbor-Pressure Metric (NPM)—is proposed as a means of quantifying language contact with respect to phonological inventories. It is shown that addition of NPM helps achieve higher prediction accuracy than using bare phylogenetic data and that distributions of different consonants display a different degree of dependence on language-contact processes. Finally, more complex models for predicting consonant inventories are briefly explored, demonstrating the presence of complex non-linear relationships between inventories of neighboring languages.|000|geography, sound inventories, database, language in space, Delaney triangulation, contact zone, 4811|Nikolaev2019|The author defines a *neighbor graph* as a concept to define languages in contact, by using a simplified Delauney triangulation along with a threshold for geographic distance. This seems very sound and easy to apply to other approaches.|000|contact zone, contact area, neighbor graph, language contact, geography, language in space, 4812|Levin2019|The nature of the relationship between the head and modifier in Eng- lish noun compounds has long posed a challenge to semantic theories. We argue that the type of head-modifier relation in an English endocentric noun-headed compound depends on how its referent is categorized: specifically, on whether the referent is conceptualized as an artifact, made by humans for a purpose; or as a natural kind, existing independently of humans. We propose the Events vs. Essences Hypothesis: the modifier in an artifact-headed compound typically refers to an event of use or creation associated with that artifact, while the mod- ifier in a natural kind-headed compound typically makes reference to inherent properties reflective of an abstract essence associated with the kind, such as its perceptual properties or native habitat. We present three studies substantiating this hypothesis. First, in a corpus of almost 1,700 attested compounds in two con- ceptual domains (food/cooking and precious minerals/jewelry), we find that as predicted, compound names referring to artifacts tend to evoke events, whereas compound names referring to natural kinds tend to evoke essential properties. Next, in a production experiment involving compound creation and a compre- hension experiment involving compound interpretation, we find that the same tendencies also extend to novel compounds.|000|compounding, partial colexification, semantics, morphology, word formation, conceptualization, 4813|Alhama2019|We present a critical review of computational models of generalization of simple grammar-like rules, such as ABA and ABB. In particular, we focus on models attempting to account for the empirical results of Marcus et al. (Science, 283(5398), 77–80 1999). In that study, evidence is reported of generalization behavior by 7-month-old infants, using an Artificial Language Learning paradigm. The authors fail to replicate this behavior in neural network simulations, and claim that this failure reveals inherent limitations of a whole class of neural networks: those that do not incorporate symbolic operations. A great number of computational models were proposed in follow-up studies, fuelling a heated debate about what is required for a model to generalize. Twenty years later, this debate is still not settled. In this paper, we review a large number of the proposed models. We present a critical analysis of those models, in terms of how they contribute to answer the most relevant questions raised by the experiment. After identifying which aspects require further research, we propose a list of desiderata for advancing our understanding on generalization.|000|rule learning, machine learning, artificial language learning, review, summary, 4814|Alhama2019|Generally interesting especially also in the context of rule induction, although the topic is a bit different.|000|rule induction, artificial language learning, rule learning, 4815|Korovina2019|В статье рассматривается методика для получения ранжирован- ного списка базисной лексики С. А. Старостина. Показывается, что в некоторых случаях она может быть улучшена. Так, во-первых, кажется, что в случае неполноты данных более точный результат дает деление не на общее число языков, а на число языков, где это слово представлено. Во-вторых, при собственно ранжировании присвоение разных рангов словам с одинаковой устойчивостью в рамках одной семьи при небольшом числе языков в семье может существенно изменить порядок ранжирования. :translation:`The article discusses the method of obtaining a ranked list of the basic vocabulary elaborated by Sergei А. Starostin. Unfortunately, as it seems, the original procedure cannot be completely reproduced. This is mainly due to the fact that some languages do not preserve the original data in the form suggested in Starostin’s work. In particular, the Austro-Asiatic base has undergone significant changes. However, the reconstruction of Starostin’s procedure shows that in some cases it can be improved. So, firstly, it seems that a more accurate result is given not by the total number of languages, but by the number of languages where a given word is present, since otherwise, if the word is not attested in all the languages, significant distortions are possible. E.g. in the Austronesian database, the word warm has a very low stability index because it was attested only in 5 out of 94 languages. Secondly, if one follows the actual ranking method, assigning different ranks to words with the same stability within the same family, in case the latter comprises a small number of languages, can significantly change the ranking order. It seems that when using ranks it is better to assign the same ranks – in one way or another – to such words.` |000|ranked concept list, S. A. Starostin, Swadesh list, basic vocabulary,concept list, 4816|Mufwene2017|As linguists theorize about language endangerment and loss (LEL), we must understand the big picture: the coexistence of languages in particular polities and how the competition that some- times arises is resolved. Many concerns have been voiced about LEL since the early 1990s, but theoretical developments regarding language vitality lag far behind linguists’ current investment in language advocacy. While discussing issues such as the failure to connect the subject matter to language evolution in general, the framing of LEL as deleterious almost exclusively to ‘indige- nous peoples’, a lack of historical time depth, and the omission of the ecological factors in typical approaches to LEL, I argue that linguistics should theorize about language vitality more ade- quately than has been the case to date.|000|language vitality, language endangerment, discussion, 4817|Mufwene2017|I thus pose the following questions, among others: Can we say today that a similar re- search area has developed in linguistics, one that can inform our discourse on language vitality? Can we support empirically the claim that giving up an ancestral or ethnic lan- guage is as disadvantageous to the relevant population as damage done to our natural ecologies? In other words, is language shift as deleterious to the balance of human lives or to our social ecologies as, for instance, deforestation, poaching elephants, killing whales, and destroying corals in the ocean floor are to the equilibrium of our natural ecosystems? Is linguistic diversity as significant to our well-being as biological diversity?|e203|linguistic diversity, biodiversity, biological parallels, 4818|Mufwene2017|Very useful article, as it raises the important question of potential differences between the concept of linguistic diversity and biological diversity. While for linguists the loss of a language is relevant, it may be less relevant for its original speakers, while it seems that this is clearly not the case in the context of biology. Thus, a discussion of what diversity means for languages, and what it means for our cultures is in order and has long been neglected.|000|linguistic diversity, biological diversity, discussion, language endangerment, 4819|Nunez2019|More than a half-century ago, the ‘cognitive revolution’, with the influential tenet ‘cognition is computation’, launched the investigation of the mind through a multidisciplinary endeavour called cognitive science. Despite significant diversity of views regarding its definition and intended scope, this new science, explicitly named in the singular, was meant to have a cohesive subject matter, complementary methods and integrated theories. Multiple signs, however, suggest that over time the prospect of an integrated cohesive science has not materialized. Here we investigate the status of the field in a data-informed manner, focusing on four indicators, two bibliometric and two socio-institutional. These indicators consistently show that the devised multi-disciplinary program failed to transition to a mature inter-disciplinary coherent field. Bibliometrically, the field has been largely subsumed by (cognitive) psychology, and educationally, it exhibits a striking lack of curricular consensus, raising ques- tions about the future of the cognitive science enterprise.|000|discussion, critics, review, cognitive science, interdisciplinary research, 4820|Nunez2019|Cognitive science is a product of the 1950s in North America, when psychology, linguistics and anthropology were redefining themselves and when computer science and neuroscience were emerging on the academic scene.|p2|cognitive science, history of science, origin, 4821|Nunez2019|Potentially very interesting review as it also discusses the origins of cognitive science as an interdisciplinary endeavor involving linguistics.|000|cognitive science, linguistics, discussion, review, critics, history of science, 4822|Chen2019|The primary goal of this article is to apply the comparative method to the reconstruction of Proto- Kampa consonants. Specifically, this entails creating cognate sets using the six Kampa varieties (No- matsigenga, Ashéninka, Pajonal, Asháninka, Kakinte, and Matsigenka) in my data and generating correspondence sets for the consonants. These correspondence sets form the basis for my reconstruc- tion of the Proto-Kampa consonant phonemic inventory and the sound changes that resulted in the diversification of consonant phonemes in the daughter varieties.|000|Kampa, phonological reconstruction, computer-assisted analysis, linguistic reconstruction, sound correspondences, correspondence patterns, 4823|Jaeger2011|@Atkinson<2011>’s article has received considerable public attention and sparked lively discussion among typologists. In this commentary, we focus on potential issues with the statistical procedures employed in the paper. In particular, we investigate to what extent the results are robust once genealogical and ge- ographic relations between languages are taken into account. Such concerns about violations of independence due to the failure to account for relatedness between languages play a central role in quantitative research on typology (e.g., Bell 1978, Dryer 1989, Perkins 1989). We show that the statistical approach taken by Atkinson, linear mixed effect regression, provides a powerful way to control for both genealogical and areal dependencies between languages that has advantages over previous proposals, such as separate regressions by lan- guage family or by continent or limiting oneself to stratified samples. While Atkinson (2011) includes only controls for genetic dependencies in his model, we introduce two simple ways to extend mixed effect models to account for effects of language contact (“areal dependencies”). These approaches also pro- vide an alternative way to account for genetic relations about which there is high uncertainty.|000|mixed models, mixed effect models, phoneme inventory size, genetic inheritance, areal diffusion, Galton's problem, 4824|Cathcart2019|This paper presents a new approach to disen- tangling inter-dialectal and intra-dialectal re- lationships within one such group, the Indo- Aryan subgroup of Indo-European. I draw upon admixture models and deep generative models to tease apart historic language contact and language-specific behavior in the over- all patterns of sound change displayed by Indo-Aryan languages. I show that a “deep” model of Indo-Aryan dialectology sheds some light on questions regarding inter-relationships among the Indo-Aryan languages, and per- forms better than a “shallow” model in terms of certain qualities of the posterior distribu- tion (e.g., entropy of posterior distributions), and outline future pathways for model devel- opment|000|Indo-Aryan, Indo-Aryan languages, dialectology, computational approaches, admixture, language contact, 4825|Schlenker2018b|While it is now accepted that sign languages should inform and constrain theories of ‘Universal Grammar’, their role in ‘Universal Semantics’ has been under-studied. We argue that they have a crucial role to play in the foundations of semantics, for two reasons. First, in some cases sign languages provide overt evidence on crucial aspects of the Logical Form of sentences, ones that are only inferred indirectly in spoken language. For instance, sign lan- guage ‘loci’ are positions in signing space that can arguably realize logical variables, and the fact that they are overt makes it possible to revisit founda- tional debates about the syntactic reality of variables, about mechanisms of temporal and modal anaphora, and about the existence of dynamic binding. Another example pertains to mechanisms of ‘context shift’, which were postu- lated on the basis of indirect evidence in spoken language, but which are arguably overt in sign language. Second, along one dimension sign languages are strictly more expressive than spoken languages because iconic phenomena can be found at their logical core. This applies to loci themselves, which may simultaneously function as logical variables and as schematic pictures of what they denote (context shift comes with some iconic requirements as well). As a result, the semantic system of spoken languages can in some respects be seen as a simplified version of the richer semantics found in sign languages. Two conclusions could be drawn from this observation. One is that the full extent of Universal Semantics can only be studied in sign languages. An alternative possibility is that spoken languages have comparable expressive mechanisms, but only when co-speech gestures are taken into account (as recently argued by Goldin-Meadow and Brentari). Either way, sign languages have a crucial role to play in investigations of the foundations of semantics.|000|sign language, universal grammar, semantics, linguistic typology, universals, 4826|SoederblomSaarela2019|This blog post will discuss some transnational aspects of the history of Mandarin Chinese, what in the twentieth century became codified as the national language of China. I will first briefly discuss what China’s national language is, then look at a few aspects of its history that shows its entanglement with Inner Asian empires, non-Chinese languages, and scholars and students from elsewhere in East Asia and even Europe. Finally, I will discuss some of my own recent and ongoing research in this area, and end on what I think is an exciting avenue for future work. Throughout my post, I will not make a strict separation of linguistic research on the history of the Mandarin language itself and historical research on the production and reception of the documents used to learn Mandarin in the past. I treat them as two aspects of the same story. The sources for the sounds of Mandarin are often the same documents that the historian uses to explore how it was studied.|000|Mandarin, Early Mandarin, Chinese, language history, pǔtōnghuà, history of science, 4827|Aguzzi2019|I'm passionately in favour of everyone having open access to the results of the scientific research that their taxes pay for. But I think there are deep problems with one of the current modes for delivering it. The author-pays model (which I call broken access) means journals increase their profits when they accept more papers and reject fewer. That makes it all too tempting to subordinate stringent acceptance cri- teria to the balance sheet. This conflict of interest has allowed the pro- liferation of predatory journals, which charge authors to publish papers but do not provide the expected services and offer no quality control.|000|open access, discussion, broken access 4828|Henderson2019|In addition to roots for familiar classes like verb, noun, and adjective, Mayan languages have a class of roots traditionally called “positional”. Positional roots are distinct from other roots most prominently in terms of requiring derivation into stems of one of the more familiar categories to be used. The goal of this work is to show that the behavior of positionals follows from ­semantic facts, in particular, the fact that they denote measure functions of type ⟨e,d⟩. This conclusion is supported through a series of novel arguments from the Mayan language Kaqchikel that p ­ ositional roots have a scalar semantics. It then argues for the type ⟨e,d⟩ analysis by ­contrasting them with gradable root adjectives, which similarly make reference to ordered degrees on a scale, but which have a relational type—namely, ⟨d,et⟩. I then show that a core function of p ­ ositional morphology, and the morpheme that derives positional stative predicates in particular, is to take positional roots into stems of type ⟨d,et⟩, which will account for the fact that derived positionals behave semantically like root adjectives. In this way, this work not only presents a novel account of the Mayan data, but provide additional evidence for the proposal that even within languages there can be differences in the fine-grained compositional structure of degree-denoting expressions.|000|Mayan languages, root structure, root, derivation, word formation, 4829|Jiao2018|It is often assumed, explicitly or implicitly, that speakers generate special cues in whispered tone and intonation to make up for the absence of fundamental frequency. The present study examined this assumption with one production and three perception experiments. The production experiment compared duration, intensity, formants and spectral tilt of phonated and whispered Mandarin monosyllabic utterances with four lexical tones spoken as either statements or questions. For tones, no acoustic properties were found to occur only in whispered but not in phonated utterances. For intonation, some spectral tilt measurements differed between the two phonation types. The two tone perception experiments used phonated and whispered utterances as well as amplitude-modulated noise based on those utterances as stimuli. Results show that once turned into amplitude-modulated noise, phonated and whispered tones had similar identification patterns, indicating that the non-F 0 tonal cues in whispers were already in phonated speech. The intonation perception experiment used original utterances as stimuli and showed a substantial drop in overall identification rate and an overwhelming bias towards statement. Thus the spectral tilt differences found in the acoustic analysis were not helpful for intonation perception. Possible reasons for the lack of effective enhancement in whispered speech were discussed.|000|whispered speech, Mandarin, phonetics, experimental phonetics, pragmatics 4830|Saldana2019|Compositional hierarchical structure is a prerequisite for productive languages; it allows language learners to express and understand an infinity of meanings from finite sources (i.e., a lexicon and a grammar). Understanding how such structure evolved is central to evolutionary linguistics. Previous work combining artificial language learning and iterated learning techniques has shown how basic compositional structure can evolve from the trade-off between learnability and expressivity pressures at play in language transmission. In the present study we show, across two experiments, how the same mechanisms involved in the evolution of basic compositionality can also lead to the evolution of compositional hierarchical structure. We thus provide experimental evidence showing that cultural transmission allows advantages of compositional hierarchical structure in language learning and use to permeate language as a system of behaviour.|000|compositionality, hierarchical structure, language origin, artificial language learning, 4831|Ventresca2019|The pace of transmission of domesticated cereals, including millet from China as well as wheat and barley from southwest Asia, throughout the vast pastoralist landscapes of the Eurasian Steppe (ES) is unclear. The rich monumental record of the ES preserves abundant human remains that provide a temporally deep and spatially broad record of pastoralist dietary intake. Calibration of human δ 13 C and δ 15 N values against isotope ratios derived from co-occurring livestock distinguish pastoralist consumption of millet from the products of livestock and, in some regions, identify a considerable reliance by pastoralists on C 3 crops. We suggest that the adoption of millet was initially sporadic and consumed at low intensities during the Bronze Age, with the low-level consumption of millet possibly taking place in the Minusinsk Basin perhaps as early as the late third millennium cal BC. Starting in the mid-second millennium cal BC, millet consumption intensified dramatically throughout the ES with the exception of both the Mongolian steppe where millet uptake was strongly delayed until the end of first millennium cal BC and the Trans-Urals where instead barley or wheat gained dietary prominence. The emergence of complex, trans-regional political networks likely facilitated the rapid transfer of cultivars across the steppe during the transition to the Iron Age.|000|millet, pastoralism, Eurasia, domestication, cereals South-East Asia, 4832|Reid2018|This paper explores various problems in modeling the Philippine linguistic situation. Simple cladistic models are valuable in modeling proposed genetic relationships based on the results of the comparative-historical method, but are problematic when dealing with the languages of Negrito groups that adopted Austronesian languages. They are also problematic in dealing with networking as the result of dialect chaining, and widespread lexical borrowing from non-Austronesian languages, each of which creates special problems in modeling the Philippine linguistic situation.|000|Philippines, Austronesian, language contact, family tree, wave theory, dialect chain, 4833|FreckletonJetz2009|Variation in traits across species or populations is the outcome of both environmental and historical factors. Trait variation is therefore a function of both the phylogenetic and spatial context of species. Here we introduce a method that, within a single framework, estimates the relative roles of spatial and phylogenetic variations in comparative data. The approach requires traits measured across phylogenetic units, e.g. species, the spatial occurrences of those units and a phylogeny connecting them. The method modifies the expected variance of phylogenetically independent contrasts to include both spatial and phylogenetic effects. We illustrate this approach by analysing cross-species variation in body mass, geographical range size and species-typical environmental temperature in three orders of mammals (carnivores, artiodactyls and primates). These species attributes contain highly disparate levels of phylogenetic and spatial signals, with the strongest phylogenetic autocorrelation in body size and spatial dependence in environmental temperatures and geographical range size showing mixed effects. The proposed method successfully captures these differences and in its simplest form estimates a single parameter that quantifies the relative effects of space and phylogeny. We discuss how the method may be extended to explore a range of models of evolution and spatial dependence.|000|Galton's problem, contact, inheritance, phylogenetic reconstruction, computational approaches, 4834|FreckletonJetz2009|This approach is an example on how trees cannot efficiently be used to study contact, since phylogenies usually do not show much contact, while interesting traits showing contact can more efficiently be found in linguistics by searching directly for indicators of langauge contact, such as lexical borrowing, for example.|000|historical similarities, phylogenetic reconstruction, trait similarity, geographic distance, genetic inheritance, 4835|Bodt2019|Although it is well‐known to most historical linguists that the comparative method could in principle be used to predict hitherto unobserved words in genetically related languages, the task of word prediction is rarely discussed in the linguistic literature. Here, we introduce 'reflex retrodiction' as a new task for historical linguistics and report on an ongoing experiment in which we use a computer‐assisted workflow to retrodict reflexes for so far unobserved words in eight varieties of Western Kho‐Bwa (a subgroup of Sino‐Tibetan). Since, at the time of writing this report, the experiment is still ongoing, we do not report concrete results, but instead provide an estimate of our expectations by testing the performance of the computational part of our workflow on existing language data. Our results suggest that reflex retrodiction has the potential of becoming a useful tool for historically oriented fieldwork.|000|prediction, comparative method, sound correspondences, correspondence patterns, Kho-Bwa, Sino-Tibetan 4836|Starostin2007|I have thought much about how we should explain this process of "aging of words", and in fact I think I know the general answer, which, I [pb] suspect is acquiring additional meanings. When a word, e.g., «head», starts being used with this meaning, its usage is basically restricted to it. With time it develops additional meanings («head of state», «head of team»; in some languages also «head of text», i.e., «chapter» etc. etc.). At some point the additional meanings start to «outweigh» the original meaning -- since too much polysemy for a basic word becomes dangerous for the communication process, and the word tends to be replaced by a less polysemantic synonym. |857f|word age, lexicostatistics, lexical replacement, hypothesis, word aging hypothesis, S. A. Starostin, 4837|Amery2000|The methods afforded by comparative/historical linguistics provide insights into aspects of the Kaurna language as it was recorded. Further, these methods allow some of the gaps in the language to be filled in sound and well-motivated ways, drawing on the available evidence from the Kaurna sources themselves and those of neighbouring languages. Where a lexical gap exists, for instance, in some cases a proto form can be reconstructed for the Yura subgroup of languages to which Kaurna belongs, and the expected reflex of this proto-form in Kaurna can be reconstructed. This reconstructed Kaurna reflex may or may not have actually existed in the Kaurna language. It may well have been part of the repertoire of speakers in the 1830s and 1840s, but for one reason or another observers simply failed to record it. Of course it is quite possible that Kaurna had an entirely different word for the particular concept when compared with neighbouring languages, there is simply no way of knowing.|36|language revival, language death, prediction, retrodiction, reflex retrodiction, Kaurna, example 4838|Knobloch2019|In this thesis, 9 Indo-Aryan languages which have previously been classified as Shina languages were analyzed. A cognate analysis of basic vocabulary was conducted, in order to explore the relatedness of the languages. Furthermore, a selection of phonological, morphological, syntactic, and lexical fea- tures was analyzed, in order to explore areal patterns among the languages. The data mainly consisted of first-hand data, which has been collected for the project ”Language contact and relatedness in the Hindu Kush region”, but even previous descriptions of the languages were used. The results primarily confirmed hypotheses about the relatedness of the Shina languages, and showed interesting areal pat- terns. The data also suggested that the Shina languages share many typical features with other Hindu Kush Indo-Aryan languages, such as SOV word order, the use of postpositions, sex based grammati- cal gender, and moderately complex to complex syllable structures. Other features, such as aspiration, retroflexion, and case alignment in noun phrases showed more variation and could certainly be relevant for future studies on these languages.|000|Shina, Indo-Aryan languages, typological study, splits networks, feature data 4839|Anderson2018|Contrary to what non-practitioners might expect, the systems of phonetic notation used by linguists are highly idiosyncratic. Not only do various linguistic subfields disagree on the specific symbols they use to denote the speech sounds of languages, but also in large databases of sound inventories considerable variation can be found. Inspired by recent efforts to link cross-linguistic data with help of reference catalogues (Glottolog, Concepticon) across different resources, we present initial efforts to link different phonetic notation systems to a catalogue of speech sounds. This is achieved with the help of a database accompanied by a software framework that uses a limited but easily extendable set of non-binary feature values to allow for quick and convenient registration of different transcription systems, while at the same time linking to additional datasets with restricted inventories. Linking different transcription systems enables us to conveniently translate between different phonetic transcription systems, while linking sounds to databases allows users quick access to various kinds of metadata, including feature values, statistics on phoneme inventories, and information on prosody and sound classes. In order to prove the feasibility of this enterprise, we supplement an initial version of our cross-linguistic database of phonetic transcription systems (CLTS), which currently registers five transcription systems and links to fifteen datasets, as well as a web application, which permits users to conveniently test the power of the automatic translation across transcription systems.|000|dataset, IPA, transcription systems, 4840|Gerstenberg2019|Our introduction to the special collection gives an overview of the research projects which were originally presented at the third CLARe network conference. We group the research under four cross-sectional topics that unite the different contributions: the data used in the research, the theoretical frameworks, the languages and varieties which are represented and the situational contexts which are examined. These projects represent the current state of research in this field and allows the reader to orient themselves within this diverse field but also leaves many questions open and provides impetus for future lines of research. The interaction and collaboration between diverse disciplines is the central aspect which unites all contributions to the special collection.|000|language change, language variation, age, aging, 4841|Hakenbeck2019|Recent advances in archeogenetics have revived an interest in grand narratives in which ethnic groups are once again thought to be agents of historical change. New scientific developments are generating a sense of optimism that difficult questions in palaeodemography may at last be solved. However, genetic research often uncritically makes use of essentialist models of past populations, reifying genetic populations as ethnic groups. This paper explores how such views of the past may play into notions of racial purity and fears of non-European migrants stoked by adherents of far-right ideologies|000|critics, archaeogenetics, ethnicity, 4842|Chaabouni2019|Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language. One fundamental characteristic of the latter, known as Zipf’s Law of Abbreviation (ZLA), is that more frequent words are efficiently associated to shorter strings. We study whether the same pattern emerges when two neural networks, a “speaker” and a “listener”, are trained to play a signaling game. Surprisingly, we find that networks develop an anti-efficient encoding scheme, in which the most frequent inputs are associated to the longest messages, and messages in general are skewed towards the maximum length threshold. This anti- efficient code appears easier to discriminate for the listener, and, unlike in human communication, the speaker does not impose a contrasting least-effort pressure towards brevity. Indeed, when the cost function includes a penalty for longer messages, the resulting message distribution starts respecting ZLA. Our analysis stresses the importance of studying the basic features of emergent communication in a highly controlled setup, to ensure the latter will not strand too far from human language. Moreover, we present a concrete illustration of how different functional pressures can lead to successful communication codes that lack basic properties of human language, thus highlighting the role such pressures play in the latter.|000|neural networks, language simulation, language emergence, 4843|Cavique2018|The comprehension of social network phenomena is closely related to data visualization. However, even with only hundreds of nodes, the visualization of dense networks is usually difficult. The strategy adopted in this work is data reduction using communities. Community detection in social network analysis is a very important issue and in particular detection of com- munity overlapping. In this approach, the information extracted from social networks transcends cohesive groups, enabling the discovery of brokers that interact among communities. To find admissible solutions in hard problems, relaxed approaches are used. Quasi-cliques are generated, and partition is found using a partial set-covering heuristic. The proposed method allows the identification of communities and actors that link two or more groups. In the visualization process, the user can choose different dimension reduction approaches for the condensed graph. For each condensed structure, a hypergraph can be drawn, identifying communities and brokers.|000|graph theory, introduction, hypergraph, community detection, fuzzy clustering, visualization, 4844|Beakins2019|We report on the rapid birth of a new language in Australia, Gurindji Kriol, from the admixture of Gurindji and Kriol. This study is the first investigation of contact-induced change within a sin- gle speaker population that uses multiple variants. It also represents an innovative modification of the Wright-Fisher population genetics model to investigate temporal change in linguistic data. We track changes in lexicon and grammar over three generations of Gurindji people, using data from seventy-eight speakers coded for their use of Gurindji, Kriol, and innovative variants across 120 variables (with 292 variants). We show that the adoption of variants into Gurindji Kriol was not random, but biased toward Kriol variants and innovations. This bias is not explained by simplifi- cation, as is often claimed for contact-induced change. There is no preferential adoption of less complex variants, and, in fact, complex Kriol variants are more likely to be adopted over simpler Gurindji variants.|000|Gurindji Kriol, Australian languages, creole languages, creolisation, complexity, lexical borrowing, 4845|Beakins2019|This is in some sense an example for a study of borrowing in progress. |000|borrowing, lexical borrowing, language change in progress, 4846|Dahl2000|As the title suggests, this chapter is about the ways in which speakers of European languages talk about the future; more specifically, the grammatical devices that are used in doing so. At the centre of the investigation, we will necessarily find the things traditionally called future tenses. Since their theoretical status has been the object of considerable controversy, and since we want to be open for other potentially inter- esting phenomena, the delimitation of the area of study is kept deliberately vague.|000|future, future tense, grammar, European languages, 4847|Haspelmath2009a|Framework-free grammatical description/analysis and explanation is argued here to be superior to framework-bound analysis because all languages have differ- ent catgeories, and languages should be described in their own terms. Frame- works represent aprioristic assumptions that are likely to lead to a distorted description of a language. I argue against restrictive theoretical frameworks of the generative type, against frameworks of functional approaches such as Func- tional Grammar and Role and Reference Grammar, and against Basic Linguistic Theory|000|grammatical description, g-linguistics, general linguistics, terminology, methodology, 4848|Kretzschmar2018|In the history of linguistics there have been crucial moments when those of us inter- ested in language have essentially changed the way we study our subject. We stand now at such a moment. In this presentation I will review the history of linguistics in order to highlight some past important changes in the field, and then turn to where we stand now. Some things that we thought we knew have turned out not to be true, like the systematic, logical nature of languages. Other things that we had not suspect- ed, like a universal underlying emergent pattern for all the features of a language, are now evident. This emergent pattern is fractal, that is, we can observe the same distri- butional pattern in frequency profiles for linguistic variants at every level of scale in our analysis. We also have hints that time, as the persistence of a preference for par- ticular variants of features, is a much more important part of our language than we had previously believed. We need to explore the new realities of language as we now understand them, chief among them the idea that patterned variation, not logical sys- tem, is the central factor in human speech. In order to account for what we now un- derstand, we need to get used to new methods of study and presentation, and place new emphasis on different communities and groups of speakers. Because the under- lying pattern of language is fractal, we need to examine the habits of every group of speakers at every location for themselves, as opposed to our previous emphasis on overall grammars. We need to make our studies much more local, as opposed to global. We do still want to make grammars and to understand language in global terms, but such generalizations need to follow from what we can now see as the pat- tern of language as it is actually used.|000|p-linguistics, discussion, linguistic variation, 4849|Kretzschmar2018|A development in 2017 that has had a striking affect in America is the new idea of “alternative facts” or “fake news.” These two labels actually go to- gether. We have all experienced the information explosion on the Internet, which is mostly a good thing but which also raises the problem of authority.|1|fake news, nice quote, alternative facts, scientific practice, 4850|Kaminsky2004|During speech acquisition, children form quick and rough hypotheses about the meaning of a new word after only a single exposure—a process dubbed “fast mapping.” Here we provide evidence that a border collie, Rico, is able to fast map. Rico knew the labels of over 200 different items. He inferred the names of novel items by exclusion learning and correctly retrieved those items right away as well as 4 weeks after the initial exposure. Fast mapping thus appears to be mediated by general learning and memory mechanisms also found in other animals and not by a language acquisition device that is special to humans.|000|dog language, animal cognition, dog, 4851|Zhang2019a|In the brain, the semantic system is thought to store concepts. However, little is known about how it connects different concepts and infers semantic relations. To address this question, we collected hours of functional magnetic resonance imaging (fMRI) data from human subjects listening to natural stories. We developed a predictive model of the voxel-wise response, and further applied it to thousands of new words. We found that both semantic categories and relations were represented by spatially overlapping cortical networks, instead of anatomically segregated regions. Importantly, many such semantic relations that reflected conceptual progression from concreteness to abstractness were represented by a similar cortical pattern of anti-correlation between the default mode network and the frontoparietal attention network. Our results suggest that the human brain represents a continuous semantic space and uses distributed networks to encode not only concepts but also relationships between concepts. In particular, the default mode network plays a central role in semantic processing for abstraction of concepts across various domains.|000|semantic relations, neurolinguistics, 4852|Alexiadou2019|The paper investigates two related questions that concern the realization of plural morphology on nouns across languages. The first question is whether markedness in the sense of complexity in form goes hand in hand with complex- ity in meaning. In other words, since plural nouns are formally more complex than singular nouns, does that mean that they differ in interpretation? On the basis of experimental and theoretical investigations the claim is supported that plurals, although morphologically more complex than singulars, are semantically unmarked across languages. The second question is what regulates the presence of plural morphology in numeral-noun constructions across languages, in light of the proposal that plural appears on nouns in such constructions only if it is semantically unmarked. The paper offers an explanation of this distribution by adopting a dual system of agreement, which distinguishes between CONCORD and INDEX features. By looking at these two questions, the paper makes a contri- bution to the discussion of the relationship between semantic and morphological markedness.|000|markedness, overt marking, iconicity, frequency, typology, morphology, semantics 4853|Fitch2009|Darwin's "Origin of Species" (Darwin, 1859) made little mention of human evolution. This initial avoidance of human evolution was no oversight, but rather a carefully calculated move: Darwin was well aware of the widespread resistance his theory would meet from scientists, clergymen, and the lay public, and mention of human evolution might have generated insuperable opposition. But Darwin's many opponents quickly seized on the human mind, and language in particular, as a potent weapon in the battle against Darwin's new way of thinking. Alfred Wallace, whose independent discovery of the principle of natural selection spurred Darwin into finally publishing his long-developing "outline" of the theory in 1859, didn't help by arguing that natural selection was unable to explain the origins of the human mind. Although Wallace had reservations about all evolutionary approaches to the mind, human language provided the most powerful argument, due to the respectable position of linguistics and philology in Victorian science.|000|ursprache, origin of language, 4854|Wade2011|Chaser, a border collie who lives in Spartanburg, S.C., has the largest vocabulary of any known dog. She knows 1,022 nouns, a record that displays unexpected depths of the canine mind and may help explain how children acquire language.|000|dog, animal cognition, dog language, animal communication 4855|Sabino1992|El objetivo principal de este texto es presentar, en forma clara y completa, una guía básica para quienes se inicien en la aventura de la investigación científica. La ciencia, como forma de conocimiento que predomina en el mundo contemporáneo, es creada por una labor multifacética que se desarrolla en centros e institutos, en universidades, empresas y laboratorios. Este trabajo de investigación, cuyo producto es el conocimiento científico y tecnológico que de modo tan profundo ha cambiado nuestro modo de vida, posee la peculiaridad de requerir -a la vez- creatividad, disciplina de trabajo y sistematicidad.|000|research practice, scientific practice, 4856|Tomasello1992|During the second year of his daughter's life, Michael Tomasello kept a detailed diary of her language, creating a rich database. He made a careful study of how she acquired her first verbs and analyzed the role that verbs played in her early grammatical development. Using a Cognitive Linguistics framework, the author argues persuasively that the child's earliest grammatical organization is verb- specific (the Verb Island hypothesis). He argues further that early language is acquired by means of very general cognitive and social—cognitive processes, especially event structures and cultural learning. The richness of the database and the analytical tools used make First verbs a particularly useful and important book for developmental psychologists, linguists, language development research- ers, and speech pathologists.|000|language acquisition, corpus studies, concept list, 4857|Samaja1999|Book provides an introduction to epistemology and methodology for scientific practice.|000|handbook, introduction, methodology, scientific practice, epistemology, research 4858|Hocket1960|Some three or four thousand human languages are spoken in the world today. Each is a communicative system the conventions of which are shared more or less precisely by a group of human beings. The small number of languages on which we have fairly adequate information show wide variation in many respects, and as reports on other languages become available the range of known variation increases. Yet, in the face of the variety, we are confident that all languages share certain basic design features.|000|design feature, human language, animal communication, 4859|Hocket1960|Discusses 13 design features of spoken language.|000|spoken language, design feature, human language, animal communication, 4860|Vogel2019|This explorative study focuses on grammatical taboos in German, mor- phosyntactic constructions which are subject to stigmatisation, as they regularly occur in standard languages. They are subjected to systematic experimental test- ing in a questionnaire study with gradient rating scales on two salient and two non-salient grammatical taboo phenomena of German. The study is divided into three subexperiments with different judgement types, an aesthetic judgement, a norm-oriented judgement and the sort of possibility judgement that comes clos- est to linguists’ understanding of grammar. Included in the investigated material are also examples of ordinary gradient grammaticality: unmarked, marked and ungrammatical sentences. The empirical characteristics of grammatical taboos are compared to those ordinary cases with the finding that they are rated at the level of markedness, but differ from ordinary markedness in that they produce a different pattern of between-subject variance. In addition, we find that grammat- ical taboos have a particular disadvantage under the aesthetic judgement type. The paper also introduces the concept of empirical grammaticality as a necessary theoretical cornerstone for empirical linguistics. Methodically, the study applies a mix of parametric and non-parametric methods of statistical analysis.|000|taboo, German, morphosyntax, acceptability judgments, experimental study, 4861|Vogel2019|The initial motivation for the study presented in this paper is a problem that every- one is facing who carries out empirical studies on grammaticality: the unknown influence of prescription and ideological bias on the outcomes of such studies. The participants of elicitation experiments, typically linguistically naïve, rarely understand the difference between the “natural” rules and constraints of their [pb] language’s grammar (which linguists are interested in) and prescriptive constraints (which often are seen as uninteresting artefacts, not only by theoretical linguists).|37f|grammaticality, acceptability judgments, 4862|Vogel2019|The increasing use of experimental research methods in grammatical theory over the last twenty years has called into question a stance towards linguistic data which by and large is based on expert knowledge and expert consent. But the enthusiasm of the early revolutionary phase 2 has somewhat cooled down recently, due mainly to the insight that informal expert judgements and results from grammaticality experiments on the same data usually converge, as has convincingly been shown by Sprouse et al. (2013) who report a convergence rate of about 95 % for a huge sample of grammaticality judgements from ten volumes of Linguistic Inquiry.|40|grammaticality, acceptability judgments, 4863|Wang1967|The strongest evidence in support of binary features comes from alternations that may be called 'flip-flops.'. These are cases where, in certain linguistic environments, the high tones become low tones and the low tones become high tones. Such alternations have been reported for many Chinese dialects, as well as for other languages. In some cases theres alternations are synchronic; in others, they are deducible only historically by comparing cognates.|102|definition, flip-flop, tone, terminology, tone alternation, tone analysis, 4864|Wang1967|Assuming that there are no other relevant factors, it is difficult to see how a historical flip-flop such as one between high tone and low tone can be brought about without the two tones merging with each other at some stage of change, if one views this type of change as occuring in small, cumulative phonetic increments. The possibility is always open, of course, for one of the tones to become something else during ian intermediate stage to avoid the collision course.|102|flip-flop, tone change, 4865|Vogel2019|Very interesting article explaining experimental research into acceptability judgments of grammar and also the problem of grammatical taboo in German, which relates to cases where speakers think they should not speak in a certain manner, although they in fact speak as such in certain cases.|000|grammatical taboo, taboo, morphosyntax, German, Standard German, acceptability judgments, 4866|Schroeder2018|Nahezu vergessen ist in der öffentlichen Diskussion um die Promotions- skandale der vergangenen Jahre, dass der Handel mit Doktordiplomen nicht allein in Deutschland, sondern in ganz Europa und Amerika auf eine lange Geschichte zurückblicken kann. Als an der Heidelberger Juristischen Fa- kultät im Jahr 1521 an einem einzigen Tag 15 Kandidaten „öffentlich und feierlich“ das Doktorat verliehen werden sollte, reagierte Kurfürst Ludwig V. als „patron“ seiner Universität äußerst ungehalten auf diese „Massengradu- ierung“. Insbesondere nahm er Anstoß daran, dass sich unter den künftigen „doctores“ manche befanden, „die noch viel wenigern dan dis stands ihres alters und lere halben wurdig seien.“ Nicht ohne Grund befürchtete er, dass durch ein solches Spektakel „nit allein uch und gantzer universitet, sonder auch uns und unsern furfaren ... auch dem gantzen furstenthumb zu merg- licher schmehe nachret verachtung und schmelerung reichende.“ Gefährdet sah er „ere, ruhm und preiß“ der Rupertina und drohte, das „hoher zu hert- zen zu faßen gegen der facultät oder den personen, so sollich dinge uben, zu straffen handeln und furzunemen, das unser mißfall gespurt werden soll.“ Mit dem Befehl, sich an die Statuten zu halten, untersagte er der Fakultät ihr Vorgehen|000|skandal, doctorate, PhD, plagiarism, history, scientific practice, 4867|Mischke2019|The Chinese empire experienced a large expansion to the arid regions in the west during the Han Dynasty (206 BCE–220 CE). The Hexi Corridor, the Yanqi Basin, the southeastern part of the Junggar Basin and the Tarim Basin became part of the empire. The expansion of the Han Dynasty was accompanied by the significant intensification of irrigation farming along rivers draining the Qilian, Tianshan and Kunlun Mountains. Sedimentological and geochemical analyses and dating of lake sediments and shorelines revealed that four large lakes in the region experienced falling levels, or were almost or completely desiccating. The level of Zhuyeze Lake was falling rapidly ca. 2100 years before present (a BP), and the accumulation of lake sediments was replaced by an alluvial fan setting in large parts of the basin. Lake Eastern Juyan desiccated ca. 1700 a BP. Lake Bosten experienced low levels and increasing salinities at ca. 2200 a BP. Lake sediments in the Lop Nur region were mostly replaced by aeolian sands during a period of near-desiccation at 1800 a BP. In contrast, records from fifteen lakes farther in the west, north or south of the Han Dynasty realm indicate relatively wet climate conditions ca. 2000 years ago. Thus, dramatic landscape changes including the near and complete desiccation of large lakes in the arid western part of today’s China probably resulted from the withdrawal of water from tributaries during the Han Dynasty. These changes likely represent the earliest man-made environmental disasters comparable to the recent Aral-Sea crisis.|000|Hàn time, climate, irrigation farming, Silk Road, history 4868|Pierrard2018|Cette thèse porte sur le quechua bolivien méridional et ses aspects linguistiques, historiques et sociolinguistiques. Au niveau historique, j’y défends un modèle de diffusion centrifuge hiérarchique urbaine relativement tardive (17 ème et 18 ème siècles) et d’une hispanisation précoce de cette variété de quechua 2 C , avec pour centre principal de diffusion la ville impériale minière de Potosí. L’articulation entre une approche émique (dialectologie perceptive) et étique (sociolinguistique variationniste) m’a par ailleurs mené à proposer une hiérarchisation sociolectale entre deux variétés de quechua bolivien reposant largement sur la perception d’une plus ou moins grande hispanisation. Deux variables linguistiques particulièrement saillantes de cette structuration ont été retenues pour l’étude de la région du Valle Alto de Cochabamba. Les variantes à voyelles basses du morphème du pluriel inclusif C HIK , [čeχ], [čaχ], autrefois prestigieuses et en passe de s’imposer sur la variante haute [čis], associée à la ruralité, connaissent aujourd’hui un fort recul suite aux bouleversements socioéconomiques et migratoires des 80 dernières années. Dans le même temps, en production, la distribution des variantes rurales [ʃa] et des variantes urbaines [sqa], [sa] du morphème du progressif C HKA , demeure globalement stable. L’interprétation proposée est le manque de saillance de la variable dû à l’absence d’opposition entre sibilantes alvéolaire et post-alvéolaire en quechua 2 C et à un phénomène de quasi fusion des allomorphes en perception.|000|Quechua, diffusion, sociolinguistics, South American languages, 4869|Ratliff2010|Determination of how Hmong-Mien is connected to its neighbors through contact -- and how it may be related to its neighbors by descent form a common ancestor -- represents the greates and most exciting challenge for future research. Work on strata of loanwords from different Chinese languages on the model of Downer 1973 will be best carried out by those who are specialists in the history of Chinese. And more evidence of common ground between Hmong-Mien and each of the other language families of the south -- Mon-Khmer, [pb] Tai-Kadai, Tibeto-Burman, and Austronesian -- needs to be added to what has been collected so far, and this evidence needs to be subjected to various tests designed to help discriminate between inheritance and contact (@Thomason1988, @Ratliff2005). Although the Hmong-Mien language family is comparatively small in size, its central geographic position ensures that this work will be crucial to our understanding of the prehistory of southern China and northern Southeast Asia.|8f|stratification, strata, borrowing, Hmong-Mien, language contact 4870|Ratliff2010|Book introduces a reconstruction of Hmong-Mien languages.|000|stratification, strata, borrowing, Hmong-Mien, language contact 4871|Ratliff2010|The purpose of this book is to present a new reconstruction of Proto Hmong-Mien, the ancestral language of modern-day speakers of the Hmonginc (Miao) and Mienic (Yao) langauges of southern China and Southeast Asia, and a set of discussions on topics relevant to the historical development of these languages.|8f|stratification, strata, borrowing, Hmong-Mien, language contact 4872|Browne2019|This report summarises the Digital Ludeme Project, a recently launched 5-year research project being conducted at Maastricht University. This computational study of the world’s traditional strategy games seeks to improve our understanding of early games, their development, and their role in the spread of related mathematical ideas throughout recorded human history.|000|annotation, games, database, horizontal influence map, 4873|Jacques2019a|Skepticism regarding the tree model has a long tradition in historical linguistics. Although scholars have emphasized that the tree model and its long-standing counterpart, the wave theory, are not necessarily incompatible, the opinion that family trees are unrealistic and should be completely abandoned in the field of historical linguistics has always enjoyed a certain popularity. This skepticism has further increased with the advent of recently proposed techniques for data visualization which seem to confirm that we can study language history without trees. In this article, we show that the concrete arguments that have been brought up in favor of achronistic wave models do not hold. By comparing the phenomenon of incomplete lineage sorting in biology with processes in linguistics, we show that data which do not seem as though they can be explained using trees can indeed be explained without turning to diffusion as an explanation. At the same time, methodological limits in historical reconstruction might easily lead to an overestimation of regularity, which may in turn appear as conflicting patterns when the researcher is trying to reconstruct a coherent phylogeny. We illustrate how, in several instances, trees can benefit language comparison, although we also discuss their shortcomings in modeling mixed languages. While acknowledging that not all aspects of language history are tree-like, and that integrated models which capture both vertical and lateral language relations may depict language history more realistically than trees do, we conclude that all models claiming that vertical language relations can be completely ignored are essentially wrong: either they still tacitly draw upon family trees or they only provide a static display of data and thus fail to model temporal aspects of language history. |000|incomplete lineage sorting, phylogenetic reconstruction, family tree, history of science, discussion, historical glottometry, 4874|Feldmann2019|The ancient Mediterranean port city of Ashkelon, identified as “Philistine” during the Iron Age, underwent a marked cultural change between the Late Bronze and the early Iron Age. It has been long debated whether this change was driven by a substantial movement of people, possibly linked to a larger migration of the so-called “Sea Peoples.” Here, we report genome-wide data of 10 Bronze and Iron Age individuals from Ashkelon. We find that the early Iron Age population was genetically distinct due to a European-related admixture. This genetic signal is no longer detectible in the later Iron Age population. Our results support that a migration event occurred during the Bronze to Iron Age transition in Ashkelon but did not leave a long-lasting genetic signature.|000|Philistines, Askhkelon, Iron Age, archaeogenetics, archaeology, 4875|Verkeerk2019|Recent applications of phylogenetic methods to historical linguistics have been criticized for assuming a tree structure in which ancestral languages differentiate and split up into daughter languages, while language evolution is inherently non-tree-like (@Francois2014; @Blench2015 : 32–33). This article attempts to contribute to this debate by discussing the use of the multiple topologies method (@Pagel<2006a> & Meade 2006a) implemented in BayesPhyloge- nies (Pagel & Meade 2004). This method is applied to lexical datasets from four different language families: Austronesian (Gray, Drummond & Green- hill 2009), Sinitic (Ben @Hamed<2006> & Wang 2006), Indo-European (Bouckaert et al. 2012), and Japonic (@Lee<2011> & Hasegawa 2011). Evidence for multiple topologies is found in all families except, surprisingly, Austronesian. It is suggested that reticulation may arise from a number of processes, including dialect chain break-up, borrowing (both shortly after language splits and later on), incomplete lineage sorting, and characteristics of lexical datasets. It is shown that the multiple topologies method is a useful tool to study the dynamics of language evolution.|000|lateral transfer, language contact, Bayesian analysis, Chinese, Japonic, Indo-European, 4876|Shilton2019|Recently, a growing number of studies have considered the role of language in the social transmis- sion of tool-making skill during human evolution. In this article, I address this question in light of a new theory of language and its evolution, and review evidence from anthropology and experimental archaeology related to it. I argue that the specific function of language—the instruction of imagination—is not necessary for the social transmission of tool-making skill. Evidence from hunter- gatherer ethnographies suggests that social learning relies mainly on observation, participation, play, and experimentation. Ethnographies of traditional stone cultures likewise describe group activities with simple, context-bound interactions embedded in the here and now. Experiments comparing ges- tural and verbal teaching of tool-making skills also demonstrate that language is not necessary for that process. I conclude that there is no convincing evidence that language played an important role in the social transmission of lithic technology, although the possibility that linguistic instruction was involved as part of the social interactions accompanying tool-making cannot be excluded.|000|lithic technology, language, imitation, tool making, origin of language, 4877|Herce2019|Regularity and irregularity are among the most widely invoked notions in linguistics. The terms are backed up by a long and venerable tradition, and yet (or maybe therefore) different disciplines and authors seem to be using them for very different phenomena and in very different ways. The most fre- quent usage conflates or replaces other notions such as type frequency, pro- ductivity, (non-)concatenative morphology, storage vs. computation, predictability, etc. An assessment of these and other variables in Icelandic verbal inflection reveals that most of them are in practice strongly corre- lated. I conclude, however, that this is largely unsurprising by virtue of the definitional dependencies holding between those notions. It is empirically doubtful whether there exists a single underlying phenomenon or category which the terms designate. In addition, given their multiple and overlapping senses, and the existence of separate, unambiguous labels for the relevant underlying notions, I contend that the terms ‘regular’ and ‘irregular’ should be ideally abandoned in scientific literature in order to avoid ambiguity, sloppy reasoning and misunderstandings and to facilitate cross-linguistic comparison and interdisciplinary dialogue.|000|irregularity, irregular forms, discussion, theoretical problems, 4878|Braginsky2019|Why do children learn some words earlier than others? The order in which words are acquired can provide clues about the mechanisms of word learning. In a large-scale corpus analysis, we use parent-report data from over 32,000 children to estimate the acquisition trajectories of around 400 words in each of 10 languages, predicting them on the basis of independently derived properties of the words’ linguistic environment (from corpora) and meaning (from adult judgments). We examine the consistency and variability of these predictors across languages, by lexical category, and over development. The patterning of predictors across languages is quite similar, suggesting similar processes in operation. In contrast, the patterning of predictors across different lexical categories is distinct, in line with theories that posit different factors at play in the acquisition of content words and function words. By leveraging data at a significantly larger scale than previous work, our analyses identify candidate generalizations about the processes underlying word learning across languages.|000|language acquisition, acquisition, variation, 4879|Braginsky2019|Very nice apparently dataset from 10 languages with speech norm data for 400 words. Yet another example why it would be so useful to link this to Concepticon.|000|dataset, age of acquisition, language acquisition, concept list, speech norms, 4880|Trask2000|**contamination** Any unsystematic change in which the form of a linguistic item is irregularly influenced by the form of another item associated with it. For example, the former English *femelle* //fi:məl// became *female* under contamination with its unrelated opposite *male*; English *covert* //kvvət//, a variant of *covered*, has become //kəU'v3:t// by contamination from its unrelated opposite *overt*; Latin *gravis* 'heavy' became popular *grevis* by contamination from *levis* 'light', Basque *bigira* 'watchfulness', a loan from Latin *vigilia*, and the derived verb *bigiratu* 'look at' have become *begira* and *begiratu* by contamination from native *begi* 'eye'; the expected Russian `*`nevjat'* 'nine' has become instead *devjat'* by contamination from the following *desjat'* 'ten'. Mutual contamination is possible: Old French *citeien* and *denzein* have yielded English *citizen* and *denizen* by contaminating each other. Compare **cross**. |72f|definition, contamination, historical lingusitics, terminology 4881|Trask2000|**cross** A word which is derived, not from a single source, but from two sources which have become confused in some way. For example, Basque *bilo* '(a single) hair* cannot derive from the synonymous Latin *pilum* because the form is wrong (we would expect `*`*biro*), while a phonologically perfect source is Latin *villum* 'tuft of hair', which, however, has the wrong meaning. It may well be, then, that the Basque word derives from a cross of the two Latin words, with the form of one but the sense of the other. Compare **contamination**.|77|terminology, definition, historical linguistics 4882|Trask2000|**analogy** (also **analogical change**) Any linguistic change which results from an attempt to make some linguistic forms more similar to other linguistic forms. See the entries preceding this one for the various types of analogy which are distinguished, and see also **four-part analogy, intraparadigmatic analogy*. See Anttila (@1977) for a study.|20|terminology, analogy, definition 4883|Coseriu1973|Pero es necesario distinguir entre tres problemas diversos del cambio linüístico, que a menudo se confunden: a) el problema *racional* del cambio (¿por qué cambian las lenguas?, es decir, ¿por qué no son inmutables?); b) el problema *general* de los cambios, que, como se verá, no es un problema «causal» sino «condicional» (¿en qué condiciones suelen ocurrir cambios en las lenguas?); y c) el problema *histórico* de tal [pb] cambio determinado. Efectivamente, el segundo problema es un problema de lo que se llama «lingüística general»; y, puesto que no hay propiamente una lingüística general», salvo como generalización de lostt resultados de la lingüística histórica, ese problema es una generalización de ciertos aspectos de los problemas del tercer tipo; asimismo, su solución es generalización de varias soluciones de problemas históricamente concretos y, a su vez, como acumulación de lo sabido acerca de los hechos históricos, ofrece hipótesis para la solución de nuevos problemas convretos. |65f|linguistic theory, methodology, philosophy of science, 4884|Coseriu1973|El sentido de esta distinción se aclarará mejor en lo que sigue. Por el momento, la diversidad de los tres problemas puede ilustrarse, hasta cierto punto, mediante una analogía: a) ¿por qué mueren los hombres? (es decir, ¿pro qué no son inmortales?); b) ¿de qué mueren los hombres? (de vejez, de enfermedades, etc.); y c) ¿de qué ha muerto Fulano? El primero de estos problemas es el problema de la racionalidad de la muerte (o sea, de la mortalidad del hombre) y no puede reducirse al segundo.|66:88|example, philosopy of science 4885|Dixon2019|In recent years, a 'phylogenetic' method of language classification has been adapted from biology. This is an artefact, which manipulates a limited set of data in a playful way. It has no relevance with respect to the established discipline of comparative-historical linguistics. |11-7|nice quote, critics, historical linguistics, phylogenetics 4886|Gabelentz2016|Was ich von der einzelsprachlichen Forschung gesagt habe, gilt, wenn anders es richtig ist, nicht nur von dieser oder jener, sondern von allen Einzel- sprachen. Und die Grundsätze der historisch-genealogischen Forschung wollen nicht nur für eine einzelne Sprachfamilie, sondern für alle gelten. Auch waren die Erkenntnisse, zu denen wir gelangten, nicht diesem oder jenem beschränk- ten Sprachgebiete abgewonnen, sondern sie beruhten entweder auf der Natur der Sache oder auf einem möglichst weiten Kreise von Erfahrungen. :translation:`What I said about the investigation of particular languages holds, not only for this or that, but for all particular languages. And the foundations of historical-genealogical investigations do not only want to apply to a particular language family, but to all [language families]. Indeed, the insights which we arrived at, were not derived from this or that limited area of linguistics, but they reflect the nature of the subject or a maximally broad circle of experiences.` |317|Gabelentz, general linguistics, methodology, 4887|Gabelentz2016|Diese Wissenschaft hat das menschliche Sprachvermögen selbst zum Gegenstande. Sie will dies Vermögen begreifen, nicht nur in Rücksicht auf die geistleiblichen Kräfte und Anlagen, aus denen es sich zusammensetzt, son- dern auch, soweit dies erreichbar ist, dem ganzen Umfange seiner Entfaltungen. :translation:`This scientific discipline has the human language capacity itself as its research object. It wants to understand this capacity, not only with respect to the specific abilities of spirit and body, of which it is composed, but also, and as much as this is achievable, in the whole extend of its developments.`|317|Gabelentz, general linguistics, methodology, 4888|Gabelentz2016|Endlich wird immer und immer wieder das Bestreben auftauchen, sei es rück- schliessend, sei es durch apriorische Speculation, ein Bild von dem Urzustande menschlicher Rede zu gewinnen. Der Sehnsucht nach einem Einblicke in die ersten Anfänge alles Seienden kann sich die Wissenschaft nirgends erwehren. :translation:`Finally, there will always be an attempt, be it by concluding from evidence, or by speculating in an a-priori fashion, to derive an image of the original state of human speech. Science can never resist the desire for these insights in the first beginnings of all beings.`|318|language origin, language evolution, nice quote, origin of language, 4889|Atkinson2006|When biologists model evolution, they lie: they lie about the independence of character state changes across 
sites; they lie about the 
homogeneity 
of substitution 
mechanisms; and 
they 
lie about the importance of selection pressure 
on substitution rates. But these are lies that 
lead 
us 
to the truth.|94|modeling, nice quote, 4890|Matisoff2019|This book is a welcome addition to the literature on individual East and Southeast Asian languages, as well as an important validation of the concept of linguistic area. The 13 articles treat languages belonging to the five great families of the re- gion (Sino-Tibetan, Mon-Khmer/Austroasiatic, Tai-Kadai [=Kradai], Hmong-Mien [=Miao-Yao], and Austronesian), with an explicit emphasis on the manifestation of particular areal features that have been discussed in the literature.|000|introduction, handbook, language union, Sprachbund, South-East Asia, 4891|Matisoff2019|The extent to which each of the languages in this volume exemplify these fea- tures is roughly indicated in the following charts: .. image:: static/img/Matisoff2019-XI.png :name: chinese_dialects :width: 500px :comment:`[Structural features in SEA languages]`|XI|structural data, South-East Asia, 4892|Ross2013|This paper addresses the questions, Do bilingually induced and shift-induced change have differ- ent outcomes? If they do, can these differences assist us in reconstructing the prehistoric past, specifically the linguistic prehistory of the (smallscale neolithic) societies of Melanesia. A key to better interpreting differences in the outputs of contact-induced change is to under- stand how such change in smallscale societies actually occurs. I argue that it is important to know the life-stage loci of change. I suggest that language shift has two life-stage loci, one in early childhood, where evidence of shift, if any, is restricted to specialist lexicon, and one in adult- hood. Adult language shift appears to have been rare in Melanesia. I also suggest that bilingually induced change, which entails the syntactic restructuring of one’s heritage language on the model of a second language, takes place among preadolescent children–a claim which is sup- ported by various kinds of evidence. This understanding helps us in turn to interpret the outcomes of contact-induced change and to infer prehistoric events, since adult second-language learning typically leads to simplification, whilst childhood language learning may lead to an increase in complexity.|000|borrowing, language contact, hypothesis, bilingualism, contact-induced language change, 4893|Alves2018|This paper evaluates Chinese lexical data in Shorto's @2006 Proto-Mon-Khmer reconstructions to prevent misapplication of his reconstructions, which in a few dozen instances are based on problematic data that affect or even refute his reconstructions. First, Shorto notes about 20 Chinese items to consider for their comparable semantic and phonological properties. While several are probable Chinese loanwords spread throughout the region, a majority of these are unlikely to be Chinese as they are either Wanderwörter seen in multiple language families with undetermined origins or, in most cases, simply partial chance similarities, and these latter items can thus be removed from consideration in Proto-Austroasiatic reconstructions. Second, Shorto also listed about 50 Vietnamese words as supporting data for proto-Austroasiatic etyma which are either (a) clearly Sino-Vietnamese readings of Chinese characters (about 20 instances) or (b) Early Sino-Vietnamese colloquial borrowings (about 30 instances). Many of those proposed proto-Austroasiatic reconstructions must be reconsidered due to the exclusion of these Sino- Vietnamese items. While excluding such Sino-Vietnamese or Early Sino-Vietnamese items in some cases has no impact on those reconstructions, other exclusions result in slight changes in the reconstructed forms, and in several cases, proposed reconstructions must be entirely excluded as only Vietnamese and one other branch of Austroasiatic are available as comparative evidence. Finally, both the exclusions of proposed attestations (and the clarification of their actual origin) and the hypotheses of regional spread of Chinese words must be considered not only for Proto-Austroasiatic but also in comparative historical linguistic studies in the region.|000|Wanderwort, Mon-Khmer, Vietnamese, Sino-Vietnamese borrowings, Proto-Mon-Khmer, Proto-Austroasiatic, Austro-Asiatic, 4894|Barthel2019|Speech planning is a sophisticated process. In dialog, it regularly starts in overlap with an incoming turn by a conversation partner. We show that planning spoken responses in overlap with incoming turns is associated with higher processing load than planning in silence. In a dialogic experiment, participants took turns with a confederate describing lists of objects. The confeder- ate’s utterances (to which participants responded) were pre-recorded and varied in whether they ended in a verb or an object noun and whether this ending was predictable or not. We found that response planning in overlap with sentence-final verbs evokes larger task-evoked pupillary responses, while end predictability had no effect. This finding indicates that planning in overlap leads to higher processing load for next speakers in dialog and that next speakers do not proac- tively modulate the time course of their response planning based on their predictions of turn end- ings. The turn-taking system exerts pressure on the language processing system by pushing speakers to plan in overlap despite the ensuing increase in processing load.|000|speech planning, dialog, turn taking, experimental study, 4895|BenNCir2015|Identifying non-disjoint clusters is an important issue in clustering re- ferred to as Overlapping Clustering. While traditional clustering methods ignore the possibility that an observation can be assigned to several groups and lead to k exhaustive and exclusive clusters representing the data, Overlapping Clustering methods offer a richer model for fitting existing structures in several applications requiring a non-disjoint partitioning. In fact, the issue of overlapping clustering has been studied since the last four decades leading to several methods in the litera- ture adopting many usual approaches such as hierarchical, generative, graphical and k-means based approach. We review in this paper the fundamental concepts of over- lapping clustering while we survey the widely known overlapping partitional clus- tering algorithms and the existing techniques to evaluate the quality of non-disjoint partitioning. Furthermore, a comparative theoretical and experimental study of used techniques to model overlaps is given over different multi-labeled benchmarks.|000|fuzzy clustering, overview, introduction, 4896|Blench2004|The concept of synthesising linguistics, archaeology and genetics in the reconstruction of the past is becoming a commonplace; but the reality is that each discipline largely pursues its own methods and what little interaction there is remains marginal. Hence many of the questions asked are internal to the discipline, addressed to colleagues, not the larger sphere of understanding the past. China and East Asia in general are a particularly difficult case because so much of the linguistics and archaeology is driven by an emphasis on high culture. Major archaeological texts refer neither to linguistics nor genetics and speculation about the identity of non–Chinese groups mentioned in the texts tends to be unanchored. In addition, ideology surrounding the definition of minorities in China has confused the analysis in genetics papers. This situation has begun to change and a review of the current situation may be useful 1 .|000|synthesis of historical sciences, linguistics, archaeology, genetics, Chinese, peopling of South-East Asia, 4897|Crow2010|Sewall Wright and R. A. Fisher often differed, including on the meaning of inbreeding and random gene frequency drift. Fisher regarded them as quite distinct processes, whereas Wright thought that because his inbreeding coefficient measured both they should be regarded as the same. Since the effective population numbers for inbreeding and random drift are different, this would argue for the Fisher view.|000|overview, Fisher-Writh model, evolutionary theory 4898|Karimi2005|In this thesis, we present a solution to the problem of discovering rules from sequential data. As part of the solution, the Temporal Investigation Method for Enregistered Record Sequences (TIMERS) and its implementation, the TimeSleuth software, are introduced. TIMERS uses the passage of time between attribute observations as justification for judging the causality of a rule set. Given a sorted sequence of input data records, and assuming that the effects take time to manifest themselves, we merge the input records to bring potential causes and effects together in the same record. Three tests are performed using three different assumptions on the nature of the relationship: instantaneous, causal, or acausal. The temporal reversibility of a relationship in time is used to judge the relationship as potentially acausal, while reversibility is considered as evidence for judging the relationship as potentially causal. To visualise the attributes’ influence on each other, the thesis introduces dependence diagrams, which are graphs that connect condition attributes to decision attributes. We performed a series of comparisons between TIMERS and other causality discoverers, and also experimented with both synthetic and real temporal data for the discovery of temporal rules. The results show an improvement in the quality of the rules discovered with TIMERS.|000|rule induction, decision tree, causal inference, 4899|Hubsz2019|The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present an extended version of the ARGweaver algorithm, ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topology and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. We also identify 1% of the Denisovan genome which was likely introgressed from an unsequenced hominin ancestor, and note that 15% of these regions have been passed on to modern humans through subsequent gene flow.|000|gene flow, character mapping, population genetics, introgression, incomplete lineage sorting, 4900|Hill2019|The discovery of sound laws by comparing attested languages is the method which has unlocked the history of European languages stretching back thou- sands of years before the appearance of written records, e.g. Latin p- corre- sponds to English f- (pes, foot; primus, first; plenus, full). Although Burmese, Chinese, and Tibetan have long been regarded as related, the systematic exploration of their shared history has never before been attempted. Tracing the history of these three languages using just such sound laws, this book sheds light on the prehistoric language from which they descend. Written for readers with little linguistic knowledge of these languages, but fully explicit and copiously indexed for the specialist, this work will serve as the bedrock for future progress in the study of these languages.|000|Sino-Tibetan, Chinese, Tibetan, Burmese, sound law, introduction, 4901|Jin2013|The Lei 雷 initial is unique in terms of sound correspondences among the Miao-Yao languages and dialects. Based on the modern reflexes of the reconstructed Lei initial, this paper, along with the traditional view, divides the Miao-Yao languages, altogether six fo them, into two sub-branches. However, distinct from the traditional view, it argues that he Miao sub-branch consists of only one language, whereas the Yao sub-branch includes five languages. By virtue of the comparison of the related languages, this initials is more reasonably reconstructed as `*`sm- instead of `*`phs-. |000|Hmong-Mien, reconstruction, subgrouping, 4902|Gordon2016|Phonological typology is concerned with the study of the distribution and behavior of sounds found in human languages of the world. One thread of typological research in phonology involves defining the range of cross-linguistic variation and the relative frequency of phonological patterns. Another line of investigation attempts to couch these typological observations within theories designed to model and explain the human knowledge of and capacity to acquire phonological systems. Both of these research programs require a cross-linguistic database from which to draw generalizations. They often differ, however, in the ultimate purpose to which the typological data is put to use, a difference that has consequences for the methodology employed by the researcher. Because phonological theory dating back to work by Trubetzkoy (@1939), Hockett (@1955), Jakobson (@1962), Jakobson et al. (@1963) has characteristically been concerned with explaining and modeling cross-linguistic variation, typology has become largely inseparable from most research in phonology, a close bond that is obvious even in casual inspection of the phonology literature (Hyman 2007a). Most chapters in recent handbooks of phonological theory explore particular phonological phenomena, e.g. phoneme inventories, syllable structure, harmony processes, etc., providing an overview of the typology of the relevant phenomenon and a summary of theories designed to account for the range of patterns. One of the current dominant paradigms in phonological theory, Optimality Theory, is well suited to capturing typological variation since it employs a set of competing constraints on phonological well-formedness that can be prioritized differently in different languages (see Chapter 2 for discussion).|000|typology, phonological typology, cross-linguistic study, 4903|Zalizniak2018|The article summarizes the goals and the current state of the Catalogue of sematic shifts (CSSh), its primary notions being those of a semantic shift, which is understood as a relation of cognitive proximity between two linguistic meanings, and a realization of a semantic shift, i.e. one polysemic word or a pair of cognate words of the same language or different languages that act as “exponents” of this relation. The typology of semantic shifts occupies a position at the crossroad of semantic, lexical and grammatical typologies, overlapping each of these areas of study in terms of linguistic data and methods used; however, the domain of CSSh does not coincide with any of them. The framework of CSSh provides the theoretical foundation for identifying recurring cross-linguistic semantic shifts, and collecting them in the Database of Semantic Shifts for further analysis. The article demonstrates that the notion of semantic shift as defined in CSSh is just a formalization of an instrument of linguistic analysis that is already quite common in various areas of linguistics. Semantic shift provides a basis for the notion of semantic parallel used in the historical linguistics and etymology, for motivational models in word-formation, it is a central notion for grammati- calization theory; finally, semantic shift is one of various types of implicit meanings (along with presuppositions and connotations) that shape the “linguistic model of the world”. Linguistic data contained in the Database of Semantic Shifts can be used in all these areas, in order to provide semantic plausibility criteria for linguistic reconstruction, to act as empirical evidence for cognitive mechanisms of linguistic conceptualization, to aid in identifying specific features of the semantic system of a given language or group of languages.|000|semantic shift, semantic change, polysemy, cross-linguistic study, database 4904|Solnit1996|All initial consonants are assumed to have a three-way manner distinction. For obstruents these are voiceless unaspirate, aspirate, and voiced. Teh corresponding manners for sonorants are preglottalized, voiceless, and voiced. In addition, obstruents may be [+/- prenasalized]. |1|Hmong-Mien, Proto-Hmong-Mien, plosive initial, initial 4905|Delson2019|Analysis of two fossils from a Greek cave has shed light on early hominins in Eurasia. One fossil is the earliest known specimen of Homo sapiens found outside Africa; the other is a Neanderthal who lived 40,000 years later. :comment:`See paper by` @Harvati2019|000|homo sapiens, Out-of-Africa, archaelogy, 4906|Harvati2019|Two fossilized human crania (Apidima 1 and Apidima 2) from Apidima Cave, southern Greece, were discovered in the late 1970s but have remained enigmatic owing to their incomplete nature, taphonomic distortion and lack of archaeological context and chronology. Here we virtually reconstruct both crania, provide detailed comparative descriptions and analyses, and date them using U-series radiometric methods. Apidima 2 dates to more than 170 thousand years ago and has a Neanderthal-like morphological pattern. By contrast, Apidima 1 dates to more than 210 thousand years ago and presents a mixture of modern human and primitive features. These results suggest that two late Middle Pleistocene human groups were present at this site—an early Homo sapiens population, followed by a Neanderthal population. Our findings support multiple dispersals of early modern humans out of Africa, and highlight the complex demographic processes that characterized Pleistocene human evolution and modern human presence in southeast Europe.|000|homo sapiens, Out-of-Africa, archaelogy, 4907|Allan2012|Pragmatics is the study of human communication: the choices speakers make to express their intended meaning and the kinds of inferences that hearers draw from an utterance in the context of its use. This Handbook surveys pragmatics from different perspectives, presenting the main theories in pragmatic research, incorporating seminal research as well as cutting-edge solutions. It addresses questions of rational and empirical research methods, what counts as an adequate and successful pragmatic theory, and how to go about answering problems raised in pragmatic theory. In the fast-developing field of pragmatics, this Handbook fills the gap in the market for a one-stop resource on the wide scope of today’s research and the intricacy of the many theoretical debates. It is an authoritative guide for graduate students and researchers with its focus on the areas and theories that will mark progress in pragmatic research in the future.|000|human communication, pragmatics, speaker-listener-model, context 4908|Ariel2012|Two major obstacles blocked the attempts to come up with a coherent definition for pragmatics. First, there was the hope that a multiplicity of criteria simultaneously con- verge to distinguish between all of grammar (including semantics) and all of pragmatics, e.g., context sensitivity, non-truth-conditionality, implicitness, discourse scope (and many others), all characterizing pragmatics, and con- text invariability, truth-conditionality, explicitness, sentential scope (and many others), all characterizing grammar. Naturally, the more criteria we can mobilize for drawing the grammar/pragmatics division of labor, the more contentful each of the defined domains is made out to be, and the more significant the distinction between them. The second high hope was that complete topics, such as speech acts, implicatures, politeness, functional syntax, deixis, presupposition, agreement, and argument structure each, en bloc, belongs either in grammar or in pragmatics. Indeed, this would guar- antee a very neat division of labor between grammar and pragmatics. Thus, speech acts, implicatures, politeness, and functional syntax should wholly belong on the pragmatics turf, agreement and argument structure should wholly belong on the grammatical turf, and semanticists and pragmatists should do battle over, e.g., presupposition and deixis. Unfortunately, neither one of these worthy goals can be achieved. Multi- ple criteria for distinguishing grammar and pragmatics resulted in multiple contradictions between the various criteria, which render the definitions of both grammar and pragmatics incoherent.|000|pragmatics, definition, overview 4909|Ariel2012|Finally, historical/typological pragmatics research too could benefit from integrating inferential pragmatics theories and questions. Proponents must ascertain that they only assume very small steps of grammaticization, each of which is analyzable as depending on a reasonable on-line pragmatic infer- ence. Etymological analyses can serve as excellent pointers to potential paths [pb] of grammaticization, but they cannot replace detailed analyses of the actual inferential steps, leading from one stage to the next. |43f|pragmatics, semantic change, grammaticalization, pragmatic inference, 4910|Traugott2012|Language is always in flux. Over time new patterns can be observed that are either minor modifications to the linguistic system, as when the meaning of a lexical item changes, or major ones, as when word order changes occur. That language change occurs primarily as a result of acquisition is uncontroversial. There are, however, very different theories and discourses about how to interpret this observation. To simplify, one view assumes that change is internal or endogenous, in other words that grammars change [...]. A competing view is that usage changes and language acquisition occurs throughout life. Change is not only internal but also external, driven by social factors and language users who are active participants in negotiation of linguistic patterning, especially meaning: “languages don’t change: people change language” (@Croft2000: 4).|549|pragmatics, language change, 4911|Traugott2012|Granting that the concept of polysemy is problematic (see Tuggy 1993), without polysemy one cannot account for the fine-grained step-by-step developments that are attested by detailed study of texts and contexts over time.|551|polysemy, semantic change, 4912|Grimm1822|Aus dem verhältnis der consonanten geht also genügender beweis einer urverwandtschaft der verglichenen sprachen hervor. Sollte sich, auf es gestützt, nicht zugleich berührungen der *vocale* nachspüren laßen? die analogie zwischen hochd. und gothischem vocalstande nicht zu dem schluße leiten, daß auch latein. vocale mit goth. zusammenhängen müßen. Unsicherer und abgebrochener wird dieser zus.hang schon deßhalb seyn, weil wir in deutschen dialeten derselben consonantenstufe so schwankenden und manigfalitgen vocalen begegnen. Gelichwohl gibt es noch solche unverkennbare ähnlichkeit: [...]. |592|genetic relationship, proof, proof of relationship, correspondence patterns, 4913|Grimm1822|sollten unter den gegebenen beispielen einzelne nch bedenklich udn unasgemacht scheinen, so darf die mehrzahl hauptsächlich wegen analogie der abstufung frür streng erwiesen gelten, die richtigkeit der regel überhaupt ist unverkennbar. Wörtern, in welchen zwei consonanten stimmen [...] sind doppelt sicher; solche in denen ein cons. stimmt, der andere abweicht, verdächtig; noch verdächtiger, deren consonanten unabgestuft in den drein sprachen wirkliche gleichheit zeigten. In diesem fall fehlt entw. alle verwandtschaft [...] oder die eine sprache hat aus der anderen entlehnt [...].|588|borrowing, cognacy, proof of cognacy, correspondence patterns, 4914|Grimm1822|Zur darstellung der laute in sämmtlichen deutschen sprachen bediene ich mich meistentheils der heutigen gangbaren buchstaben, deren unzulänglichkeit für alle fälle leicht einzusehen ist. Sie würden ausreichen, wenn es bloß auf ddie einfachen oder grundlaute ankäme; aber in der mischung und zusammenfügung pflegt sich gerade die mannigfaltigkeit der mundarten zu erweisen. |3|language variation, transcription, letters, 4915|Grimm1822|In unserm worte : schrift z. b. drücken wir acht laute mit sieben zeichen aus, f. nämlich stehet für ph. Das sch würde der Russe ebenfalls mit einem einzigen zeichen, folglich jenes wort mit fünf buchstaben schreiben können. Dergleichen eigene buchstaben zu sp. st. und andern lieblingslauten unserer sprache wären ihr so dienlich as es dem Griechen sein ψ für ps. ist. :translation:`In our word schrift, for example, we express eight sounds with seven signs, since f stands for ph.`|3|letter, orthography, transcription, phonology, nice quote, 4916|Lass1997|Neither phonetic similarity nor semantic (near-) equivalence hold against regularity of correspondence.|129|proof of cognacy, correspondence patterns, regularity, 4917|Szemerenyi1970|Der Abweichung sind keine Schranken gesetzt, solange sie als regelmäßig erwiesen werden kann.|14|sound correspondences, regularity, semantic similarity, proof of cognacy, 4918|Newman1970|The proof of genetic relationship does not depend on the demonstration of historical sound laws. Rather, the discovery of sound laws and the reconstruction of linguistic history normally emerge from the careful comparison of languages already presumed to be related.|39|proof of relationship, genetic relationship, sound laws, sound correspondences 4919|Yakhontov1965|[Но есть] некоторые области лексики, где заимствования почти невозможны, например: местоимения, названия частей тела, родственных отношений, важнейших явлений природы, некоторые наиболее употребительные глаголы и прилагательные и т. п.». :translation:`es [...] einige Bereiche der Lexik [gibt], in denen Entlehnung fast unmöglich ist, bspw. Pronomen, Bezeichnungen für Körperteile, Verwandtschaftsbeziehungen, wichtiger Naturerscheinungen, einige häufig verwendete Verben und Adjektive u.s.w.`|14|borrowing, borrowability, 4920|Meiser1998|[Das] Konzept der genetischen Sprachverwandtschaft, d.h. die Annahme, derzufolge mehrere Sprachen ungeachtet ihres in historischer Zeit mehr oder minder voneinander verschiedenen Erscheinungsbildes von einer gemeinsamen ‚Ursprache’ abstammen können, wie die Mitglieder einer weitverzweigten Familie von gemeinsamen Vorfahren|22|genetic relationship, definition, historical linguistics 4921|Grimm1822|wörter, welche die eine oder die andere sprache nicht besitzt, ließen sich für die neun cons. verhätnisse leicht herstellen, nicht aber in den elementen der vocale, liquiden und spiranten. Alles rathen bleibt also unfruchtbar; wir dürften höchstens behaupten, daß z. b. δάφνη im goth. t-b, im hochd. z-p; φυτὸν goth. b-þ, hochd. p-d haben müßte. Jene neun regeln sind nur prüfstein für vorhandene wörter. |589|word prediction, prediction, retrodiction, history of science, 4922|Grimm1822|die lautverschiebung erfolgt in der masse, thut sich aber im einzelnen niemahls rein ab; es bleiben wörter in dem verhältnisse der alten einrichtung stehn, der strom der neuerung ist an ihnen vorbeigestoßen. Schutz gewährten ihnen zumahl (nicht immer) die verbindung mit den unwandelbaren liquiden und spiranten. :translation:`The sound shift proceeds in the majority, but will never be pure in particular cases. Words remain in their ancient shape, the wind of change has gone past them. They received shelter (not always) from the union with the immutable liquids and spirants.`|590|regularity, regular sound change, Grimm's Law, 4923|Verner1877|Indogerm. *k*, *t*, *p* gingen erst überall in *h*, *þ*, *f* über; die so entstandenen tonlosen fricativae nebst der vom indogermanischen ererbten tonlosen fricativa *s* wurden weiter inlautend by tönender nachbarschaft selbst tönend, erhielten sich aber als tonlose im nachlaute betonter silben. :translation:`Indo-European k, t, p first changed in all places to h, þ, f; these voiceless fricatives along with the voiceless fricative s, inherited from Indo-European, became then voiced inside a word, when being in voiced neighborhood, but stayed voiceless when following after a stressed syllable.`|114|Verner's Law, nice quote, 4924|Ohala1989|By «hidden» I mean rather that speakers exhibit variations in their pronunciation which they and listeners usually do not recognize as variation. When pronunciation is tramsitted, however, the existence of this variation can create ambiguity and lead to the listener's misapprehension of the intended pronunciation norm. A misapprehended pro[pb]nunciation is a changed pronunciation, i.e., sound change.|175f|linguistics, variation, change, reasons for sound change, 4925|Ringe2002|However, the attempt to apply the traditional criteria for sub- grouping rigorously encounters severe practical problems which are too often overlooked or downplayed. Examples from phonology, inflectional morphology and the lexicon can be adduced to illustrate these problems. Traditional subgrouping tends to rely on phonology because phonemic mergers are clearly innovations. But though the set of sound changes in each line of descent is unique, the individual changes are usually so `natural' that they can easily be repeated in di€erent lines of descent; that is, they are the products of phonetic pressures that operate in all languages and frequently give the same results in cases widely separated in space and time. That is true whether one states the changes in phonetic or phonemic terms.|66|subgrouping, problem, cladistics, homoplasy 4926|Ringe2002|The probability of parallel development is thus relatively high for most apparently shared sound changes, and the probability of historically shared development is correspondingly low. Of course [pb] not all sound changes are equally likely to recur repeatedly in historically unconnected cases; some, at least, seem rare enough that it might be worth trying to use them as potential indicators of shared history. Changes that give rise to unusual segment types come immediately to mind, but experience seems to show that changes with unusual constraints on their conditioning environ- ments are much more common and potentially very useful (since odd conditioning environments are not very likely to recur by chance).|67f|homoplasy, parallel evolution, sound change, cladistics, subgrouping 4927|Ringe2002|But if one chooses to use such individual sound changes for subgrouping, one must do so with a clear appreciation of the risks involved, not only because the possibility of parallel develop- ment can never be absolutely excluded, but also because our estimates of the probabilities involved must remain very approxi- mate until we have a fairly complete catalogue of sound changes for at least a few language families and linguistic areas.|68|sound change patterns, regular sound change, sound change typology 4928|Ringe2002|given that backmutation is easily excluded, if all loan- words are coded with unique states and all characters exhibiting parallel development are shelved (temporarily), every state of each remaining character will be convex on the true evolutionary tree.|73|cladistics, innovation, subgrouping, circularity, 4929|Nakhleh2005|Ringe and Taylor attempted to find sound changes and sets of sound changes unlikely to be repeated that are shared by more than one major subgroup of the family; they were able to discover only three plausible candidates, which are our first three phonological characters. The remaining phonological characters define various uncontroversial subgroups of the family. It would have been possible to find many more such phonological characters and/or to use even larger sets of sound changes for some of them, but nothing would have been gained. Because so few probably unrepeatable sound changes are shared by more than one uncontroversial subgroup, the question of whether the characters chosen might favor or disfavor construction of a phylogenetic tree did not arise.|394|sound change, regular sound change, cladistics, character coding, 4930|BerezKroeker2018|Second, we realize that some linguists may be reluctant to share data for personal (as opposed to ethical) reasons, and such an attitude is hardly surpris- ing given that data sharing may not previously been standard practice in the subfields many of us work in. We can only encourage such researchers to carefully evaluate the reasons for their reticence in the light of the discussion in this paper, to potentially reconsider whether their concerns are valid, and to bring any concerns into public light so that future policies and public debate on data sharing issues can take them into account.|15|reproducibility, data sharing, 4931|List2014d|Both the UPGMA algorithm and the Neighbor-Joining algorithm produce evolutionary trees from distance data and their original purpose is to explain observed distances between a set of taxonomic units (species, genomes) as an evolutionary process of split and divergence. The amount of divergence is displayed by the branch lengths of the evolutionary tree. Conceptually, both algorithms are quite different. UPGMA assumes that the distances are ultrametric. This means that evolutionary change should be constant along all branches of the tree. Divergence (as represented by the branch lengths) re- ceives a direct temporal interpretation: UPGMA trees have a definite root, and all leaves (terminal nodes) have the same distance to it (@Peer<2009> 2009: 144). If the observed distances between the taxonomic units can be interpreted in such a way, UPGMA will produce a tree which directly reflects the distance [pb] matrix: The sum of the branch lengths connecting any two taxa will be the same as their pairwise distance in the distance matrix.|101f|UPGMA, ultrametricity, algorithm, definition, 4932|List2014d|Neighbor-Joining, on the other hand, allows for varying divergence rates. A direct temporal interpretation of the results is therefore not possible. Branch lengths only indicate the degree to which a node has diverged from its ancestral node. Neighbor- Joining assumes that the observed pairwise distances between the taxonomic units are additive. In simple terms, a distance matrix is additive, if there is an unrooted tree which directly reflects the pairwise distances between all taxonomic units (Peer 2009: 148).|102|Neighbor-Joining, algorithm, additivity, definition 4933|Bybee2019|Given the common intuition that consonant lenition occurs more often than fortition, we formulate this as a hypothesis, defining these sound change types in terms of decrease or increase in oral constriction. We then test the hypothesis on allophonic processes in a diverse sample of 81 languages. With the hypothesis confirmed, we examine the input and output of such sound changes in terms of manner and place of articulation and find that while decrease in oral constriction (weakening) affects most consonant types, increase in oral constriction (strengthening) is largely restricted to palatal and labial glides. We conclude that strengthening does not appear to be the simple inverse of weakening. In conclusion we suggest some possible avenues for explaining how glide strengthening may result from articulatory production pressures and speculate that strengthening and weakening can be encompassed under a single theory of sound change resulting from the automatization of production.|000|missing code, missing data, phoneme inventory, lenition, fortition, sound change, 4934|Raviv2019|Understanding worldwide patterns of language diversity has long been agoal for evolutionary scientists, linguists and philosophers. Research overthe past decade has suggested that linguistic diversity may result fromdifferences in the social environments in which languages evolve. Specifi-cally, recent work found that languages spoken in larger communitiestypically have more systematic grammatical structures. However, in thereal world, community size is confounded with other social factors suchas network structure and the number of second languages learners in thecommunity, and it is often assumed that linguistic simplification is drivenby these factors instead. Here, we show that in contrast to previous assump-tions, community size has a unique and important influence on linguisticstructure. We experimentally examine the live formation of new languagescreated in the laboratory by small and larger groups, and find that largergroups of interacting participants develop more systematic languages overtime, and do so faster and more consistently than small groups. Smallgroups also vary more in their linguistic behaviours, suggesting that smallcommunities are more vulnerable to drift. These results show that commu-nity size predicts patterns of language diversity, and suggest that an increasein community size might have contributed to language evolution.|000|community size, community structure, linguistic complexity, artificial language learning, experimental study, 4935|Wilkins1996|In general, the criteria of formal reconstruction can be strict because they stem from precise rules that cannot be set aside unless one is in a position to substitute more exact rules for them. The whole apparatus of phonetics and morphology enters in to sustain or refute these endeavours. But when it is a matter of meaning, one has as a guide only a certain probability based on common sense, on the personal evaluation of the linguist, and on the parallels that he can cite. Teh problem is always, at all levels of analysis, within just one language or at different stages ofa comparative reconstruction, to determine if and how two morphemes which are formally identical or similar can be shown to coincide in meaning. (@Benveniste1971: 249)|000|semantic shift, semantic reconstruction, linguistic reconstruction, 4936|Kraemer2019|Human mobility is an important driver of geographic spread of infectious pathogens. Detailed information about human movements during outbreaks are, however, difficult to obtain and may not be available during future epidemics. The Ebola virus disease (EVD) outbreak in West Africa between 2014–16 demonstrated how quickly pathogens can spread to large urban centers following one cross-species transmission event. Here we describe a flexible transmission model to test the utility of generalised human movement models in estimating EVD cases and spatial spread over the course of the outbreak. A transmission model that includes a general model of human mobility significantly improves prediction of EVD’s incidence compared to models without this component. Human movement plays an important role not only to ignite the epidemic in locations previously disease free, but over the course of the entire epidemic. We also demonstrate important differences between countries in population mixing and the improved prediction attributable to movement metrics. Given their relative rareness, locally derived mobility data are unlikely to exist in advance of future epidemics or pandemics. Our findings show that transmission patterns derived from general human movement models can improve forecasts of spatio-temporal transmission patterns in places where local mobility data is unavailable.|000|disease spread, human mobility, spatial modeling, geographic models, 4937|Trubeckoj1923|Таким образом, языки есть непрерывная цепь говоров, постепенно и незаметно переьодящих один в другой. Языки в свою очередь объединияются друг с другом в „семейства”, внутри которых можно различать „ветви”, подветви” и.т.д. В пределах каждой такой единизцы деления, отдельные языки распологатся так же как говоры в пределах языка, т.е. каждый язык данной ветви, кроме черт характерных для него одного и черт характерных для всей ветви, имеет и черты, сближающие его специально с одним из других языков этой ветвы, другие черты, сближающие его с другими языком той же ветви и.т.д. при чем очень часто между родственными языками существуют переходные говоры. :translation:`Daher stellen Sprachen eine ununterbrochene Kette von Dialekten dar, die allmählich und unbemerkt ineinander übergehen. Sprachen hingegen vereingen sich zu “Familien”, inner- halb derer man “Zweige”, “Unterzweige”, u.s.w. identifizieren kann. Innerhalb der Gren- zen jeder derartigen Unterteilung verteilen sich die unterschiedlichen Sprachen genauso wie die Dialekte innerhalb der Grenzen der Sprache, d. h. jede Sprache eines bestimmten Zweiges besitzt, abgesehen von den Eigentschaften, die charakteristisch für sie selbst und denen, die charakteristisch für den gesamten Sprachzweig sind, auch Eigenschaften die sie mit einer der Sprachen dieses Zweiges verbinden, und andere, die sie an eine andere Sprache desselben Zweiges annähern, u.s.w., wobei es häufig unter genetisch verwandten Sprachen auch Übergangsdialekte gibt.`|115|dialect boundaries, language union, Sprachbund, 4938|Trubeckoj1923|Так складываются отнощения языковых единиц, объединяюшихся генетически, т.е. восходящих к диалектам некогда единого „праязыка” данной генетической группы (семейства, ветви, подветви и.т.д). Но кроме таякой генетической группировки, географически соседящие друг с другом языки часто группируются и независимо от своего происхождения. Случается, что несколько языков одной и той же геограпфической и культуроисторической области обнаруживают черты специального сходства, несмотря на то, что сходство это не обусловлено общим происхождением, а только продольным соседством и параллельным развитием. Для таких групп, основанных не на генетическом принципе, мы предлагаем название „языковых союзов”. Такие „языковые союзы” существенно не только между отдельными языками, но и между языковыми семействами, т.е. слычается, что несколько семейств генетически друг с другом не родственных, но распространенных в одной географической и култорноисторической зоне, целым рядом общих черт объединяются в „союз языковых семейств. :translation:`So bilden sich Beziehungen sprachlicher Einheiten heraus, die sich genetisch vereinigen, d.h. dass sie entstanden sind aus den Dialekten einer einheitlichen “Ursprache” einer be- stimmten Gruppe (Familien, Zweige, Unterzweige, u.s.w.). Aber außer einer solchen ge- netischen Gruppierung, lassen sich auch geographisch benachbarte Sprachen untereinan- der unabhängig von ihrer Herkunft gruppieren. Es kann vorkommen, dass einige Spra- chen derselben geographischen und kulturhistorischen Sphäre Eigenschaften von spezi- eller Ähnlichkeit entwickeln, ungeachtet dessen, dass diese Ähnlichkeiten nicht auf ge- meinsamen Ursprung zurückzuführen sind, sondern ledigich auf länger andauernde Nach- barschaft und parallele Entwicklungen. Für solche Gruppen, die nicht durch das geneti- sche Prinzip begründet werden, schagen wir den Begriff “Sprachbund” vor. Derartige “Sprachbünde” bestehen nicht nur zwischen bestimmten Sprachen, sondern auch zwi- schen Sprachfamilien, d.h. es kommt vor, dass sich einige Familien, die genetisch mit- einander nicht verwandt sind, aber in einer geographischen und kulturhistorischen Zone glegen sind, aufgrund einer ganzen Reihe gemeinsamer Eigenschaften zu einem “Sprach- familienbund” vereinigen lassen.`|115f|classification, areal linguistics, language union, Sprachbund, 4939|Ross2013|The title of this section is an allusion to a section in Lass (@1997: 209–214) with the title ‘Etymologia ex silentio: contact with lost languages’. Here Lass dis- cusses lexical items in European languages which cannot be reconstructed back to Proto Indo-European and which, according to a plausible inference, were borrowed into a reconstructed interstage from a source for which we no longer have direct evidence.|11|language contact, evidence, lost language, loss 4940|Ross2013|The heat generated by these proposals suggests that some scholars find even strong circumstantial evidence of contact unacceptable, apparently because of a sense that internal sources of change should be exhausted before contact is considered. They are unwilling to accept toponymic, lexical or morphosyntac- tic evidence of contact unless data from the original contact languages is avail- able for comparison. But, as Lass insists, the idea that internal sources of change should always be preferred over contact has no evidentiary basis.|12|contact, evidence, methodology, 4941|Ross2013|This brings me to a major issue. The usual approach of most contact lin- guists, myself included, has been to try to relate each contact situation type to certain outcomes, without considering in much detail the processes whereby [pb] these outcomes result from the relevant situation, i.e. from bilingually induced change or from language shift. The result has been some disagreement about what outcomes are triggered by what kinds of contact. It seems to me that progress can be made in this regard if we attempt to model what actually hap- pens in language change situations rather than treating them as something of a black box.|12d|modeling, language contact, bilingualism, types of language contact 4942|Aitchison1981|Babies do not form influential social groups. Changes begin within social groups, when group members unconsciously imitate those around them. Differences in the speech forms of parents and children probably begin at a time when the two generations identify with different social sets. [cited after @Ross2013 13]|180|actuation problem, sociolinguistics, 4943|Ross2013|One of very few variationist studies of smallscale dialect contact is James Stanford’s work on Sui, a Kadai language of Guizhou Province in SE China. Each Sui village has its own clanlect, which is spoken by men and unmarried women. Marriage normally involves partners from different villages, and the wife relocates to her husband’s village. As a result, husband and wife often speak different clanlects. @Stanford<2008> (2008b) investigated whether the wives accommodated to the clanlect of their new village, and found that they don’t: they retain their native clanlect. The Sui place strong value on the clanlect and on its retention, and women who accommodate to their husband’s clanlect are objects of ridicule. This contradicts the standard claim that speakers always accommodate to the dialect spoken around them. But the interesting thing is what happens to Sui children. Young children inevitably acquire features of Mum’s clanlect, but this gives way quite quickly to Dad’s lect—the lect of the village—as Mum’s clanlect invites ridicule. Children as young as three already speak Dad’s lect, but with interferences from Mum’s lect, and even adolescents of 15-16 may still produce Mum’s version of a lexical item (Stanford 2008a, 2008b:469–471). But by adulthood their speech only very rarely betrays their mother’s clanlect.|16|sociolinguistics, language change, language contact, language interference, 4944|Ross2013|Prototypical sequence of events in bilingual copying: a. lexical calquing (loan translation) b. grammatical calquing c. syntactic restructuring = metatypy |23|bilingual copying, bilingualism, language contact, 4945|Ross2013|Lexical calquing has been much discussed in the literature since Weinreich (1953). It is also obvious from listening to, say, conversations in German between German immigrants in Australia, that lexical calquing is something that adults also do.|23|definition, lexical calquing, loan translation, 4946|Ross2013|Grammatical calquing apparently occurs by two routes. The first mimics a construction in the model language by translating its morphemes more or less one-for-one into the recipient language. It begins as lexical calquing that includes copying the valency of a model-language item onto its perceived cor- respondent in the recipient language.|23|grammatical calquing, loan translation, grammatical interference, language contact, definition 4947|Ross2013|[...] adults innovate on a one-lexical-item-at-a-time basis, but they do not innovate new constructional patterns. These are new patterns which probably arose on the lips of preadolescent speakers [...].|26|life stage, language contact, adults, preadolescence, 4948|Ross2013|Syntactic restructuring, on the other hand, also entails copying syntax from the model language. Syntactic restructuring is considerably less common than the stages that precede it. One reason for this may be that where the languages in contact have radically different gram- matical systems, syntactic restructuring is obstructed. But how different must the systems be for there to be obstruction? And under what sociolinguistic conditions might obstruction be overcome? These are questions about which the contact literature at present says little.|26|syntactic restructuring, bilingual calquing, bilingual interference 4949|Ross2013|To summarise, whilst adults are involved in the first stage of bilingual copy- ing, namely lexical calquing, the evidence strongly indicates that grammatical calquing and restructuring are innovations made during preadolescence and propagated in adolescence.|27|adulthood, life stage, lexical interference, borrowing, calcquing, loan translation, model 4950|Schmidt1872|Die ursprache bleibt demnach bis auf weiteres, wenn wir sie als ganzes betrachten, eine wissenschaftliche fiction. Die forschung wird durch diese fiction allerdings wesentlich erleichtert, aber ein histtorisches individuum ist das, was wir heute ursprache nennen dürfen, nicht.|31|nature of the proto-language 4951|Nicholls2008|Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait that is visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data and present a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth–death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model misspecification, through analysis of predictive distribu- tions for external data, and fitting data that are simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whereas posterior probabilities for topology are not reliable.|000|binary state models, cognate coding, cognate sets, gain-loss models, 4952|Ryder2011|Nicholls and Gray have described a phylogenetic model for trait data. They used their model to estimate branching times on Indo-European language trees from lexical data. Alekseyenko and co-workers extended the model and gave applications in genetics. We extend the inference to handle data missing at random. When trait data are gathered, traits are thinned in a way that depends on both the trait and the missing data content. Nicholls and Gray treated missing records as absent traits. Hittite has 12% missing trait records. Its age is poorly predicted in their cross-validation. Our prediction is consistent with the historical record. Nicholls and Gray dropped seven languages with too much missing data. We fit all 24 languages in the lexical data of Ringe and co-workers. To model spatiotemporal rate heterogeneity we add a catastrophe process to the model. When a language passes through a catastrophe, many traits change at the same time. We fit the full model in a Bayesian setting, via Markov chain Monte Carlo sam- pling. We validate our fit by using Bayes factors to test known age constraints. We reject three of 30 historically attested constraints. Our main result is a unimodal posterior distribution for the age of Proto-Indo-European centred at 8400 years before Present with 95% highest posterior density interval equal to 7100–9800 years before Present.|000|binary state models, binary trait data, cognate sets, cognate coding, phylogenetic reconstruction, gain-loss models, 4953|Rama2019|We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference. Our results show that the methods take less than a few minutes to process language families that have so far required large amounts of time and computational power. Moreover, the cognates and the trees inferred from the method are quite close, both to gold standard cognate judgments and to expert language family trees. Given its speed and ease of application, our framework is specifically useful for the exploration of very large datasets in historical linguistics.|000|cognate detection, Bayesian inference, performance, 4954|Searle1976|There are at least a dozen linguistically significant dimensions of differences between illocutionary acts. Of these, the most important are illocutionary point, direction of fit, and expressed psychological state. These three form the basis of a taxonomy of the fundamental classes of illocutionary acts. The five basic kinds of illocutionary acts are: representatives (or assertives), directives, commissives, expressives, and declarations. Each of these notions is defined. An earlier attempt at constructing a taxonomy by Austin is defective for several reasons, especially in its lack of clear criteria for distinguishing one kind of illocutionary force from another. Paradigm performative verbs in each of the five categories exhibit different syntactical properties. These are explained.|000|speech act, illocutionary act, classification, taxonomy 4955|Aikhenvald2010|Second person imperatives—the prototypical commands—can be expressed with a variety of means. These include affixes, clitics, particles, special forms of pronouns, and even periphrastic constructions. Quite frequently, a bare root or stem of a verb marks a command. Synthetic languages tend to mark imperatives with inflectional means. And isolating and highly analytic languages will employ particles (short independent function words) as com- mand markers.|18|imperative, command, typology, linguistic typology, typological study, 4956|Camiciotti2008|The purpose of this study is to shed light on the history of two speech acts: directives and commissives in a specific domain: the epistolary discourse of nineteenth century international traders. These two speech acts are closely related as they share the same direction of fit: world-to-words. The illocutionary point of directives lies in the fact that they are attempts by the speaker to get the hearer to do something, while commissives are those illocutionary acts the point of which is to commit the speaker to some course of action (@Searle<1976> 1976).|000|speech act, historical linguistics, pragmatics, language change, English, p-linguistics 4957|Pinker2007|This paper proposes a new analysis of indirect speech in the framework of game theory, social psychology, and evolutionary psychology. It builds on the theory of Grice, which tries to ground indirect speech in pure rationality (the demands of efficient communication between two cooperating agents) and on the Politeness Theory of Brown and Levinson, who proposed that people cooperate not just in exchanging data but in saving face (both the speaker’s and the hearer’s). I suggest that these theories need to be supplemented because they assume that people in conversation always cooperate. A reflection on how a pair of talkers may have goals that conflict as well as coincide requires an examination of the game-theoretic logic of plausible denial, both in legal contexts, where people’s words may be held against them, and in everyday life, where the sanctions are social rather than judicial. This in turn requires a theory of the distinct kinds of relationships that make up human social life, a consideration of a new role for common knowledge in the use of indirect speech, and ultimately the paradox of rational ignorance, where we choose not to know something relevant to our interests.|000|indirect speech, game theory, evolutionary psychology, Grice, 4958|Halliday1978|For much of the past twenty years linguistics has been dominated by an individualistic ideology, having as one of its articles of faith the astonishing dictum, first enunciated by Katz and Fodor, in a treatise on semantics which explicitly banished all reference to the social context of language, that «nearly every sentence is uttered for the first time». Only in a very special kind of social context could such a claim be taken seriously -- that of a highly intellectual and individual conception of language in which the object of study was the idealized sentence of an equally idealized speaker. :comment:`Quoted after` @Pratt1986 :comment:`p. 59`|4|ideal speaker, ideal hearer, Chomsky syntax, ideology, nice quote 4959|Pratt1986|Speech-act theory has been seen as a corrective to the abstract, individualized concept of language criticized by Halliday. It has offered many people, in linguistics, philosophy, criticism, psy-[pb]chology, even law, a way to move out of the realm of language as autonomous, self-contained grammatical system into the realm of language as social practice. This was the move that followed from Austin's original insight that not all utterances could be accounted for by truth-conditional logic.|59f|speech act, Chomsky syntax, pragmatics, history of science, 4960|Pereltsvaig2015|Over the past decade, a group of prolific and innovative evolutionary biologists has sought to reinvent historical linguistics through the use of phylogenetic and phylogeographical analysis, treating cognates like genes and conceptualizing the spread of languages in terms of the diffusion of viruses. Using these techniques, researchers claim to have located the origin of the Indo-European language family in Neolithic Anatolia, challenging the near-consensus view that it emerged in the grasslands north of the Black Sea thousands of years later. But despite its widespread celebration in the global media, this new approach fails to withstand scrutiny. As languages do not evolve like biological species and do not spread like viruses, the model produces incoherent results, contradicted by the empirical record at every turn. This book asserts that the origin and spread of languages must be examined primarily through the time-tested techniques of linguistic analysis, rather than those of evolutionary biology.|000|Indo-European, discussion, methodology, phylogenetic reconstruction 4961|Koch2011|Le DECOLAR est un dictionnaire onomasiologique et diachronique dont l’objet est de décrire l’origine des dénominations des parties du corps humain en quatorze langues et idiomes romans et d’examiner leur genèse. |000|dictionary, concept-based perspective, onomasiological approach, French, Romance, morphology, 4962|Koch2011|This manual is very important in the context of the work of @Gevaudan2007, whose model of affiliation, based on displaying three aspects of the sign, namely the stratic, the semantic, and the morphological dimension is crucial for historical linguistics.|000|concept-based perspective, dictionary, Romance, manual, 4963|Zhang2019b|Gyalrongic languages, a subgroup of the Burmo-Qiangic branch of the Sino-Tibetan family, are spoken in the Western Sichuan Province of China. They are polysynthetic languages, and present rich verbal morphology. Although they are not closely related to Chinese, they are of particular interest for Sino-Tibetan/Trans-Himalayan comparative linguistics with regards to their conservative phonology and morphology. Based on previous studies on Old Chinese phonology, combining with recent fieldwork data, this paper aims to show how Gyalrong languages could shed light on Old Chinese morphology and thus contribute to the Old Chinese reconstruction. It also proposes a list of possible cognates between Old Chinese, Gyalrong languages, indicating also Tibetan cognates when available|000|Rgyalrong, Old Chinese, cognate detection, etymology 4964|Nouri2016b|This paper presents a method for linking models for aligning linguistic etymologi- cal data with models for phylogenetic in- ference from population genetics. We be- gin with a large database of genetically re- lated words—sets of cognates—from lan- guages in a language family. We pro- cess the cognate sets to obtain a complete alignment of the data. We use the align- ments as input to a model developed for phylogenetic reconstruction in population genetics. This is achieved via a natu- ral novel projection of the linguistic data onto genetic primitives. As a result, we induce phylogenies based on aligned lin- guistic data. We place the method in the context of those reported in the literature, and illustrate its operation on data from the Uralic language family, which results in family trees that are very close to the “true” (expected) phylogenies.|000|phonetic alignment, etymological cognacy, phylogenetic inference, distance-based methods, 4965|Sloos2019|The low-mid unrounded front vowel /ɛː/ in German (as in Bären) has been subject to change since Old High German. It slowly merged with the high-mid unrounded front vowel /eː/, but a reversal seems to have emerged recently. This paper investigates both historical and current change of the B ÄREN vowel. Historical change is investigated through literature-based research; current change is examined through corpus- based research. This paper takes the approach of studying both grammatical context and frequency of use. The two major insights of this study are (i) that the B ÄREN vowel has been subject to change for a long time and is still variable, and (ii) that frequency effects interact with grammar in an unexpected way. This interaction shows us how to proceed with hybrid grammar-lexicon modelling and I advocate a combined model of Optimality Theory and Exemplar Theory to account for this type of grammar-frequency interactions.|000|German, vowels, sound change, lexical diffusion 4966|Sloos2019|Paper is interesting because it deals with the well-known phenomenon of the pronunciation of long /ä/ in German. |000|vowels, German, lexical diffusion, pronunciation, language variation, sound change 4967|Nikulin2019|In this work, I examine the sound correspondences between Proto-Cerrado (Nikulin 2017) and Proto-Southern Jê (Jolkesky 2010) and offer a phonological reconstruction of Proto-Jê, the proto-language of the most diverse subgrouping within the Macro-Jê language stock. I reconstruct 11 consonants and 19 vowels for this proto-language. I also claim that */CrVC/ was the maximal syllable structure in Proto-Jê with some further restrictions on its complex onsets (only */pr, mr, kr, ŋr/ were allowed). I reconstruct a shielding allophony pattern to Proto-Jê, according to which nasal onsets would have had post-oralized allophones before oral nuclei. The discussion on Proto-Jê phonology is followed by a sample of Proto-Jê lexicon.|000|Proto-Yê, South American languages, linguistic reconstruction, 4968|Orlandi2019|The Ng Yap (formerly Sze Yap) dialects are routinely considered a branch of the Yue subfam- ily. This paper seeks to demonstrate that, contrary to this widespread opinion, these dialects show a wide range of distinctive features which, for formal purposes of language/dialect classification, may warrant their separation from the Yue subfamily. This paper also discusses the criteria which are often at the basis of language subgrouping in the field of Chinese linguistics. Nevertheless, this work should be regarded only as an attempt of stimu- lating a further discussion into a topic which has been overlooked for far too long.|000|structural data, missing data, Sinitic, Chinese dialectology, dialect classification, 4969|Starostin2019|In this paper, I attempt to compare the relative rates of replacement of basic vocabulary items (from the 100-item Swadesh list) over four specific checkpoints in the history of the Chinese language: Early Old Chinese (as represented by documents such as The Book of Songs), Classic Old Chinese, Late Middle Chinese (represented by the language of The Record of Linji), and Modern Chinese. After a concise explication of the applied methodology and a detailed presentation of the data, it is shown that the average rates of replacement between each of these checkpoints do not significantly deviate from each other and are generally compatible with the classic «Swadesh constant» of 0.14 loss per millennium; furthermore, these results correlate with other similar observed situations, e.g. for the Greek language, though not with others (Icelandic). It is hoped that future similar studies on the lexical evolution of languages with attested written histories will allow to place these observations into a more significant context.|000|Chinese, Middle Chinese, basic words, compounding, methodology, 4970|Starostin2019|All four wordlists have been published online as part of the Sinitic 100-item wordlist database, included in the Global Lexicostatistical Database framework ( http://starling.rinet.ru/new100 ); in addition to the words themselves, the database includes plenty of annotations and comments, such as precise references to sources, quotations of contexts from which the items have been elicited, and (sometimes highly detailed) explanations on why certain synonyms were pre-[pb] ferred over others. This section of the paper represents a seriously condensed, but also par- tially reworked variant of that part of the database, with all the words rearranged in order of their relative historical stability.|161f|data statement, fairness, Old Chinese, dataset 4971|Lass1997|To say that OE fearh ‘(young) pig’ is a descendant of IE */pork-o-s/ is not the same as saying that it is a descendant of */♥♦♣♠-♦-◙/.|271|abstractionist-realist debate, nice quote, realism, linguistic reconstruction 4972|Dacrema2019|This is in particular problematic when sourcecode and data are not shared. While we observe an increasing trendthat researchers publish the source code of their algorithms, thisis not the common rule today even for top-level publication out-lets. And even in cases when the code is published, it is sometimesincomplete and, for instance, does not include the code for data pre-processing, parameter tuning, or the exact evaluation procedures [...].|2|reproducibility, data sharing, code sharing, missing code, missing data, 4973|Dacrema2019|Deep learning techniques have become the method of choice forresearchers working on algorithmic aspects of recommendersys-tems. With the strongly increased interest in machine learning ingeneral, it has, as a result, become difficult to keep track of whatrepresents the state-of-the-art at the moment, e.g., for top-n rec-ommendation tasks. At the same time, several recent publicationspoint out problems in today’s research practice in applied machinelearning, e.g., in terms of the reproducibility of the results or thechoice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of al-gorithmic proposals for top-n recommendation tasks. Specifically,we considered 18 algorithms that were presented at top-level re-search conferences in the last years. Only 7 of them could be re-produced with reasonable effort. For these methods, it howeverturned out that 6 of them can often be outperformed with compa-rably simple heuristic methods, e.g., based on nearest-neighbor orgraph-based techniques. The remaining one clearly outperformedthe baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds lighton anumber of potential problems in today’s machine learning schol-arship and calls for improved scientific practices in this area.|000|deep learning, performance, missing data, missing code, reproducibility, neural networks, machine learning, 4974|Dacrema2019|Different factors contribute to such phenomena, including (i) weak baselines; (ii) establishment of weak methods as new base-lines; and (iii) difficulties in comparing or reproducing results across papers. One first problem lies in the choice of the baselines that are used in the comparisons. Sometimes, baselines are chosen that are too weak in general for the given task and dataset, and some-times the baselines are not properly fine-tuned. Other times, base-lines are chosen from the same family as the newly proposed algorithm, e.g., when a new deep learning algorithm is compared only against other deep learning baselines. This behaviour enforces the propagation of weak baselines. When previous deep learning algorithms were evaluated against too weak baselines, the new deep learning algorithm will not necessarily improve over strong non-neural baselines. Furthermore, with the constant flow of papers being published in recent years, keeping track of what represents a state-of-the-art baseline becomes increasingly challenging.|1|baseline, methodology, evaluation, neural networks, machine learning, 4975|Dacrema2019|Very important paper showing on the one hand that deep learning algorithms may not actually be as good as people claim, while at the same time also demonstrating that scholars usually do not share data and code.|000|data sharing, code sharing, baseline, evaluation, deep learning, performance, reproducibility, 4976|Daniels2019|Historical Glottometry is a method, recently proposed by Kalyan and François (@Francois<2014> 2014; @Kalyan<2018> & François 2018), for analyzing and repre- senting the relationships among sister languages in a language family. We present a glottometric analysis of the Sogeram language family of Papua New Guinea and, in the process, provide an evaluation of the method. We focus on three topics that we regard as problematic: how to handle the higher incidence of cross-cutting isoglosses in the Sogeram data; how best to handle lexical innovations; and what to do when the data do not allow the analyst to be sure whether a given language underwent a given innovation or not. For each topic we compare different ways of coding and calculating the data and suggest the best way forward. We conclude by proposing changes to the way glottometric data are coded and calculated and the way glottometric results are visualized. We also discuss how to incorporate His- torical Glottometry into an effective historical-linguistic research workflow.|000|historical glottometry, critics, Sogeram languages, Papua New Guinea, evaluation 4977|Xia2019|The early history of the Hmong-Mien language family and its speakers is elusive. A good variety of Hmong-Mien-speaking groups distribute in Central China. Here, we report 903 high-resolution Y-chromosomal, 624 full-sequencing mitochondrial, and 415 autosomal samples from 20 populations in Central China, mainly Húnán Province. We identify an autosomal component which is commonly seen in all the Hmong-Mien-speaking populations, with nearly unmixed composition in Pahng. In contrast, Hmong and Mien respectively demonstrate additional genomic affinity to Tibeto-Burman and Kra-Dai speakers. We also discover two prevalent uniparental lineages of Hmong-Mien speakers. Y-chromosomal haplogroup O2a2a1b1a1b-N5 diverged ~2,330 years before present (BP), approximately coinciding with the estimated time of Proto-Hmong-Mien (~2,500 BP), whereas mitochondrial haplogroup B5a1c1a significantly correlates with Pahng and Mien. All the evidence indicates a founding population substantially contributing to present-day Hmong-Mien speakers. Consistent with the two distinct routes of agricultural expansion from southern China, this Hmong-Mien founding ancestry is phylogenetically closer to the founding ancestry of Neolithic Mainland Southeast Asians and present-day isolated Austroasiatic-speaking populations than Austronesians. The spatial and temporal distribution of the southern East Asian lineage is alsocompatible with the scenario of out-of-southern-China farming dispersal. Thus, our finding reveals an inland-coastal genetic discrepancy related to the farming pioneers in southern Chinaand supports an inland southern China origin of an ancestral meta-population contributing toboth Hmong-Mien and Austroasiatic speakers.|000|Hmong-Mien, population genetics, homeland, 4978|Meillet1925|[La] restitution d’une « langue commune » dont le chinois, le tibétain, etc., par exemple, seraient des formes postérieures, se heurte à des obstacles quasi invincibles. :translation:`The reconstruction of a proto-language of which Chinese, Tibetan, etc., are the descendants, encounters almost unsurmountable obstacles.`|26f|Sino-Tibetan, nice quote, Antoine Meillet, 4979|Zimmermann2019|Word2Vec models are used to study the semantic chain shift FOOD>MEAT>FLESH in the history of English, c. 1475-1925. The development stretches out over a long time, starting before 1500, and may possibly be continuing to this day. The semantic changes likely proceeded as a push chain.|000|Word2Vec, word embeddings, semantic shift, 4980|Hauer2019|We propose cognate projection as a method of crosslingual transfer for inflection generation in the context of the SIGMORPHON 2019 Shared Task. The results on four language pairs show the method is effective when no low-resource training data is available.|000|cognate detection, inflection, computational linguistics, 4981|Givon1999| 1. Language as biological adaptation 2. The bound of generativity and the adaptive basis of variation 3. The demise of competence 4. Human language as an evolutionary product 5. An evolutionary account of language processing rates 6. The diachronic foundations of language universals 7. The neuro-cognitive interpretation of ‘context’: Anticipating other minds 8. The grammar of the narrator’s perspective in narrative fiction 9. The society of intimates 10. On the ontology of academic negativity 11. Epilogue: Epilogue: Joseph Greenberg as a theorist|000|evolution, language evolution, language history, biolinguistics, typology, 4982|Bondy1976| 1. Graphs and subgraphs 2. Trees 3. Connectivity 4. Euler tours and Hamilton cycles 5. Matchings 6. Edge colourings 7. Independent sets and cliques 8. Vertex colourings 9. Planar graphs 10. Directed graphs 11. Networks 12. The cycle space and bond space|000|graph theory, introduction, 4983|Schapper2019|This paper investigates smell/kiss colexification, the lexical semantic associ- ation of transitive verbs of smelling with verbs expressing certain types of con- ventionalised gestures of greeting and/or affection (i.e., kissing). Whilst found sporadically in the languages of the world, smell/kiss colexification is com- mon in languages of all families of Southeast Asia. The prevalence of the lex- ical association reflects an ancient, endemic Southeast Asian practice in which kissing involves the nose, rather than the mouth, as the primary organ. This study demonstrates the potential of lexical semantic typology to contribute to identifying linguistic areas and cultural practices shared across them.|000|colexification, CLICS, South-East Asia, 4984|Schlenker2016|We argue that rich data gathered in experimental primatology in the last 40 years can benefit from analytical methods used in contemporary linguistics. Focusing on the syntactic and especially semantic side, we suggest that these methods could help clarify five questions: (i) what morphology and syntax, if any, do monkey calls have? (ii) what is the ‘lexical meaning’ of individual calls? (iii) how are the meanings of individual calls combined? (iv) how do calls or call sequences compete with each other when several are appropriate in a given situation? (v) how did the form and meaning of calls evolve? We address these questions in five case studies pertaining to cercopithecines (Putty-nosed monkeys, Blue monkeys, and Campbell’s monkeys), colobinae (Guereza monkeys and King Colobus monkeys), and New World monkeys (Titi monkeys). The morphology mostly involves simple calls, but in at least one case (Campbell’s -oo) we find a root-suffix structure, possibly with a compositional semantics. The syntax is in all clear cases simple and finite-state. With respect to meaning, nearly all cases of call concatenation can be analyzed as conjunction. But a key question concerns the division of labor between semantics, pragmatics and the environmental context (‘world’ knowledge and context change). An apparent case of dialectal variation in the semantics (Campbell’s krak) can arguably be analyzed away if one posits sufficiently powerful mechanisms of competition among calls, akin to scalar implicatures. An apparent case of non-compositionality (Putty-nosed pyow-hack sequences) can be analyzed away if one further posits a pragmatic principle of ‘urgency’, whereby threat-related calls must come early in sequences (another potential case of non-compositionality – Colobus snort-roar sequences – might justify assigning non-compositional meanings to complex calls, but results are tentative). Finally, rich Titi sequences in which two calls are re-arranged in com- plex ways so as to reflect information about both predator identity and location are argued not to involve a complex syntax/semantics interface, but rather a fine- grained interaction between simple call meanings and the environmental context. With respect to call evolution, we suggest that the remarkable preservation of call form and function over millions of years should make it possible to lay the groundwork for an evolutionary monkey linguistics, which we illustrate with cerco- pithecine booms, and with a comparative analysis of Blue monkey and Putty-nosed monkey repertoires. Throughout, we aim to compare possible theories rather than to fully adjudicate between them, and our claims are correspondingly modest. But we hope that our methods could lay the groundwork for a formal monkey linguistics combining data from primatology with formal techniques from linguistics (from which it does not follow that the calls under study share non-trivial properties, let alone an evolutionary history, with human language).|000|animal linguistics, animal language, language evolution, origin of language, animal calls, monkey calls, monkeys 4985|Schlenker2016a|A field of primate linguistics is gradually emerging. It combines general ques- tions and tools from theoretical linguistics with rich data gathered in experi- mental primatology. Analyses of several monkey systems have uncovered very simple morphological and syntactic rules and have led to the development of a primate semantics that asks new questions about the division of semantic labor between the literal meaning of monkey calls, additional mechanisms of prag- matic enrichment, and the environmental context. We show that comparative studies across species may validate this program and may in some cases help in reconstructing the evolution of monkey communication over millions of years.|000|monkey calls, monkey linguistics, animal linguistics, monkeys, animal calls, language evolution, 4986|Evans2001|It is increasingly common for primary linguistic fieldwork to be conducted with "last speakers," as swingeing language extinction brings a belated attention to the need to document endangered languages. Data from "last speakers" must, however, be treated with caution, given that the variety they speak may have been simplified through various processes of language death [...] -- though this is by no means always the case -- and/or heavily influenced by interference from whatever otther language(s) they use in day-to-day communication. |000|fieldwork, language death, language description, 4987|Ning2019|Recent studies of early Bronze Age human ge- nomes revealed a massive population expansion by individuals-related to the Yamnaya culture, from the Pontic Caspian steppe into Western and Eastern Eurasia, likely accompanied by the spread of Indo-European languages [1–5]. The south eastern extent of this migration is currently not known. Modern-day human populations from the Xinjiang region in northwestern China show a com- plex population history, with genetic links to both Eastern and Western Eurasia [6–10]. However, due to the lack of ancient genomic data, it remains unclear which source populations contributed to the Xinjiang population and what was the timing and the number of admixture events. Here, we report the first genome-wide data of 10 ancient in- dividuals from northeastern Xinjiang. They are dated to around 2,200 years ago and were found at the Iron Age Shirenzigou site. We find them to be already genetically admixed between Eastern and Western Eurasians. We also find that the ma- jority of the East Eurasian ancestry in the Shirenzi- gou individuals is-related to northeastern Asian populations, while the West Eurasian ancestry is best presented by 20% to 80% Yamnaya-like ancestry. Our data thus suggest a Western Eurasian steppe origin for at least part of the ancient Xinjiang population. Our findings further- more support a Yamnaya-related origin for the now extinct Tocharian languages in the Tarim Ba- sin, in southern Xinjiang.|000|Yamnaya culture, Xinjiang, Indo-European, archaeogenetics, population genetics, 4988|LewisKraus2019|Geneticists have begun using old bones to make sweeping claims about the distant past. But their revisions to the human story are making some scholars of prehistory uneasy.|000|genetics, ancient DNA, population genetics, scientific practice, 4989|Luniewska2019|We present a new set of subjective Age of Acquisition (AoA) ratings for 299 words (158 nouns, 141 verbs) in seven languages from various language families and cultural settings: American English, Czech, Scottish Gaelic, Lebanese Arabic, Malaysian Malay, Persian, and Western Armenian. The ratings were collected from a total of 173 participants and were highly reliable in each language. We applied the same method of data collection as used in a previous study on 25 languages which allowed us to create a database of fully comparable AoA ratings of 299 words in 32 languages. We found that in the seven languages not included in the previous study, the words are estimated to be acquired at roughly the same age as in the previously reported languages, i.e. mostly between the ages of 1 and 7 years. We also found that the order of word acquisition is moderately to highly correlated across all 32 languages, which extends our previous conclusion that early words are acquired in simi- lar order across a wide range of languages and cultures.|000|age of acquisition, cross-linguistic study, concept list, 4990|Yang2019|Endangered tone languages are not often studied within quantitative varia- tionist approaches, but such approaches can provide valuable insights for language description and documentation in the Tibeto-Burman area. This study examines tone variation within Yangliu Lalo (Central Ngwi), a minor- ity language community in China that is currently shifting to Southwestern Mandarin. Yangliu Lalo’s Tone 4, the rising-falling High tone, is lowering and flattening among young people, especially females, who also tend to use Lalo less frequently. Tonal range in elicited speech is shown to be decreasing as use of Lalo decreases. Concurrently, the standard deviation of the pitch of individual tones also decreases, while at the same time speakers with a nar- row tonal range also show greater articulatory precision for each tone. Tonal range and standard deviation of pitch are both parameters of tonal space, the arrangement of, and relationship between, tones within the tonal sys- tem. The results from our apparent-time study suggest that tonal space pro- vides a new avenue of sociolinguistic inquiry for tone languages.|000|Yangliu Lalo, Lalo, Sino-Tibetan, tone, tonal variation 4991|Heggarty2019|Sound Comparisons hosts over 90,000 individual word recordings and 50,000 narrow phonetic transcriptions from 600 language varieties from eleven language families around the world. This resource is designed to serve researchers in phonetics, phonology and related fields. Transcriptions follow new initiatives for standardisation in usage of the IPA and Unicode. At soundcomparisons.com, users can explore the transcription datasets by phonetically-informed search and filtering, customise selections of languages and words, download any targeted data subset (sound files and transcriptions) and cite it through a custom URL . We present sample research applications based on our extensive coverage of regional and sociolinguistic variation within major languages, and also of endangered languages, for which Sound Comparisons provides a rapid first documentation of their diversity in phonetics. The multilingual interface and user-friendly, ‘hover-to- hear’ maps likewise constitute an outreach tool, where speakers can instantaneously hear and compare the phonetic diversity and relationships of their native languages.|000|database, phonetic transcription, cross-linguistic study, 4992|Kawahara2019|Sound symbolism refers to stochastic and systematic associations between sounds and meanings. Sound symbolism has not received much serious attention in the generative phonology literature, perhaps because most if not all sound symbolic patterns are probabilistic. Building on the recent proposal to analyze sound symbolic patterns within a formal phonological framework (Alderete and Kochetov 2017), this paper shows that MaxEnt grammars allow us to model stochastic sound symbolic patterns in a very natural way. The analyses presented in the paper show that sound symbolic relationships can be modeled in the same way that we model phonological patterns. We suggest that there is nothing fundamental that prohibits formal phonologists from analyzing sound symbolic patterns, and that studying sound symbolism using a formal framework may open up a new, interesting research domain. The current study also reports two hitherto unnoticed cases of sound symbolism, thereby expanding the empirical scope of sound symbolic patterns in natural languages.|000|sound symbolism,entropy, missing code, missing data, 4993|DFG2019| 1. Vorwort 2. Präambel 3. Standards guter wissenschaftlicher Praxis 4. Nichtbeachtung guter wissenschaftlicher Praxis, Verfahren 5. Umsetzung der Leitlinien |000|scientific practice, guidelines, 4994|Ponti2019|Linguistic typology aims to capture structural and semantic variation across the world’s lan- guages. A large-scale typology could provide excellent guidance for multilingual Natural Lan- guage Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.|000|language variation, universals, computational linguistics, missing code, missing data 4995|Lillicrap2019|We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In analogy, we conjecture that rules for development and learning in brains may be far easier to understand than their resulting properties. The analogy suggests that neuroscience would benefit from a focus on learning and development.|000|code, neural network, evolution, evolution of code, evolution of complexity, 4996|Lillicrap2019|This is highly similar to the discussion of @Cepelewicz2019, where she points to the evolution of complexity as being based on the evolution of code instead of the evolution of the phenotype.|000|evolutionary theory, evolution of complexity, evolution of code 4997|LaPolla2019|This short note discusses the origin and development of the use of the term “pronomenalisation” (pronominalization) in Sino-Tibetan linguistics, point- ing out that the concept was originally a typological one, and that the phe- nomenon was seen as the result of grammaticalization, i.e. the free pronouns being copied onto the verb.|000|pronominalization, Sino-Tibetan, typology, terminology, 4998|Nyima2019|This article presents information regarding newly recognised non-Tibetic Tibeto-Burman languages spoken in three counties, Dzogang, Markham, and Drag-yab, of Chamdo Municipality and the adjacent Dzayul County in the Tibet Autonomous Region. First, we introduce four languages – Lamo, Larong sMar, Drag-yab sMar, and gSerkhu – identifying the location of each language on the Chinese administrative map as well as the numbers of speakers of the languages. Second, we provide a brief historical background on these languages, which suggests a relationship between them and Qiangic groups. Third, we display lexical evidence that shows not only their non-Tibetic features but also their closeness to Qiangic languages. Finally, the article focuses on Lamo, an endangered language spoken in Dzogang County, and provides a linguistic analysis of an annotated Lamo historical narrative in the Appendix.|000|language detection, fieldwork, Chamdo, Sino-Tibetan, Qiang, inter-linear-glossed text 4999|Bayanati2019|Animacy influences the patterns of subject-verb agreement marking in many languages, including Persian and Inari Saami. In Persian, animate plural subjects trigger plural agreement on the verb, whereas inanimate subjects may or may not trigger agreement. The variation is governed by factors such as personi- fication, agency and distributivity. In Inari Saami, verbs fully agree with human subjects and verbs partially agree with inanimate subjects. Verbs may or may not agree with subjects referring to animals. We argue that the intricate interaction between biological animacy and grammatical agreement in these two languages warrants careful consideration of the tripartite distinction between biological animacy in the world, our conceptualization of animacy and formal animacy features in the grammar.|000|animacy, Persian, Inari Saami, agreement 5000|Givon1989|Pragmatics is an approach to description, to information processing, thus to the construction, interpretation and comm unication of experience. At its core lies the notion of context, and the axiom that reality and/or experience are not absolute fixed entities, but rather frame-dependent, contingent upon the ob- server's perspective. Pragmatics traces its illustrious ancestry to the pre-Socratic Greek dialec- ticians, then via Aristotle to Locke, Kant and Peirce, eventually to 19th Century phenomenologists, and--Iast but not least--to Ludw ig Wittgenstein. In cogni- tive psychology, pragmatics underlies figure-ground perception, primed storage and maleable recall, attended ('context-scanning') information processing, and flexible ('prototype') categorization. In linguistics, prag- matics animates the study of contextual meaning and metaphoric extension, frame semantics and the semeiotics of grammar-in-discourse, the sociology of language, and the acquisition of communicative competence. In anthropol- ogy, pragmatics is reflected in the exploration of cultural relativity, ethno- methodology and cross-cultural cognition. In spite of such exalted lineage and wide applicability, the academic study of pragmatics remains narrow, insular and fractious. On the one hand, various formal schools have undertaken to keep pragmatics firmly attached to the very discipline which it purported to overthrow--formal deductive logic. On the other hand, a plethora of informal schools have taken the intoxicating freedom of contextual relativity as license for extreme methodological nihilism, unfettered intuitionism, and an anything goes rejec- tion of sensible em pirical constraints. What unites these extrem e in terpreta- tions is, paradoxically, an antipragmatic faith in the Platonic excluded middle: The lack oftotal order means a total lack oforder; the lack oftotal understanding is a total lack of understanding. In this way, the very essence of pragmatics is subverted by its most impassioned proponents. academics all too often ignore--that Goedel's observed limits on systems-- neither fully consistent, nor ever complete--sum up rather well the pragmatic predicament of life and mind in a real environment. Pragmatics, at its somewhat unadorned middle-ground best, closely reflects the evolutionary compromise practiced by biological organisms. In adapting to life in a less-than-ideal environment, bio-organisms have invariably opted for the proposition that half a loaf is infinitely better than none; that life is precariously suspended mid-way between absolute order and unmitigated chaos; that while full determinism is a dangerous evolutionary trap, un- bounded freedom is an unrealistic evolutionary mirage. In their h u mble travail to adapt and survive, bio-organisms have recognized what contentious|000|pragmatics, code, context, linguistics, introduction, essay, 5001|Morey2019|This chapter briefly introduces the languages of the Tangsa-Nocte ‘group’ within the Northern Naga languages. This group is the subject of detailed studies of Hakhun (Boro 2019), Muklom (Mulder 2019), and Phong (Dutta 2019), as well as an overview of agreement in the Pangwa group (Morey 2019).|000|Tangsa-Nocte, Naga, Sino-Tibetan, overview, 5002|Pyysalo2019|Since both the orthodox (M ØLLER , B ENVENISTE , P UHVEL ) and the revisionist (K URYŁOWICZ , E ICHNER , M ELCHERT /R IX , K ORTLANDT ) models of the laryngeal theory ( LT ) have failed to solve the problem of the IE vowels (P YYSALO & J ANHUNEN 2018a, 2018b), revisions in the theory are necessary. A comparison of the models of P UHVEL , E ICHNER , and M ELCHERT -R IX with regard to the criterion of economy shows that, although they are mutually contradictory, each of them has contributed at least one correct solution absent in the other models. By combining these correct solutions into a single model we can arrive at what may be termed the “Optimized Laryngeal Theory” ( OLT ), which, then, can be tested against monolaryngealism, as formulated by S ZEMERÉNYI .|000|laryngeal theory, computational modeling, 5003|Schlenker2016b|We explain why general techniques from formal linguistics can and should be applied to the analysis of monkey communication – in the areas of syntax and especially semantics. An informed look at our recent proposals shows that such techniques needn’t rely excessively on categories of human language: syntax and semantics provide versatile formal tools that go beyond the specificities of human linguistics. We argue that “formal monkey linguistics” can yield new insights into monkey morphology, syntax, and semantics, as well as raise provocative new questions about the existence of a pragmatic, competi- tion-based component in these communication systems. Finally, we argue that evolutionary questions, which are highly speculative in human language, can be addressed in an empirically satisfying fashion in primate linguistics, and we lay out problems that should be addressed at the interface between evolutionary primate linguistics and formal analyses of language evolution|000|syntax, primate linguistics, animal linguistics, semantics, syntax, language evolution, origin of language, 5004|Maliet2019|Understanding how and why diversification rates vary through time and space and across species groups is key to understand- ing the emergence of today’s biodiversity. Phylogenetic approaches aimed at identifying variations in diversification rates dur- ing the evolutionary history of clades have focused on exceptional shifts subtending evolutionary radiations. While such shifts have undoubtedly affected the history of life, identifying smaller but more frequent changes is important as well. We developed ClaDS—a new Bayesian approach for estimating branch-specific diversification rates on a phylogeny that relies on a model with changes in diversification rates at each speciation event. We show, using Monte Carlo simulations, that the approach performs well at inferring both small and large changes in diversification. Applying our approach to bird phylogenies covering the entire avian radiation, we find that diversification rates are remarkably heterogeneous within evolutionarily restricted species groups. Some groups such as Accipitridae (hawks and allies) cover almost the full range of speciation rates found across the entire bird radiation. As much as 76% of the variation in branch-specific rates across this radiation is due to intraclade variation, suggest- ing that a large part of the variation in diversification rates is due to many small, rather than few large, shifts.|000|diversification, species, modeling, evolutionary model, bird phylogenies, bird evolution, 5005|Manni2019|Gabon is an African country located very close to the homeland of Bantu languages (Cameroun). Starting about 5,000 years ago, Bantu-speaking populations diffused into almost all sub-Saharan Africa. By processing with computational linguistic methods (Levenshtein distance) two independently-collected lexical datasets recording the pro- nunciation of 88 and 158 words in more than 50 linguistic varieties spoken in Gabon, we obtain a numerical classification of the major linguistic groups. We compare this classifi- cation to available ones based on historical linguistics methods (cognate-sharing defined by experts), and find them overlapping, which indicates that the two methods capture the same signal of linguistic difference (and relatedness). To focus on the historical related- ness between major linguistic clusters, we control for the linguistic similarity related to contact, proportional to geographic vicinity, and suggest that the first Bantu-speaking groups to people Gabon where those speaking KOTA-KELE (B20) languages. The other varieties concern five different immigration waves (B10; B30; B40; B50-B60-B70 – Guthrie nomenclature) that penetrated Gabon later in history. To conclude, we suggest a peopling scenario that incorporates available paleoclimatic, archaeological and population genetic evidence.|000|Gabon, African languages, dialectology, dialectometry, missing data, missing code 5006|Karimi2010|We present a solution to the problem of understanding a system that produces a sequence of temporally ordered observations. Our solution is based on generating and interpreting a set of temporal decision rules. A temporal decision rule is a decision rule that can be used to predict or retrodict the value of a decision attribute, using condition attributes that are observed at times other than the decision attribute’s time of observation. A rule set, consisting of a set of temporal decision rules with the same decision attribute, can be interpreted by our Temporal Investigation Method for Enregistered Record Sequences (TIMERS) to signify an instantaneous, an acausal or a possibly causal relationship between the condition attributes and the decision attribute. We show the effectiveness of our method, by describing a number of experiments with both synthetic and real temporal data.|000|temporal decision rules, decision tree, sound change rules, 5007|Karimi2010|This may be useful for modeling sound change. |000|sound change, sound change model, temporal decision rules, decision tree, 5008|BunzaDonzo2015|This PhD thesis consists of the documentation, reconstruction and classification of ten Bantu lan- gages (bolondó, bonyange, ebudzá, ebwela, libóbi, ling mb , mondóngó, mony ng , mosángé, pága éte) spoken in the geographical area between the Congo and Ubangi Rivers in the north- western part of the Democratic Republic of the Congo. The study examines the interaction be- tween these languages and seven neighboring Ubangian languages (gbánzírí, g bú, ma ó, mb nz , monz mb , ngbandi, ngbaka-m n gend ). By means of a lexicostatistical study which determines the degree of lexical similarity between the languages under study, a phylogenetic classification has been established which integrates these languages in the larger sample of 401 Bantu languages used by Grollemund et al. (2015). This quantitative approach has generated Neighbor-Net and Neighbor-Joining networks as well as Bayesian trees, which indicate the in- ternal sub-groups of the Bantu family in general, and more specifically of the Bantu languages of the central Congo basin to which the Bantu languages spoken between the Congo and Ubangi Rivers belong. Subsequently, we have undertaken a descriptive and comparative study of the those languages as well as a study of regular sound correspondances with regard to Proto-Bantu. They possess certain foreign phonemes that have not been reconstructed to Proto-Bantu, such as im- plosives and labiovelar stops, which have the status of distinct phonemes. The study of these spe- cific sounds suggests that they were borrowed from the neighboring Ubangian languages. The lexical comparison also revealed an interaction between Bantu and Ubangian languages. Certain lexical borrowings were transferred from Bantu to Ubangian, while others moved in the opposite direction. Through the comparative method, we have obtained a phonological reconstruction of the hypothetical ancestor language of these langues. This Proto-Congo-Ubangi Bantu split into two sub-branches, i.e. Proto-Congo Bantu and Proto-Ubangi Bantu.|000|Bantu languages, genetic classification, linguistic reconstruction 5009|Nikulin2019a|This study has a two-fold purpose: to provide the reader with a panorama of the state of the art in diachronic studies of the Indigenous languages of Brazil and to promote a rigorous application of known, accepted methods employed by historical linguistics to these languages. We discuss at some length issues such as the proof of language relationship, the internal classification, the phonological and syntactic reconstruction, as well as the philological studies that aim at detecting diachronic changes. Special attention will be given to the comparative method.|000|Southern American languages, Brazil, overview, historical linguistics, genetic classification, 5010|Kirov2017|Can advances in NLP help advance cogni- tive modeling? We examine the role of artifi- cial neural networks, the current state of the art in many common NLP tasks, by return- ing to a classic case study. In 1986, Rumel- hart and McClelland famously introduced a neural architecture that learned to trans- duce English verb stems to their past tense forms. Shortly thereafter, Pinker and Prince (1988) presented a comprehensive rebuttal of many of Rumelhart and McClelland’s claims. Much of the force of their attack centered on the empirical inadequacy of the Rumel- hart and McClelland (1986) model. Today, however, that model is severely outmoded. We show that the Encoder-Decoder network architectures used in modern NLP systems obviate most of Pinker and Prince’s criti- cisms without requiring any simplication of the past tense mapping problem. We suggest that the empirical performance of modern networks warrants a reëxamination of their utility in linguistic and cognitive modeling.|000|Steven Pinker, cognitive modeling, past tense, English 5011|Shi1997|There are many loan-words from Chinese in Sui as well as in other languages of the same group. In the aspect of phonological correspondence, the earlier loan-words are quite different from the later ones. The former can be traced to the Ancient Chinese, whereas the latter are based on the neighboring Southwest Chinese dialect. The loan-words with mixed features of the earlier and the later periods revealed a continuum between historical layers. A semantic diversity of the paired loan-words of earlier and later periods is very common in the Sui language. Some words with related meaning are derived from loan-words by substitution of the tone, or initial, or final, just like the original native ones.|000|Sui languages, loan word, borrowing, language contact, 5012|Ma2019|Tibetan pig is native to the Qinghai-Tibet Plateau and has adapted to the high-altitude environmental condition such as hypoxia. However, its origin and genetic mechanisms underlying high-altitude adaptation still remain controversial and enigmatic. Herein, we analyze 229 genomes of wild and domestic pigs from Eurasia, including 63 Tibetan pigs, and detect 49.6 million high-quality variants. Phylogenomic and structure analyses show that Tibetan pigs have a close relationship with low-land domestic pigs in China, implying a common domestication origin. Positively selected genes in Tibetan pigs involved in high-altitude physiology, such as hypoxia, cardiovascular systems, UV damage, DNA repair. Three of loci with strong signals of selection are associated with EPAS1, CYP4F2, and THSD7A genes, related to hypoxia and circulation. We validated four non-coding mutations nearby EPAS1 and CYP4F2 showing reduced transcriptional activity in Tibetan pigs. A high-frequency missense mutation is found in THSD7A (Lys561Arg) in Tibetan pigs. The selective sweeps in Tibetan pigs was found in association with selection against non-coding variants, indicating an important role of regulatory mutations in Tibetan pig evolution. This study is important in understanding the evolution of Tibetan pigs and advancing our knowledge on animal adaptation to high-altitude environments.|000|population genetics, altitude, adaptation, Tibet, pigs, animals, Sino-Tibetan 5013|DeLancey2019|This paper surveys the forms of dual and plural pronouns across the Tibeto-Burman languages, and offers a reconstruction of the non-singular pronouns, and a general account of how various branches and languages have diverged from this original system. We can certainly reconstruct two, perhaps three, person-number portmanteaus: #i 1pl, or perhaps 1pl.inc, #ni 2pl, and, less certainly, #ka 1pl.exc. We also reconstruct #tsi dual which combined with singular pronouns to make dual forms. This con- struction was the model on which most daughter languages have inno- vated an analytic system of person and number marking, with distinct person and dual and/or plural morphemes combining to make the mor- phologically complex but semantically transparent compositional forms found in the majority of languages.|000|pronoun, singular, non-singular, plural, Tibeto-Burman, Sino-Tibetan, typology, genetic classification 5014|Campos2019|The objective of this work is to set a corpus-driven methodology to quantify automatically diachronic lan- guage distance between chronological periods of several languages. We apply a perplexity-based measure to written text representing different historical periods of three languages: European English, European Portuguese, and European Spanish. For this purpose, we have built historical corpora for each period, which have been compiled from different open corpus sources containing texts as close as possible to its original spelling. The results of our experiments show that a diachronic language distance based on perplexity detects the linguistic evolution that had already been explained by the historians of the three languages. It is remarkable to underline that it is an unsupervised multilingual method which only needs a raw corpora organized by periods.|000|perplexity, language classification, computational approaches, chronology, corpus studies 5015|Rodero2012|Speech rate is one of the most important elements in a news presentation, e­specially on radio, a sound medium. Accordingly, this study seeks to compare broadcasters’ speech rate and the number of pauses in 40 news bulletins from the BBC (United Kingdom), Radio France (France), RAI (Italy), and RNE (Spain). Most authors addressing the medium of radio recommend a speech rate of between 160 and 180 words per minute (wpm). If this rate is considered, only one radio station, BBC, would be within the suitable limits. Instead, higher speeds and fewer pauses have been identified in the RAI and RNE bul- letins. The second part of this study attempts to analyze whether perception in the news can be affected by different speech rates. The findings indicate that the extent to which the individuals surveyed experience subjective assessment varies according to the speech rate.|000|speech rate, communication speed, speech speed, radio bulletin, 5016|Meloni2019|Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this pro- cess be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contempo- rary daughter languages, and has to predict the proto word in the ancestor language. We pro- vide a novel dataset for this task, encompass- ing over 8,000 comparative entries, and show that neural sequence models outperform con- ventional methods applied to this task so far. Error analysis reveals a variability in the abil- ity of neural model to capture different phono- logical changes, correlating with the complex- ity of the changes. Analysis of learned embed- dings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.|000|automatic linguistic reconstruction, missing code, missing data, 5017|Filho2019|Despite a widespread agreement on the importance of transparency in science, a growing body of evidence suggests that both the natural and the social sciences are facing a reproducibility crisis. In this paper, we present seven reasons why journals and authors should implement — transparent guidelines. We argue that sharing replication materials, which include full disclosure of the methods used to collect and analyze data, the public availability of raw and manipulated data, in addition to computational scripts, may generate the following positive outcomes: 01. production of trustworthy empirical results, by preventing intentional frauds and avoiding honest mistakes; 02. making the writing and publishing of papers more efficient; 03. enhancing the reviewers ’ ability to provide better evaluations; 04. enabling the continuity of academic work; 05. developing scientific reputation; 06. helping to learn data analysis; and 07. increasing the impact of scholarly work. In addition, we review the most recent computational tools to work reproducibly. With this paper, we hope to foster transparency within the political science scholarly community.|000|scientific practice, code sharing, data sharing, open science, 5018|Hartmann2019|Traditional historical linguistics lacks the pos- sibility to empirically assess its assumptions regarding the phonetic systems of past lan- guages and language stages beyond traditional methods such as comparative tools to gain insights into phonetic features of sounds in proto- or ancestor languages. The paper at hand presents a computational method based on deep neural networks to predict phonetic features of historical sounds where the exact quality is unknown and to test the overall co- herence of reconstructed historical phonetic features. The method utilizes the principles of coarticulation, local predictability and statis- tical phonological constraints to predict pho- netic features by the features of their immedi- ate phonetic environment. The validity of this method will be assessed using New High Ger- man phonetic data and its specific application to diachronic linguistics will be demonstrated in a case study of the phonetic system Proto- Indo-European.|000|distinctive features, automatic approach, Wiktionary, missing code, missing data, 5019|Condit-Schultz2017|This paper describes a new digital corpus of rap transcriptions known as the Musical Corpus of Flow (MCFlow). MCFlow currently contains transcriptions of verses from 124 popular rap songs, performed by 86 different rappers, containing a total of 374 verses, and consisting of 5,803 measures of music. MCFlow transcriptions contain rhythmic information, encoded in musical durations, as well as prosodic information, syntactic information, and phonetic information, including the identification of rhymes. In the second part of the paper, preliminary analyses of the corpus are presented, describing the “norms” of several important features of rap deliveries. These features include speed, rhyme density, metric position of stressed syllables, metric position of rhymes, phrase length, and the metric position of phrases. Several historical trends are identified, including an increase in rhyme density and phrase variability between 1980 and 2000. In each analysis, variance between different performers is compared to variance between songs. It is found that there is generally more variability between songs than between performers|000|corpus, rap music, transcription, 5020|Condit-Schultz2017|The following section presents a brief overview of the contents and encoding scheme of the Musical Corpus of Flow. Complete details of the transcription process and the rationale for encoding and sampling decisions can be found in the author’s dissertation (Condit-Schultz, 2016). However, this is an ongoing project, so for the most up to date information regarding the current version of corpus, please visit www.rapscience.net; the dataset itself is also available for download at this site.|126|dataset, URL, data availability statement 5021|Batsuren2019|This paper introduces CogNet, a new, large-scale lexical database that provides cognates—words of common origin and meaning—across languages. The database currently contains 3.1 million cognate pairs across 338 languages using 35 writing sys- tems. The paper also describes the automated method by which cognates were computed from publicly available wordnets, with an accuracy evaluated to 94%. Finally, statistics and early insights about the cognate data are presented, hinting at a possible future exploitation of the resource 1 by various fields of lingustics|000|cognate sets, automatic cognate detection, database, 5022|Batsuren2019|Database available from http://cognet.ukc.disi.unitn.it|000|cognate sets, database, automatic cognate detection, 5023|Luangthongkum2019|The reconstruction of Proto-Karen (PK) has previously been attempted and presented in different ways by scholars, leading to some serious disagreements on some major points. To offer another new look at PK based only on fresh data collected by myself (except Bwe), the PK phonology and lexicon with 341 entries were reconstructed. Deliberately, available documented materials on the Karenic languages since 1799 onwards were not used for this reconstruction although they were consulted. The reconstruction is based on a 2,000-item word list with English and Thai glosses of ten selected Karenic varieties spoken in Thailand, i.e. Northern Pa- O and Southern Pa-O (Northern branch, NK); Kayan, Kayah, Bwe (from Henderson 1997) and Kayaw (Central branch, CK); Northern Sgaw, Southern Sgaw, Northern Pwo and Southern Pwo (Southern branch, SK). For comparative purposes, only the obvious cognates found in at least two of the three branches were used. In following this method, most of the items in my field notes had to be eliminated. The correspondence patterns of onsets, rhymes and tones were investigated and, then, the protoforms were reconstructed and compared with the previous PK reconstructions and with the PTB forms reconstructed by Benedict (1972) and Matisoff (2003).|000|Proto-Karen, Karen, Sino-Tibetan, Karenic, database, etymological data 5024|Madagain2019|Pointing gestures play a foundational role in human language, but up to now, we have not known where these gestures come from. Here, we investigated the hypothesis that pointing originates in touch. We found, first, that when pointing at a target, children and adults oriented their fingers not as though trying to create an “arrow” that picks out the target but instead as though they were aiming to touch it; second, that when pointing at a target at an angle, participants rotated their wrists to match that angle as they would if they were trying to touch the target; and last, that young children interpret pointing gestures as if they were attempts to touch things, not as arrows. These results provide the first substantial evidence that pointing originates in touch.|000|pointing, origin, language evolution, origin of language, gesture, 5025|Velde2018|n and propagation. Yet what has generally been lacking is a principled way of analyzing their interaction. Research into innovation focuses on the role of individual language users and tends to take a more qualitative approach, while propagation is typically studied in terms of the community grammar and tends to be more statistically driven. We propose an approach that bridges the two. Drawing on a much larger historical data set than is commonly done, our study shows how a high- resolution analysis of semantic and morphosyntactic behavior can be married to statistics, result- ing in a method that measures the degree of grammaticalization at the level of single attestations. We apply this method to the early grammaticalization of be going to inf, showing how a commu- nal increase breaks down into different rates of change in the run-up to, the middle of, and right after conventionalization. Additionally, we trace lifespan change of individual authors longitudi- nally. While not robustly in evidence, there are hints of postadolescence reanalysis in the run-up generation, and of increased realization of innovative features in the middle generation.|000|grammaticalization, real time, dynamics, language change, corpus studies, 5026|Soisalon-Soininen2019|In our on-going work, we are addressing the prob- lem of identifying cognates across lexica of any pair of languages. In particular, we assume that the languages of interest are low-resource to the extent that no training data whatsoever, even in closely related languages, are available for the task. In- stead, we investigate the performance of transfer learning approaches utilising training data from a completely unrelated language family. Cognate identification is a core task in the com- parative method, a collection of techniques used in historical linguistics, a field closely tied with linguistic typology (Shields, 2011). Cognate in- formation is also useful for applications such as machine translation (Grönroos et al., 2018). In ad- dition, knowledge of cognates is useful for second- language learning (Beinborn et al., 2014).|000|low resource languages, cognate detection, automatic cognate detection, Uralic languages, Saami languages, missing code, missing data 5027|Uban2019|Semantic divergence in related languages is a key concern of historical linguistics. Intra- lingual semantic shift has been previously studied in computational linguistics, but it can only provide a limited picture of the evolution of word meanings, which often develop in a multilingual environment. In this paper we investigate semantic change across languages by measuring the semantic distance of cog- nate words in multiple languages. By com- paring current meanings of cognates in differ- ent languages, we hope to uncover informa- tion about their previous meanings, and about how they diverged within their respective lan- guages from their common original etymon. We further study the properties of the seman- tic divergence of cognates, by analyzing how features of the words, such as frequency and polysemy, are related to their shift in meaning, and thus take the first steps towards formulat- ing laws of cross-lingual semantic change.|000|semantic divergence, cognate sets, semantic shift, colexification, automatic approach, missing data, missing code 5028|Baumann2018|Language acquisition and change are thought to be causally connected. We demonstrate a method for quantifying the strength of this connection in terms of the ‘basic reproductive ratio’ of linguistic constituents. It represents a standardized measure of reproductive success, which can be derived both from diachronic and from acquisition data. By analyzing phonotactic English data, we show that the results of both types of derivation correlate, so that phonotactic acquisition indeed predicts phonotactic change, and vice versa. After drawing that general conclusion, we discuss the role of utterance frequency and show that the latter exhibits destabilizing effects only on late acquired items, which belong to phonotactic periphery. We conclude that – at least in the evolution of English phonotactics – acquisition serves conservation, while innovation is more likely to occur in adult speech and affects items that are less entrenched but comparably frequent.|000|language acquisition, language change, basic reproductive ratio, reasons for language change, 5029|Baumann2018|Potentially important paper, linking acquisition with language change.|000|language change, language acquisition, mechanism of language change, 5030|Vittrang2019|Thomason and Kaufman’s 1988 book Language contact, creolization, and genetic linguistics had a stimulating effect on the fields of comparative and descriptive linguistics and inspired a number of studies on various topics related to lan- guage contact: the relationship between typology and language contact; the ef- fect of language contact on a language’s genetically inherited characteristics, and work on mixed and endangered languages. More generally speaking, the increased availability of data relating to language contact has enabled wider- ranging discussion on the nature of language contact and its consequences (see Hickey 2010 for a more detailed account of these subjects).|000|language contact, South-East Asian languages, Sino-Tibetan, areal linguistics, areal diffusion, structural data, 5031|Wright1955|It is not surprising that up to 1900, discussion of the causes of evolution largely took the form of ardent advocacy of one or another single principle. This is not wholly extinct today. There are muta- tionists, selectionists, hybridizationists, and still some orthogenesists and Lamarckians. Evidence that a particular major evolutionary step has re- quired a macromutation is likely to be urged against the view that major steps may arise from an accumulation of micromutations, or vice versa. Those who find evidence of a selective difference in a case in which this has not been obvious, are likely to present this as evidence that random drift is thereby excluded as a significant factor in this or any other case. I also sometimes find references to an author who masquerades under my name who seems to maintain the opposite.|000|evolutionary theory, important paper, nice paper, biological evolution 5032|Zoller1983|Book introduces the language of the Rang Pa Garhwal, offering a dictionary, grammar, and texts.|000|Rang Pa, Garhwal, Sino-Tibetan, 5033|Bona2006| 1. Basic Methods 2. Enumerative Combinatorics 3. Graph Theory 4. Horizons |000|handbook, introduction, graph theory, combinatorics, enumeration, mathematics 5034|List2019d|While language contact has so far been predominantly studied on the basis of detailed case studies, the emergence of methods for phylogenetic reconstruction and automated word comparison—as a result of the recent quantitative turn in historical linguistics—has also resulted in new proposals to study language contact situations by means of automated approaches. This study provides a concise introduction to the most important approaches, which have been proposed in the past, presenting methods that use (A) phylogenetic networks to detect reticulation events during language history, (B) sequence comparison methods in order to identify borrowings in multilingual datasets, and (C) arguments for the borrowability of shared traits to decide if traits have been borrowed or inherited. While the overview focuses on approaches dealing with lexical borrowing, questions of general contact inference will also be discussed where applicable.|000|language contact, computational methods, automatic approach, lexical borrowing, review 5035|Prince2012|The purpose of this article is to show that long-established insights into the close relation between predicate structure and information structure in Mandarin Chinese can account for a number of concrete observations once they are formal- ized. In the course of the discussion, I will develop formal definitions of the prin- ciple I refer to as the Predicate-Comment Mapping Hypothesis and of the copula and comment marker shi. After discussing how they apply to simple assertive clauses, I will show that these definitions allow us to derive the correct predictions about the differences between three different types of polarity questions—the so-called ma questions, shi-bu-shi questions and A-neg-A questions.|000|Chinese, Mandarin, grammar, information structure, predicate structure 5036|Campbell2011|The Dene–Yeniseian Connection (Henceforth DYC) has 18 articles and two appen- dixes, based on papers most of which were presented at the Dene–Yeniseian Sym- posium held February 26–27, 2008, at the University of Alaska, Fairbanks. The cornerstone of DYC is Edward Vajda’s “A Siberian Link with Na-Dene Languages” (pp. 33–99) in which he proposes a connection between the Yeniseian language fam- ily of central Siberia and “Na-Dene” (Athabaskan-Eyak-Tlingit, minus Haida of the traditional Na-Dene hypothesis). The other articles deal mostly with Vajda’s hypothe- sis, presenting a range of opinions about it and matters related to it. Given both the distance and time depth separating Yeniseian and Na-Dene, this would seem an im- plausible relationship. Nevertheless, several well-known linguists have declared their support, though often with caution, for example, in DYC, Hamp, Comrie, Fortescue, Kari, and Nichols. This being the case, this proposal merits careful attention. There are noteworthy ideas, as well as a few dramatic errors, in the other papers in DYC well worth discussion. However, most of the limited space for this review is dedicated to an evaluation of Vajda’s paper, given the importance of the hypothesis it proposes, though the other papers of the volume are also referenced here in connection with Vajda’s.|000|Dene-Caucasian hypothesis, Na-Dene, Dene-Yeniseian, review 5037|Dediu2019a|Linguistic diversity is affected by multiple factors, but it is usually assumed that variation in the anatomy of our speech organs plays no explanatory role. Here we use realistic computer models of the human speech organs to test whether inter-individual and inter-group variation in the shape of the hard palate (the bony roof of the mouth) affects acoustics of speech sounds. Based on 107 midsagittal MRI scans of the hard palate of human participants, we modelled with high accuracy the articulation of a set of five cross-linguistically representative vowels by agents learning to produce speech sounds. We found that different hard palate shapes result in subtle differences in the acoustics and articulatory strategies of the produced vowels, and that these individual-level speech idiosyncrasies are amplified by the repeated transmission of language across generations. Therefore, we suggest that, besides culture and environment, quantitative biological variation can be amplified, also influencing language.|000|vocal tract, language evolution, sound change, reasons for sound change, 5038|DeLancey1995|The term "bipartite stem" (@Jacobsen1980) refers to a pattern of compund stem construction found in northern California and Oregon, which crosses genetic boundaries, occurring in Hokan and Penutian languages, but also seems to roughly correlate with plausible genetic subunits (Northern Hokan, Plateau Penutian + Maiduan).|37|bipartite stem, bipartite verb, definition, 5039|Ryzhova2016|In this paper, we present an application for formal concept analysis (FCA) by showing how it can help construct a semantic map for a lexical typological study. We show that FCA captures typological regularities, so that concept lattices automatically built from linguistic data appear to be even more informative than traditional semantic maps. While sometimes this informativeness causes unreadability of a map, in other cases, it opens up new perspectives in the field, such as the opportunity to analyze the relationship between direct and figurative lexical meanings.|000|semantic map, formal concept analysis, colexification 5040|Oestling2012|Most existing models for multilingual nat- ural language processing (NLP) treat lan- guage as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of lan- guage. We show that these can be learned efficiently with a character-based neural language model, and used to improve in- ference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different lan- guages, we empirically explore the ca- pacity of multilingual language models, and also show that the language vectors capture genetic relationships between lan- guages.|000|continuous vector, NLP, natural language processing, bible corpus, 5041|Starostin2019a|I would therefore like to take this opportunity to offer a postscriptum of sorts to a large paper that I have recently coauthored with two of my colleagues from the Moscow School of Comparative Linguistics (@Kassian2015b, Zhivlov, and Starostin 2015; although my own part in that paper was more technical than substantial, I still fully endorse its contents and conclu- sions). The paper offered a relatively simple probabilistic procedure (the so-called “permuta- tion test,” devised by the American linguists William Baxter and Alexis Manaster Ramer, see Baxter and Manaster Ramer ) that was applied to reconstructions of basic roots for Proto-Indo-European (PIE) and Proto-Uralic (PU) and helped to determine that, at least within the framework of the chosen model, the number of phonetically similar matches be- tween PIE and PU (7 out of 50) exceeded chance expectations (the latter were defined by the model as falling within the range of 1 to 4 matches). The paper was open to panel discus- sion and received a certain amount of criticism from such qualified scholars as Don Ringe, Brett Kessler, and Petri Kallio; but even though the authors were given plenty of space to [pb] answer the most significant critical objections, I still think that certain important aspects and implications of this research remained without sufficient clarification, and this Festschrift is an excellent opportunity to try to offer some additional insights on the issue.|327f|Uralic, Indo-European, Indo-Uralic hypothesis, long-range comparison, macro families, proof of relationship, 5042|Steels2015|Mapping insights and frameworks from one scientific domain to an- other is often useful because it encourages communication between different scientific fields and acts as a conduit for the exchange of mathematical and com- putational tools. This paper introduces analogies between concepts and mecha- nisms from molecular biology and language processing. The main purpose is to find ways for understanding language as a ‘living’, dynamically evolving, self- organizing system. The analogies have been the main source of inspiration for a computational implementation of construction grammar, called Fluid Con- struction Grammar (FCG). The paper describes briefly the biological analogies underlying FCG and discusses the opportunities for further research that these analogies open up.|000|construction grammar, fluid construction grammar, biological parallels, 5043|Zhang2019c|In the brain, the semantic system is thought to store concepts. However, little is known about how it connects different concepts and infers semantic relations. To address this question, we collected hours of functional magnetic resonance imaging (fMRI) data from human subjects listening to natural stories. We developed a predictive model of the voxel-wise response, and further applied it to thousands of new words. We found that both semantic categories and relations were represented by spatially overlapping cortical networks, instead of anatomically segregated regions. Importantly, many such semantic relations that reflected conceptual progression from concreteness to abstractness were represented by a similar cortical pattern of anti-correlation between the default mode network and the frontoparietal attention network. Our results suggest that the human brain represents a continuous semantic space and uses distributed networks to encode not only concepts but also relationships between concepts. In particular, the default mode network plays a central role in semantic processing for abstraction of concepts across various domains.|000|human brain, semantic space, concepts, cortical representations, neurolinguistics, 5044|Campbell1999|Words which violate the typical phonological patterns (canonical forms, morpheme structure, syllable structure, phonotactics) of a language are likely to be loans.|64|borrowing detection, qualitative approach 5045|Campbell1999|Words containing sounds which are not normally expected in native words are candidates for loans.|64|borrowing detection, qualitative criteria 5046|Campbell1999|In some cases where the phonological history of the languages of a family is known, information concerning the sound changes that they have undergone can be helpful for determining loans, the direction of borrowing, and what the donor language was.|65|sound change, borrowing detection, qualitative criteria 5047|Campbell1999|The morphological make-up of words can help determine the direction of borrowing. In cases of borrowing, when the form in question in one language is morphologically complex (composed of two or more morphemes) or has an etymology which is morphologically complex, but the form in the other languages has no morphological analysis, then usually the donor language is the one with the morphologically complex form and the borrower is the one with the monomorphemic form.|65|borrowing detection, morphology, qualitative criteria 5048|Campbell1999|When a word in two (or more) languages is suspected of being borrowed, if it has legitimate cognates (with regular sound correspondences) across sister languages of one family, but is found in only one language (or a few languages) of another family, then the donor language is usu- ally one of the languages for which the form in question has cognates in the related languages.|67|borrowing detection, qualitative criteria, cognacy, 5049|Campbell1999|The geographical and ecological associations of words suspected of being loans can often provide information helpful to determining whether they are borrowed and what the identity of the donor language is.|68|borrowing detection, qualitative criteria 5050|Campbell1999|A still weaker kind of inference, related to the last criterion, can some- times be obtained from the semantic domain of a suspected loan.|69|borrowing detection, qualitative criteria, semantics 5051|Levinson2010|This paper argues that the language sciences are on the brink of major changes in primary data, methods and theory. Reactions to ‘The myth of language universals’ (Evans and Levinson, 2009a,b) divide in response to these new challenges. Chomskyan-inspired ‘C- linguists’ defend a status quo, based on intuitive data and disparate universalizing abstract frameworks, reflecting 30 years of changing models. Linguists driven by interests in richer data and linguistic diversity, ‘D-linguists’, though more responsive to the new developments, have tended to lack an integrating framework. Here we outline such an integrative framework of the kind we were presupposing in ‘Myth’, namely a coevolutionary model of the interaction between mind and cultural linguistic traditions which puts variation central at all levels – a model that offers the right kind of response to the new challenges. In doing so we traverse the fundamental questions raised by the commentary in this special issue: What constitutes the data, what is the place of formal representations, how should linguistic comparison be done, what counts as explanation, what is the source of design in language? Radical changes in data, methods and theory are upon us. The future of the discipline will depend on responses to these changes: either the field turns in on itself and atrophies, or it modernizes, and tries to capitalize on the way language lies at the intersection of all the disciplines interested in human nature.|000|quantitative turn, linguistics, opinion paper, 5052|Levinson2010|We believe that linguistics is on the brink of major changes in data, methods and theory. This was the message we tried to get across in the BBS paper and perhaps especially in our response to the 23 earlier comments in BBS – a response, we suspect, that many of the commentators here have failed to study.|2733|nice quote, big data, quantitative turn, linguistics, 5053|Trask1996|At present there are some 6000 different languages spoken on our planet, and every one of these languages has a vocabulary containing many thousands of words. Moreover, speakers of every one of these languages are in contact with [pb] neighbors who speak different languages; this is true today even for people living on remote Pacific islands on which they had previously been isolated for centuries. Consequently, everybody is in a position to learn some of the words used by their neighbours, and very frequently people take a liking to some of their neighbours' words and take those words over into their own language. |17f|borrowing, phenomenon, description, nice quote 5054|Trask1996|This process is somewhat curiously called **borrowing** -- 'curiously', because, of course, the lending language does not lose the use of the word, nor does the borrowing language intend to give it back. A better term might be 'copying', but 'borrowing' has long been established in this sense. And words which are borrowed are called **loan words**.|18|terminology, nice quote, borrowing, 5055|Trask1996|Such borrowing is one fo the most frequent ways of acquiring new words, and speakers of all languages do it.|18|lexical change, borrowing, neologism, 5056|Trask1996|Why should people be so eager to borrow somebody else's word? There are several reasons, but the simplest is that the words is the name of something genuinely new to speakers of the borrowing language. |18|borrowing, reasons for borrowing, methodology, 5057|Trask1996|Why should English-speakers go to the trouble of trying to borrow a French word for something when English already had a perfectly good word with the same meaning [...]? The reason is a simple one: prestige.|19|borrowing, prestige, reasons for borrowing 5058|Trask1996|All languages borrow words, but it is notable that some types of words are borrowed more readily than others. For one thing, nouns are borrowed more often than verbs or adjectives. This occurs partly because nouns are far more numerous than other classes of words to begin with, partly because new things are more likely to be denoted by nouns than by other words, and partly because new nouns are often easier to accommodate within the grammatical system of the borrowing language.|23|borrowability, hypothesis, reasons for borrowing, borrowing 5059|Trask1996|Further, there is clear evidence that certain semantic classes of words are much less likely to be borrowed than other words. These are chiefly the items of very high frequency which we would expect to find in every language: pronouns, lower numerals, kinship terms, names of body parts, simple verbs like *go*, *be*, *have*, *want*, *see*, *eat*, and *die*, widespread colour terms like *black*, *white*, and *red*, simple adjectives like *big*, *small*, *good*, *bad*, and *old*, names of natural phenomena like *sun*, *moon*, *star*, *fire*, *rain*, *river*, *snow*, *day*, and *night*.,grammatical words like *when*, *here*, *and*, *if*, and *this*, and a few others. Such words are often called the basic vocabulary, and the fact that they are rarely borrowed makes them of considerable importance in historical linguistics, as we will see later in the book.|23|borrowing, borrowability, basic vocabulary, 5060|Trask1996|Broadly speaking, there are two ways of dealing with this problem. First, if you ahve some idea how the word is pronounced in the donor language, you can try your best to reproduce that pronunciation in your own language, producing as a result something which is conspicuously foreign. Second, you can abandon such efforts and just pronounce the loan word as though it were a native word, following the ordinary phonological patterns of your language, and as a result changing the original pronunciation of the word, perhaps greatly. Both these approaches are widely used.|24|loan word, loanword integration, loan adaptation, borrowing 5061|Trask1996|Without introducing any new phonemes, lexical borrowing can also affect the phonotactics of the borrowing language. |26|phonotactics, borrowing, 5062|Trask1996|Even nouns may produce morphological complications for the borrowing language, however. In the majority of languages, nouns are inflected for number, and in many languages they are also marked for case and/or grammatical gender. Borrowed nouns must be fitted into all this morphology in one way or another, and the result may be disturbances to the borrowing language's morphology.|27|borrowing, loanword integration, loan adaptation 5063|Trask1996|Borrowing is very far from being the only way of obtaining new words. Languages can use their own resources to create new words, without appealing to other languages. |30|word formation, neologism, word creation 5064|Trask1996|Since the Norman Conquest, English has lost at least 60 per cent of the Old English vocabulary in favour of loans from French and Latin, and most of that loss took place in the several centuries after the Conquest. In less than 2000 year Basque has borrowed so many words from the neighbouring Latin and Romance that these loan words now outnumber the indigenous words in the language, and hundreds or thousands of indigenous words have undoubtledly been lost in the process. The Romance language Romanian has borrowed so many Slavic words that scholars for a while believed it was a Slavic language. Albanian seems to have lost more than 90 per cent of its original vocabulary in favour of loans from Latin, Greek, Hungarian, Slavic, Italian, and Turkish. Teh Arabic spoken in Malta has borrowed so many words from Italian, French, English, and other languages that Maltese is no longer considerd by anyone to be a variety of Arabic. |309|borrowing, examples, nice quote, Romanian, English, Norman Conquest 5065|Trask1996|The linguist Edith Moravcsik (@1978) has proposed some universal principles applying to grammatical borrowing. [...] *Universal 1: Grammatical morphemes cannot be borrowed until after some lexical items have been borrowed.* [...] *Universal 2: Bound morphemes can be borrowed only as parts of complete words.* *Universal 3: Verbs cannot be borrowed directly.* [...] Nevertheless, this claim appears to be simply false. *Universal 4: Inflectional morphemes cannot be borrowed until after some derivational (word-forming) morphemes have been borrowed.* *Universal 5: A preposed grammatical item may not be borrowed as a postposed one, and vice versa.*|314|universals, grammatical borrowing, borrowability, 5066|Trask1996|In some cases, centuries of contact between languages can lead to a particularly striking result: several neighbouring but unrelated languages can come to share a number of structural properties with one another, properties which they do not share with their closest genetic relatives elsewhere. A group of languages in which this situation obtains is called a linguistic area, or, using the German term, a *Sprachbund*. A number of such linguistic areas have been identified: the Balkans, theIndian subcontinent, southern Africa, the north-west coast of North America, southeast Asia, and several others. |315|Sprachbund, linguistic area, examples 5067|Trask1996|Southeast Asia is a case in point. Such languages as Chinese, Vietnamese, Thai, Burmese, and the Miao-Yao languages all have tones, and they all have monosyllabic morphemes (and often monosyllabic words). Their closest relatives elsewhere, such as Tibetan, a fairly close relative of Burmese, generally lack these characteristics (though some dialects of Tibetan have acquired tones very recently). Indeed, so distinctive are these languages that it was formerly thought they must all be related, a vew now known to be false, and identifying the true relatives of these languages has proved to be an exceedingly difficult problem, because all of these languages look far more like one another than they do like their relatives, the more so since Chinese loan words have penetrated deeply into most of the neighbouring languages. In this particular case, it is often though that the convergence among these unrelated languages is chiefly the result of heavy influence from the prestigious Chinese, but no on really knows.|315|South-East Asia, linguistic area, 5068|Trask1996|One of the most famous linguistic areas is the Balkans, where the languages participating most strongly in the *Sprachbund* are Bulgarian (Slavic), and the very closely related Macedonian, Romanian (Romance), Greek, and Albanian (the last two both belonging to independent branches of Indo-European); the Slavic language Serbian and the non-IE Turkish are marginal members of the group. :comment:`Lists basic features in the remainder.`|315|Sprachbund, linguistic area, Balkan, structural data, 5069|Bensalem2019|When a shift in writing style is noticed in a document, doubts arise about its originality. Based on this clue to plagiarism, the intrinsic approach to plagiarism detection identifies the stolen passages by analysing the writing style of the sus- picious document without comparing it to textual resources that may serve as sources for the plagiarist. Character n-grams are recognised as a successful approach to modelling text for writing style analysis. Although prior studies have investigated the best practice of using character n-grams in authorship attribution and other problems, there is still a need for such investigations in the context of intrinsic plagiarism detection. Moreover, it has been assumed in previous works that the ways of using character n-grams in authorship attribution remain the same for intrinsic plagiarism detection. In this paper, we study the effect of character n-grams frequency and length on the performance of intrinsic plagiarism detection. Our experiments utilise two state-of-the-art methods and five large document collections of PAN labs written in English and Arabic. We demonstrate empirically that the low- and the high-frequency n-grams are not equally relevant for intrinsic plagia- rism detection, but their performance depends on the way they are exploited.|000|plagiarism, n-gram model, plagiarism detection, automatic approach, 5070|Chacon2019|Este trabajo analiza de forma comparativa lenguas y variedades arawak del Alto Río Negro (Brasil), documentadas en la década de 1950 por el sacerdote salesiano Alcionílio Brüzzi Alves da Silva. A partir de un análisis inicial para reinterpretar y actualizar los metadatos y las transcripciones de Brüzzi, exploramos cerca de 220 conceptos y determinamos las palabras cognadas entre las diferentes lenguas y variedades. Tuvimos en cuenta la presencia o ausencia de determinado conjunto de cognados como variables lexicales y la realización alofónica de ciertos fonemas como variable fonética. El análisis resultó en un cuadro general de la diversidad lingüística arawak en los ríos Isana y Vaupés en la década de 1950, lo que nos permitió indagar en las relaciones genéticas y dialectológicas entre las lenguas y las variedades documentadas en aquel entonces, así como expandir el análisis en diálogo con investigaciones comparativas y dialectológicas recientes. :translation:`The article carries out a comparative analysis of Arawak languages and varieties from the Upper Rio Negro (Brazil), documented in the 1950s by Salesian priest Alcionílio Brüzzi Alves da Silva. On the basis of an initial analysis aimed at reinterpreting and updating Brüzzi’s metadata and transcriptions, we explored nearly 220 concepts and established the cognate words in the different languages and varieties. We took into account the presence or absence of a determined group of cognates as lexical variables, and the allophonic realization of certain phonemes, as a phonetic variable. The analysis resulted in a general picture of Arawak linguistic diversity in the region of the Isana and Vaupes rivers during the 1950s, which allowed us to inquire into genetic and dialectological relations among the languages and varieties, as well as to expand the analysis in dialogue with recent comparative and dialectological research.`|000|Baniwa, Arawakan, computer-assisted analysis, 5071|Chacon2019|Este trabajo analiza de forma comparativa lenguas y variedades arawak del Alto Río Negro (Brasil), documentadas en la década de 1950 por el sacerdote salesiano Alcionílio Brüzzi Alves da Silva. A partir de un análisis inicial para reinterpretar y actualizar los metadatos y las transcripciones de Brüzzi, exploramos cerca de 220 conceptos y determinamos las palabras cognadas entre las diferentes lenguas y variedades. Tuvimos en cuenta la presencia o ausencia de determinado conjunto de cognados como variables lexicales y la realización alofónica de ciertos fonemas como variable fonética. El análisis resultó en un cuadro general de la diversidad lingüística arawak en los ríos Isana y Vaupés en la década de 1950, lo que nos permitió indagar en las relaciones genéticas y dialectológicas entre las lenguas y las variedades documentadas en aquel entonces, así como expandir el análisis en diálogo con investigaciones comparativas y dialectológicas recientes. :translation:`The article carries out a comparative analysis of Arawak languages and varieties from the Upper Rio Negro (Brazil), documented in the 1950s by Salesian priest Alcionílio Brüzzi Alves da Silva. On the basis of an initial analysis aimed at reinterpreting and updating Brüzzi’s metadata and transcriptions, we explored nearly 220 concepts and established the cognate words in the different languages and varieties. We took into account the presence or absence of a determined group of cognates as lexical variables, and the allophonic realization of certain phonemes, as a phonetic variable. The analysis resulted in a general picture of Arawak linguistic diversity in the region of the Isana and Vaupes rivers during the 1950s, which allowed us to inquire into genetic and dialectological relations among the languages and varieties, as well as to expand the analysis in dialogue with recent comparative and dialectological research.`|000|Baniwa, Arawakan, computer-assisted analysis, 5072|Blench2019|The Sino-Tibetan [Trans-Himalayan] language phylum consists of large number of independent branches with no agreed internal structure. It is usually characterised as sesquisyllabic, i.e. typical word forms, especially nominals, have one or more presyllables and a root, the presyllable consisting of a single consonant. Such a structure, which is globally rare, also characterises almost all branches of Austroasiatic, the phylum with which it is intertwined across the central part of its range. However, the incidence of sesquisyllabism is sporadic in Sino-Tibetan, characterising east-central laguages and being absent in the diverse languages in the west of its range, typically in Nepal. In the easternmost Sino-Tibetan languages the trend towards monosyllabism has all but eliminated sesquisyllabism. The paper argues that sesquisyllabic structures are a consequence of the interaction with Austroasiatic and not an underlying characteristic of the phylum. The presence or absence of these structures is tabulated and mapped in each potential branch of Sino-Tibetan, to demonstrate their geography and the overlap with Austroasiatic. It is underlined that if this argument is accepted, then the whole process of reconstruction of PST, which in many cases hangs on the citation of a very few languages, is unreliable and the pathways to the proto-language must be radically rethought.|000|sesqui-syllabic structure, Sino-Tibetan, critic, linguistic reconstruction, language contact, 5073|Daland2019|Loanword adaptation has been claimed to provide a unique window onto the relation between speech perception and the phonological grammar. This paper focuses on whether the ‘illusory vowel’ effect—in which the presence/absence of a vowel is poorly discriminated within an illicit cluster—is sufficient to explain why vowel epenthesis is the preferred repair for medial clusters in Korean loanword adap- tation. A cross-linguistic discrimination experiment revealed a causative role of the stop release burst (or other audible frication noise) in the perception of an illusory vowel; in some cases, perception alone explains vowel epenthesis in loanword adap- tation. A follow-up, identification experiment showed that Koreans’ perceptual simi- larity judgements do not match up with the adaptation pattern for stop-nasal clusters (e.g. pakna), although they do for fricative-stop and stop-stop clusters (e.g. paska, pakta). This finding is problematic for a purely perceptual account of loanword adap- tation. The paper sketches a Bayesian account of Korean speech perception that in- tegrates top-down phonotactic likelihood and bottom-up acoustic match and is able to explain the experimental results. It closes with some speculation on the role of the Preservation Principle versus perception in loanword adaptation.|000|loanword integration, Korean, p-linguistics, 5074|Egorova2019|This paper presents the results of theoretical analysis and computer modeling, which suggest that two main linguistic populations characterized today as the division of Indo-European languages into the so- called “satem-centum” language ranges could emerge in the model Indo-European language community approximately 3500–4000 years ago. The results of computer modeling show that among the two main hypotheses of the formation of the Proto-Indo-Europeans (the Anatolian and Kurgan hypotheses), the latter corresponds to the time estimates we obtained to a greater extent. Some of the problems of the search for the ancestral homeland of the peoples that were carriers of the Proto-Indo-European language are analyzed.|000|homeland, Indo-European homeland, Indo-European, statistical analysis, 5075|Heggarty2017|When looking to language data as a source of information on human (pre)history, linguistic areas have long been the very poor relation of language families. Both within linguistics, and in conjunction with archaeology and genetics, far less atten- tion has been paid to convergence areas than to diverging families. Yet human populations have inevitably interacted in complexes of both convergent and diver- gent processes. This holds in linguistics no less than in culture and genetics: witness Matisoff’s (1990: 113) “Sinosphere” vs. “Indosphere”, two contrasting areal con- vergence zones, but within the same diverging Tibeto-Burman family. This imbal- ance between families and areas distorts and diminishes what we can learn from comparative linguistics, both historical and typological. It also means that we have much to gain if we can rebalance, to look much more seriously at the real-world contexts through (pre)history in which linguistic areas arose. Archaeologists and geneticists, when faced with signals of convergence between human populations on the sociocultural and demographic levels, often still think only in terms of divergent families as the linguistic parallel — rather than the more natural fit with convergent areas, so little known outside linguistics. Linguistics, meanwhile, labours under its own misconceptions and outdated visions of other disciplines, in the balance between migratory and diffusionist interpretations of the human past. Explaining linguistic areas requires one to think in terms of demo- graphic and socio-cultural processes radically different to those traditionally invoked to account for language family expansions. Or indeed, to rethink whether certain contexts and processes — trade, mobility, and so on — are good explanations for divergent families at all, when in fact they can be more plausible shapers of linguis- tic convergence areas instead. This contribution aims to set out some general first principles for a prehistory of language areas. These principles will be illustrated by cases drawn from a range of (pre)historic contexts from across the globe: Meso-America, the Andes and Amazonia, the Balkans, mainland south-east Asia, and the ‘Altaic’ zone of north- eastern Asia.|000|language contact, convergence, linguistic area, Sprachbund, language union, South-East Asia, Meso-America, Andes, Amazonia, Altaic, Balkan 5076|Heggarty2017|From such patterns in individual language structures, it is a short step to the next dimension of the global linguistic panorama, informative of (pre)history, and our theme here: the worldwide patterning of linguistic convergence areas. For those areas are effectively defined as aggregations of what has just been discussed: multiple structural features that overlap in their geographical distributions. The presence of highly complex tonal sys- tems, for instance, is one of a set of key shared characteristics that together define the linguistic area of (mainland) South-East Asia. (Others include tendencies also towards isolating morphology, monosyllabic morphemes, svo word order, and various phonological commonalities in relatively restricted syllable structures.)|149|linguistic area, language union, South-East Asia, criteria, 5077|Heggarty2017|At first sight, there may seem little contextual similarity with tropical Mainland South-East Asia, but there are nonetheless certain parallels. The most widely-held view sees the Austro-Asiatic family as ‘here first’, fol- lowed by a succession of other lineages moving in from the north, whether ‘pulled’ by the attraction of farming lands particularly suited to wet-rice agriculture, or ‘pushed’ by the southward spread of Chinese. The result is a linguistic patchwork not so dissimilar to the Balkans, of cheek by jowl frag- mentation.|160|South-East Asia, linguistic area, 5078|Orlandi2019|The ‘discovery’ of early Chinese, and its subsequent reconstruction, have allowed the modern linguist to reach a wide range of firm conclusions about the Chinese language and its position within the Tibeto- Burman family. Reverend Joseph Edkins should be credited with initial work on early Chinese as the ancestor language of the various Sinitic languages, and with its first partial reconstruction. This article is an attempt to supply at least a first historical guide for those interested in obtaining a better understanding of the implicit discovery of Sinitic and the first reconstructions of early Chinese.|000|Joseph Edkins, biography, linguistic reconstruction, history of science, 5079|Fischer2019|From the beginning, the idea of human races and their existence has been linked to an evaluation of these supposed races. Indeed, the notion that different groups of people differ in value preceded supposedly scientific work on the subject. The primarily biological justification for defining groups of humans as races – for example based on the colour of their skin or eyes, or the shape of their skulls – has led to the persecution, enslavement and slaughter of millions of people. Even today, the term ‘race’ is still frequently used in connection with human groups. However, there is no biological basis for races, and there has never been one. The concept of race is the result of racism, not its prerequisite.|000|declaration, racism, genetics, science, 5080|Roth2019|The term ‘digital humanities’ may be understood in three different ways: as ‘digitized humanities’, by dealing essentially with the constitution, management, and processing of digitized archives; as ‘numerical humanities’, by putting the emphasis on mathematical abstraction and the development of numerical and formal models; and as ‘humanities of the digital’, by focusing on the study of computer-mediated interactions and online communities. Discussing their meth- ods and actors, we show how these three potential acceptations cover markedly distinct epistemological endeavors and, eventually, non-overlapping scientific communities.|000|digital humanities, definition, terminology 5081|Rizvi2019|Microattribution is the name of a method which has recently started to be used in the attribution of parts of early modern plays. The method seeks to make author- ship attributions by using samples of writing consisting of less than two hundred words. This article argues that the method should not be used, fundamentally because it flouts the well-founded scientific insistence on the sufficiency of sample sizes. The article considers two recent applications of the method, show- ing that huge amounts of evidence were overlooked which would have invali- dated the conclusions drawn. Moreover, the article demonstrates that the method is biased in favour of authors with large surviving canons, such as Shakespeare, and it cannot therefore be relied upon.|000|micro-attribution, author recognition, critics 5082|Rau2019|On the basis of historical linguistic and language geographic evidence, the authors advance the novel hypothesis that the Munda languages originated on the east coast of India after their Austroasiatic precursor arrived via a maritime route from Southeast Asia, 3,500 to 4,000 years ago. Based on the linguistic evidence, we argue that pre-Proto-Munda arose in Mainland Southeast Asia after the spread of rice agriculture in the late Neolithic period, sometime after 4,500 years ago. A small Austroasiatic population then brought pre-Proto-Munda by means of a maritime route across the Bay of Bengal to the Mahanadi Delta region – an important hub location for maritime trade in historic and pre-historic times. The interaction with a local South Asian population gave rise to proto-Munda and the Munda branch of Austroasiatic. The Maritime Hypothesis accounts for the linguistic evidence better than other scenarios such as an Indian origin of Austroasiatic or a migration from Southeast Asia through the Brahmaputra basin. The available evidence from archaeology and genetics further supports the hypothesis of a small founder population of Austroasiatic speakers arriving in Odisha from Southeast Asia before the Aryan conquest in the Iron-Age.|000|Munda, Austroasiatic, homeland, dating 5083|Schwab2017|The perception of stress is highly influenced by listeners’ native language. In this research, the authors examined the effect of intonation and talker variability (here: phonetic variability) in the discrimination of Spanish lexical stress contrasts by native Spanish (N=17), German (N=21), and French (N=27) listeners. Participants listened to 216 trials containing three Spanish disyllabic words, where one word carried a different lexical stress to the others. The listeners’ task was to identify the deviant word in each trial (Odd-One-Out task). The words in the trials were produced by either the same talker or by two different talkers, and carried the same or varying intonation pat-terns. The German listeners’ performance was lower compared to the Spanish listeners but higher than that of the French listeners. French listeners performed above chance level with and without talker variability, and performed at chance level when intonation variability was introduced.Results are discussed in the context of the stress “deafness” hypothesis.|000|stress deafness, stress patterns,inter-speaker-variation, perception, foreign language, French, Spanish, German, empirical study, 5084|Bickel2017|This shift in emphasis started over twenty years ago with Dryer (1989), who drew attention to large-scale diffusion as an important possible confounding factor in the statistics of universals, and with Nichols (@1992), who set out to test uni- versals but instead discovered an intriguing set of large-scale areal pat- terns.|40|areal diffusion, universals, distinguishability, 5085|Bickel2017|A second problem is similar in kind to the possible fallacies when interpreting worldwide frequencies as universal preferences: if a pattern is more frequent in an area than outside it, this can be attributed to diffusion in contact only to the extent that we can be sure that the pattern did not arise many times independently because of some universal princi- ple.|41|areal diffusion, universals, distinguishability, 5086|Bickel2017|A second problem is similar in kind to the possible fallacies when interpreting worldwide frequencies as universal preferences: if a pattern is more frequent in an area than outside it, this can be attributed to diffusion in contact only to the extent that we can be sure that the pattern did not arise many times independently because of some universal princi- ple.|41|areal trait, universal trait, universals, areal diffusion, distinguishability, 5087|Bickel2017|In the following, I begin the discussion by exploring the kinds of pro- cesses that lead to area formation and universal patterns (Section 3.2). This leads me to suggest ways of distinguishing the statistical footprint of the relevant processes (Section 3.3). Section 3.4 illustrates the methods via recent case studies, and Section 3.5 concludes the chapter.|41f|methodology, evolutionary processes, universals, areal traits, distinguishability, 5088|Bjerva2019|A neural language model trained on a text corpus can be used to induce distributed represen- tations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is cap- tured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.|000|structural similarity, genetic similarity, language representations, neural network, 5089|Legendre2015|1. The Mantel test is widely used in biology, including landscape ecology and genetics, to detect spatial structures in data or control for spatial correlation in the relationship between two data sets, for example community composition and environment. The study demonstrates that this is an incorrect use of that test. 2. The null hypothesis of the Mantel test differs from that of correlation analysis; the statistics computed in the two types of analyses differ. We examined the basic assumptions of the Mantel test in spatial analysis and showed that they are not verified in most studies. We showed the consequences, in terms of power, of the mismatch between these assumptions and the Mantel testing procedure. 3. The Mantel test H 0 is the absence of relationship between values in two dissimilarity matrices, not the indepen- dence between two random variables or data tables. The Mantel R 2 differs from the R 2 of correlation, regression and canonical analysis; these two statistics cannot be reduced to one another. Using simulated data, we show that in spatial analysis, the assumptions of linearity and homoscedasticity of the Mantel test (H 1 : small values of D 1 correspond to small values of D 2 and large values of D 1 to large values of D 2 ) do not hold in most cases, except when spatial correlation extends over the whole study area. Using extensive simulations of spatially correlated data involving different representations of geographic relationships, we show that the power of the Mantel test is always lower than that of distance-based Moran’s eigenvector map (dbMEM) analysis and that the Mantel R 2 is always smaller than in dbMEM analysis, and uninterpretable. These simulation results are novel contributions to the Mantel debate. We also show that regression on a geographic distance matrix does not remove the spatial structure from response data and does not produce spatially uncorrelated residuals. 4. Our main conclusion is that Mantel tests should be restricted to questions that, in the domain of application, only concern dissimilarity matrices, and are not derived from questions that can be formulated as the analysis of the vectors and matrices from which one can compute dissimilarity matrices.|000|Mantel test, spatial analysis, spatial modeling, correlational studies, 5090|Bender2011|Language independence is commonly presented as one of the advantages of modern, machine-learning approaches to NLP, and it is an important type of scalability. If technology developed for one language can be ported to another merely by amassing appropriate training data in the second language, then the effort put into the development of the technology in the first language can be leveraged to more effciently create technology for other languages. In cases where the collection of training data represents minimal effort (compared to the algorithm development), this can be very efficient indeed.|000|language independance, NLP, Bender rule 5091|Bender2019|Progress in the field of Natural Language Processing (NLP) depends on the existence of language resources: digitized collections of written, spoken or signed language, often with gold standard labels or annotations reflecting the intended output of the NLP system for the task at hand (e.g. the gold standard text for a speech recognition system or gold standard user intent labels in a dialogue system such as Siri, Alexa or Google Home). Unsupervised, weakly supervised, semi-supervised, or distantly supervised machine learning techniques reduce the overall dependence on labeled data, but even with such approaches, there is a need for both sufficient labeled data to evaluate system performance and typically much larger collections of unlabeled data to support the very data-hungry machine learning techniques.|000|Bender rule, NLP, critic 5092|Dabrowska2019|Universal Grammar (UG) is a suspect concept. There is little agreement on what exactly is in it; and the empirical evidence for it is very weak. This paper critically examines a variety of arguments that have been put forward as evidence for UG, focussing on the three most powerful ones: universality (all human languages share a number of properties), convergence (all language learners converge on the same grammar in spite of the fact that they are exposed to different input), and poverty of the stimulus (children know things about language which they could not have learned from the input available to them). I argue that these arguments are based on premises which are either false or unsubstantiated. Languages differ from each other in profound ways, and there are very few true universals, so the fundamental crosslinguistic fact that needs explaining is diversity, not universality. A number of recent studies have demonstrated the existence of considerable differences in adult native speakers’ knowledge of the grammar of their language, including aspects of inflectional morphology, passives, quantifiers, and a variety of more complex constructions, so learners do not in fact converge on the same grammar. Finally, the poverty of the stimulus argument presupposes that children acquire linguistic representations of the kind postulated by generative grammarians; constructionist grammars such as those proposed by Tomasello, Goldberg and others can be learned from the input. We are the only species that has language, so there must be something unique about humans that makes language learning possible. The extent of crosslinguistic diversity and the considerable individual differences in the rate, style and outcome of acquisition suggest that it is more promising to think in terms of a language- making capacity, i.e., a set of domain-general abilities, rather than an innate body of knowledge about the structural properties of the target system.|000|universal grammar, UG, critic, Chomsky syntax, 5093|Wood2019|The English language is a direct offshoot of Mandarin Chinese, a group of academics from the country who believe Europe had no history before the 15th century has claimed. Scholars from the World Civilisation Research Association, a Chinese scholarly group, argued that all European languages derived from a Mandarin root while speaking at the first China International Frontier Education Summit in Beijing. Vice president and secretary-general of the group Zhai Guiyun told a reporter from Sina Online that words such as yellow proved a prime example - arguing the word was based on autumnal leaves, and that it was phonetically similar to the mandarin word for ‘leaf drop’.|000|English, Chinese, dialect, fun article, blogpost, 5094|Serva2008|The evolution of languages closely resembles the evolution of haploid organisms. This similarity has been recently exploited (Gray R. D. and Atkinson Q. D., Nature, 426 (2003) 435; Gray R. D. and Jordan F. M., Nature, 405 (2000) 1052) to construct language trees. The key point is the definition of a distance among all pairs of languages which is the analogous of a genetic distance. Many methods have been proposed to define these distances; one of these, used by glottochronology, computes the distance from the percentage of shared “cognates”. Cognates are words inferred to have a common historical origin, and subjective judgment plays a relevant role in the identification process. Here we push closer the analogy with evolutionary biology and we introduce a genetic distance among language pairs by considering a renormalized Levenshtein distance among words with same meaning and averaging on all words contained in a Swadesh list (Swadesh M., Proc. Am. Philos. Soc., 96 (1952) 452). The subjectivity of process is consistently reduced and the reproducibility is highly facilitated. We test our method against the Indo- European group considering fifty different languages and the two hundred words of the Swadesh list for any of them. We find out a tree which closely resembles the one published in Gray and Atkinson (2003), with some significant differences.|000|phylogeny, Indo-European, edit distance, Levenshtein distance, 5095|Schiborr2019|Multi-CAST, the Multilingual Corpus of Annotated Spoken Texts (Haig & Schnell 2015), 1 is a collec- tion of annotated texts from a typologically diverse set of languages. The texts in the collection are chiefly non-elicited and monologic narratives. Multi-CAST has been designed to enable cross- linguistic inquiries into referentiality and discourse structure by providing common ground for quantitative analyses, 2 in an effort to address questions posed by notions such as preferred ar- gument structure (Du Bois 1987; 2003; 2017), referential density (Bickel 2003; Noonan 2003), and accessibility theory (Ariel 1988; 1990; 2004), among many others.|000|corpus, typological diversity, language documentation, spoken language, 5096|Ziemann2016|The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.|000|errors, annotation, genetics, Excel, data format, problems 5097|Faust2019|This special issue includes twelve papers that consider roots to be a unit of lexical structure that is distinct from the stem, word, lexeme or morpheme. The assumption common to all of the papers is that roots are terminals in morpho- syntactic structure. Given these assumptions, the overarching topic of this volume is the way in which roots interact with their morphosyntactic environ- ment. We have grouped the papers in this collection into four thematic sections: (1) roots and selection; (2) roots and allosemy; (3) roots and allomorphy; and (4) the form of roots. |000|root, Chomsky syntax, formal syntax, morphology, word formation, 5098|Vulic2019|Recent efforts in cross-lingual word embed- ding (CLWE) learning have predominantly fo- cused on fully unsupervised approaches that project monolingual embeddings into a shared cross-lingual space without any cross-lingual signal. The lack of any supervision makes such approaches conceptually attractive. Yet, their only core difference from (weakly) super- vised projection-based CLWE methods is in the way they obtain a seed dictionary used to initialize an iterative self-learning procedure. The fully unsupervised methods have arguably become more robust, and their primary use case is CLWE induction for pairs of resource- poor and distant languages. In this paper, we question the ability of even the most robust un- supervised CLWE approaches to induce mean- ingful CLWEs in these more challenging set- tings. A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210 language pairs) show that fully unsuper- vised CLWE methods still fail for a large num- ber of language pairs (e.g., they yield zero BLI performance for 87/210 pairs). Even when they succeed, they never surpass the perfor- mance of weakly supervised methods (seeded with 500-1,000 translation pairs) using the same self-learning procedure in any BLI setup, and the gaps are often substantial. These find- ings call for revisiting the main motivations be- hind fully unsupervised CLWE methods.|000|cross-lingual embeddings, word embeddings, NLP, critic, evaluation, 5099|Mallick2016|We report the Simons Genome Diversity Project (SGDP) dataset: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioral modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that in other non-Africans.|000|Out-of-Africa, genetics, population genetics, genome 5100|Borsley2019|This chapter compares work done in Head-Driven Phrase Structure Grammar with work done under the heading Minimalist Program. We discuss differences in the respective approaches and the outlook of theories. We have a look at the procedu- ral/constraint-based views on grammar and discuss the differences in complexity of the structures that are assumed. We also address psycholinguistic issues like processing and language acquisition.|000|Chomsky syntax, minimalism, HPSG, Head-Driven Phrase Structure Grammar, computational linguistics, Minimalist Program, automatic approach, 5101|Clematide2018|This article describes a new word alignment gold standard for German nominal compounds and their multiword translation equivalents in English, French, Italian, and Spanish. The gold standard contains alignments for each of the ten language pairs, resulting in a total of 8,229 bidirectional alignments. It covers 362 occurrences of 137 different German compounds randomly selected from the corpus of European Parliament plenary sessions, sampled according to the criteria of frequency and morphological complexity. The standard serves for the evaluation and optimisation of automatic word alignments in the context of spotting translations of German compounds. The study also shows that in this text genre, around 80% of German noun types are morphological compounds indicating potential multiword units in their parallel equivalents.|000|gold standard, multiword expressions, compounding, German, Spanish, French, Italian, English, morphology 5102|Thomason2003|It must be emphasized that there is no sharp boundary between «mixed language» and «unmixed language». All languages have undergone at least some contact-induced changes [...].|21|mixed languages, nice quote 5103|Schuchardt1884|Ich habe behauptet dass unter allen Fragen mit welchen die heutige Sprachwissenschaft zu thun hat, keine von grösserer Wichtigkeit ist als die der Sprachmischung, und ich will zunächst darlegen was mich zu dieser Meinung geführt hat.|3|language mixture, historical linguistics, nice quote, 5104|Gao2017|The study of language contact epitomizes the dynamics of language as a system of human commu- nication. The competing linguistic forces at work when speakers of different language varieties come into contact can be narrowed down to two basic concepts––convergence and divergence. Looking at linguistic areas using a macro approach, languages in contact tend to show conver- gence across all structural levels through diffusion and borrowing, but nevertheless, linguistic diversity persists in regions of high interethnic language contact. Ethnicity often plays a signifi- cant role in constructing identity, therefore a speaker’s linguistic choices can reflect ethnic identity and intergroup relations. Because these processes occur in and as a result of complex societies, “studies of interethnic language contact must begin by understanding the context in which speak- ers in a community construct their own ethnicity, as well as the ideologies that affect how they view other groups” (Fought 2013: 395). Southwest China is a particularly interesting region for language contact research because high levels of ethnolinguistic diversity in remote areas perpet- uates traditional interethnic contact relations while these same groups are also currently under social and economic pressure to assimilate to mainstream Chinese society.|000|South-East Asia, language contact, dataset, wordlist 5105|Rudnicka2019|Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of three equivalence types: strong, regular and weak, distin- guished by a custom-designed set of formal, semantic and translational features. To validate the procedure, three independent samples of 120 sense pairs were manual- ly analysed with respect to the features. The results show that synsets from the two wordnets linked by interlingual synonymy relation have a greater number of equiva- lents than those linked through interlingual partial synonymy or interlingual hyp- onymy relations. Even synsets linked via interlingual synonymy may have pairs of lexical units which are only weak equivalents. More-fine grained sense linking enhances the usefulness of the mapped wordnets as a bilingual lexicon for transla- tors or researchers.|000|wordnet, sense equivalence, meaning, NLP 5106|Pasternak2019|Certain measurement-related constructions impose a requirement that the measure function used track the part-whole structure of the domain of measurement, so that a given entity or eventuality must have a larger measurement in the chosen dimension than any of its salient proper parts. I provide evidence from English and Chinese that these constructions can be used to measure the intensity of mental states like hatred and love, indicating that in the natural language ontology of such states, intensity correlates with part-whole structure. A natural language metaphysics of psychological intensity meeting this requirement is then developed and integrated into the semantics. Further complications arise when looking at attitudes like want, wish, and regret, which also permit measurements of intensity in the relevant constructions. To account for such attitudes, the ontology and semantics are then enriched in a way that integrates ordering and quantification over possible worlds into the part-whole structure of attitude states, so that even in these more complicated cases, the constructions at hand have a unified compositional semantics.|000|meaning, sense, emotion, Chinese, English, particular linguistics 5107|Franco2019| The use of loanwords is generally attributed to a social feature, like social prestige, and to semantic features, like the need to fill a lexical gap. However, few studies take into account variation in the use of loanwords within a speech community, and directly compare the frequency of loanwords from more than one source language. This paper contributes to research on lexical borrowing by comparing the distribution of loanwords from three different source languages in two large databases of dialect data. We take an onomasiological perspective, which allows us to gauge the frequency of borrowed lexical items vis-à-vis alternative expressions. Using Generalized Additive Mixed Modeling, we show that the usage of loanwords can only be explained by taking into account the interaction between semantics and geographical diffusion. Our analysis confirms that the patterns that occur almost exclusively reflect changes in socio-cultural history. |000|loan word, loan transfer, lexical borrowing, prestige, sociolinguistics, computational study, concept-based 5108|Parkvall2019|Almost all creolists see creole formation as a case of (failed) second language acqui- sition. I argue that there are good reasons to distinguish between second language acquisition and pidginisation/creolisation, and that little is gained by equating the two. While learners have an extant language as their target, pidginisers typically aim to com- municate (in any which way) rather than to acquire a specific language. In this sense, pidginisation represents, if not “conscious language change”, at least “conscious lan- guage creation”.|000|creolisation, pidgin, discussion 5109|Urban2019|In this article, I reconsider the evidence for a Central Andean linguistic area. I suggest that there is no evidence for a clear-cut linguistic area comprising the entire Central Andes narrowly defined, and that perceived homogeneity is partially due to an over- emphasis on the largest and surviving Central Andean language families, Quechuan and Aymaran. I show that none of the other Central Andean languages known suffi- ciently well match their typological profile to a high degree. I make a contribution to a more adequate picture by discussing some typological aspects tentatively recoverable for the extinct and poorly documented languages of the North-Central Andes. These suggest that the North was the site of linguistic traits contrasting with those of Quech- uan and Aymaran.|000|linguistic area, Andes, South American languages, 5110|Torner2019|The digital turn in lexicography has paved the way to new techniques for presenting information in dictionaries. This paper explores the potential of collocational net- works (Phillips 1983, Williams 1998) as key lexicographical tools to represent infor- mation about lexical combinatorics. Collocational networks, which had initially been used as a means of identifying collocations in large corpora, must now be rethought in order to assist non-expert dictionary users in text production by allowing them to find the exact collocates they are looking for, as well as by offering the grammatical information needed to use collocations accurately. We advocate improving net- works by incorporating information visualization techniques (Ware 2008, Pham 2012). Specifically, we suggest a number of measures which may be taken to both simplify access to the information provided by raw output from corpora —much of which may be noise for the dictionary user— and to enrich such collocational data by means of visually-explained relevant grammatical information.|000|digital turn, digital humanities, dictionary, visualization, interactive visualization, graph, network 5111|Ebling2012|e propose an approach to semi-automatically obtaining semantic relations in Swiss German Sign Language (Deutschschweizerische Gebärdensprache, DSGS). We use a set of keywords including the gloss to represent each sign. We apply GermaNet, a lexicographic reference database for German annotated with semantic relations. The results show that approximately 60% of the semantic relations found for the German keywords associated with 9000 entries of a DSGS lexicon also apply for DSGS. We use the semantic relations to extract sub-types of the same type within the concept of double glossing (Konrad 2011). We were able to extract 53 sub-type pairs.|000|annotation, semantic relations, semi-automatic approach, computer-assisted analysis, sign language 5112|DaleOlsen2019|We study the importance of linguistic diversity in the workplace for workplace productivity. While cultural diversity might improve productivity through new ideas and innovation, linguistic diversity might increase communication costs and thereby reduce productivity. We apply a new measure of languages’ linguistic proximity to Norwegian linked employer- employee Manufacturing data from 2003-12, and find that higher workforce linguistic diversity decreases productivity. We find a negative effect also when we take into account the impact of cultural diversity. As expected proficiency in Norwegian of foreign workers improves since their time of arrival in Norway, the detrimental impact disappears.|000|workplace, linguistic diversity, ASJP, workplace productivity, economics 5113|Kolly2016|Which acoustic cues can be used by listeners to identify speakers’ linguistic origins in foreign-accented speech? We investigated accent identification performance in signal-manipulated speech, where (a) Swiss German listeners heard native German speech to which we transplanted segment durations of French- accented German and English-accented German, and (b) Swiss German listeners heard 6-band noise- vocoded French-accented and English-accented German speech to which we transplanted native German segment durations. Therefore, the foreign accent cues in the stimuli consisted of only temporal informa- tion (in a) and only strongly degraded spectral information (in b). Findings suggest that listeners were able to identify the linguistic origin of French and English speakers in their foreign-accented German speech based on temporal features alone, as well as based on strongly degraded spectral features alone. When comparing these results to previous research, we found an additive trend of temporal and spectral cues: identification performance tended to be higher when both cues were present in the signal. Acoustic measures of temporal variability could not easily explain the perceptual results. However, listeners were drawn towards some of the native German segmental cues in condition (a), which biased responses to- wards ‘French’ when stimuli featured uvular /r/s and towards ‘English’ when they contained vocalized /r/s or lacked /r/.|000|perception, foreign language, speech perception, phonetics, accent, accent identification 5114|Bleyan2019|We discuss two methods that let us easily create grapheme-to- phoneme (G2P) conversion systems for languages without any human-curated pronunciation lexicons, as long as we know the phoneme inventory of the target language and as long as we have some pronunciation lexicons for other languages written in the same script. We use these resources to infer what grapheme- to-phoneme correspondences we would expect, and predict pro- nunciations for words in the target language with minimal or no language-specific human work. Our first approach uses finite- state transducers, while our second approach uses a sequence- to-sequence neural network. Our G2P models reach high de- grees of accuracy, and can be used for various applications, e.g. in developing an automatic speech recognition system. Our methods greatly simplify a task that has historically required extensive manual labor.|000|orthography, orthography profile, phoneme-to-grapheme, phoneme, grapheme-to-phoneme, NLP, transcription, 5115|Ratliff2010|In some words that begin with a nasal, like «crossbow», secondary nasalization in the rime makes [pb] Hmongic and Mienic difficult to reconcile at the higher level. But when the tone is the only ambiguous element, as in «he/she/it» and «to go», it is reconstructed, with indication (by use of parentheses) that variant forms probably existed in the protolanguage. |30f|kng-alternation, Hmong-Mien, Proto-Hmong-Mien, linguistic reconstruction 5116|Taylor1986|LaPolla Because I do a lot of that myself, because that's the way I was taught. Li Hm. I think all such reconstructions are junk. `[LINK] `_|68|reconstruction, linguistic reconstruction, Sino-Tibetan, Tibeto-Burman, Li Fang-Kuei, interview, critics 5117|Aumann2004|The Mienic languages constitute one of the two main brnaches of the Hmong-Mien (or Miao-Yao) language family. There have been numerous atempts to link the Hmong-Mien language family with others. Chinese linguists consider that Hmong-Mien is a subfamily of the Sino-Tibetan language family. Benedict has put forward a controversial proposal, called the Austro-Tai hypothesis, that Austronesian, Kadai, and Hmong-Mien are all related (@1975). Later this hypothesis was expanded to include Japanese (@Benedict1990). These are just the two most well known of the numerous attempts to link the Hmong-Mien language family with others (otehr proposals are listed in @Huffmann1986 : 574 and Voegelin and @Voegelin1977 : 228). The most recent proposal comes from Peiros (@1998), who presents evidence linking Austroasiatic and Hmong-Mien. None of these proposals are generally accepted and most western linguists simply note this and do not discuss the issue in detail.|000|Hmong-Mien, Mienic languages, subgrouping, 5118|List2019a|Sound correspondence patterns play a crucial role for linguistic reconstruction. Linguists use them to prove language relationship, to reconstruct proto-forms, and for classical phylogenetic reconstruction based on shared innovations. Cognate words that fail to conform with expected patterns can further point to various kinds of exceptions in sound change, such as analogy or assimilation of frequent words. Here I present an automatic method for the inference of sound correspondence patterns across multiple languages based on a network approach. The core idea is to represent all columns in aligned cognate sets as nodes in a network with edges representing the degree of compatibility between the nodes. The task of inferring all compatible correspondence sets can then be handled as the well-known minimum clique cover problem in graph theory, which essentially seeks to split the graph into the smallest number of cliques in which each node is represented by exactly one clique. The resulting partitions represent all correspondence patterns that can be inferred for a given data set. By excluding those patterns that occur in only a few cognate sets, the core of regularly recurring sound correspondences can be inferred. Based on this idea, the article presents a method for automatic correspondence pattern recognition, which is implemented as part of a Python library which supplements the article. To illustrate the usefulness of the method, I present how the inferred patterns can be used to predict words that have not been observed before.|000|correspondence patterns, inference, automatic approach, sound correspondences, 5119|Jarvis2019|Although language, and therefore spoken language or speech, is often considered unique to humans, the past several decades have seen a surge in nonhuman animal studies that inform us about human spoken language. Here, I present a modern, evolution-based synthesis of these studies, from behavioral to molecular levels of analyses. Among the key concepts drawn are that components of spoken language are continuous between species, and that the vocal learning component is the most specialized and rarest and evolved by brain pathway duplication from an ancient motor learning pathway. These concepts have important implications for understanding brain mechanisms and disorders of spoken language.|000|spoken language, language origin, vocalization, animal language, 5120|Hagoort2019|In this Review, I propose a multiple-network view for the neurobiological basis of distinctly human language skills. A much more complex picture of interacting brain areas emerges than in the classical neurobiological model of language. This is because using language is more than single-word processing, and much goes on beyond the information given in the acoustic or orthographic tokens that enter primary sensory cortices. This requires the involvement of multiple networks with functionally nonoverlapping contributions.|000|neurolinguistics, human language, language origin, neurobiology, 5121|Gutaker2019|Potato, one of the most important staple crops, originates from the highlands of the equatorial Andes. There, potatoes propa- gate vegetatively via tubers under short days, constant throughout the year. After their introduction to Europe in the sixteenth century, potatoes adapted to a shorter growing season and to tuber formation under long days. Here, we traced the demo- graphic and adaptive history of potato introduction to Europe. To this end, we sequenced 88 individuals that comprise land- races, modern cultivars and historical herbarium samples, including specimens collected by Darwin during the voyage of the Beagle. Our findings show that European potatoes collected during the period 1650–1750 were closely related to Andean land- races. After their introduction to Europe, potatoes admixed with Chilean genotypes. We identified candidate genes putatively involved in long-day pre-adaptation, and showed that the 1650–1750 European individuals were not long-day adapted through previously described allelic variants of the CYCLING DOF FACTOR1 gene. Such allelic variants were detected in Europe during the nineteenth century. Our study highlights the power of combining contemporary and historical genomes to understand the complex evolutionary history of crop adaptation to new environments.|000|potato, Europe, plants, plants genetics, 5122|Hellstroem2019|This study investigates early employments of family trees in the modern sciences, in order to historicise their iconic status and now established uses, notably in evolutionary biology and linguistics. Moving beyond disciplinary accounts to consider the wider cultural background, it examines how early uses within the sciences transformed family trees as a format of visual representation, as well as the meanings invested in them. Historical writing about trees in the modern sciences is heavily tilted towards evolutionary biology, especially the iconic diagrams associated with Darwinism. Trees of Knowledge shifts the focus to France in the wake of the Revolution, when family trees were first put to use in a number of disparate academic fields. Through three case studies drawn from across the disciplines, it investigates the simultaneous appearance of trees in natural history, language studies, and music theory. Augustin Augier’s tree of plant families, Félix Gallet’s family tree of dead and living languages, and Henri Montan Berton’s family tree of chords served diverse ends, yet all exploited the familiar shape of genealogy. While outlining how genealogical trees once constituted a more general resource in scholarly knowledge production—employed primarily as pedagogical tools—this study argues that family trees entered the modern sciences independently of the evolutionary theories they were later made to illustrate. The trees from post-revolutionary France occasionally charted development over time, yet more often they served to visualise organic hierarchy and perfect order. In bringing this neglected history to light, Trees of Knowledge provides not only a rich account of the rise of tree thinking in the modern sciences, but also a pragmatic methodology for approaching the dynamic interplay of metaphor, visual representation, and knowledge production in the history of science.|000|history of science, family tree, stemmatics, linguistics, biology 5123|Francois2019|This paper proposes a new approach for collecting lexical and grammatical data: one that meets the need to control the features to be elicited, while ensuring a fair level of idiomaticity. The method, called conversational questionnaires, consists in eliciting speech not at the level of words or of isolated sentences, but in the form of a chunk of dialogue. Ahead of fieldwork, a number of scripted conversations are written in the area’s lingua franca, each anchored in a plausible real-world situation – whether universal or culture-specific. Native speakers are then asked to come up with the most naturalistic utterances that would occur in each context, resulting in a plausible conversation in the target language.|000|questionnaire, conversation, comparative concept, 5124|Hulden2017|This article presents a selection of methods to analyse, compare, verify and for- mally prove properties about phonological generalisations. Drawing from both well-known and recent results in the domains of model checking and automata theory, a useful methodology for automating the task of comparing analyses and inventing counterexamples is explored. The methods are illustrated by practical case studies that are intended to both resolve concrete issues and be representative of typical techniques and results.|000|phonology, generalization, phonological theory, quantitative analysis, 5125|Muellner2018|The fastcluster package is a C++ library for hierarchical, agglomerative clustering. It efficiently implements the seven most widely used clustering schemes: single, com- plete, average, weighted/mcquitty, Ward, centroid and median linkage. The library currently has interfaces to two languages: R and Python/SciPy. Part of the function- ality is designed as drop-in replacement for existing routines: linkage in the SciPy package scipy.cluster.hierarchy, hclust in R’s stats package, and the flashClust package. Once the fastcluster library is loaded at the beginning of the code, every pro- gram that uses hierarchical clustering can benefit immediately and effortlessly from the performance gain. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide.|000|clustering, software, partitioning, Python 5126|Dong2015|Elastic words are those whose length can vary between monosyllabic and disyllabic, without changing the meaning. Though elastic words have known to be many in Chinese, it is still not clear how many words are elastic. In addition, there is no consensus on the motivation of creating elastic words. This dissertation offers a complete annotation of elastic words in modern Standard Chinese and sample annotations of Middle Chinese, and investigates why elastic words are created. Specifically, it examines four properties of elastic words focusing on the homophone-avoidance theory and the prosody theory. The former, by far the most popular one, proposes that disyllabic words are created to reduce homophony and avoid ambiguity after massive syllable loss. In contrast, the prosody theory proposes that elastic words are created because disyllabic words are needed in prosodically strong positions, due to the requirement of Foot Binarity. First, a study examines the relation between homophony and elastic words, based on a complete length elasticity annotation of Modern Chinese Dictionary (2005). Results show that there is no correlation between homophony and elastic words. The second study examines the effect of word category on elastic words in modern Standard Chinese. Results show that (i) half of words in Chinese lexicon are elastic; (ii) content words have higher percentage of elastic words than function words. The third study examines the historical development of elastic words, with a focus on Middle Chinese, especially Tang poems. Results show that there are many elastic words in Middle Chinese, similar to that in Modern Chinese. The fourth study examines word length in Chinese dialects, focusing on Mandarin and Cantonese. Results show that they have similar percentages of disyllabic words and that the size of syllable inventory has no effect on word length. Various evidence consistently points to the conclusion that the prosody theory offers a better explanation of why elastic words are created in Chinese, despite of the fact that the homophone-avoidance theory seems quite intuitive and natural. In other words, elastic words are created to fulfill prosodic requirement rather than to compensate for syllable loss or an increase in homophony.|000|elastic words, Chinese, Mandarin, homophony avoidance, homophony, polysyllabification, 5127|Demoule2019|Professeur émérite de protohistoire européenne à l’université Paris-1 Panthéon- Sorbonne, Jean-Paul Demoule est archéologue et préhistorien. Il est l’auteur de la somme Mais où sont passés les Indo-Européens ? publiée au Seuil en 2014. Avec une forte distance critique et non sans une certaine ironie, cet ouvrage examine les origines, les multiples formes et les enjeux politiques de l’idée indo-européenne qui « hante l’histoire de l’Europe », de la fin du XVIII e siècle à nos jours. Cet entretien est l’occasion de revenir sur le parcours de Jean-Paul Demoule et sur un malentendu auquel son ouvrage a pu donner lieu : s’il y a bien un « mythe » indo-européen, il ne s’agit pas de nier l’existence d’une « famille » de langues indo-européennes dont les ressemblances doivent être interprétées à partir de modèles complexes et pluridisciplinaires. C’est parce qu’elle ne peut être réglée par une seule discipline, fût-elle l’archéologie, la linguistique, l’histoire ou la génétique, que la question indo-européenne suscite des interrogations méthodologiques stimulantes et difficiles, mais se prête aussi aux réappropriations idéologiques.|000|interview, Indo-European, archaelogy, debate, historical linguistics, philosopy of science, 5128|Pyysalo2019a|Proto-Indo-European Lexicon (PIELex) is the generative etymological dictionary of Indo- European (IE) languages at http://pielexicon.hum.helsinki.fi. It is the first dictionary in the world capable of mechanically generating its data entries, i.e. the lexical stems of more than 120 of the most archaic IE languages. In addition, in order to solve the reverse process work has already begun on the problem of the mechanical generation of Proto-Indo-European (PIE) from the IE data,. The plan of the project as a whole is to run PIE Lexicon using an operating system (OS), a computer, under which the dictionary and its data are exclusively governed by smart features ranging from semantics to morphology, and the very root structure of Proto- Indo-European itself. In principle PIE Lexicon is compatible with all digitized etymological dictionaries of IE languages, and as the operating system is scientifically neutral, material of any language or language family can be implemented onto the platform. By outlining the key features of the future coding plan we hope to offer ideas, assistance and support for other enterprises in the field of electronic lexicography|000|finite state transducer, Indo-European, linguistic reconstruction, etymological dictionary, 5129|Scott2019|Human speech perception is a paradigm example of the complexity of human linguistic processing; however, it is also the dominant way of expressing vocal identity and is critically important for social interactions. Here, I review the ways that the speech, the talker, and the social nature of speech interact and how this may be computed in the human brain, using models and approaches from nonhuman primate studies. I explore the extent to which domain-general approaches may be able to account for some of these neural findings. Finally, I address the importance of extending these findings into a better understanding of the social use of speech in conversations.|000|sociolinguistics, review, human language, human brain, pragmatics 5130|Pylkkaenen2019|Human language allows us to create an infinitude of ideas from a finite set of basic building blocks. What is the neurobiology of this combinatory system? Research has begun to dissect the neural basis of natural language syntax and semantics by analyzing the basics of meaning composition, such as two-word phrases. This work has revealed a system of composition that involves rapidly peaking activity in the left anterior temporal lobe and later engagement of the medial prefrontal cortex. Both brain regions show evidence of shared processing between comprehension and production, as well as between spoken and signed language. Both appear to compute meaning, not syntactic structure. This Review discusses how language builds meaning and lays out directions for future neurobiological research on the combinatory system.|000|neurobiology, neurology, neurolinguistics, semantics, review, combinatory system, syntax 5131|Ottoni2017|The cat has long been important to human societies as a pest-control agent, object of symbolic value and companion animal, but little is known about its domestication process and early anthropogenic dispersal. Here we show, using ancient DNA analysis of geographically and temporally widespread archaeological cat remains, that both the Near Eastern and Egyptian populations of Felis silvestris lybica contributed to the gene pool of the domestic cat at different historical times. While the cat’s worldwide conquest began during the Neolithic period in the Near East, its dispersal gained momentum during the Classical period, when the Egyptian cat successfully spread throughout the Old World. The expansion patterns and ranges suggest dispersal along human maritime and terrestrial routes of trade and connectivity. A coat-colour variant was found at high frequency only after the Middle Ages, suggesting that directed breeding of cats occurred later than with most other domesticated animals.|000|cat, genetics, cat dispersal, animal evolution, population genetics, 5132|Spyrou2019|The second plague pandemic, caused by Yersinia pestis, devastated Europe and the nearby regions between the 14 th and 18 th centuries AD. Here we analyse human remains from ten European archaeological sites spanning this period and reconstruct 34 ancient Y. pestis genomes. Our data support an initial entry of the bacterium through eastern Europe, the absence of genetic diversity during the Black Death, and low within-outbreak diversity thereafter. Analysis of post-Black Death genomes shows the diversification of a Y. pestis lineage into multiple genetically distinct clades that may have given rise to more than one disease reservoir in, or close to, Europe. In addition, we show the loss of a genomic region that includes virulence-related genes in strains associated with late stages of the pandemic. The deletion was also identified in genomes connected with the first plague pandemic (541–750 AD), suggesting a comparable evolutionary trajectory of Y. pestis during both events.|000|plague, phylogeography, plague dispersal, archaeogenetics, 5133|BuhrmannDeever2007|Introduced feral populations offer a unique opportunity to study the effects of social interaction and founder effects on the development of geographic variation in learned vocalizations. Introduced populations of Monk Parakeets (Myiopsitta monachus) have been growing in number since the 1970s, with a mixture of isolated and potentially interacting populations. We surveyed diversity in contact calls of Monk Parakeet populations in Connecticut, Texas, Florida, and Louisiana. Contact call structure differed significantly among the isolated populations in each state. Contact call structure also differed significantly among potentially interacting nest colonies in coastal Connecticut, and these differences did not follow a geographic gradient. Limited dispersal distances, founder effects, and social learning preferences may play a role in call structure differences.|000|vocalization, contact call, Monk Parakeet, founder effect 5134|DeJesus2019|Scientific communication poses a challenge: To clearly highlight key conclusions and implications while fully acknowledging the limitations of the evidence. Although these goals are in principle compatible, the goal of conveying complex and variable data may compete with reporting results in a digestible form that fits (increasingly) limited publication formats. As a result, authors’ choices may favor clarity over complexity. For example, generic language (e.g., “Introverts and extraverts require different learn- ing environments”) may mislead by implying general, timeless conclusions while glossing over exceptions and variability. Using generic language is especially problematic if authors overgeneral- ize from small or unrepresentative samples (e.g., exclusively West- ern, middle-class). We present 4 studies examining the use and implications of generic language in psychology research articles. Study 1, a text analysis of 1,149 psychology articles published in 11 journals in 2015 and 2016, examined the use of generics in titles, research highlights, and abstracts. We found that generics were ubiquitously used to convey results (89% of articles included at least 1 generic), despite that most articles made no mention of sample demographics. Generics appeared more frequently in shorter units of the paper (i.e., highlights more than abstracts), and generics were not associated with sample size. Studies 2 to 4 (n = 1,578) found that readers judged results expressed with generic language to be more important and generalizable than findings expressed with nongeneric language. We highlight po- tential unintended consequences of language choice in scientific communication, as well as what these choices reveal about how scientists think about their data.|000|generic language, scientific practice, scientific communication, popular science, 5135|Campbell1998|I am not a Nostraticist; rather, I have been involved primarily with American Indian and Uralic linguistics, and also with methods of distant genetic relationship. Nevertheless, my work has been mentioned in recent Nostratic literature several times, and I therefore take this as an invitation to express my own views and why I hold them concerning the Nostratic hypothesis. In fact, I have until now purposefully avoided taking a public stand on Nostratic, since it involves many different language families and a full-scale evaluation of it would be an enormous enterprise. Still, while I have tried to remain open-minded with regard to the Nostratic hypothesis, my reading of the Nostratic literature has left me with questions, reservations, and doubts of an empirical nature, i.e. with misgivings based on the evidence that has been presented for the Nostratic hypothesis and not in any way attributable to any prejudice or preconceived bias on my part. Therefore, in response to the commentary on my work, I present in this paper a personal assessment of the Nostratic hypothesis, accompanied by a brief reply to Nostraticist claims about American Indian linguists and their work.|000|review, critics, Nostratic, comparative method, 5136|Chen2018|We analyze the complexity of the problem of determining whether a set of phonemes forms a natural class and, if so, that of finding the minimal feature specification for the class. A standard assumption in phonology is that find- ing a minimal feature specification is an au- tomatic part of acquisition and generalization. We find that the natural class decision problem is tractable (i.e. is in P), while the minimiza- tion problem is not; the decision version of the problem which determines whether a natural class can be defined with k features or less is N P-complete. We also show that, empir- ically, a greedy algorithm for finding minimal feature specifications will sometimes fail, and thus cannot be assumed to be the basis for hu- man performance in solving the problem.|000|feature minimization, distinctive features, automatic approach, phonological theory, 5137|Silfverberg2018|Vector space models of words in NLP— word embeddings—have been recently shown to reliably encode semantic infor- mation, offering capabilities such as solv- ing proportional analogy tasks such as man:woman::king:queen. We study how well these distributional properties carry over to similarly learned phoneme embed- dings, and whether phoneme vector spaces align with articulatory distinctive features, using several methods of obtaining such continuous-space representations. We demonstrate a statistically significant cor- relation between distinctive feature spaces and vector spaces learned with word- context PPMI+SVD and word2vec, show- ing that many distinctive feature contrasts are implicitly present in phoneme distribu- tions. Furthermore, these distributed rep- resentations allow us to solve proportional analogy tasks with phonemes, such as p is to b as t is to X, where the solution is that X = d. This effect is even stronger when a supervision signal is added where we extract phoneme representations from the embedding layer of an recurrent neu- ral network that is trained to solve a word inflection task, i.e. a model that is made aware of word relatedness.|000|embedding, phoneme embeddings, Word2Vec, 5138|Salali2019|High-fidelity transmission of information through imitation and teaching has been proposed as necessary for cumulative cultural evolution. Yet, it is unclear when and for which knowledge domains children employ different social learning processes. This paper explores the development of social learning processes and play in BaYaka hunter-gatherer children by analysing video recordings and time budgets of children from early infancy to adolescence. From infancy to early childhood, hunter- gatherer children learn mainly by imitating and observing others’ activities. From early childhood, learning occurs mainly in playgroups and through practice. Throughout childhood boys engage in play more often than girls whereas girls start foraging wild plants from early childhood and spend more time in domestic activities and childcare. Sex differences in play reflect the emergence of sexual division of labour and the play-work transition occurring earlier for girls. Consistent with theoretical models, teaching occurs for skills/knowledge that cannot be transmitted with high fidelity through other social learning processes such as the acquisition of abstract information e.g. social norms. Whereas, observational and imitative learning occur for the transmission of visually transparent skills such as tool use, foraging, and cooking. These results suggest that coevolutionary relationships between human sociality, language and teaching have likely been fundamental in the emergence of human cumulative culture.|000|hunter gatherers, social learning, cultural evolution, 5139|Heusinger2004|In Spanish, the direct object can be marked or not by the marker a. The marker a is obligatory, optional or ungrammatical, depending on a variety of parameters. These parameters are the object of controversial discussions and of an immense descriptive and functional literature, often under the heading of “prepositional accusative” since the marker is homonym with the preposition a ‘to’. The prepositional accusative is discussed in the context of Spanish grammar (see Torrego Salcedo 1999 for an overview), in the broader context of Romance languages (see Rohlfs 1971, Bossong 1998) and from an even broader typological perspective that discusses the “prepositional” accusative in Spanish as an instance of Differential Object Marking or DOM, which is a widespread phenomenon among the languages of the world (Lazard 1984, Bossong 1985, Aissen 2003 among others). Bossong (1985) and others assume three main parameters that determine whether or not a direct object is marked: (i) animacy, (ii) referential category, and (iii) topicality. Animacy and referential category form each a scale with different values. Topicality is generally described as a simple feature ±top. DOM-languages differ with respect to which parameters and to which particular (transition) point on the relevant scale they are sensitive to.|000|differential object marking, Spanish, summary 5140|Paridon2019|This paper introduces a collection of vector- embeddings models of lexical semantics in 55 languages, trained on a large corpus of pseudo-conversational speech transcriptions from television shows and movies. The models were trained on the OpenSubtitles corpus using the fastText implementation of the skipgram al- gorithm. Performance comparable with (and in some cases exceeding) models trained on non-conversational (Wikipedia) text is reported on standard benchmark evaluation datasets. A novel evaluation method of par- ticular relevance to psycholinguists is also introduced: prediction of experimental lexical norms in multiple languages. The models, as well as code for reproduc- ing the models and all analyses reported in this pa- per (implemented as a user-friendly Python package), are freely available at: https://github.com/jvparidon/ subs2vec/.|000|speech norms, Word2Vec, dataset, word embeddings, cross-linguistic study, 5141|Shlyahova2019|The article reveals the current trends in the study of sound symbolism in foreign discourse in the con- text of NBIC technologies. It has been established that studies of sound symbolism in recent decades have intensified. In the search for sound symbolic universals, there is an increase in the number of matching lan- guages at the expense of Big Data. The author of the study can embrace 62 % of the world's languages, cov- ering 85 % of language families (Automated Similarity Judgment Program (ASJP), CLLD-Concepticon, etc.), which allows to increase the volume of a material, velocity, veracity and value of research, to take into account the factors of variety and variability of data analysis, and visualize data. Further, the works devoted to the natural basis of sound symbols, including at the level of synesthesia and cross-modal effects, are considered. In modern studies, classical hypotheses of the imitative and gestural origin of the language are actualized; sound symbolism is understood as a pre-semantic phenomenon (originated in the very early stages of lan- guage development). It has been established that in the perception of sound symbolism, the order and time in the presentation of visual and auditory stimuli are essential. Studies of sound symbolism in their nature are often pragmatic and focused on the manipulation of consciousness in the field of mass pragmatic communica- tions, which makes it possible to consider sound symbolism as an essential part of NBIC-technology. The sound symbolic characteristics of language units make it possible to simulate text with predetermined seman- tic and pragmatic parameters.|000|sound symbolism, review, overview, cross-linguistic study, 5142|Starostin2016d|Один из наиболее интересных моментов здесь — опознание и ка- талогизирование тех графических вариантов одного и того же слова, которые оказываются дополнительно распределены относительно фо- нетического контекста, т. е. по сути являются графическим отражением речевых сандхи. :translation:`[not literal] One very interesting aspect here is detection and catalogisation of graphic variants of one and the same word, which seem to be different enough with respect to their phonetic context, that is, they reflect the graphical representation of speech sandhi.`|399|Chinese writing system, sandhi, 5143|Desagulier2019|Two recent methods based on distributional semantic models (DSMs) have proved very successful in learning high-quality vector representations of words from large corpora: word2vec and GloVe. Once trained on a very large corpus, these algorithms produce distributed representations for words in the form of vectors. DSMs based on deep learning and neural networks have proved efficient in representing the meaning of individual words. In this paper, I assess to what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes. Although word vectors suggest exciting oppor- tunities for resolving semantic annotation issues, there is still room for improvement in terms of the representation of polysemy, homonymy, and multiword expressions.|000|corpus annotation, semantic annotation, corpus studies, Word2Vec, machine learning, 5144|Desagulier2019|Based on the distributional hypothesis (@Harris1954; @Miller<1991> & Charles 1991), according to which words that appear in similar contexts have similar meanings, distributional semantic modeling contends that it is possible to approximate what humans do when they learn word meanings via similarity judgments. Distributional semantic models approximate this knowledge by operationalizing semantic similarities between linguistic units based on their distributional properties in large bodies of natural language data.|1|distribution, semantics, context, distributional semantics, 5145|Desagulier2019|To reduce the bias of a single annotator, the same table can be annotated by several researchers on the basis of a tagset that has been agreed upon beforehand. The annotation is then verified by means of Cohen’s κ (@Cohen1960, 1968), which measures the degree of agreement between several annotators.|3|Cohen's k, inter-annotator agreement, 5146|Desagulier2019|One major limitation of manual annotation is that once a coding scheme has been chosen, it cannot be amended anymore. The annotator must therefore know the data perfectly before determining what tags should be used. Because of this, human annota- tion might be considered the gold standard.|3|annotation, gold standard, corpus annotation, 5147|Desagulier2019|On the one hand, very large corpora are a good thing since they allow the linguist to investigate rare linguistic forms. One field in which size is critical is the study of produc- tivity. As described by Baayen (2001), productivity measures rely on the idea that the number of hapax legomena of a given grammatical category correlates with the number of neologisms in that category, which in turn correlates with the productivity of the rule at work. Thus, lexical productivity is a factor of both a large number of low-frequency forms and a low number of high-frequency forms.|4|Hapax legomena, productivity, corpus size, 5148|Desagulier2019|On the other hand, corpus linguists are well aware that very large corpora are difficult to handle. Any collection of texts generates some noise, i.e. unwanted data. This phenomenon is captured by the precision vs. recall trade-off. The precision of a corpus query is the proportion of relevant hits and the total number of returned occurrences. Recall is the proportion of relevant retrieved hits with respect to the total number of relevant hits in the corpus. Corpus linguists tend to maximise recall so as to avoid the unhappy situation when the query returns zero hits, a choice which compromises precision. When precision is not optimal, the linguist has to filter the output manually. The larger the corpus and the broader the query, the larger the dataset and the more tedious the clean up. If the dataset is too large, its manual annotation becomes infeasible.|4|precision, recall, corpus size, corpus annotation, 5149|Desagulier2019|For this reason, not all corpus linguists are willing to embrace the age of ‘big data’.|4|big data, corpus linguistics, corpus size, nice quote 5150|Thykostup2018|This dataset provides the first comprehensive diachronic comparison of the Vai script of Liberia, as derived from sixteen sources dated between 1834 and 2005. The Vai syllabary was invented by non-literate speakers of a Mande language and is of interest to scholars in the fields of writing and cultural transmission as an emergent writing system. Script samples that entered the dataset were retrieved from a wide variety of published manuscripts. The compiled dataset tracks the evolution of the Vai script via archival records and has reuse potential across various fields of research.|000|Vai script, writing system, Liberia, Mande 5151|Shinde2019|We report an ancient genome from the Indus Valley Civilization (IVC). The individual we sequenced fits as a mixture of people related to ancient Iranians (the largest component) and Southeast Asian hunter-gath- erers, a unique profile that matches ancient DNA from 11 genetic outliers from sites in Iran and Turkmenistan in cultural communication with the IVC. These indi- viduals had little if any Steppe pastoralist-derived ancestry, showing that it was not ubiquitous in north- west South Asia during the IVC as it is today. The Ira- nian-related ancestry in the IVC derives from a lineage leading to early Iranian farmers, herders, and hunter- gatherers before their ancestors separated, contra- dicting the hypothesis that the shared ancestry between early Iranians and South Asians reflects a large-scale spread of western Iranian farmers east. Instead, sampled ancient genomes from the Iranian plateau and IVC descend from different groups of hunter-gatherers who began farming without being connected by substantial movement of people.|000|population genetics, archaeogenetics, Indus Valley, ancient DNA, 5152|Kleinewillinghoefer2015|Research in NO Nigeria was conducted 1990-1995 within the framework of the research project Kulturentwicklung und Sprachgeschichte im Naturraum Westafrikanische Savanne (SFB 268) of the Goethe-Universität Frankfurt am Main and was generously financed by the Deutsche Forschungsgemeinschaft (DFG).|000|Bikwin Jen, Jen, comparative wordlist, Swadesh list, 5153|Yang2019a|Ground-breaking studies on how Bangkok Thai tones have changed over the past 100 years (Pittayaporn 2007, 2018; Zhu et al. 2015) reveal a pattern that Zhu et al. (2015) term the “clockwise tone shift cycle:” low > falling > high level or rising-falling > rising > falling-rising or low. The present study addresses three follow-up questions: (1) Are tone changes like those seen in Bangkok Thai also attested in other languages? (2) What other tone changes are repeated across multiple languages? (3) What phonetic biases are most likely to be the origins of the reported changes? A typological review of 52 tone change studies across 45 Sinitic, Tai-Kadai, Hmong-Mien, and Tibeto- Burman languages reveals that clockwise changes are by far the most com- mon. The paper concludes by exploring how tonal truncation (Xu 2017) generates synchronic variation that matches the diachronic patterns; this suggests that truncation is a key mechanism in tone change.|000|tone change, cross-linguistic study, 5154|Yang2019a|Ratliff (@2015: 249) sums up what many tonologists have found: [pb] “Tones in Asian languages tend to evolve rapidly and in unexpected ways” (emphasis added).|417f|tone change, challenges, nice quote, 5155|Yang2019a|Haspelmath (@2004: 20), concerning the issue of directionality constraints, notes that “diachronic phonologists would benefit enormously from a handbook of attested sound changes in the world’s languages.”|418|sound change, database, patterns of sound change, directionality 5156|Yang2019a|Pittayaporn’s (@2018: 260) tone change model, which posits that “diachronic sound changes are the result of phonologization of synchronic patterns of phonetic variation”, proposes specific phonetic and systemic biases (Garrett & Johnson 2013) that may have shaped Bangkok Thai’s changes. The present study builds on this recent work, addressing three follow-up questions: (1) Are tone changes like those seen in Bangkok Thai also attested in other languages? (2) What other tone changes are repeated across multiple languages? (3) What phonetic biases are most likely to be the origins of the reported tone changes?|418|tone change, patterns of sound change, 5157|Yang2019a|The results of the review demonstrate strong crosslinguistic trends in pho- netic tone change. Specifically, clockwise changes, the pattern observed in real time in Bangkok Thai, are by far the most common type. The crosslinguistic dom- inance of this type suggests that this pattern is not language-specific, but rather is attributable to language-general mechanisms of tone production and perception.|418|tone change, cross-linguistic study, patterns of sound change, 5158|Yang2019a|Bangkok Thai holds a unique position in tone change studies, as one of the few tonal languages with recordings dating back over a century, with the earliest acoustic evidence coming from Bradley (1911). Pittayaporn (2007, 2018) and Zhu et al. (2015) examine changes in the phonetic values of Bangkok Thai tones from the past century to today. Examining changes as they have occurred in real time (i.e., analyzing data collected at multiple points in a given time period) pro- vides the most direct and reliable evidence for the directionality of change (Bybee 2015: 9).|420|language change in progress, tone change, 5159|Pittayaporn2018|[Figure taken from @Yang2019a page 420] .. image:: static/img/Pittayaporn2018-259.png :width: 80% Tone change in Bangkok Thai|259|Thai, Bangkok Thai, tone change, chain shift 5160|Yang2019a|Pittayaporn (2007, @2018) proposes several phonetic and systemic biases that influence tone change directionality: Phonetic biases 1. Peak delay: tone peaks tend to slide rightward rather than leftward. 2. Contour reduction: F 0 excursion of tonal contours tends to decrease; affects pitch offsets only. 3. Segment-tone interaction: an initial segment’s interaction with tone affects pitch onsets only. Systemic biases 1. Contour maximization: tonal variants with greater F 0 excursions will be selected for phonologization; affects pitch onsets only. 2. Contour accentuation: a new feature, such as an initial drop before a rise, is selected to enhance the auditory distinctiveness of a tone. 3. Avoidance of similar tones: tones within a tone system tend to be dispersed in phonetic space so as to maximize perceptual contrast.|423|tone change, directionality, factors, 5161|Yang2019a|Studies with two methodological approaches are included: (1) apparent-time studies and (2) analy- ses of tonal systems that compare their findings with previous (usually decades earlier) analyses of the same location.|427|tone change, study design 5162|Rojas2019|Pronouns as a diagnostic feature of language relatedness have been widely explored in historical and comparative linguistics. In this article, we focus on South American pronouns, as a potential example of items with their own history passing between the boundaries of language families, what has been dubbed in the literature as ‘historical markers’. Historical markers are not a direct diagnostic of genealogical relatedness among languages, but account for phenomena beyond the grasp of the histor- ical comparative method. Relatedness between pronoun systems can thus serve as suggestions for closer studies of genealogical relationships. How can we use computational methods to help us with this process? We collected pronouns for 121 South American languages, grouped them into classes and aligned the phonemes within each class (assisted by automatic methods). We then used Bayesian phylo- genetic tree inference to model the birth and death of individual phonemes within cognate sets, rather than the typical practice of modelling whole cognate sets. The reliability of the splits found in our ana- lysis was low above the level of language family, and validation on alternative data suggested that the analysis cannot be used to infer general genealogical relatedness among languages. However, many results aligned with existing theories, and the analysis as a whole provided a useful starting point for fu- ture analyses of historical relationships between the languages of South America. We show that using automated methods with evolutionary principles can support progress in historical linguistics research.|000|pronoun systems, lexical database, South American languages, computer-assisted analysis, 5163|Narasimhan2019|By sequencing 523 ancient humans, we show that the primary source of ancestry in modern South Asians is a prehistoric genetic gradient between people related to early hunter-gatherers of Iran and Southeast Asia. After the Indus Valley Civilization’s decline, its people mixed with individuals in the southeast to form one of the two main ancestral populations of South Asia, whose direct descendants live in southern India. Simultaneously, they mixed with descendants of Steppe pastoralists who, starting around 4000 years ago, spread via Central Asia to form the other main ancestral population. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the distinctive features shared between Indo-Iranian and Balto-Slavic languages.|000|South Asia, peopling of South Asia, population genetics, ancient DNA, Central Asia 5164|Ceolin2019|Historical linguists have been debating for decades about whether the classical comparative method provides sufficient evidence to consider Altaic languages as part of a single genetic unity, like Indo-European and Uralic, or whether the implicit statistical robustness behind regular sound correspondences is lacking in the case of Altaic. In this paper, I run a significance test on Swadesh-lists representing Turkish, Mongolian and Manchu to see if there are regular patterns of phonetic similarities or correspondences among word-initial phonemes in the basic vocabulary that cannot be expected to have arisen by chance. The methodology draws on Oswalt (1970), Ringe (1992), Baxter & Manaster Ramer (2000) and Kessler (2001, 2007). The results only partially point towards an Altaic family: Mongolian and Manchu show significant sound correspondences, while Turkish and Mongolian show some marginally significant phonological similarity, that might however be the consequence of areal contact. Crucially, Turkish and Manchu do not test positively under any condition.|000|significance, Altaic, sound correspondences, 5165|Koplenig2019b|In the first volume of Corpus Linguistics and Linguistic Theory, Gries (2005. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). doi:10.1515/ cllt.2005.1.2.277. http://www.degruyter.com/view/j/cllt.2005.1.issue-2/cllt.2005. 1.2.277/cllt.2005.1.2.277.xml: 285) asked whether corpus linguists should aban- don null-hypothesis significance testing. In this paper, I want to revive this discussion by defending the argument that the assumptions that allow infer- ences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled. As a consequence, corpus linguists should indeed abandon null-hypothesis significance testing.|000|statistics, historical linguistics, corpus linguistics, significance, significance testing 5166|Koplenig2019b|In this paper, I want to revive this discussion by defending the argument that the assumptions that allow inferences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled.|321|significance testing, significance, corpus linguistics, historical linguistics, linguistic typology, critics 5167|Koplenig2019b|As [pb] a consequence, corpus linguists should indeed abandon null-hypothesis signifi- cance testing.|321f|corpus linguistics, significance testing, significance, statistics 5168|Koplenig2019b|Many empirical research projects face the problem that it is not possible or far too expensive to study the whole population, i.e. all objects of interest, e.g. all citizens of a country, all animals of a given species or all stars in the Milky Way. Fortunately, it is not necessary to investigate all items of the population in the majority of situations. The main idea behind statistical frequentist inference is to use the distributional information from a sample of objects to estimate the characteristics of the unknown population from where the sample was taken. This is possible because under certain circumstances probability theory can be used to show that the distribution function of the population can be approxi- mated by the distribution function of the sample (Jann 2005: 124–127).|322|statistics, frequentist statistics, 5169|Koplenig2019b|The theory behind this rests on the assumption that the elements of the sample are chosen randomly from the population. And, as Berk and Freedman (2003: 2) put it: “Conventional statistical inferences (e.g., formulas for the standard error of the mean, t -tests, etc.) depend on the assumption of random sampling. This is not a matter of debate or opinion; it is a matter of mathematical necessity.”|322|randomness, statistical inference, significance testing, 5170|Koplenig2019b|The idea of statistical signifi- cance follows from this argument: A result found in a sample is considered statistically significant if the probability of observing such an effect (given that it does not actually exist in the population of interest) is smaller than or equal to a chosen level of significance, for example 5 %. In our example, this means that the number of samples in which we find an apparent relationship must not exceed 50,000 of all 1,000,000 samples. The p-value in this context is the probability of observing a result that is equal, or even more extreme, than the one we found in our sample, given the fact that there is actually no relationship in the population.|324|p-value, significance, significance testing, statistical inference, introduction, 5171|Koplenig2019b|This leads us to the general form of a null hypothesis statistical significance test: a result based on a random sample is called statistically significant if the probability of observing the data plus more extreme data in all potential randomly drawn samples of the same sample size is lower than some pre- selected threshold (e.g. 1 % or 5 %) if the null hypothesis is true and if the assumptions in the statistical test are all satisfied (Schneider 2013).|325|significance testing, null hypothesis, statistical inference, introduction, terminology, 5172|Koplenig2019b|In general, the probability of observing the data under the assumption that the null hypothesis is true depends on: (i) the magnitude of the observed difference (also called effect size) and (ii) the sample size.|326|significance testing, sample size, effect size, introduction 5173|Koplenig2019b|A (synchronic) corpus can be defined as: “a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety” (McEnery et al. 2006: 5; for a similar definition see; Gilquin and Gries 2009: 6).|327|definition, corpus, corpus linguistics 5174|Koplenig2019b|In an influential paper, @Kilgarriff<2005> (2005: 273) argued that: “Language users never choose words randomly, and language is essentially non-random. Statistical hypothesis testing uses a null hypothesis, which posits randomness. Hence, when we look at linguistic phenomena in corpora, the null hypothesis will never be true.”|330|randomness, significance testing, language data, linguistic data, 5175|Biber1993|Whether or not a sample is ‘representative’, however, depends first of all on the extent to which it is selected from the range of text types in the target population; an assessment of this representativeness thus depends on a prior full definition of the ‘population’ that the sample is intended to represent, and the techniques used to select the sample from that population. :comment:`Quoted after` @Koplenig2019a (335) |243|representativity, sampling, representative samples, statistical inference, corpus linguistics, 5176|Koplenig2019b|From a statistical point of view, balancing is problematic because it is subjective per definition - different researchers might [pb] have different opinions on defining different registers and sub-registers as well as genres and subgenres that constitute different sections of our imaginary library.|336f|balanced sample, corpus linguistics, problems, sampling, significance, 5177|Koplenig2019b|To recapitulate the argument outlined in the preceding sections, probability theory helps us to quantify the amount of uncertainty that results from the data collection process. Or put differently, it gives us an idea of what is likely happen were the study to be repeated, without actually having to repeat it. However, this comes at the price of rather restrictive assumptions: (i) the researcher must be able to define the (existing) population of interest (in a non-imaginary way) from which the data are assumed to be (ii) a random sample. For corpus linguistics, both assumptions are not met.|338|significance testing, sampling, corpus linguistics 5178|Koplenig2019b|If a result holds true across different corpora and – even better – for different types of linguistic data, we can use this form of converging evidence to cau- tiously postulate a general relationship – maybe even for the language as a whole.|339|resilience, resilience theory, cumulative evidence, converging evidence, corpus linguistics 5179|Koplenig2019b|This is why the fact that corpora are not representative in a statistical sense does not render the use of quantitative statistical models useless, because the fitting procedure and its inherent logic have nothing to do with statistical inference. This can be easily seen by recalling that, in this example, we have collected data for all members of our population of interest (i.e. German chan- cellors), so there is no inference at all. However, the regression analysis still helps us to appropriately describe a potential relationship found in the data and, converging evidence again, can be used for predictions.|341|exploratory data analysis, significance testing, corpus linguistics, 5180|Koplenig2019b|Or, take the statement from the ASA, the world’s largest professional asso- ciation of statisticians published in 2016: “The widespread use of ‘statistical significance’ (generally interpreted as “p ≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process” (Wasserstein and Lazar 2016: 9).|342|significance testing, problem, nice quote, 5181|Zwicky1976|a. One (or both) of the matched vowels is unstressed [...]. This is light or tangential, rhyme [...]. b. The stressed syllables match, but following unstressed syllables do not. Usually, onee word has an additional syllable lacking in the other [...]. This is apocopated rhyme [...]. c. The stressed vowels do not match, though the following consonants do [...]. This is consonance [...]. d. The stressed vowels match, but the following consonants do not [...]. This is assonance [...].|676|English, poetry, rhymes, impure rhymes, definition, terminology 5182|Zwicky1976|However, they are not equally common, assonance being by far the most frequent poetic device of the four.|676|English, assonance, impure rhymes, 5183|Zwicky1976|[...] the traditional classification into assonance and consonance is not particularly useful in the analysis of popular verse, which instead can be referred to two major principles: a. Feature rhyme: segments differing minimally in phonological features count as rhyming. [...] b. Subsequence rhyme: X counts as rhyming with XC, where C is a consonant (X may end with a consonant itself [...]). |677|English, rhyme analysis, terminology, impure rhymes 5184|Zwicky1976|Imprefect rhymes can also be linked in a chain: X is rhymed (imprefectly) with Y, and Y with Z, so that X and Z may count as rhymes thanks to the mediation of Y, even when X and Z satisfy neither the feature nor the subsequence principle. |677|rhyme chain, intransitive rhyming, definition, rhyming, impure rhymes, 5185|Fox1995|[...] there is [...] little in semantic change which bears any relationship to regularity in phonological change.|111|semantic change, problems, 5186|Finley2017|Analogy completion via vector arithmetic has become a common means of demonstrating the compositionality of word embeddings. Previous work have shown that this strategy works more reliably for certain types of analogical word relationships than for others, but these studies have not offered a convincing account for why this is the case. We arrive at such an account through an experiment that targets a wide variety of analogy questions and defines a baseline condition to more accurately measure the efficacy of our system. We find that the most reliably solvable analogy categories involve either 1) the application of a morpheme with clear syntactic effects, 2) male–female alternations, or 3) named entities. These broader types do not pattern cleanly along a syntactic–semantic divide. We suggest instead that their commonality is distributional, in that the difference between the distributions of two words in any given pair encompasses a relatively small number of word types. Our study offers a needed explanation for why analogy tests succeed and fail where they do and provides nuanced insight into the relationship between word distributions and the theoretical linguistic domains of syntax and semantics.|000|machine learning, word2vec, problems, analogies, NLP 5187|Levy2016|Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks. We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.|000|distribution semantics, Word2Vec, word embeddings, problems, NLP, critics, 5188|Joo2019|Based on the vocabulary of 66 genealogically distinct languages, this study reveals the biased association between phonological features and the 100 lexical meanings of the Leipzig-Jakarta List. Morphemes whose meanings are related to round shapes (‘egg’, ‘navel’, ‘neck’, and ‘knee’) tend to contain phon- emes that bear the [+round] feature. Also observable is the positive association between buccal actions and the phonological features they resemble (‘to blow’ with [+labial] and ‘to suck’ with [+delayed release]). Grammatical morphemes related to proximity (‘this’, ‘in’, 1 SG and 2 SG pronoun) are positively associated with [+nasal]. The phonosemantic patterns found in the most basic vocabulary of spoken languages further confirm that the sound-meaning association in natural languages is not completely arbitrary but may be motivated by human cognitive biases.|000|phonosemantic bias, sound symbolism, iconicity, wordlist, 5189|Warren2019|Where DNA degrades, proteins might persist. So scientists are looking to fill in early human history using some very old amino acids.|000|popular science, ancient DNA, ancient proteins, archaeogenetics, 5190|Warinner2014|Milk is a major food of global economic importance, and its consumption is regarded as a classic example of gene-culture evolution. Humans have exploited animal milk as a food resource for at least 8500 years, but the origins, spread, and scale of dairying remain poorly understood. Indirect lines of evidence, such as lipid isotopic ratios of pottery residues, faunal mortality profiles, and lactase persistence allele frequencies, provide a partial picture of this process; however, in order to understand how, where, and when humans consumed milk products, it is necessary to link evidence of consumption directly to individuals and their dairy livestock. Here we report the first direct evidence of milk consumption, the whey protein b-lactoglobulin (BLG), preserved in human dental calculus from the Bronze Age (ca. 3000 BCE) to the present day. Using protein tandem mass spectrometry, we demonstrate that BLG is a species-specific biomarker of dairy consumption, and we identify individuals consuming cattle, sheep, and goat milk products in the archaeological record. We then apply this method to human dental calculus from Greenland’s medieval Norse colonies, and report a decline of this biomarker leading up to the abandonment of the Norse Greenland colonies in the 15 th century CE.|000|human dental calculus, ancient proteins, milk, archaeogenetics, 5191|Walker2019|Syllables show language-specific size restrictions that characterise upper limits on their constituents. Such restrictions have been characterised as syllable templates, defined by a frame such as moraic structure or CV skel- etal slots. A variety of phenomena have been attributed to enforcement of a syllable template in numerous languages, including closed syllable vowel shortening, degemination, epenthesis and deletion (Clements & Keyser 1983, Itô 1989, Archangeli 1991, Zec 1995).|458|syllable structure, syllable template, phonotactics, phonotactic restrictions, CVCV, 5192|Walker2019|Language-specific maximal size restrictions on syllables have been defined using frames such as moraic structure. In General American English, a trimoraic syllable template makes largely successful predictions about contexts where tense/lax vowel contrasts are neutralised, but neutralisation preceding a coda rhotic has not been adequately explained. We attribute the apparent special properties of coda /ɹ/ to two characteristics of its representation, informed by our articulatory investigation: sequential coordination of dorsal and coronal subsegmental units and a high blending strength specification, corresponding to high coarticulatory dominance. Characteristics of coda laterals are compared. Our approach employs phonological representations where sequencing is encoded directly among subsegments, and coordination is sensitive to strength. Mora assignment is computed over sequencing of subsegments, predicting that complex segments may be bimoraic. The account brings phonotactics for rhymes with postvocalic liquids into line with the trimoraic template, and supports representing coordination and strength at the subsegmental level.|000|mora, American English, rhotic sounds, syllable template, syllable structure, CVCV, phonology 5193|Racz2019|Languages do not replace their vocabularies at an even rate: words endure longer if they are used more frequently. This effect, which has parallels in evolutionary biology, has been demonstrated for the core vocabulary, a set of common, unrelated meanings. The extent to which it replicates in closed lexical classes remains to be seen, and may indicate how general this effect is in language change. Here, we use phylogenetic comparative methods to investigate the history of 10 kinship categories, a type of closed lexical class of content words, across 47 Indo-European languages. We find that their rate of replacement is correlated with their usage frequency, and this relationship is stronger than in the case of the core vocabulary, even though the envelope of variation is comparable across the two cases. We also find that the residual variation in the rate of replacement of kinship terms is related to genealogical distance of referent to kin. We argue that this relationship is the result of social changes and corresponding shifts in the entire semantic class of kinship terms, shifts typically not present in the core vocabulary. Thus, an understanding of the scope and limits of social change is needed to understand changes in kinship systems, and broader context is necessary to model cultural evolution in particular and the process of system change in general.|000|kinship terms, lexical replacement, Indo-European, gain-loss mapping, speech norms, 5194|Carling2019|All languages borrow words from other languages. Some languages are more prone to borrowing, while others borrow less, and different domains of the vocabulary are unequally susceptible to borrowing. Languages typically borrow words when a new concept is introduced, but languages may also borrow a new word for an already existing concept. Linguists describe two causalities for borrowing: need, i.e., the internal pressure of borrowing a new term for a concept in the language, and prestige, i.e., the external pressure of borrowing a term from a more prestigious language. We investigate lexical loans in a dataset of 104 concepts in 115 Eurasian languages from 7 families occupying a coherent contact area of the Eurasian landmass, of which Indo-European languages from various periods constitute a majority. We use a cognacy-coded dataset, which identifies loan events including a source and a target language. To avoid loans for newly introduced concepts in languages, we use a list of lexical concepts that have been in use at least since the Chalcolithic (4000–3000 BCE). We observe that the rates of borrowing are highly variable among concepts, lexical domains, languages, language families, and time periods. We compare our results to those of a global sample and observe that our rates are generally lower, but that the rates between the samples are significantly correlated. To test the causality of borrowing, we use two different ranks. Firstly, to test need, we use a cultural ranking of concepts by their mobility (of nature items) or their labour intensity and “distance-from-hearth” (of culture items). Secondly, to test prestige, we use a power ranking of languages by their socio-cultural status. We conclude that the borrowability of concepts increases with increasing mobility (nature), and with increased labour intensity and “distance-from-hearth” (culture). We also conclude that language prestige is not correlated with borrowability in general (all languages borrow, independently of prestige), but prestige predicts the directionality of borrowing, from a more prestigious language to a less prestigious one. The process is not constant over time, with a larger inequality during the ancient and modern periods, but this result may depend on the status of the data (non-prestigious languages often remain unattested). In conclusion, we observe that need and prestige compete as causes of lexical borrowing.|000|borrowing, borrowability, empirical study, 5195|Scotto2017|En este trabajo se exploran las conexiones metodológicas entre dos métodos de investigación lingüística y filológica: el de August Schleicher y el de Karl Lachmann. En ambos casos se analizan las coincidencias que habilita la utilización, por ambos académicos, del método stemmatico y sus esfuerzos por reconstruir una forma (ya sea textual o lingüística) originaria, nombrada con el prefijo Ur-. Se acompaña este análisis con un enfoque que incluye la influencia del contexto en el territorio alemán durante el siglo XIX, momento en que surgen estas metodologías disciplinares, entendiendo por “contexto” la influencia de la filosofía romántica que sostiene la búsqueda de los orígenes en un pasado perdido y reivindicado, así como la estructura académica que aportó el prestigio y la pretensión científica asignada al estudio de las Humanidades, que en ese territorio tomó la forma de la Aufklärung. Hacia el final se presentarán fundamentos que sostienen la hipótesis central de este trabajo, que es precisamente que es posible advertir en los métodos de Schleicher y Lachmann conexiones directas que vinculan tanto sus producciones académicas como las disciplinas a cuyo establecimiento institucional colaboraron. :translation:`This paper explores the methodological connections between two linguistic and philological research methods separately put forward by August Schleicher and Karl Lachmann. In both cases, we will analyze the coincidences that enabled the use of the stemmatic method by both academics and their efforts to reconstruct an original (either textual or linguistic) form, named with the prefix Ur-. This analysis also includes the influence of the cultural context in the German territory during the 19th century, the time and place where these disciplinary methodologies arose. By “context” here we mean the influence of the Romantic philosophy that sustained the search for origins in a lost and vindicated past, as well as the academic structure that brought the prestige and scientific claim assigned to the study of the Humanities, which in that territory was called Aufklärung. Towards the end, we will present the evidence that support the central hypothesis of this paper: there are indeed direct connections between the methods of Schleicher and Lachmann, connections that link both their academic productions and the disciplines that they helped to establish institutionally.`|000|August Schleicher, stemmatics, history of science, family tree, Karl Lachmann 5196|Fiske2019|Vernacular lexemes appear self-evident, so we unwittingly reify them. But the words and phrases of natural languages comprise a treacherous basis for identifying valid psychological constructs, as I illustrate in emotion research. Like other vernacular lexemes, the emotion labels in natural languages do not have definite, stable, mutually transparent meanings, and any one vernacular word may be used to denote multiple scientifically distinct entities. In addition, the consequential choice of one lexeme to name a scientific construct rather than any of its partial synonyms is often arbitrary. Furthermore, a given vernacular lexeme from any one of the world’s 7000 languages rarely maps one-to-one into an exactly corresponding vernacular lexeme in other languages. Words related to anger in different languages illustrate this. Since each language constitutes a distinct taxonomy of things in the world, most or all languages must fail to cut nature at its joints. In short, it is pernicious to use one language’s dictionary as the source of psychological constructs. So scientists need to coin new technical names for scientifically derived constructs—names precisely defined in terms of the constellation of features or components that characterize the constructs they denote. The development of the kama muta construct illustrates one way to go about this. Kama muta is the emotion evoked by sudden intensification of communal sharing— universally experienced but not isomorphic with any vernacular lexeme such as heart warming, moving, touching, collective pride, tender, nostalgic, sentimental, Awww—so cute!.|000|emotion, emotion concepts, psychology, critics 5197|Orjuela2019|Este artículo presenta una herramienta diseñada para recoger datos fértiles, verificables y comparables sobre clases de palabra en diversas lenguas. Su diseño resulta de la reflexión sobre los problemas teóricos y metodológicos que subyacen a la identificación y recolección de datos sobre las clases de palabra. La herramienta consiste en una serie de actividades lúdicas que se basan en la presentación de estímulos a grupos de hablantes de una lengua. Su objetivo es guiar al investigador en la planeación y recolección de proferencias de enunciados en contextos discursivos que contengan elementos lingüísticos que refieran a entidades, propiedades y eventos contextualizados. Para probar su viabilidad se llevaron a cabo pruebas piloto con hablantes de la lengua yuhup. Con esta herramienta se espera aportar a la investigación futura sobre clases de palabra y contribuir a la construcción de una base de datos comparable que alimente el análisis de las lenguas y la teoría lingüística. :translation:`The article presents a tool designed to gather fruitful, verifiable, and comparable data on word classes in different languages. Its design is the result of reflection on the theoretical and methodological problems underlying the identification and collection of data on word classes. The tool consists of a series of play activities based on the presentation of stimuli to groups of speakers of a language. Their objective is to guide the researcher in the planning and collection of utterances in discursive contexts that include linguistic elements referring to contextualized entities, properties, and events. Pilot tests were carried out with Yuhup language speakers in order to check the tool’s viability. With this tool, we expect to contribute to future research on word classes and to the construction of a comparable database that fosters the analysis of languages and linguistic theory.`|000|field work, tools, data collection, word classes, 5198|Lin2019|This paper investigates how prosody is hidden behind transcriptions in historical resources. Three historical sources are used in the analysis. They are Chinese transcriptions from the 15 th century in which Japanese, Korean and Ryukyuan phrases are recorded using Chinese characters. The argument concentrates on the prosodic patterns of disyllabic nouns in the three historical sources. The results of chi-square tests show that in the transcriptions Korean is significantly different from Japanese and Ryukyuan. In disyllabic nouns, the Chinese tonal category shăngshēng is favored in the first syllable of the Korean data to show changes from low to high tone. On the other hand, the transition is not salient in the Japanese and Ryukyuan data. In addition, the Chinese tonal cateogry yīnpíng is disfavored in the first syllable of the Korean data, whereas Chinese yīnpíng is not overtly excluded from the first syllable of Japanese and Ryukyuan data. This paper also discusses the projection of prosodic characteristics from Chinese onto the transcriptions: the second syllable in a disyllabic noun tends to be qùshēng.|000|transliteration, Chinese, Japanese, Korean, Chinese writing system, 5199|Joseph2019|I explore here how aware speakers are of the history of their language as they use it and how aware of typology they are. I advocate for a speaker-oriented viewpoint and argue ultimately that speakers know little to nothing about language history and less about typology, and yet they behave in ways that essentially create typology and history. I offer a number of examples, mainly from Sanskrit and Greek, covering sound change and grammatical change and discuss issues regarding naturalness, gradualness, and social indexing.|000|diachronic competence, speaker awareness, language history, awareness of language change, 5200|Joseph2019|There are clear points of contact between historical linguistics, understood as the investigation of language change and language history, and typology, under- stood as the investigation of cross-linguistic generalizations about language structure and the categorization of observed types of structures along differ- ent parameters. Both are concerned with the notion of “possible human lan- guage”: typology explores the range of existing, synchronic, variation within that notion and historical linguistics explores the range of changes within the scope of that notion, thus offering essentially a diachronic typology.|33|historical linguistics, linguistic typology, diachronic typology, diversity linguistics, 5201|Joseph2019|By contrast, I advocate here for the speaker’s point of view, taking my cue from James Milroy’s response to Roger Lass: @Lass<1997> (1997: 309) wrote that languages make use of the detritus left over from older systems via “bricolage,” i.e. the recycling and repurposing of bits left “lying around” in a language, to which @Milroy<1999> (1999: 188) responded by asking how we can “make sense of all this without ... an appeal to speakers? ... If there is bricolage, who is the bricoleur? Does the language do the bricolage independently of those who use it?|34|tinkering, language change, bricolage, scaffolding, scaffolded evolution, 5202|Joseph2019|My key points are that speakers participate in and shape “history” without being aware of history, and that they ultimately are the source of typologi- cal generalizations—again, without being aware of typology. This view can be summed up in a single evocative (and provocative) question (with apologies to Freud): “What does a speaker want?” The answer to the question, drawing on the insights of Henning Andersen (p.c.), is that speakers want to engage in successful communication, they want to find solutions to linguistic “problems” that their language poses for them, and they want to do all of this in a socially acceptable way that offers them no stigma or harm.|35|speaker, agent-based modeling, language change, 5203|Jacobs2019|The aim of this special issue is to explore the ways in which language change that is carried out intentionally may differ from language change that occurs non-intentionally. It seems obvious that, as with most human-driven processes, (the outcomes of) language change will differ depending on whether the change is carried out intentionally or not. However, the potentially far-reaching effects of intentionality on language change still suffer from a lack of recognition in the field of historical linguists and the number of publications dealing specifically with this topic is correspondingly limited.|000|special issue, intentional language change, language change, 5204|Thomason2007|Historical linguists have always known that some linguistic changes result from deliberate, conscious actions by speakers. But the general assumption has been that such changes are relatively trivial, confined mainly to the invention or borrowing of new words, changes in lexical semantics, and the adoption of a few structural features from a prestige dialect. :comment:`Quoted after `@Jacobs<2019> :comment:`127`|1|intentionality, language change, intentional language change 5205|Inoue2008|In this paper a new technique for representing dialectal differences will be introduced. A kind of simplification will be attempted to represent the distribution patterns of the lexical items of standard Japanese. In order to simplify the geographical distribution patterns, railway distance center graph is utilized. In this technique geographical locations are plotted on a one-dimensional line from a cultural center. The shift of the main cultural center of Japan from the west to the east is reflected in the graphs obtained from factor analysis and cluster analysis, and in the geographical distribution patterns of the standard Japanese words making use of the railway distance. The process of dissemination has been concisely summarized by the railway distance center graph. Multivariate analysis allows us to grasp an overall picture of the relationship between dialectal distribution and the historical background of words. After applying multivariate techniques the results can be represented by more concise and simplified numerical techniques.|000|Japanese dialects, quantitative analysis, 5206|Spike2018|Rule-like behaviour is found throughout human language, provoking a number of apparently conflicting explanations. This paper frames the topic in terms of Tinbergen’s four questions and works within the context of rule-like behaviour seen both in nature and the non-linguistic domain in humans. I argue for a minimal account of linguistic rules which relies on powerful domain-general cognition, has a communicative function allowing for multiple engineering solutions, and evolves mainly culturally, while leaving the door open for some genetic adaptation in the form of learning biases.|000|Tinbergen's four questions, linguistic rules, learning bias, cultural evolution, rule learning, 5207|Spike2018|Although human languages are shot through with irregularities, linguists are typically drawn towards their ubiquitous regularities and rule-like structures. This is reflected in the vast [pb] literature on linguistic theory. These theories are sometimes framed — where they are framed at all — in terms of descriptive accuracy, with no recourse to psychological or evolutionary plausibility. Elsewhere, we see the opposite: @Pinker<1991> (1991, 1999) has argued that rule-like behaviour is the defining characteristic of human language, and that these rules are innate, domain-specific, and evolved via natural selection under communicative pressures. I will argue mostly against this view, in light of work on human and comparative cognitive evolution.|1f/18|universal grammar, rule learning, language learning, discussion 5208|Spike2018|Here are some rule-like behaviours found in many varieties of English. They are written out in a standard linguistic formalism, but are explained afterwards. (1) `*`ŋ ∕ #`_` (2) t → [ɾ]V_V (3) Evaluative > General property > Age > Colour > Provenance > Manufacture > Type |3/18|rule learning, linguistic rules, 5209|Spike2018|For ‘naturalists’, rules are an emergent phenomenon, contingent on history alone, so they focus on historical process. ‘Cognitivists’ see rule-like behaviour as a ‘surface’ phenomenon derived from underlying sets of inter- acting representations, rules, or constraints, so these become the target of their inves- tigation. Despite this, neither account can dispense with some kind of synchronic/ cognitive or diachronic/cultural explanation; rather, it is often left implicit.|4/18|rule learning, debate, naturalists, cognitivists, linguistic rules, 5210|Spike2018|:comment:`Tinbergen's four questions` 1. Mechanism * → sophisticated, domain (and sub-domain) specific * → sophisticated, domain-general mechanisms for processing and interaction 2. Function - → externally but not ultimately for social communication because: * no ultimate function * ultimately non-communicative function - → ultimately for social communication 3. Acquisition - → learning selects a subset of rules from rich, innately-specified knowledge - → learning infers rules from rich, culturally-transmitted knowledge 4. Evolution - → biological evolution alone - → cultural evolution alone - → gene-culture co-evolution |13/18|Tinnbergen's four questions, linguistic rules, rule learning, language evolution, 5211|Michaud2019|A puzzling fact about linguistic norms is that they are mainly stable, but the conven- tional variant sometimes changes. These transitions seem to be mostly S-shaped and, therefore, directed. Previous models have suggested possible mechanisms to explain these directed changes, mainly based on a bias favoring the innovative variant. What is still debated is the origin of such a bias. In this paper, we propose a refined taxonomy of mechanisms of language change and identify a family of mechanisms explaining self- actuated language changes. We exemplify this type of mechanism with the preference- based selection mechanism that relies on agents having dynamic preferences for dif- ferent variants of the linguistic norm. The key point is that if these preferences align through social interactions, then new changes can be actuated even in the absence of external triggers. We present results of a multi-agent model and demonstrate that the model produces trajectories that are typical of language change.|000|language change, simulation studies, artificial agents, agent-based modeling 5212|Perea2008|In Catalonia, from a general point of view and concerning Geolinguistics, three assessments can be done: a) no new initiatives for creating a general linguistic atlas are expected; on the contrary, the tendency would be to create regional or local atlases or, disregarding cartography, to develop of monographs concerning several linguistic aspects of a certain dialectal area; b) there is no perceived need for an electronic publication of the atlas or the release of an internet version (the general format used is paper); and c) there is a possibility of computerising the data contained in old atlases. The main aim of this paper is to describe the processes of systematisation and mapping of dialectal data based on “La flexió verbal en els dialectes catalans”. The paper is structured in five parts: a) The corpus of morphological and phonetic data; b) Mapping the data; c) Using the program; d) Sound maps; e) Conclusions.|000|geographic map, Catalan dialects, 5213|Pakendorf2015|At first glance, it might come as a surprise to find a chapter on molecular anthropology in a handbook of historical linguistics. And yet, as will be outlined below, molecular anthropological studies can provide insights into prehistoric processes that may have had an impact on language change, thus offering the potential of deepening our understanding of such changes. The reasons for this potential are that both ‘genes’ (DNA molecules) and languages are passed on by human beings through social interactions, and both genes and languages can retain traces of prehistory, leading to the expectation that genes and languages should coevolve. As will be outlined briefl y in section 2 below, this potential coevolution of genes and languages has stimulated research predominantly among geneticists who are interested in elucidating whether cultural factors like language might have an impact on biological evolution. A different approach to genetic insights into language change, which is driven by questions concerning language evolution (specifically, contact-induced language change) rather than genetic evolution, is at the heart of this chapter and will be described in section 3. Since this is still a very young field of research, the focus will be not so much on a review of results, but rather on introducing this interdisciplinary approach to population and language contact and the insights it can provide into processes underlying language change. For readers who might need a (re-)introduction to genetics, the Appendix provides a brief overview of some of the most important concepts needed to follow this chapter.|000|anthropology, archaeogenetics, gene-language co-evolution, language evolution 5214|Lewis2010|In this article, we review the process of building ODIN, the Online Database of Interlinear Text (http://odin.linguistlist.org) a multilingual repository of linguis- tically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted on the web. At the time of this writing, ODIN holds nearly 190,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from >10% of the world’s languages). ODIN’s charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e. the docu- ments from which data were extracted) by language name, language family, and even annotations used to markup the data (e.g. NOM, ACC, ERG, PST, 3SG). Further, we have sought to enrich the data we have collected and extract ‘know- ledge’ from the enriched content. To enrich the data, we use a variety of statistical tagging and parsing methods applied in the English translations. An enhanced search facility allows users to find data across languages for a variety of syntactic constructions and constituent orders, facilitating unprecedented automated and online discovery of language data.|000|inter-linear-glossed text, corpus, cross-linguistic study, 5215|Zemkova2018|Although language is something deeply embedded in our nature, the question of its origin is of the same order as the misty question of the origin of life. I point out that the core of the problem can be rooted in the dichotomy between language and speech, similar to the dichotomy of genotype and phenotype in biology. Following the ontogeny–phylogeny framework, I propose that studies of language ontogeny, especially its early stages, can bring a new understanding to language, same as the study of communication in non-human primates..|000|biological parallels, ontogeny, language origin, communication 5216|Wichmann2020|In many scientific disciplines it is often necessary to refer to geographical travel distances. While online services can provide such distances, they fail for larger distances or for distances between points not connected by roads, and they do not allow for the calculation of many distances. Here we describe two novel methods of measuring travel distances which overcome these problems. Both use waypoints of populated places from the geonames.org database. The more efficient and accurate of the two uses the Dijkstra algorithm to find the shortest path through a Delaunay graph of neighbouring populated places.|000|distances, walking distances, geography, geographic map, travel distances 5217|Chirkova2019|Convergence is an oft-used notion in contact linguistics and historical linguistics. Yet it is problematic as an explanatory account for the changes it represents. In this study, we model one specific case of convergence (Duoxu, an endangered Tibeto-Burman lan- guage with 9 remaining speakers) to contribute to a more systematic understanding of the mechanisms underlying this phenomenon. The goals are (1) to address the role of some linguistic and social factors assumed to have an effect on the process of conver- gence, and (2) to test the following explanations of empirical observations related to phonological convergence: (a) the loss of phonological segments in a language that has undergone convergence is correlated with the relative frequency and markedness of these segments in the combined bilingual repertoire, and (b) widespread bilingualism is a prerequisite for convergence. The results of our agent-based simulation affirm the importance of frequency and markedness of phonological segments in the process of convergence. At the same time, they suggest that the explanation related to widespread bilingualism may not be valid. Our study suggests computer simulations as a promising tool for investigation of complex cases of language change in contact settings.|000|bilingualism, convergence, language contact, agent-based modeling 5218|Flaksman2017|The paper deals with the process of iconicity loss and touches upon the enigma of iconic words’ continuous and persistent appearance in languages all over the globe. The hypothesis introduced aims at explaining the reasons behind the never-ceasing iconic coinage. The paper also presents a universal four-step model of iconicity loss.|000|iconicity, sound symbolism, language change, lexical change, word formation 5219|Martin2019|Regarding the introductory aspect: when it comes to the practice of visual display, there is a difference between the use of visu- alizations for exploration or for the produc- tion of new results (or hypotheses), and their use for communication of known results.|607|visualization, exploratory data analysis, 5220|Rama2018b|Bayesian linguistic phylogenies are standardly based on cognate matrices for words referring to a fix set of meanings—typically around 100-200. To this day there has not been any empirical investigation into which datasize is optimal. Here we determine, across a set of language families, the optimal number of meanings required for the best performance in Bayesian phylogenetic inference. We rank meanings by stability, infer phylogenetic trees using first the most stable meaning, then the two most stable meanings, and so on, computing the quartet distance of the resulting tree to the tree proposed by language family experts at each step of datasize increase. When a gold standard tree is not available we propose to instead compute the quartet distance between the tree based on the n-most stable meaning and the one based on the n + 1-most stable meanings, increasing n from 1 to N − 1, where N is the total number of meanings. The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve. We show that this assumption is borne out. The results of the two methods vary across families, and the optimal number of meanings appears to correlate with the number of languages under consideration.|000|sample size, sampling, Bayesian analysis, phylogenetic reconstruction, 5221|Vagheesh2019|By sequencing 523 ancient humans, we show that the primary source of ancestry in modern South Asians is a prehistoric genetic gradient between people related to early hunter-gatherers of Iran and Southeast Asia. After the Indus Valley Civilization’s decline, its people mixed with individuals in the southeast to form one of the two main ancestral populations of South Asia, whose direct descendants live in southern India. Simultaneously, they mixed with descendants of Steppe pastoralists who, starting around 4000 years ago, spread via Central Asia to form the other main ancestral population. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the distinctive features shared between Indo-Iranian and Balto-Slavic languages.|000|South Asia, archaeogenetics, population genetics, South-East Asia, 5222|Glaubrecht2019|Im Abstand von 32 Jahren segeln zwei Gelehrte, ein Preuße und ein Brite, in die Welt, erforschen Tiere, Pflanzen, Vulkane und ziehen ihre Schlüsse: Universelle Harmonie sieht Humboldt. Stetigen Wandel erkennt Darwin. Am Ende steht ein epochaler Umbruch -- eine neue Sicht auf das Leben. Und ein Genie, das sich mit fremden Federn schmückte.|000|biography, history of science, Alexander von Humboldt, Charles Darwin, popular science, 5223|McCoy2018|Edit distance is commonly used to relate cog-nates across languages. This technique is particularly relevant for the processing of low-resource languages because the sparse data from such a language can be significantly bolstered by connecting words in the low-resource language with cognates in a related,higher-resource language. We present threemethods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other methods and the Levenshtein edit distance base-line, showing that NLP applications can benefit from information about cross-linguistic phonological patterns.|000|edit distance, low resource languages, NLP, 5224|Easterday2019|The syllable is a natural unit of organization in spoken language whose strongest cross- linguistic patterns are often explained in terms of a universal preference for the CV structure. Syllable patterns involving long sequences of consonants are both typologi- cally rare and theoretically marginalized, with few approaches treating these as natural or unproblematic structures. This book is an investigation of the properties of languages with highly complex syllable patterns. The two aims are (i) to establish whether these languages share other linguistic features in common such that they constitute a distinct linguistic type, and (ii) to identify possible diachronic paths and natural mechanisms by which these patterns come about in the history of a language. These issues are investi- gated in a diversified sample of 100 languages, 25 of which have highly complex syllable patterns. Languages with highly complex syllable structure are characterized by a number of phonetic, phonological, and morphological features which serve to set them apart from languages with simpler syllable patterns. These include specific segmental and supraseg- mental properties, a higher prevalence of vowel reduction processes with extreme out- comes, and higher average morpheme/word ratios. The results suggest that highly com- plex syllable structure is a linguistic type distinct from but sharing some characteristics with other proposed holistic phonological types, including stress-timed and consonantal languages. The results point to word stress and specific patterns of gestural organization as playing important roles in the diachronic development of these patterns out of simpler syllable structures.|000|syllable structure, typology, dataset, 5225|Blaxter2019|Tracing the diffusion of linguistic innovations in space from historical sources is challenging. The complexity of the datasets needed in combination with the noisy reality of historical language data mean that it has not been practical until recently. However, bigger historical corpora with richer spatial and temporal information allow us to attempt it. This paper presents an investigation into changes affecting first person non-singular pronouns in the history of Norwegian: first, individual changes affecting the dual (vit > mit) and plural (vér > mér), followed by loss of the dual-plural distinction by merger into either form or replacement of both by Danish-Swedish vi. To create dynamic spatial visualisations of these changes, the use of kernel density estimation is proposed. This term covers a range of statistical tools depending on the kernel function. The paper argues for a Gaussian kernel in time and an adaptive uniform (k-nearest neighbours) kernel in space, allowing uncertainty or multiple localisation to be incorporated into calculations. The results for this dataset allow us to make a link between Modern Norwegian dialectological patterns and language use in the Middle Ages; they also exemplify different types of diffusion process in the spread of linguistic innovations.|000|linguistic diffusion, diffusion, kernel density estimation, KDE, Norwegian dialects, Norway 5226|Olander2019|The study examines the terminology currently in use for the higher-level subgroups of the Indo-European family tree. Based on the observation that the terminology is heterogeneous and confusing, the study discusses the central terms, suggesting that the whole language family and its ancestor should be referred to as “Indo-European” and “Proto-Indo-European” respectively. Under the hypothesis that the three first subgroups to branch off were Anatolian, Tocharian and Italo- Celtic, “Indo-Tocharian” is recommended as a suitable name for the non-Anatolian subgroup, and “Indo-Celtic” for the non-Anatolian and non-Tocharian subgroup.|000|subgrouping, Indo-European, terminology, 5227|Kalkhoff2019|The mutual interaction, i.e. attraction and repulsion, of bodies across space without direct mechanical contact, such as the movement of planets, gravity, magnetism, electricity, or light, posed a theoretical and practical problem for physics until the middle of the nineteenth century (see for what follows about physical fields McMullin 2002). Up until that time, Newtonian mechanics provided generally accepted basic assumptions about the nature of matter and its movement, such as the dualism of matter and acting forces and the static space and absolute time. Movements of physical bodies across space were conceived as mathematical dispositions over the space that could be calculated and observed, but their driving forces were not understood.|000|history of science, linguistic relativity, field, phonology, Edward Sapir, Benjamin Lee Whorf, Albert Einstein 5228|Sapir1925|There used to be and to some extent still is a feeling among linguists that the psychology of a language is more particularly concerned with its grammatical features, but that its sounds and its phonetic processes belong to a grosser physiological substratum. Thus, we sometimes hear it said that such phonetic processes as the palatalizing of a vowel by a following i or other front vowel ("umlaut") or the series of shifts in the manner of articulating the old Indo-European stopped consonants which have become celebrated under the name of "Grimm's Law" are merely mechanical processes, consummated by the organs of speech and by the nerves that control them as a set of shifts in relatively simple sensorimotor habits. It is my purpose in this paper, as briefly as may be, to indicate that the sounds and sound processes of speech cannot be properly understood in such simple, mechanical terms.|000|phonology, sound patterns, phonetics, sound system, 5229|Sapir1925|It is time to escape from a possible charge of phonetic metaphysics and to face the question, "How can a sound be assigned a 'place' in a phonetic pattern over and above its natural classification on organic and acoustic grounds?" The answer is simple. "A 'place' is in- tuitively found for a sound (which is here thought of as a true 'point in the pattern,' not a mere conditional variant) in such a system be- cause of a general feeling of its phonetic relationship resulting from all the specific phonetic relationships (sich as parallelism, contrast, combination, imperviousness to combination, and so on) to all other sounds." These relationships may, or may not, involve morphological processes (e.g., the fact that in English we have morphological alterna- tions like wife: wives, sheath: to sheathe, breath: to breathe, mouse: to mouse helps to give the sounds f, 0, s an intuitive pattern relation to their voiced correlates v, 5, z which is specifically different from the theoretically analogous relation p, t, k: b, d, g; in English, f is nearer to v than p is to b, but in German this is certainly not true).|48|sound system, phonology, sound patterns, 5230|Schweizer2019|Dans sa Lettre sur la musique française (1753 : 91), Rousseau déclare « qu’il n’y a ni mesure ni mélodie dans la Musique Française, parce que la langue n’en est pas susceptible ». Par conséquent, son « chant n’est qu’un aboyement continuel, insupportable à toute oreille non prévenue ». Le français est une langue inadaptée pour chanter, ainsi l’opinion souvent exprimée (comme ici chez Rousseau), et ceci jusqu’à aujourd’hui. La préférence est généralement donnée à la langue italienne dont Giambattista Mancini (1776 : 199) fait l’éloge ici : « Toutes les nations sont obligées, bon gré ou malgré [sic !], de convenir que la langue italienne est, de toutes les langues, la plus harmonieuse, la plus douce, la plus suave, la plus propre, en un mot, à être adaptée à une bonne musique. »|000|music, speech perception, singing, Italian, French, history of science, 5231|Torre2019|The first issue of WORD was launched in 1945, announced on its front cover as “the journal of the Linguistic Circle of New York, devoted to the study of linguistic science in all its aspects.” At the time, the only other general linguistics journal published in the United States was Language, the organ of the Linguistic Society of America, which – at least according to the received view – was firmly in the hands of mechanist post-Bloomfieldians. Indeed, under Bernard Bloch’s (1907–1965) editorship, most contributions accepted in Language were either papers on historical linguistics or strictly formal descriptions of linguistic phenomena. As scholars of the mechanist orientation were increasingly perceived as becoming elitist and the field seemed to be narrowing, a sense of discontent began to spread among fellow linguists who did not recognize themselves in that approach (Householder 1978). |000|journal, history of science, WORD journal, Language journal, André Martinet, 5232|Handel2019|In a recent article, @Fellner<2019> & Hill (this volume) level a strong critique against what they view as the misguided prevailing methodology of historical-comparative reconstruc- tion in the Sino-Tibetan (aka Trans-Himalayan) language family. The central focus of their criticism is the assembling of “word families” and the reconstruction of ST proto- forms exhibiting variation to account for those word families. In this response, I argue that the methodology is basically sound and is appropriate to the current state of our knowledge. At the same time, I dispute some of the assertions made by Fellner & Hill, which I believe are mischaracterizations of the methods and assumptions underlying the work of Sino-Tibetan scholars.|000|word family, Sino-Tibetan, Old Chinese, debate 5233|Fellner2019|Linguists researching the Trans-Himalayan family do not have a self-perception as working outside the mainstream of historical linguistics, but ‘word families’ and ‘allo- fams’ are important elements in their thinking despite the absence of these terms in the wider discipline. A close examination of the practice of historical linguistics in Indo- European and Trans-Himalayan leads to the conclusion that those phenomena treated as word families admit superior analyses in more traditional terms.|000|word family, debate, Sino-Tibetan, Old Chinese 5234|Fellner2019a|The replies to Fellner and Hill (this volume) [@Schuessler2019, @Handel2019, @Thurgood2019 ] present the practice of historical linguistics in the study of the Trans-Himalayan family as on the trail our Indo-European forbears blazed. The replies further present “word families” and “allofams” as beacons that light this path; we disagree. Our respondents overlook the different status of reconstructions in the two families. Research at the subgroup level that they point to as Neogrammarian implements a formalist approach to reconstruction, which, fine as far as it goes, lacks the sophistication of reconstructions in more mature disciplines. Not appreciating the different status of reconstruction in the two families, our respondents exaggerate the extent to which Indo-European evinces “word family”-like phenomena and present allofams as more synchronically plausible than they are.|000|word family, allofams, Sino-Tibetan, debate, Old Chinese 5235|Fellner2019|@Fellner<2019> and Hill (this volume) argue that the recourse to the notion of word families has prevented scholars specializing in Sino-Tibetan comparative linguistics from working out regular sound correspondences. This paper disputes this evaluation of the state of the art in the field, and suggests that F&H’s appraisal is due to severe misunderstand- ings.|000|Sino-Tibetan, word family, allofams, debate, Old Chinese 5236|Schuessler2019|This response to @Fellner<2019> and Hill defends the concept of word family and allofam.|000|debate, Sino-Tibetan, allofams, word family, Old Chinese 5237|Levshina2019a|The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantita- tive studies of word order variation, which can be measured as entropy at differ- ent levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme.|000|corpus linguistics, linguistic typology, entropy, universal dependencies, 5238|Lanwermeyer2019|This paper focuses on phonetic variation within the standard German language register of Austria. While the norm status and a high socio-symbolic value are attributed to certain lexical variants of standard language in Austria, the norm and usage status of characteristic phonetic properties remain unclear, due to lack of empirical analyses. By investigating the relation between standard language norms and “standard usage” (Ge- brauchsstandard) in Austria, our study aims to close this research gap by using the example of unstressed ‹-ig›. The analyses are based on data gathered from 52 speakers from two generations, covering all Austrian dialect regions. Elicitation settings varied from strongly standardized tasks with a graphic or visual stimulus (reading aloud tasks, picture naming tasks) to translation tasks (translation from dialect into standard) with oral stimuli. The results demonstrate that although ‹-ig› is predominantly pronounced like [ɪk], (socio-)linguistic factors as phonetic context, part of speech, setting, gender and regional background influence the ‹-ig›-variation. In total, the data suggest that German speaking Austrians are situated in a conflict between transnationally diverging norms and intra-nationally varying model speakers of German.|000|language variation, Austrian German, Austrian dialects, sociolinguistics, language change in progress, 5239|Handschuh2019|Personal names can be specified as male or female in almost all languages of the world. Languages differ, however, whether the sex of the referent is lexical knowledge or overtly coded in the form of the name. Symmetrical systems – with overt marking on both male and female names – can be distinguished from asymmetrical ones, of which one subtype, overt coding of female names, is by far the most frequent. In addition, the morpho- syntactic system of encoding the sex of the referent can be either limited to personal names or use morphological material also employed on other types of nominals. This paper investigates the morpho-syntactic means used for the classification of personal names in the languages of the world as well as the integration of personal names into classificatory systems used for common nouns, namely gender and classifiers.|000|classification, personal names, typological study, 5240|Martins2019|Recently, prominent theoretical linguists have argued for an explicit scenario for the evolu- tion of the human language capacity on the basis of its computational properties. Con- cretely, the simplicity of a minimalist formulation of the operation Merge, which allows humans to recursively compute hierarchical relations in language, has been used to pro- mote a sudden-emergence, single-mutation scenario. In support of this view, Merge is said to be either fully present or fully absent: one cannot have half-Merge. On this basis, it is inferred that the emergence of our fully fledged language capacity had to be sudden. Thus, proponents of this view draw a parallelism between the formal complexity of the operation at the computational level and the number of evolutionary steps it must imply. Here, we exam- ine this argument in detail and show that the jump from the atomicity of Merge to a single- mutation scenario is not valid and therefore cannot be used as justification for a theory of language evolution along those lines.|000|merge, Chomsky syntax, discussion, debate 5241|Berwick2019|In their Essay on the evolution of human language, @Martins<2019> and Boeckx seek to refute what they call the “half-Merge fallacy”—the conclusion that the most elementary computational operation for human language syntax, binary set formation, or “Merge,” evolved in a single step. We show that their argument collapses. It is based on a serious misunderstanding of binary set formation as well as formal language theory. Furthermore, their specific evolution- ary scenario counterproposal for a “two-step” evolution of Merge does not work. Although we agree with their Essay on several points, including that there must have been many steps in the evolution of human language and the importance of understanding how lan- guage and language syntax are implemented in the brain, we disagree that there is any justi- fication, empirical or conceptual, for the decomposition of binary set formation into separate steps.|000|merge, Chomsky syntax, debate, discussion, origin of language 5242|Nosek2019|Preregistration clarifies the dis- tinction between planned and unplanned research by reducing unnoticed flexibility. This improves credibility of findings and calibra- tion of uncertainty. However, mak- ing decisions before conducting analyses requires practice. During report writing, respecting both what was planned and what actu- ally happened requires good judg- ment and humility in making claims.|000|preregistration, p-hacking, hypothesis testing 5243|Hua2019a|Language diversity is distributed unevenly over the globe. Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we present the first global analysis of language diversity that compares the relative importance of two key ecological mechan- isms – isolation and ecological risk – after correcting for spatial autocorrelation and phylo- genetic non-independence. We find significant effects of climate on language diversity, consistent with the ecological risk hypothesis that areas of high year-round productivity lead to more languages by supporting human cultural groups with smaller distributions. Climate has a much stronger effect on language diversity than landscape features, such as altitudinal range and river density, which might contribute to isolation of cultural groups. The asso- ciation between biodiversity and language diversity appears to be an incidental effect of their covariation with climate, rather than a causal link between the two.|000|climate, language diversity, environmental factors, cross-linguistic study, 5244|Boyd2019|There is, fortunately, an increasing focus on making governments representative of the diversity of the pop- ulation they serve in terms of gender, race, and sexual orientation. But diversity also needs to embrace different intellectual approaches. The structured thinking and dis- ciplined methodologies of sci- ence add to diversity, but these are aspects that can challenge vested interests. The blunt, socially insensitive, scientist speaking truth to power is certainly a caricature, but it is sufficiently real to warrant careful management by gov- ernments. There is also often suspicion that scientists oper- ate their own agendas.|000|editorial, science, politics, discussion 5245|Fleck2013|Knowledge of Panoan languages and linguistics has increased significantly over the last several decades. The present paper draws upon this new information to produce a current internal classification of all the extant and extinct languages in the Panoan family based on lexical, phonological, and grammatical comparisons. This classification pays special attention to distinguishing dialects from independent languages and to mismatches that exist between linguistically defined languages and socially defined ethnic groups. An evaluation of previ- ously proposed genetic relations to other language families is followed by a discussion of lexi- cal borrowing and possible areal diffusion of grammatical features from and into neighboring non-Panoan languages and Kechua. The history of Panoan linguistics is chronicled from the first Jesuit and Franciscan vocabularies to the most recent contributions, and priorities for future research are suggested. A typological overview of Panoan phonology, morphology, and syntax is provided along with descriptions of some of the extraordinary linguistic features found in the family. Name taboos, postmortem word taboos, in-law avoidance languages, trade languages, ceremonial languages, and other ethnolinguistic phenomena found in the Panoan family are also discussed.|000|Panoan languages, South American languages, linguistic reconstruction, 5246|Dobo2019|Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models de- signed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best configuration in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such configurations that significantly out- perform conventional configurations and achieve state-of-the-art results.|000|semantic similarity, distributional semantics, semantic vectors, 5247|Ioannou2019|This article inquiries into specific aspects of the relation between conceptual contiguity found in metonymic shifts and the online construction of frames, seen as a dynamic process of construal. It first reviews the theory of metonymy regarding the conceptual, lexical and contextual facets of the phenomenon. It then explores the possibility of extending the conceptual relevance of metonymy beyond the traditional typological approach of metonymic categorization, re-interpreting it as a frame- integration mechanism, or blending, whereby two frames are brought together into an extended ICM. Metonymic blending is formulated as a partial integration between two input spaces discursively driven, whereby an ad hoc identification of a referential commonness plays the role of the generic space of the blending. Subsequently, in the light of the assumption that frame-extension is not given categorically but it also includes – beyond its cognitive relevance – an interactional aspect, this analysis draws an interesting link: that between the generic space of metonymic blend, and common ground. The latter is precisely what facilitates the metonymic blend, regulating the distance between the integrated frames, at the same time remaining silent as discursively given information.|2019|metonymy, frames, metonymic blending, semantics 5248|Kraus2019|Economic inequality is at its highest point on record and is linked to poorer health and well-being across countries. The forces that perpetuate inequality continue to be studied, and here we examine how a person’s position within the economic hierarchy, their social class, is accurately perceived and reproduced by mun- dane patterns embedded in brief speech. Studies 1 through 4 ex- amined the extent that people accurately perceive social class based on brief speech patterns. We find that brief speech spoken out of context is sufficient to allow respondents to discern the social class of speakers at levels above chance accuracy, that ad- herence to both digital and subjective standards for English is associated with higher perceived and actual social class of speak- ers, and that pronunciation cues in speech communicate social class over and above speech content. In study 5, we find that people with prior hiring experience use speech patterns in prein- terview conversations to judge the fit, competence, starting sal- ary, and signing bonus of prospective job candidates in ways that bias the process in favor of applicants of higher social class. Over- all, this research provides evidence for the stratification of com- mon speech and its role in both shaping perceiver judgments and perpetuating inequality during the briefest interactions.|000|sociolinguistics, social class, prejudice, bias 5249|Li2004|This paper examines factors complicating the definition of Standard Chinese, including register and socio-geographical variation, sound change and folk etymology, foreign loans and contact-induced structural change, and inherent imprecisions in the national spelling system. Also examined are reactions to change in the linguistic and language-teaching com- munities, how lay and academic attitudes towards impurities and linguistic innovation differ, and how differences between Chinese and western notions of ‘language’ and ‘dialect’ serve to further widen the gap between the textbook standard and perceived standardness. Predictions are made regarding the future development of Modern Standard Chinese that take into con- sideration the popular appeal of the language of westernized Chinese societies (e.g., Hong Kong and Taiwan) and the effect of the growth of native speakers of Mandarin in the Chinese-speaking world.|000|standard language, Dachsprache, Mandarin, Chinese, purification 5250|Li2004|The problem with ‘ing’ is that in most dialects of Mandarin, Beijing included, the rhyme is not pronounced as it is spelled, but rather with a schwa offglide—[i @ N]. It is for this reason that in traditional verse and nursery rhymes from the last century ‘ing’ is always allowed to rhyme with ‘eng’—a violation of rhyme conventions if ‘ing’ is not represented as phonological /i@N/.|115|rhymes, impure rhymes, Mandarin, Chinese 5251|Duda2016|Article is in so far interesting as it compares linguistic classifications by Ethnologue and Greenberg and Ruhlen against the genetic "supertree" classifications. The fact that families like Altaic and Amerind do not fit well here is interesting, and can be used to explore a bit more which connections might seem most plausible in approaches that go beyond the typical language-family level.|000|supertree, world tree, global tree, human prehistory, population genetics, 5252|Walkden2019a|Recent work has cast doubt on the idea that all languages are equally complex; however, the notion of syntactic complexity remains underexplored.Taking complexity to equate to difficulty of acquisition for late L2 acquirers, we propose an operationalization of syntactic complexity in terms of uninterpretable features. Trudgill’s sociolinguistic typology predicts that sociohistorical situations involving substantial late L2 acquisition should be conducive to simplification, i.e. loss of such features. We sketch a programme for investigating this prediction. In particular, we suggest that the loss of bipartite negation in the history of Low German and other languages indicates that it may be on the right track.|000|complexity, linguistic complexity, second language acquisition, operationalization, syntactic complexity 5253|Jaeger2019|Computational approaches to historical linguistics have been proposed for half a century. Within the last decade, this line of research has received a major boost, owing both to the transfer of ideas and software from computational biology and to the release of several large electronic data resources suitable for systematic comparative work. In this article, some of the central research topics of this new wave of computational historical linguistics are introduced and discussed. These are automatic assessment of genetic relatedness, automatic cognate detection, phylogenetic inference and ancestral state reconstruction. They will be demonstrated by means of a case study of automatically reconstructing a Proto-Romance word list from lexical data of 50 modern Romance languages and dialects. The results illustrate both the strengths and the weaknesses of the current state of the art of automating the comparative method.|000|computational historical linguistics, introduction, workflow, discussion 5254|Hammarstroem2019|We welcome Gerhard Jäger’s framing of Computational Historical Linguistics: its history and background, its goals and ambitions as well as the concrete implementation by Jäger himself. As Jäger explains (pp. 151–153), the comparative method can be broken down into seven steps and there have been attempts to formalise/automatise (some of) the steps since the 1950s. However, Jäger contrasts the work in the 1960–2000s on various steps as “mostly constituting isolated efforts” and, in contrast, characterises the biologically inspired work of the 2000s as a “major impetus”. It is difficult to find the motivation for this division as the latter group, like the former, also concern themselves with only a subpart of the comparative method and, furthermore, rely fundamentally on subjective cognate judgments done by humans (as also acknowledged by Jäger later, pp. 156–157).|000|discussion, computational historical linguistics, reply, 5255|Hammarstroem2019|Simple models are better than more complex ones. Hidden overfitting (or in fact overfitting of any kind) is more likely to happen if our models contain a lot of parameters that we can adjust based on the data. Therefore, if a simple and a more complex model explain a process equally well, the simple model should always be preferred. This is a general scientific principle known as Ockham’s razor, but we think it is useful to reconsider the principle in relation to the danger of hidden overfitting.|237|Okham's Razor, model complexity, historical linguistics, 5256|List2019e|Jäger’s target article [@Jaeger2019] illustrates how a fully automated workflow of the comparative method, starting from the proof of genetic relationship up to the reconstruction of proto-forms can be applied to a small set of Romance languages. There are quite a few points deserving further discussion in this very interesting and very well-written overview on the current state-of-the-art in computational historical linguistics. For example, I do not completely agree with Jäger’s claim that methods that make use of orthographical string representations would not qualify as part of NLP-near computational historical linguistics. I think that the current practice in NLP is much less rigorous than Jäger assumes here, with many of the first but also the more recent algorithms that addressed quantitative tasks in historical linguistics making exclusive use of orthographic language data (Hauer and Kondrak 2011; Serva and Petroni 2008; Ciobanu and Dinu 2018).|000|linguistic reconstruction, evaluation, reply, discussion 5257|Jaeger2019a|This is a reply to the comments by Hammarström et al. [@Hammarstroem2019] (This volume) and @List<2019e> (This volume) on the target article Computational Historical Linguistics (This volume). There I proposed several methodological principles for research in Computational Historical Linguistics pertaining to suitable techniques for model fitting and model evaluation. Hammarström et al. debate the usefulness of these principles, and List proposes a novel evaluation measure specifically aimed at the task of proto-form reconstruction. This reply will focus on the role of model evaluation in our field.|000|reply, computational historical linguistics, discussion 5258|Greenberg1969|the comparative method, or more accurately stated, comparative methods, since a multiplicity of them exists, have a fundamental place in the disciplines concerned with man in his social and cultural aspects. These disciplines, in contrast with the physical and biological sciences, never encounter in pure form the phenomena concerning which they seek for understanding and the formulation of regularities. Such entities as culture, society, religion, or language are always encountered in the concrete form of particular, historically conditioned cultures, societies, religions, languages, and so on. One basic approach is, therefore, the comparative one; and a fundamental purpose often served by such an approach is the uncovering of constancies of structure or of developmental tendencies underlying the individual variant forms. Hence we may study culture by means of cultures and language by means of languages.|000|comparative linguistics, historical language, typological language comparison, linguistic typology, theory, nice quote, comparative method 5259|Greenberg1969|Given that the conditions of application of synchronically derived universals are general, when one encounters a new language it is possible, from certain characteristics, to predict others. Such fresh data will also constitute and empirical test of the validity of the hypothesis. On the other hand, it makes no sense to ask such questions as whether there are any languages in the world which violate Grimm's Law, since it only applies to a given language during a specified chronological period.|150|linguistic typology, Grimm's Law, historical language comparison, prediction, 5260|Greenberg1969|There exists the possibility of a more comprehensive mode of diachronic comparison which shares with synchronic typology the attribute of generality. The individual items of such comparison are not the cognate forms, but the change themselves, as formulated in rules and occurring in historically independent cases, and hence subject to classification (corresponding to synchronic typology) and generalization without proper-name restriction. For reasons that will presently appear, it is appropriate to call such comparison processual. That this type of comparison has received nly marginal and unexplicit attention rests at least partly on the preemption of the term "comparative" to particular applications of the genetic method, so that diachronic linguistics has appeared already to possess its own comparative method. |151|change, sound change, historical language comparison, diachronic typology, nice quote 5261|Greenberg1969|Interesting article presenting certain proposed universals that concentrate on sound patterns, but also discusses the role of historical language comparison compared to typological language comparison, and tries to propagate a diachronic typology.|000|diachronic typology, nice paper, implicational universals, sound patterns, 5262|Chappell2019|Recent accounts on the typology of predicative possession, including those by Stassen, recognise a Topic Possessive type with the possessee coded like the figure in an existential predication, and the possessor coded as a topic that is not subcategorised by the predicate and is not related to any syntactic position in the comment, literally: As for Possessor, there is Possessee. The Asian region is explicitly singled out as being a Topic Possessive area. On the basis of a sample of 71 languages from the four main language families of continental East and Southeast Asia – Sino-Tibetan, Hmong-Mien, Tai-Kadai and Austroasiatic, contrary to these previous accounts of the distribu- tion of the main types of predicative possession in the world’s languages, we argue that this area should rather be considered as showing a particularly high concentration of Have-Possessives, with the additional particularity that the verbs occurring in the Have-Possessive constructions in this linguistic area are polysemous verbs also used for existential predication. After briefly reviewing Stassen’s typology of predicative possession, we discuss his account of the Topic Possessive type and then present five arguments for considering why the possessor NP of the existential/possessive verb yǒu 有 in Standard Mandarin Chinese cannot be analysed as invariably occupying the position of a topic, and consequently, that the construction should be reclassi- fied as an instance of the Have-Possessive type. In the final sections, the situation is examined for other Southeast Asian languages showing the same configuration for predicative possession and existential predication as Standard Mandarin, to the extent that data is available.|000|topic and comment, possession, linguistic typology, Sinitic, 5263|Negesse2019|The Swadesh’s wordlist has been used for more than half a century to collect data for studies in comparative and historical linguistics. The current study compares the classification results of the Swadesh’s100 wordlist with those of its subsets to determine if reducing the size of the wordlist impacts its effectiveness. In the comparison, the 100, 50 and 40 wordlists were used to compute lexical distances of 29 Cushitic and Semitic languages spoken in Ethiopia and neighboring countries. Gabmap, a based application, was employed to compute the lexical distances and to divide the languages into related clusters. The comparison shows that the subsets are not as effective as the 100 wordlist in clustering languages into smaller related subgroups, but they are equally effective in dividing languages into bigger groups such as subfamilies. It is observed that the subsets may lead to an erroneous classification whereby unrelated languages by chance form a cluster which is not attested by a comparative study. The chance to get a wrong result will be higher when the subsets are used to classify languages which are not closely related. Though a further study is still needed to settle the issues around the size of the Swadesh’s wordlist, this study indicates that the 50 and 40 wordlists cannot be recommended as reliable substitutes for the 100 wordlist under all circumstances. The choice seems to be determined by the objective of a researcher and the degree of affiliation among the languages to be classified.|000|Swadesh list, Semitic languages, Cushitic, lexicostatistics, Gabmap, missing data, missing code 5264|Parker2018|This thesis is the result of a three and a half year study on the tone system of the Tangsa-Nocte language varieties. The Tangsa-Nocte languages are a group of under-documented varieties spoken in the Patkai Mountains in Upper Myanmar and eastern Arunachal Pradesh, India. They belong to the Sino-Tibetan 1 family, and are 1. part of the Northern Naga branch of the Sal languages within Tibeto-Burman. The vast majority of Tangsa-Nocte speakers can be found in the Patkai hills south of the town of Margherita in Upper Assam, India, located at 27°17’N 95°40’E, and north of Singkaling Hkamti in Sagaing Region, Myanmar, located at 26°N 95°40’E, as well as in the regions immediately surrounding these towns. In recent decades a number of Tangsa-Nocte speakers have migrated to the area immediately northeast of Margherita in Miao Circle, Arunachal Pradesh.|000|PhD, Tangsa-Nocte, Sino-Tibetan, tone, phonetic study, experimental phonetics, tone change, 5265|Buckley2019|This paper applies character N-grams to the study of diachronic linguistic variation in a historical language. The period selected for this initial exploratory study is medieval English, a well-studied period of great linguistic variation and language contact, whereby the efficacy of computational tech- niques can be examined through comparison to the wealth of thorough scholarship on medieval linguistic variation. Frequency profiles of character N-gram features were generated for several epochs in the history of English and a measure of language distance was employed to quantify the similarity between English at different stages in its history. Through this a quantifica- tion of internal change in English was achieved. Furthermore similarity between English and other medieval languages across time was measured allowing for a measurement of the well-known period of contact between English and Anglo-Norman French. This methodology is compared to tradi- tional lexicostatistical methods and shown to be able to derive the same patterns as those derived from expert-created feature lists (i.e. Swadesh lists). The use of character N-gram profiles proved to be a flexible and useful method to study diachronic variation, allowing for the highlighting of relevant features of change. This method may be a complement to traditional qualita- tive examinations.|000|n-gram model, language change, English, Middle English, corpus studies, missing data, missing code, 5266|Bach2009|Interesting article on semantic universals that is potentially important in the context of discussing lexical change, semantics, and lexical typology.|000|lexical universals, semantic universals, language universals, linguistic typology, lexical typology 5267|Maurits2020|The use of computational methods to assign absolute datings to language divergence is receiving renewed interest, as modern approaches based on Bayesian statistics offer alternatives to the discredited techniques of glottochronology. The datings provided by these new analyses depend crucially on the use of calibration, but the methodological issues surrounding calibration have received comparatively little attention. Especially, underappreciated is the extent to which traditional historical linguistic scholarship can contribute to the calibration process via loanword analysis. Aiming at a wide audience, we provide a detailed discussion of calibration theory and practice, evaluate previously used calibrations, recommend best practices for justifying calibrations, and provide a concrete example of these practices via a detailed derivation of calibrations for the Uralic language family. This article aims to inspire a higher quality of scholarship surrounding all statistical approaches to language dating, and especially closer engagement between practitioners of statistical methods and traditional historical linguists, with the former thinking more carefully about the arguments underlying their calibrations and the latter more clearly identifying results of their work which are relevant to calibration, or even suggesting calibrations directly.|000|dating, phylogeny, language tree, tutorial, Bayesian approaches, BEAST 5268|MacMillam2020|Interlinear Glossed Text (IGT) is a rich data type produced by linguists for the purposes of presenting an analysis of a language's semantic and grammatical properties. I combine linguistic knowledge and statistical machine learning to develop a system for automatically annotating low-resource language data. I train a generative system for each language using on the order of 1000 IGT. The input to the system is the morphologically segmented source language phrase and its English translation. The system outputs the predicted linguistic annotation for each morpheme of the source phrase. The final system is tested on held-out IGT sets for Abui [abz], Chintang [ctn], and Matsigenka [mcb] and achieves 71.7%, 80.3%, and 84.9% accuracy, respectively.|000|inter-linear-glossed text, automatic approach, 5269|Mickus2020|Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous non-contextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships.|000|word embeddings, BERT, neural networks, problem, semantics, semantic vectors, 5270|Zhu2020|This paper extends the empirical coverage of the Autosegmental Input Strictly Local (A-ISL) framework (Chandlee and Jardine, 2019) by analyzing three tonal processes: floating tone suffixation in Cantonese, metrical dominance effect in Shanghai Chinese, and a combination of floating tones and metrical dominance in Suzhou Chinese. I show both the adequacy and inadequacy of the current A-ISL framework: it locally resolves some tonal processes that are otherwise non-local (Shanghai), but fails to account for other empirical data due to a lack of tonal membership specification (Suzhou). With the addition of a morphological affiliation tier, I propose an analysis for the Suzhou data. The paper contributes to our typological knowledge of computational locality and autosegmental phonological representations.|000|floating tone, Cantonese, autosegmental phonology, Autosegmental Input Strictly Local framework, Shanghainese, Chinese dialects, tone 5271|Mayer2020|Computational models of phonotactics share much in common with language models, which assign probabilities to sequences of words. While state of the art language mod- els are implemented using neural networks, phonotactic models have not followed suit. We present several neural models of phonotactics, and show that they perform favorably when compared to existing models. In addition, they provide useful insights into the role of rep- resentations on phonotactic learning and gen- eralization. This work provides a promising starting point for future modeling of human phonotactic knowledge.|000|phonotactics, neural network, language model, pseudo word, n-gram model, 5272|McMillan2020|Interlinear Glossed Text (IGT) is a rich data type produced by linguists for the purposes of presenting an analysis of a language’s se- mantic and grammatical properties. I combine linguistic knowledge and statistical machine learning to develop a system for automatically annotating low-resource language data. I train a generative system for each language using on the order of 1000 IGT. The input to the system is the morphologically segmented source lan- guage phrase and its English translation. The system outputs the predicted linguistic annota- tion for each morpheme of the source phrase. The final system is tested on held-out IGT sets for Abui [abz], Chintang [ctn], and Matsi- genka [mcb] and achieves 71.7%, 80.3%, and 84.9% accuracy, respectively.|000|inter-linear-glossed text, automatic approach, annotation 5273|Sims2020|The interpredictability of the inflected forms of lexemes is increasingly important to ques- tions of morphological complexity and typol- ogy, but tools to quantify and visualize this as- pect of inflectional organization are lacking, inhibiting effective cross-linguistic compari- son. In this paper I use metrics from graph theory to describe and compare the organiza- tional structure of inflectional systems. Graph theory offers a well-established toolbox for de- scribing the properties of networks, making it ideal for this purpose. Comparison of nine lan- guages reveals previously unobserved gener- alizations about the typological space of mor- phological systems. This is the first paper to apply graph-theoretic tools to the goal of in- flectional typology.|000|inflection, morphology, network approaches, rule induction 5274|Honeybone2019|This article revisits, extends and interrogates the position advocated in Honeybone (2019) — that phonotactic constraints are psychologically real phonological entities (namely: constraints on output-like forms), which have a diachrony of their own, and which can also interfere with diachronic segmental change by inhibiting otherwise regular innovations. I focus in the latter part of the article on the role of one phonotactic constraint in the history of English: *Rime-xxŋ. I argue that we need to investigate the emergence of such constraints in the history of languages and I show how this particular constraint, once innovated, can be understood to have inhibited the patterning of ash-tensing in certain varieties of American English (and also that it may now have been lost in some varieties). To do this, I adopt a phonological model which combines aspects of Rule-Based Phonology and aspects of Constraint-Based Phonology, and which is firmly rooted in the variation that exists when changes are innovated. Finally, I evaluate the extent to which the type of phonotactically-driven process-inhibition that I propose here involves prophylaxis in phonological change (I show that it doesn't), and I consider the interaction of these ideas with the proposal that all change occurs in language acquisition (‘acquisitionism’). |000|phonotactics, phonotactic restrictions, constraints, psychological aspects, phonological theory, 5275|Whorf1950|I find it gratuitious to assume that a Hopi who knows only the Hopi language and the cultural ideas of his own society has the same notions, often supposed to be intuitions, of time and space that we have, and that are generally assumed to be universal. In particular, he has no general notion or intuition of TIME as a smooth flowing continuum in which everything in the universe proceeds at an equal rate, out of a future, through a present, into a past; or, in which , to reverse the picture, the observer is being carried in the stream of duration continuously away from a past and into a future.|67|Sapir-Whorff hypothesis, nice quote, 5276|Rzymski2020|Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.|000|CLICS, colexification, reproducibility, 5277|Gao2020|In order to test and analyze the model of phylogenetic network of Tibeto- Burman languages, this study tried to reconstruct the phylogenetic network of Tibeto-Burman languages using the NeighborNet method. The materials in use were the 100 core cognates of 51 Tibeto-Burman languages. The network graphs showed that the evolutionary model of these languages is mainly tree-like, which shows that splitting is still the dominant mechanism of evolution and the horizontal transmission plays a significant role in the history as well. Language contact can spread deeply into the core vocabulary of other languages and may even impact the phylogenetic position of related languages.|000|phylogeny, Sino-Tibetan, missing code, missing data, 5278|Moran2020|Here we present an expanded version of BDPROTO , a database comprising phonological inventory data from 257 ancient and reconstructed languages. These data were extracted from historical linguistic reconstructions and brought together into a single unified, normalized, accessible, and Unicode-compliant language resource. This dataset is publicly available and we aim to engage language scientists doing research on language change and language evolution. Furthermore, we identify a hitherto undiscussed temporal bias that complicates the simple compar- ison of ancient and reconstructed languages with present-day languages. Due to the sparsity of the data and the absence of statistical and computational methods that can adequately handle this bias, we instead directly target rates of change within and across families, thereby providing a case study to highlight BDPROTO ’s research viability; using phylogenetic comparative methods and high-resolution language family trees, we investigate whether consonantal and vocalic systems differ in their rates of change over the last 10,000 years. In light of the compilation of BDPROTO and the findings of our case study, we discuss the challenges involved in comparing the sound systems of reconstructed languages with modern day languages.|000|sound inventories, proto-language, dataset 5279|Michael2019|In recent years, South Americanist linguists have embraced computational phylogenetic methods to resolve the numerous outstanding questions about the genealogi- cal relationships among the languages of the continent. We provide a critical review of the methods and language classification results that have accumulated thus far, emphasizing the superiority of character-based methods over distance-based ones and the importance of develop- ing adequate comparative datasets for producing well- resolved classifications.|000|phylogenetic reconstruction, subgrouping, South American languages, review, overview, 5280|Hahn2020|The universal properties of human languages have been the subject of intense study across the language sciences. We report computational and corpus evidence for the hypothesis that a prominent subset of these universal properties—those related to word order—result from a process of optimization for efficient communication among humans, trading off the need to reduce complexity with the need to reduce ambiguity. We formalize these two pressures with information-theoretic and neural- network models of complexity and ambiguity and simulate grammars with optimized word-order parameters on large-scale data from 51 languages. Evolution of grammars toward efficiency results in word-order patterns that predict a large subset of the major word-order correlations across languages.|000|word order, universals, communicative efficiency, quantitative study, simulation studies, universal dependencies, 5281|Evans2019|The normal result of language contact is widely assumed to be convergence, as manifested in classic Sprachbünde and caused through metatypy, cognitive ecolomy, shared norms of conversational practice, etc. Yet at the same time there is growing evidence that contact can also produce divergence, originating with Larsen's idea of 'neighbor opposition') and developed through Thurston's work on *eseterogeny* (elaboration of difference and impenetrability) to account for the apparently deliberate cultivation of language difference found in many parts of melanesia. I argue that contact-induced divergence is more prevalent than previously thought, drawing on case studies from new Guinea and Northern Australia. |000|language contact, divergence, case study, Australian languages, New Guina languages, 5282|Haspelmath2020|This paper discusses the widely held idea that the building blocks of languages (features, categories, and architectures) are part of an innate blueprint for Human Language, and notes that if one allows for convergent cultural evolution of grammatical structures, then much of the motivation for it disappears. I start by observing that human linguisticality (=the biological capacity for language) is uncontroversial, and that confusing terminology (“language faculty,” “universal grammar”) has often clouded the substantive issues in the past. I argue that like musicality and other biological capacities, linguisticality is best studied in a broadly comparative perspective. Comparing languages like other aspects of culture means that the comparisons are of the Greenbergian type, but many linguists have presupposed that the comparisons should be done as in chemistry, with the presupposition that the innate building blocks are also the material that individual grammars are made of. In actual fact, the structural uniqueness of languages (in lexicon, phonology, and morphosyntax) leads us to prefer a Greenbergian approach to comparison, which is also more in line with the Minimalist idea that there are very few domain-specific elements of the biological capacity for language.|000|linguisticality, language faculty, theoretical problems, language evolution, origin of language, 5283|Szeto2019a|Decades of works dedicated to the description of (previously) lesser-known Sinitic languages have effectively dispelled the common myth that these languages share a single “universal Chinese grammar”. Yet, the underlying cause of their grammatical variation is still a matter for debate. This thesis focuses on the typological variation across Sinitic varieties. Through comparing the typological profiles of various Sinitic languages with those of their non-Sinitic neighbors, we discuss to what extent the variation within the Sinitic branch can be attributed to areal diffusion. Variation across Sinitic is often explained from the perspective of language contact – sandwiched between Altaic languages to its north and Mainland Southeast Asian (MSEA) languages to its south, Sinitic can be considered typologically intermediate between these two groups of languages, where Northern Sinitic shows signs of convergence towards Altaic languages and Southern Sinitic towards MSEA languages. For example, the northern varieties tend to have a smaller number of classifiers, tones and codas, as well as a stronger tendency to disyllabicity and head-final constructions. However, the notion of “Altaicization” (Hashimoto 1976) is a moot point. Despite the typological differences between Northern Sinitic and Southern Sinitic, as Bennet (1979) argues, there is little evidence for “Altaicization” as many of such differences can hardly be put down to Altaic influence; instead, they are more likely due to the typological convergence between Southern Sinitic and MSEA languages. Moreover, there is evidence that the typological variation across Sinitic cannot be amply explained by areal influence from non-Sinitic languages. Some Sinitic varieties are known to exhibit certain distinct typological characteristics. For instance, analyzing the disposal, passive, and comparative constructions across the Sinitic branch, Chappell (2015b) argues that there are no fewer than five principal linguistic areas in China. Taking into account over 350 language varieties of seven different genetic affiliations (Sinitic, Turkic, Mongolic, Tungusic, Hmong-Mien, Tai-Kadai, Austroasiatic) and 30 linguistic features, we conduct a typological survey with the aid of the phylogenetic program NeighborNet (Bryant & Moulton 2004). Our results suggest that convergence towards their non-Sinitic neighbors has indeed played a pivotal role in the typological diversity of Sinitic languages. Based primarily on their degree of Altaic/MSEA influence, the Sinitic varieties in our database are classified into four areal groups, namely 1) Northern, 2) Transitional, 3) Central Southeastern, 4) Far Southern. This classification scheme reflects the intricate interplay between areal convergence, regional innovations, and retention of archaic features. The findings suggest that contact-induced typological change can occur rather rapidly, especially if given the appropriate sociolinguistic conditions. Furthermore, this thesis highlights the interdependence between the meticulous analysis of qualitative linguistic data and the proper application of quantitative tools in typological studies. Although this study is chiefly concerned with Sinitic typology, the quantitative approach adopted herein can potentially help shed new light on the challenge of typological comparison in other areas.|000|dissertation, thesis, South-East Asia, structural data, convergence, linguistic area, 5284|Zettersten2020|What are the cognitive consequences of having a name for something? Having a word for a feature makes it easier to communicate about a set of exemplars belonging to the same category (e.g., “the red things”). But might it also make it easier to learn the category itself? Here, we provide evidence that the ease of learning category distinctions based on simple visual features is predicted from the ease of naming those features. Across seven experiments, participants learned categories composed of colors or shapes that were either easy or more difficult to name in English. Holding the category structure constant, when the underlying features of the category were easy to name, participants were faster and more accurate in learning the novel category. These results suggest that compact verbal labels may facilitate hypothesis formation during learning: it is easier to pose the hypothesis “it is about redness” than “it is about that pinkish-purplish color”. Our results have consequences for under- standing how developmental and cross-linguistic differences in a language's vocabulary affect category learning and conceptual development.|000|learnability, naming, cognition, experimental study, 5285|Zettersten2020|Interesting investigation which in some part confirms the intuition that having a name for something helps in understanding and sorting one's mind much better.|000|naming, cognition, learnability, experimental study 5286|Sun2019|This book presents a grammatical synopsis and a collection of fully annotated spoken texts of Tshobdun, a morphology-rich Sino-Tibetan language of northwestern Sichuan, China. Tshobdun is a member of the Rgyalrong cluster under the Rgyalrongic subgroup in the Sino-Tibetan language family. The Tshobdun forms in this book represent the native speech of the second author Bstanblo (Bstan’dzin Blogros), who is from Kakhyoris Village of Tshobdun Township. The data were collected during extensive fieldwork conducted over the past two decades. The oral tradition of storytelling used to be a primary means of cultural transmission and entertainment, as well as an important participatory activity in the Rgyalrong communities (as depicted in the text “Story-telling” under the “Local History and Culture” section). Sadly, this tradition is no longer practiced in Tshobdun and other Rgyalrong areas, where nowadays folklore survives only in the memories of certain elders. Endeavors have been witnessed in recent years to gather and publish annotated folkloric texts in the Rgyalrong languages, including book-length publications such as Jacques & Chen 2010 (six texts, in Japhug Rgyalrong) and Lin 2016 (sixteen texts, in the Cogrtse variety of Situ Rgyalrong) and a number of MPhil and PhD theses containing annotated sample texts; e.g. Prins 2016 (on the Kyomkyo variety of Situ Rgyalrong), Zhang 2016 (on the Bragdbar variety of Situ Rgyalrong), and Gong 2018 (on Showu or Zbu Rgyalrong). The current volume of seventy-five carefully selected texts represents a substantial new contribution to the preservation and linguistic analysis of Rgyalrong spoken data. The texts included fall under five different genres: personal anecdotes, accounts of local history and culture, procedural texts, folklore, and miscellaneous other texts. In order to maintain dialectal consistency, texts recorded by speakers from other Tshobdun villages were re-told in the Kakhyoris dialect by the second author.|000|Tshobdun, Sino-Tibetan, Sichuan, China, bilingual texts, annotation, spoken language, Rgyalrong, corpus, 5287|Felsenstein1995|The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than three times that for a single rate. This “Hidden Markov Model” method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using p- hemoglobin DNA sequences in eight mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.|000|variation among sites, evolutionary model, statistical model, evolutionary biology, phylogenetic reconstruction 5288|Hunter2020|This paper has two closely related aims. The main aim is to lay out one specific way in which the derivational aspects of a grammatical theory can contribute to the cognitive claims made by that theory, to demonstrate that it is not only a theory’s posited representations that testable cognitive hypotheses derive from. This requires, however, an understanding of grammatical derivations that initially appears somewhat unnatural in the context of modern generative syntax. The second aim is to argue that this impression is misleading: certain accidents of the way our theories developed over the decades have led to a situation that makes it artificially difficult to apply the understanding of derivations that I adopt to modern generative grammar. Comparisons with other derivational formalisms and with earlier generative grammars serve to clarify the question of how derivational systems can, in general, constitute hypotheses about mental phenomena.|000|cognition, grammatical theory, Chomsky syntax, generative grammar, 5289|Gil2020|This paper presents a preliminary and tentative formulation of a novel empirical generalization governing the relationship between grammar and cognition across a variety of independent domains. Its point of departure is an abstract distinction between two kinds of cognitive structures: symmetric and asymmetric. While in principle any feature whatsoever has the potential for introducing asymmetry, this paper focuses on one specific feature, namely thematic-role assignment. Our main empirical finding concerns the role of language, or, more specifically, grammar, in effecting and maintaining the distinction between symmetric and asymmetric cognitive structures. Specifically, whereas symmetric structures devoid of thematic-role assignment more commonly occur in a non-grammatical and usually also non-verbal medium, asymmetric structures involving thematic-role assignment are more likely to be associated with a grammatical medium. Our work draws together three independent strands of empirical research associated with three diverse phenomenological domains: compositional semantics, metaphors and schematological hybrids. These three domains instantiate conceptual combinations, bringing together two or more subordinate entities into a single superordinate entity. For compositional semantics this consists of a juxtaposition of constituent signs to form a single more complex sign; for metaphors this entails the bringing together of two different concepts in order to produce a comparison; while for schematological hybrids this involves the combination of different entities to form a single new hybrid entity. Our empirical results reveal a remarkable parallelism between the above three domains. Within each domain, symmetric structures tend to be associated with a non-verbal or otherwise non-grammatical medium, while asymmetric structures are more frequently associated with a grammatical medium. Thus, within each domain, grammar introduces asymmetry. More specifically, we find that in all three domains, the asymmetry in question is one that involves the assignment of thematic roles. To capture this effect, we posit two distinct levels, or tiers, of cognition: non-grammatical cognition, more commonly associated with symmetric structures, and grammatical cognition more conducive to asymmetric structures. Within each of the three phenomenological domains, we find the distinction between non-grammatical and grammatical cognition to be manifest in three independent realms, phylogeny, ontogeny, and the architecture of human cognition. Thus, grammar constitutes the driving force behind the transition from symmetric to asymmetric cognitive structures.|000|generative grammar, discussion, grammatical theory, cognition, 5290|Tanaka2020|Ever since the inception of generative linguistics, various dependency patterns have been widely discussed in the literature, particularly as they pertain to the hierarchy based on “weak generation” – the so-called Chomsky Hierarchy. However, humans can make any possible dependency patterns by using artificial means on a sequence of symbols (e.g., computer programing). The differences between sentences in human language and general symbol sequences have been routinely observed, but the question as to why such differences exist has barely been raised. Here, we address this problem and propose a theoretical explanation in terms of a new concept of “Merge-generability,” that is, whether the structural basis for a given dependency is provided by the fundamental operation Merge. In our functional magnetic resonance imaging (fMRI) study, we tested the judgments of noun phrase (NP)-predicate (Pred) pairings in sentences of Japanese, an SOV language that allows natural, unbounded nesting configurations. We further introduced two pseudo-adverbs, which artificially force dependencies that do not conform to structures generated by Merge, i.e., non-Merge-generable; these adverbs enable us to manipulate Merge-generability (Natural or Artificial). By employing this novel paradigm, we obtained the following results. Firstly, the behavioral data clearly showed that an NP-Pred matching task became more demanding under the Artificial conditions than under the Natural conditions, reflecting cognitive loads that could be covaried with the increased number of words. Secondly, localized activation in the left frontal cortex, as well as in the left middle temporal gyrus and angular gyrus, was observed for the [Natural – Artificial] contrast, indicating specialization of these left regions in syntactic processing. Any activation due to task difficulty was completely excluded from activations in these regions, because the Natural conditions were always easier than the Artificial ones. And finally, the [Artificial – Natural] contrast resulted in the dorsal portion of the left frontal cortex, together with wide-spread regions required for general cognitive demands. These results indicate that Merge-generable sentences are processed in these specific regions in contrast to non-Merge-generable sentences, demonstrating that Merge is indeed a fundamental operation, which comes into play especially under the Natural conditions.|000|cognition, neurolinguistics, experimental study, merge, generative grammar 5291|Clark2020|Work within the minimalist program attempts to meet the criterion of evolvability: “any mechanisms and primitives ascribed to UG rather than derived from independent factors must plausibly have emerged in what appears to have been a unique and relatively sudden event on the evolutionary timescale” (Chomsky et al., 2017). On minimalist assumptions the evolution of the language faculty must have involved at least three major developments: (i) the evolution of computational atoms, lexical items, understood as bundles of features, (ii) the evolution of a single, simple recursive operation that glues together lexical items and complexes of lexical items, and (iii) externalization linking the syntactic component of the language faculty to the cognitive systems that humans use for sound and gesture. The first development, the evolution of lexical items and the lexicon, is especially poorly understood. A complete account of the evolution of lexical items will state what evolved, how, and why. The focus of this article is the first question: what evolved. What properties do lexical items have, what determines these properties, and what is the internal structure of lexical entries? The article identifies what the key open problems are for a minimalist account of the evolution of words that strives to meet the criterion of evolvability.|000|lexicon, generative grammar, derivation, evolution, discussion, theoretical problems, 5292|Baumann2019|This paper explores the hypothesis that morphotactically ambiguous segment sequences should be dispreferred and selected against in the evolution of languages. We define morphotactically ambiguous sequences as sequences that can occur both within morphemes and across boundaries, such as final /nd/ or /mz/ in ModE, which occur in simple forms like wind or alms and in complex ones like sinned or seems. We test the hypothesis in two diachronic corpus studies of Middle and Early Modern English word forms ending in clusters of sonorants followed by /d/ or /t/ and /s/ or /z/. These clusters became highly frequent after the loss of unstressed vowels in final syllables and were highly ambiguous when they emerged. Our data show that the ambiguity of these final clusters was indeed reduced so that the distribution of the final clusters became increasingly skewed: clusters ending in voiceless coronals became significantly clearly indicative of simple forms, while clusters ending in voiced ones came to signal inflectional complexity more reliably.|000|language evolution, selection, English, Middle English, Old English, corpus studies, lexical evolution 5293|Allesandro2020|In May 2015, a group of eminent linguists met in Athens to debate the road ahead for genera- tive grammar. There was a lot of discussion, and the linguists expressed the intention to draw a list of achievements of generative grammar, for the benefit of other linguists and of the field in general. The list has been sketched, and it is rather interesting, as it presents a general picture of the results that is very ‘past-heavy’. In this paper I reproduce the list and discuss the reasons why it looks the way it does.|000|generative grammar, state-of-the-art, summary, open problems 5294|Chomsky2020|This is an annotated transcription of Noam Chomsky’s keynote presentation at the University of Reading, in May 2017. Here, Chomsky reviews some foundational aspects of the theory of structure building: essentially, Merge and Label. The aim is to eliminate what he refers to as exten- sions of Merge which are seemingly incompatible with the Strong Minimalist Thesis while still accounting for recursive structure, displacement, and reconstruction (as the main empirical goals of the Minimalist Program). These include sidewards movement, multi-dominance, and late- Merge; all of which have been developed throughout the life cycle of transformational generative grammar. Furthermore, Chomsky formulates a series of conditions that an adequate formulation of Merge must meet, and sketches how the aforementioned extensions may violate these conditions. Chomsky arrives at a formulation of an operation MERGE, which maintains the core properties of Merge but is further restricted by limitations over what MERGE can do to the workspaces where syntactic operations apply.|000|transcript, speech, generative grammar, summary, linguistic theory, Chomsky syntax, 5295|Honkasalo2019|Adopting a functional-typological framework, this dissertation describes Eastern Geshiza (autoglossonyms: bæ-skæ, roŋ-skæ, rgævɕetsa-skæ, ŋæ=ɲi skæ ), a previously insufficiently known Trans-Himalayan (Sino-Tibetan) Horpa language spoken primarily in eastern Geshiza Valley of Danba County in the People’s Republic of China. The approximately 5000 speakers of Eastern Geshiza are categorised as ethnic Tibetans, practice agriculture, and follow the religious traditions of Bön and Tibetan Buddhism. Following an approach emphasising linguistic ecology, this descriptive grammar aims to anchor the grammatical description to the various contexts of the language. Eastern Geshiza is currently endangered. Almost all speakers of the language are now bilingual: Eastern Geshiza functions as an in-group language while Sichuanese Mandarin, also acquired since childhood, is used for external communication as a regional lingua franca. Knowledge of Tibetan lects and Written Tibetan, however, is low among the speakers. A substantial influx of new lexical loans from Chinese and a gradual language shift towards Chinese among the young constitute issues that will greatly affect both the future shape and vitality of the language. Eastern Geshiza exhibits complex phonology. It possesses an extensive phoneme inventory that contains 8 fully phonemic vowels and 37 fully phonemic consonants. The language abounds in complex consonant clusters of up to three members. Eastern Geshiza is morphologically complex. The complexity is particularly prominent in verb morphology that is characterised by an argument indexation system based on accessibility hierarchy and a set of multifunctional verbal prefixes that encode orientation, aspect, and mood. Like many of the other regional languages, Eastern Geshiza is rich in evidential categories and includes the grammatical category of engagement. Typological peculiarities of the language make it an important source of data for typological research.|000|grammar, Geshiza, Sino-Tibetan, wordlist, Horpa 5296|Egorova2019a|The paper presents a theoretical analysis and computer simulations of the distribution and changes of the linguistic information in two model language communities: Proto-Indo-European and Proto- Chinese. Simulations show that out of two main hypotheses of the formation of the Proto-Indo-European languages, the Anatolian hypotheses and the Kurgan hypotheses, the latter is better consistent with the time estimates obtained in this study. The results obtained for Proto-Indo-European communities may also be used in the analysis of Asian language communities. In particular, the similarity of Chinese and Proto-Indo- European languages in terms of the relationship between the verb and the noun opens the possibility of applying our method to the analysis of the Proto-Sino-Tibetan language family. A possibility of creating a single national language Pǔtōnghuà (普通话) in the modern China was investigated. The results of the present study also suggest that the developed models look like a quite promising new instrument for studying linguistic information transfer in complex social and linguistic systems.|000|Indo-European, Sino-Tibetan, simulation studies, missing data, missing code 5297|Clark2008|Kim Mun is a sub-grouping from a family of languages known as the Hmong- Mien. This language family is sometimes called Miao-Yao, particularly by Chinese linguists, after the Chinese minorities of the same name. Aumann (2000: 2) points out that this name is misleading because some speakers belonging to the Miao minority do not speak a Hmongic language, and many members of the Yao minority do not speak a Mienic language. There is also the She minority with some members speaking the Hmongic language She. Therefore, the names for the two largest branches of this family, Hmong and Mien, are preferred by Western linguists.|000|Kim Mun, Hmong-Mien, thesis, wordlist, 5298|Moran2015|There are numerous phylogenetic reconstruction methods and models available—but which should you use and why? Important considerations in phylogenetic analyses include data quality, structure, signal, alignment length and sampling. If poorly modelled, variation in rates of change across proteins and across lineages can lead to incorrect phylogeny reconstruction which can then lead to downstream misinterpretation of the underlying data. The risk of choosing and applying an inappropriate model can be reduced with some critical yet straightforward steps outlined in this paper. We use the question of the position of the root of placental mammals as our working example to illustrate the topological impact of model misspecification. Using this case study we focus on using models in a Bayesian framework and we outline the steps involved in identifying and assessing better fitting models for specific datasets.|000|phylogenetic reconstruction, guide, tutorial, multi-state models, 5299|Coblin2019a|This work undertakes a comparative phonological reconstruction for the Neo-Hakka dialect group. The term Neo-Hakka is an English rendering of Chinese 新客家話, a new expression now increasingly being used by Chinese dialectologists and Hakka specialists to refer to what was earlier simply called “Hakka” ( 客 家 話 ). This Neo-Hakka group includes both the better known “Mainstream Hakka” dialects of the Méixiàn type and the lesser known ones of southern Jiāngxī and contiguous areas, whose speakers do not self-identify as ethnically Hakka or understand Mainstream Hakka when they hear it spoken. Thus, the Common Neo-Hakka comparative system developed here goes beyond the earlier Proto-Hakka phonological reconstruction of Keven O’Connor (1976), who worked exclusively with a number of Mainstream Hakka dialects.|000|reconstruction, linguistic reconstruction, Hakka, Sinitic, Chinese 5300|Levison2019|This paper explores “Anglocentrism” as a bias in contemporary linguistics and cognitive sciences. Anglo concepts dominate international discourse on language and cognition, but the influence that this Anglocentric metalinguistic discourse has on global knowledge production, research methods, and the theoretical framing of research questions is rarely debated. Three case studies on heavily “Anglicised” discursive domains are provided: (i) “the mind” – and the Anglicisation of global discourse of human personhood; (ii) “happiness” – and the Anglicisation of the global discourse of human values; (iii) “com- munity” – and the Anglicisation of the global discourse of human sociality. With cross- linguistic evidence from Europe (Danish), and the Pacific (Bislama), the paper denatural- ises the English words mind, happiness, and community and the cognitive models they stand for, demonstrating that these words are not “neutral” nor “innocent” metalinguistic descriptors. Rather, they are quintessential Anglo constructs, and as such they provide a lens on humanity that is biased towards an Anglo interpretation of the world. Finally, the paper explores the “bias” concept. Paradoxically, the bias concept is in itself a product of the Anglosphere, as as such a part of the problem. However, due to this word’s meta- discursive function, the paper argues that the bias concept can become a useful Trojan Horse, a concept through which we can fight Anglocentrism from within, and pave the way for a more adequate representation of human diversity in linguistics and cognitive sciences.|000|anglo-centrism, bias, English, theoretical problems, discrimination, WEIRD data, critics, 5301|Dziubalska-Kolaczyk2019|This paper shows how preferability measures can help to explain the cross-linguistic distribution of consonant clusters, their acquisition, as well as aspects of their diachronic development. Phonological preferability is mea- sured in terms of cluster size and Net Auditory Distance, which interact with morphological complexity and frequency. Predictions derived from the prefer- ability of clusters are tested against the evidence of language specific phono- tactics, language use, language acquisition, psycholinguistic processing, and language change.|000|consonant cluster, sound change, tendencies, Polish, English, missing data, missing code 5302|Dressler2019|Morphonotactics determines phonological conditions on sound sequences produced by morphological operations both with morphemes and across boundaries. This paper examines the historical emergence and the devel- opment of morphonotactic consonant clusters in Germanic, Slavic, Baltic, Romance and other languages. It examines the role of the following morphological prefer- ence parameters: (i) morphotactic transparency/opacity, (ii) morphosemantic trans- parency/opacity, (iii) morphological richness. We identify several diachronic processes involved in cluster emergence, production and change: vowel loss, Indo-European ablaut (and comparable Arabic processes), affixation, compound- ing, metathesis, final and consonant epenthesis. Additionally, we discuss predic- tions derived from the Net Auditory Distance principle, psycholinguistic evidence and language acquisition. We show that the majority of morphonotactic clusters arise, phonologically, from vowel loss, and morphologically from concatenation.|000|phonotactics, consonant cluster, evolution, Germanic, Slavic, Baltic, Romance, emergence 5303|Chomsky2020a|This paper provides an overview of what we take to be the key current issues in the field of Generative Grammar, the study of the human Faculty of Language. We discuss some of the insights this approach to language has produced, including substantial achievements in the under- standing of basic properties of language and its interactions with interfacing systems. This pro- gress in turn gives rise to new research questions, many of which could not even be coherently formulated until recently. We highlight some of the most pressing outstanding challenges, in the hope of inspiring future research.|000|language faculty, theoretical problems, generative grammar, Chomsky syntax, language evolution, origin of language, 5304|Urban2019a|Against a multidisciplinary background this contribution explores the areal typology of western Middle and South America. Based on a new language sample and a typological questionnaire that is specifically designed to bring some of the poorly documented and extinct languages into the debate, we explore the areal distribution of 77 linguistic traits in 44 languages. While one of the goals of the present article is to provide a general up-to-date view of the areal patterning of these traits on a large scale, we also explore a number of specific questions in more detail. In particular, we address the relationship between known language areas like Mesoamerica and the Central Andes with their respective peripheries, the possibility of detecting an areal-typological signal that predates the rise of these linguistic areas, and, finally, the question of linguistic convergence along the Pacific coast. We find that, while the languages of the Mesoamerican periphery are rather diffuse typologically, the structural profiles of the Central Andean languages are embedded organically into a more general cluster of Andean typological affinities that alters contin- uously as one moves through geographical space. In different ways, the typological properties of the peripheral languages may reflect a situation that goes back to time depths which are greater than that of the emergence of the Mesoamerican and Central Andean linguistic areas. Finally, while we can confirm typological affinities with Mesoamerica for some languages of coastal South America, we do not find support for large-scale linguistic convergence on the Pacific coast.|000|Meso-America, structural data, missing code, missing data, convergence, linguistic area 5305|Trudgill2017|The term sociolinguistic typology refers to research which attempts to apply sociolinguistic data and insights to the study of the typology of the world’s languages. The goal is to investigate whether, and to what extent, the typological characteristics of the world’s languages are influenced by social structure and social organization – by the sociolinguistic character- istics of the communities in which they are spoken.|000|sociolinguistic typology, sociolinguistics, typology, sociology, introduction 5306|Wedel2019|Listeners incrementally process words as they hear them, progressively updating inferences about what word is intended as the phonetic signal unfolds in time. As a consequence, phonetic cues positioned early in the signal for a word are on average more informative about word-identity because they dis- ambiguate the intended word from more lexical alternatives than cues late in the word. In this contribution, we review two new findings about structure in lexicons and phonological grammars, and argue that both arise through the same biases on phonetic reduction and enhancement resulting from incremental processing. (i) Languages optimize their lexicons over time with respect to the amount of signal allocated to words relative to their predictability: words that are on average less predictable in context tend to be longer, while those that are on average more predictable tend to be shorter. However, the fact that phonetic material earlier in the word plays a larger role in word identification suggests that languages should also optimize the distribution of that information across the word. In this con- tribution we review recent work on a range of different languages that supports this hypothesis: less frequent words are not only on average longer, but also contain more highly informative segments early in the word. (ii) All languages are characterized by phonological grammars of rules describing predictable modifications of pronunciation in context. Because speakers appear to pronounce informative phonetic cues more carefully than less informative cues, it has been predicted that languages should be less likely to evolve phonological rules that reduce lexical contrast at word beginnings. A recent investigation through a statistical analysis of a cross-linguistic dataset of phonological rules strongly supports this hypothesis. Taken together, we argue that these findings suggest that the incrementality of lexical processing has wide-ranging effects on the evolution of phonotactic patterns.|000|selection, language evolution, phonotactics, predictability, Zipf's law, 5307|Hsiu2015|Na Meo is a language spoken in a cluster of villages encompassing the northern Vietnam provinces of Lang Son, Cao Bang, and Bac Kan, as well as a single village in Tuyen Quang province (Nguyen 2007). Its existence as a divergent Hmong-Mien language has been noted by the Vietnamese government since 1975. However, Na Meo has remained very poorly documented, and is currently still listed as an unclassified language in Ethnologue (ISO 639-3 code [neo]) due to the lack of published data. The lengthiest published word list to date is that of Nguyen (2007), which has 67 Na Meo words in non-IPA orthography.|000|Hmong-Mien, Na Meo, genetic classification, subgrouping, dataset 5308|Hombert1979|The development of contrastive tone because of the articulatory reinterpretation of segmentally-caused perturbations in intrinsic fundamental frequency is well attested in a number of unrelated languages. Considering the wide-spread character of this process, it is likely that its' seeds' can be found in the functioning of the human artic- ulatory and/or auditory mechanisms. This paper reviews what the authors consider promising explanations for well-attested tonal sound patterns, e.g. tone originating from the effect of prevocalic stop consonants or postvocalic glottal consonants, and tone rarely or never originating from the influence of postvocalic non-glottal conso- nants or from vowel height.|000|tone evolution, tonogenesis, overview, phonetics 5309|Malouf2020|Large typological databases have permitted new ways of studying cross-linguistic mor- phological variation. Recently, computational modelers with typological interests have begun to turn to broad multilingual text databases. In this paper, we will focus particularly on the UniMorph database, a collection of morpho- logical paradigms, mostly gathered automati- cally from the crowd-sourced multi-lingual dic- tionary Wiktionary. It was designed to make the large quantity of data contained in Wik- tionary available for NLP researchers by stan- dardizing the data and putting it into a form that is easy to access. For typological stud- ies, however, the requirements for a linguis- tically informed view of morphological varia- tion are quite different. They involve using a morphological database as a scientific instru- ment to both formulate and test hypotheses about the nature and organization of language systems. The requirements are, accordingly, much higher. In this paper, we survey some of the methodological challenges and pitfalls involved in using corpora for typological re- search, and we end with a proposal for best practices and directions for further research.|000|UniMorph, database, critics, morphology, paradigmatic morphology, 5310|Malouf2020|Based on the results so far, there is suggestive ev- idence for a relationship between the number of cells in a paradigm and as predicted by an encoder-decoder model. The final step in any ty- pological study has to be to show that these met- rics applied in this way to this dataset connect to a relevant linguistic notion. In this case, a crucial question is whether , a measure of how well a model predicts forms, is a reasonable measure of the I-complexity of a paradigm, or how predictable forms are. This is the question of construct valid- ity: does the test measure what it claims to mea- sure?|304|construct validity, example, computational linguistics 5311|Casillas2020|Daylong at-home audio recordings from 10 Tseltal Mayan children (0;2–3;0; Southern Mexico) were analyzed for how often children engaged in verbal interaction with others and whether their speech environment chan- ged with age, time of day, household size, and number of speakers present. Children were infrequently directly spoken to, with most directed speech coming from adults, and no increase with age. Most directed speech came in the mornings, and interactional peaks contained nearly four times the baseline rate of directed speech. Coarse indicators of children’s language development (babbling, first words, first word combinations) suggest that Tseltal children manage to extract the linguistic information they need despite minimal directed speech. Multiple proposals for how they might do so are discussed.|000|language acquisition, Tseltan Mayan, Mayan, parental engagement, minimal directed speech, 5312|Boe2019|Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly through- out our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allow more incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryn- geal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the long- standing laryngeal descent theory, which pushes back “the dawn of speech” beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.|000|primate vocalization, primate linguistics, overview, review 5313|Abadi2019|Determining the most suitable model for phylogeny reconstruction constitutes a fundamental step in numerous evolutionary studies. Over the years, various criteria for model selection have been proposed, leading to debate over which criterion is preferable. However, the necessity of this procedure has not been questioned to date. Here, we demonstrate that although incongruency regarding the selected model is frequent over empirical and simulated data, all criteria lead to very similar inferences. When topologies and ancestral sequence reconstruction are the desired output, choosing one criterion over another is not crucial. Moreover, skipping model selection and using instead the most parameter-rich model, GTR+I +G, leads to similar inferences, thus rendering this time-consuming step nonessential, at least under current strategies of model selection.|000|phylogenetic reconstruction, model selection, parameter selection, 5314|Haugen2019|Uto-Aztecan subgrouping has long been the subject of debate. We aim to establish a more up-to-date foundation for Uto-Aztecan lexicostatistics by reexamining Wick Miller’s influential lexicostatistic classification. Miller’s cognate density measure yields a symmetri- cal table based on the number of cognates each language pairing shares on a modified Swadesh-100 wordlist. However, no language has cognate sets for all word meanings (glosses) on that list. We offer an improved metric, relative cognate density, for analyzing an updated database of Kenneth Hill’s Uto-Aztecan cognate sets. We generate an asymmetrical table dividing the number of pairwise shared cognates (“match counts”) by the number of cognate sets each comparison language actually has available. Employing more standard distance- based clustering algorithms (UPGMA, Neighbor Joining, and NeighborNet), our results align with Miller’s in some respects (e.g., in identifying a Southern-Uto-Aztecan branch but not a Northern-Uto-Aztecan one) but differ in others (e.g., in not identifying “Sonoran”).|000|Uto-Aztekan, lexicostatistics, Swadesh list, missing data, missing code 5315|Robson2019|This is a book about how knowledge travels, in minds and bodies, writ- ings and performances. It explores the forms knowledge takes, the mean- ings it accrues and how they are shaped by the peoples and places that use it. This is also a book about the relationships between political power, family ties and literate scholarship in the ancient Middle East of the first millennium bc (see Tables 3a and 5a for chronological overviews). Its particular focus is on two regions where cuneiform script was the pre- dominant writing medium: Assyria in the north of modern-day Syria and Iraq; and Babylonia to the south of modern-day Baghdad (Fig. 1.1). And third, this is a book about Assyriological and historical method, both now and over the past two centuries. It asks how the field has shaped and been shaped by the academic concerns and fashions of the day. But perhaps above all this book is an experiment in writing about ‘Mesopotamian sci- ence’, as it has often been known. By focusing on the geographical and the social I hope to shed new light on the historical and intellectual too. Although I have included a lot of technical detail and evidential data, I have tried to make the book accessible to those without a specialist train- ing in cuneiform studies. In particular, the following introduction aims to set the scene and explain my rationale, while maps, online glossaries and other resources will, I hope, give some further support to non-expert readers.|000|cuneiform script, knowledge, knowledge dissimination, Assyria, Middle East, history, linguistics, 5316|Ringen2019|Some human subsistence economies are characterized by extensive daily food sharing networks, which may buffer the risk of shortfalls and facilitate cooperative production and divisions of labor among households. Comparative studies of human food sharing can assess the generalizability of this theory across time, space, and diverse lifeways. Here we test several predictions about daily sharing norms–which presumably reflect realized cooperative behavior–in a globally representative sample of nonindustrial societies (the Standard Cross-Cultural Sample), while controlling for multiple sources of autocorrelation among societies using Bayesian multilevel models. Consistent with a risk-buffering function, we find that sharing is less likely in societies with alternative means of smoothing production and consumption such as animal husbandry, food storage, and external trade. Further, food sharing was tightly linked to labor sharing, indicating gains to cooperative production and perhaps divisions of labor. We found a small phylogenetic signal for food sharing (captured by a supertree of human populations based on genetic and linguistic data) that was mediated by food storage and social stratification. Food sharing norms reliably emerge as part of cooperative economies across time and space but are culled by innovations that facilitate self-reliant production.|000|anthropology, food-sharing, cultural evolution, 5317|Sims2020a|Prior work has suggested that proto-Rma was a non-tonal language and that tonal varieties underwent tonogenesis (Liú 1998, Evans 2001a-b). This paper re-examines the different arguments for the tonogenesis hypothesis and puts forward subgroup-internal and subgroup- external evidence for an alternative scenario in which tone, or its phonetic precursors, was present at the stage of proto-Rma. The subgroup-internal evidence comes from regular correspondences between tonal varieties. These data allow us to put forward a working hypothesis that proto-Rma had a two-way tonal contrast. Furthermore, existing accounts of how tonogenesis occurred in the tonal varieties are shown to be problematic. The subgroup-external evidence comes from regular tonal correspondences to two closely related tonal Trans- Himalayan subgroups: Prinmi, a modern language, and Tangut, a mediaeval language attested by written records from the 11th to 16th centuries. Regular correspondences among the tonal categories of these three subgroups, combined with the Rma-internal evidence, allow us to more confidently reconstruct tone for proto-Rma.|000|Rma, tonogenesis, linguistic reconstruction, Sino-Tibetan, Prinmi, Tangut, etymological data 5318|Dziubalska-Kolaczyk2014|This tool automatically calculates NAD for two- and three-consonant clusters in word initial, medial and final position.|000|phonotactics, consonant clusters, online, tools 5319|Pustka2019|Focusing on sibilant-stop onsets, this paper deals with syllabic complexity in Romance languages. At its core are two empirical studies that address the complex case of French: a type-level study is based on the Petit Robert, and a token-level study uses Parisian and Southern French corpus data elaborated in the framework of the PFC program (Phonologie du Français Contemporain). The paper identifies three factors behind the emergence of phonotactic complexity: (a) vowel elision, (b) borrowing, and (c) expressivity.|000|phonotactics, consonant cluster, French, corpus studies 5320|Gasser2020|The northwestern part of the island of New Guinea has been the site of intense con- tact between a hugely diverse set of languages. Languages from at least nine non-­ Austronesian families (plus several isolates) are spoken alongside Austronesian languages from the South Halmahera-West New Guinea branch, which arrived in the region roughly 3500 years ago. This paper looks at lexical items in the semantic areas of flora, fauna, and color terms and catalogues apparent loans between 52 of these lan- guages, some relatively widespread (‘crocodile’, ‘chicken’, ‘dog’) and some much more limited in their scope. So far as the direction of borrowing can be established, the pat- terns of shared forms indicate ongoing lexical transfer across the region with a strong preference for Austronesian-to-Papuan borrowing, suggesting a historical pattern of Austronesian cultural influence in the region.|000|borrowing, Trans-New-Guinean languages, missing data 5321|Kilani2020|In this article I present FAAL, a new Feature-based Aligning ALgorithm that can be used for the alignment of phonetically transcribed lexical items according to the com- parison of phonetic features. The algorithm can be run on any pair of phonetically transcribed words without requiring any specific setting or tuning, although various parameters of its implementation as .jar library can indeed be modified by the user, if needed. The structure and work-flow of the algorithm are described in the first part of the paper. In the second part FAAL is tested against previously proposed algorithms in a test case involving two different datasets. FAAL in its default configuration outper- forms all of them. FAAL is distributed in the form of a .jar Java library. The descrip- tion of this Java library and the details about the settings and configuration of its methods are provided on-line, in my github repository at https://github.com/MKilani/ FAAL.|000|alignment, phonetic alignment, algorithm, distinctive features, 5322|Fedorenko2020|Theories of human cognition prominently feature 'Broca’s area', which causally contributes to a myriad of mental functions. However, Broca’s area is not a mono- lithic, multipurpose unit – it is structurally and functionally heterogeneous. Some functions engaging (subsets of) this area share neurocognitive resources, whereas others rely on separable circuits. A decade of converging evidence has now illuminated a fundamental distinction between two subregions of Broca’s area that likely play computationally distinct roles in cognition: one belongs to the domain-specific 'language network', the other to the domain-general 'multi- ple-demand (MD) network'. Claims about Broca’s area should be (re)cast in terms of these (and other, as yet undetermined) functional components, to estab- lish a cumulative research enterprise where empirical findings can be replicated and theoretical proposals can be meaningfully compared and falsified.|000|Broca's Area, review, summary, introduction 5323|Mitterer2020|Second language (L2) learners are often aware of the typical pronunciation errors that speakers of their native language make, yet often persist in making these errors themselves. We hypothesised that L2 learners may perceive their own accent as closer to the target lan- guage than the accent of other learners, due to frequent exposure to their own productions. This was tested by recording 24 female native speakers of German producing 60 sentences. The same participants later rated these recordings for accentedness. Importantly, the recordings had been altered to sound male so that participants were unaware of their own productions in the to-be-rated samples. We found evidence supporting our hypothesis: par- ticipants rated their own altered voice, which they did not recognize as their own, as being closer to a native speaker than that of other learners. This finding suggests that objective feedback may be crucial in fostering L2 acquisition and reduce fossilization of erroneous patterns.|000|English, second language learning, speech perception, accent 5324|Chen2020|Blasi et al. (2019) offer evidence that post-neolithic changes in bite configuration, owed to the adoption of agriculture, have led to the innovation and proliferation of labiodental consonants in the world’s languages. Here we investigate the putative association between agriculture and labiodental consonants via a new approach that does not rely on phoneme inventories. Given that labiodentals are apparently characterized by reduced muscular effort in populations with agriculture-influenced bite configurations, we test whether labiodental sounds are actually more prevalent in languages whose speakers rely on agriculture. We rely on word lists from the Automated Similarity Judgement Program (Wichmann et al. 2018), which contains transcribed lists of common words in thousands of languages. We analyze the relative frequency of sound types in the word lists of agricultural and hunter-gatherer populations, respectively, finding differing mean rates of labiodental usage in populations with distinct subsistence strategies. Using a linear mixed-effects model to control for relatedness and contact, we find support for an association between the frequency of labiodental consonants and the use of agriculture.|000|labial sounds, bite configuration, missing data, missing code, phonetics, labiodentals 5325|Ji2020|Descent and residence rules have long been of interest to anthropologists and biologists, as they structure populations and determine patterns of kinship, relatedness and cooperation. Despite the prevalence of patrilineal descent and patrilocal residence among extant Sino-Tibetan groups, belief in a matrilineal and matrilocal ancestry persists in China. Although some evidence on ancestral Sino- Tibetan kinship is now becoming available from both genetic and archaeological studies, the findings are contradictory 1,2 . |000|kinship terms, Sino-Tibetan, missing data, missing code 5326|Savelyev2020|Despite more than 200 years of research, the internal structure of the Turkic language family remains subject to debate. Classifications of Turkic so far are based on both classical historical–comparative lin- guistic and distance-based quantitative approaches. Although these studies yield an internal structure of the Turkic family, they cannot give us an understanding of the statistical robustness of the proposed branches, nor are they capable of reliably inferring absolute divergence dates, without assuming con- stant rates of change. Here we use computational Bayesian phylogenetic methods to build a phylogeny of the Turkic languages, express the reliability of the proposed branches in terms of probability, and esti- mate the time-depth of the family within credibility intervals. To this end, we collect a new dataset of 254 basic vocabulary items for thirty-two Turkic language varieties based on the recently introduced Leipzig– Jakarta list. Our application of Bayesian phylogenetic inference on lexical data of the Turkic languages is unprecedented. The resulting phylogenetic tree supports a binary structure for Turkic and replicates most of the conventional sub-branches in the Common Turkic branch. We calculate the robustness of the inferences for subgroups and individual languages whose position in the tree seems to be debatable. We infer the time-depth of the Turkic family at around 2100 years before present, thus providing a reliable quantitative basis for previous estimates based on classical historical linguistics and lexicostatistics.|000|Turkic, Bayesian analysis, lexicostatistics, comparative wordlist, 5327|Miyakawa2020|A reproducibility crisis is a situation where many scientific studies cannot be reproduced. Inappropriate practices of science, such as HARKing, p-hacking, and selective reporting of positive results, have been suggested as causes of irreproducibility. In this editorial, I propose that a lack of raw data or data fabrication is another possible cause of irreproducibility. As an Editor-in-Chief of Molecular Brain, I have handled 180 manuscripts since early 2017 and have made 41 editorial decisions categorized as “Revise before review,” requesting that the authors provide raw data. Surprisingly, among those 41 manuscripts, 21 were withdrawn without providing raw data, indicating that requiring raw data drove away more than half of the manuscripts. I rejected 19 out of the remaining 20 manuscripts because of insufficient raw data. Thus, more than 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning, at least in some portions of these cases. Considering that any scientific study should be based on raw data, and that data storage space should no longer be a challenge, journals, in principle, should try to have their authors publicize raw data in a public database or journal site upon the publication of the paper to increase reproducibility of the published results and to increase public trust in science.|000|missing data, review, problems, scientific practice, scientific data 5328|Mallory2020|The method of linguistic palaeontology (or palaeolinguistics) has a controversial status within archaeology. According to its defenders, it promises the ability to see into the social and material cultures of prehistoric societies and uncover facts about peoples beyond the reach of archaeology. Its critics see it as essentially flawed and unscientific. Using a particular case-study, the Indo-European homeland problem, this paper attempts to discern the kinds of inference which proponents of linguistic palaeontology make and whether they can be warranted. I conclude that, while the case for linguistic palaeontology has often been overstated, so has the case against it.|000|linguistic palaeography, linguistic palaeontology, linguistic reconstruction, critics, review, history of science, 5329|Mallory2020|In 1710, the same year he published his proof that we were living in the best of all possible worlds, Gottfried Wilhelm Leibniz wrote an article on linguistics in which he argued not only that the Germanic and Celtic languages were descended from a common ancestor but that this common ancestor shared a further ancestor language with all Turk- ish, Slavic, Finnish and Greek languages. While he was wrong to lump the Finnish and Turkish languages in here, the overall idea was both correct and in direct conflict with the Biblical account of linguistic descent.|1|Gottfried Wilhelm Leibniz, history of science, deep genetic relations, 5330|Mallory2020|The practice of making inferences about the cultures of language users on the evidence of reconstructed languages is called linguistic palaeontology (henceforth LP).|1|linguistic palaeontology, linguistic palaeography, definition, terminology, 5331|Mallory2020|There are numerous terms for this. While the term palaeolinguistics is becoming more common within the linguistic literature (@Crowley<2010> and Bowern 2010, p. 316), linguistic palaeontology is the term found in the relevant archaeological literature. Cultural reconstruction is also a common term but suggests a focus on cultural artefacts over environmental features like salmon. Wörter und Sachen may actually be the most accurate term but no longer appears to be in use.|F2|linguistic palaeography, linguistic palaeontology, Wörter und Sachen, linguistic reconstruction, definition 5332|Mallory2020|The ultimate goal at this stage is the discovery of sound laws—though the term ‘law’ can be misleading. To prefig- ure later discussion, some linguists will be instrumentalists about these laws and simply identify them with regular or systematic correspondences in sound without making any further commitments. Alternatively, you can identify the laws with posited historical events that are particular to cer- tain languages (see Greenberg 1979 for discussion of this point).|2|sound law, definition, terminology 5333|Mallory2020|LP relies upon a reconstruction of elements of a proto-lan- guage which researchers can then use to draw inferences. These reconstructions are produced through the comparison of similar expressions in different languages in order to make inferences about their common ancestors. This comparative method is not merely based on the fact that certain words appear similar but that the similarities between them are sys- tematic.|2|comparative method, definition, terminology 5334|Mallory2020|Once a law or systematic correspondence has been dis- covered, it can then be reversed to reconstruct the state of the language before the shift occurred. This enables linguists to frame hypotheses about the phonological forms of the proto-language. One of the reasons that sound laws have the reputation they do is the success of particular predictions.|3|sound correspondences, sound law, terminology, methodology 5335|Mallory2020|We have an instance here of a debate between archaeologists such as Renfrew and Anthony which hinges in large part on the legitimacy of linguistic palaeontology as a method.|4|linguistic palaeography, linguistic palaeontology, Indo-European homeland, debate, 5336|Anthony2007|The proto- lexicon contains much more, including clusters of words, suggesting that the speakers of PIE inherited their rights and duties through the father’s bloodline only (patrilineal descent); probably lived with the husband’s family after marriage (patrilocal residence); recognized the authority of chiefs who acted as patrons and givers of hospitality for their clients; likely had formally instituted warrior bands; practiced ritual sacrifices of cattle and horses; drove wag- ons; recognized a male sky deity; probably avoided speaking the name of the bear for ritual reasons; and recognized two senses of the sacred (“that which is imbued with holiness” and “that which is forbidden”). Many of these practices and beliefs are simply unrecoverable through archaeology. :comment:`Quoted after ` @Mallory2020 :comment:`4`|15|Indo-European homeland, linguistic palaeontology, linguistic palaeography, linguistic reconstruction 5337|Mallory2020|Before we continue, certain familiar caveats are required. The acceptance of the claim that Indo-European languages share a common ancestor does not commit one to the idea that there ever existed a homogeneous material culture or a unified linguistic community that spoke it. These ideas might be motivated by further archaeological evidence but the linguistic evidence does not compel them. Similarly, talk of a ‘homeland’ must be shorn of nationalist connotations. The homogeneity enforced, often violently, by the modern nation-state shouldn’t be projected upon prehistoric com- munities. There is little reason to believe that speakers of a proto-language, in virtue of sharing reconstructed linguistic items, considered themselves to be a single political com- munity or that they maintained rigid borders with other communities.|4|linguistic palaeography, linguistic palaeontology, debate, critics, Indo-European homeland 5338|Ratcliffe2020|This paper presents a methodology for quantifying diversity within a group of related languages and correlating the patterns found with known historical developments, as a way of testing a variety of hypotheses, regarding subclassification, reconstruction, the influence of language contact, the relative consistency of the speed of language change, etc. The methodology is applied to Arabic dialects, for which there is a wealth of syn- chronic variation as well as considerable historical documentation on both linguistic and migration history. The goal is to establish a more solid empirical basis for inferring diachronic conclusions based on comparative analysis of synchronic data.|000|Arabic, dialect data, comparative wordlist, lexicostatistics, dataset, dialect classification, 5339|Johansson2020|Sound symbolism emerged as a prevalent component in the origin and development of language. However, as previous studies have either been lacking in scope or in phonetic granularity, the present study investigates the phonetic and semantic features involved from a bottom-up perspective. By analyzing the phonemes of 344 near-universal concepts in 245 language families, we establish 125 sound-meaning associations. The results also show that between 19 and 40 of the items of the Swadesh-100 list are sound symbolic, which calls into question the list’s ability to determine genetic relationships. In addition, by combining co-occurring semantic and phonetic features between the sound symbolic concepts, 20 macro-concepts can be identified, e. g. basic descriptors, deictic distinctions and kinship attributes. Furthermore, all identified macro-concepts can be grounded in four types of sound symbolism: (a) unimodal imitation (onomatopoeia); (b) cross-modal imitation (vocal gestures); (c) diagrammatic mappings based on relation (relative); or (d) situational mappings (circumstantial). These findings show that sound symbolism is rooted in the human perception of the body and its interaction with the surrounding world, and could therefore have originated as a bootstrapping mechanism, which can help us understand the bio-cultural origins of human language, the mental lexicon and language diversity.|000|sound symbolism, dataset, concept list, Swadesh list, 5340|Wilson2019|Based on the number of words per meaning across the Indo-European Swadesh list, Pagel et al. (2007) suggest that frequency of use is a general mechanism of linguistic evolution. We test this claim using within-language change. From the IDS (Key & Comrie 2015) we compiled a comparative word list of 1,147 cognate pairs for Classical Latin and Modern Spanish, and 1,231 cognate pairs for Classical and Modern Greek. We scored the amount of change for each cognate pair in the two language histories according to a novel 6-point scale reflecting increasing levels of change from regular sound change to external borrowing. We find a weak negative correlation between frequency of use and lexical change for both the Latin-Spanish and Classical-Modern Greek language developments, but post hoc tests reveal that low frequency of use of borrowed words drive these patterns, casting some doubt on frequency of use as a general mechanism of language change. |000|Latin, Greek, lexical change, modeling, comparative wordlist, 5341|Gafni2019|This paper describes a structural account of phonetic symbolism and submits it to empirical investigation. To enable testing for possible iconic sound–emotion relations, participants compared pairs of syllables (e.g., ma – ba) as well as pairs of emotional states (e.g., joyful – sad) on various perceptual scales (e.g., softness). In addition, we replicated the classic ‘bouba/kiki’ experiment to investigate sound-shape symbolism. In accordance with the theoretical model, the results of the experimental tasks suggest that participants can detect abstract similarities between speech sounds and emotions as well as geometrical shapes. We discuss the theoretical model and the experimental results in relation to previous empirical findings and conflicting evidence from the study of affective iconicity in poetry.|000|emotion, phonology, sound symbolism, missing data, missing code 5342|Gafni2019|The foregoing discussion suggests that features of speech sounds can be abstracted and paralleled with properties of the behavioural expressions of emotions.|56|emotion, speech sounds, phonology, interaction, sound symbolism, 5343|Gafni2019|The results of the experiment suggest that participants can detect qualitative differences within pairs of individual speech sounds and pairs of emotions. Although the nature of the tasks was rather unusual, the relatively large effect sizes suggest that many participants shared the same intuitions about many of the comparisons. This in turn suggests that these qualitative differences are grounded in perception.|66|emotion, speech sounds, sound symbolism, phonology 5344|Mollica2019|We introduce theory-neutral estimates of the amount of information learners possess about how language works. We provide estimates at several levels of linguistic analysis: phonemes, wordforms, lexical semantics, word frequency and syntax. Our best guess is that the average English-speaking adult has learned 12.5 million bits of information, the majority of which is lexical semantics. Interestingly, very little of this information is syntactic, even in our upper bound analyses. Generally, our results suggest that learners possess remarkable inferential mechanisms capable of extracting, on average, nearly 2000 bits of information about how language works each day for 18 years.|000|information, language acquisition, English, phonology, lexicon, semantics, syntax, estimate 5345|Mollica2019|These difficulties are in part why the Fermi approach is so useful: we do not need to make strong theoretical commitments in order to study the problem if we focus on rough estimation of orders of magnitude. Estimates of the number of words children acquire range in the order of 20 000 –80 000 total wordforms [13]. However, when words are grouped into families (e.g. ‘dog’ and ‘dogs’ are not counted separately) the number known by a typical college student is more in the range of 12 000 –17 000 [14,15] [@DAnna1991] (although see [16][@Brysbaert2016] for an estimate over twice that size). Lexical knowledge extends beyond words, too. Jackendoff [17] estimates that the average adult understands 25 000 idioms, items out of the view of most vocabulary studies. Our estimates of capacity could, of course, be based on upper bounds on what people could learn, which, to our knowledge, have not been found. Looking generally at these varied numbers, we will use an estimate of 40 000 as the number of essentially unique words/idioms in a typical lexicon.|4|number of words, language acquisition, knowledge, 5346|DAnna1991|Studies using dictionary-sampling methods to estimate vocabulary size have left a bewildering trail of widely differing estimates. We argue that many estimates are misleading (generally too high) principally because the definition of a word is too liberal. For practical purposes (e.g., planning vocabulary instruction), it makes sense to define a word as a base form, resembling what linguists call a lemma, and to disregard certain word forms from estimates of vocabulary size (e.g., proper names and archaic words). By providing a clear rationale for the word source which was sampled, and by using clearly defined operational criteria for what constitutes a word as well as for the procedures used in the estimation task, we found that the average number of different words known by a college student is 16,785. We suggest that vocabulary size, and corresponding rates of vocabulary growth, may not be as great, nor attempts to directly teach vocabulary as futile, as some would suggest.|000|number of words, language acquisition, knowledge, estimate, vocabulary size 5347|Ohagan2019|The question of where Proto-Tupí-Guaraní (PTG) was spoken has been a point of considerable debate. Both northeastern and southwestern Amazonian homelands having been proposed, with evidence from both archaeology and linguistic classification playing key roles in this debate. In this paper we demonstrate that the application of linguistic migration theory to a recent phylogenetic classification of the Tupí-Guaraní family lends strong support to a northeastern Amazonian homeland.|000|missing data, missing code, Tupi-Guarani, homeland, phylogenetic reconstruction, 5348|Dellert2020|This article describes the first release version of a new lexicostatistical database of Northern Eurasia, which includes Europe as the most well-researched linguistic area. Unlike in other areas of the world, where databases are restricted to covering a small number of concepts as far as possible based on often sparse documentation, good lexical resources providing wide coverage of the lexicon are available even for many smaller languages in our target area. This makes it possible to attain near-completeness for a substantial number of concepts. The resulting database provides a basis for rich benchmarks that can be used to test automated methods which aim to derive new knowledge about language history in underresearched areas.|000|database, European languages, Asian languages, wordlist, comparative wordlist, 5349|Lund2020| The Tupí–Guaraní languages Omagua [omg] and Kokama [cod] constitute interesting examples of heavy language contact in Amazonia. This is evident from their lexicon, which is mostly Tupí–Guaraní, but with a high percentage of non-Tupí–Guaraní forms, and the grammar, which is very distinct from other Tupí–Guaraní languages. The lexifying Tupí–Guaraní language in this contact situation is believed to be a language similar to Tupinambá [tpn], now extinct, but well-known from 16th century Jesuit grammars and texts. The circumstances which yielded the contact situation between the ancestral language of Omagua and Kokama and the non-Tupí–Guaraní language(s) are not widely known. Nor have the non-Tupí–Guaraní language(s) so far been identified. This thesis compares the phonology of Omagua and Kokama with their closest relative Tupinambá, and reconstructs the phonology of their most recent common ancestor, Proto-Omagua–Kokama–Tupinambá. In doing this, the thesis identifies which phonological changes were involved in the genesis of Omagua and Kokama, and what we can infer about the phonologies of the non-Tupí–Guaraní languages involved in the contact situation. This is of interest to the field of contact linguistics, as examples of contact languages of pre-Columbian origin in the Americas are rare. |000|dataset, sound correspondences, phonological reconstruction, Tupi-Guarani, 5350|Daniel2019|We analyze the dynamics of dialect loss in a cluster of villages in rural northern Russia based on a corpus of transcribed interviews, the Ustja River Basin Corpus. Eleven phonological and morphological variables are analyzed across 33 speakers born between 1922 and 1996 in a series of logistic regression models. We propose three characteristics for a comparison of the rate of loss of different variables: initial level, steepness, and turning point. We show that the dynamics of loss differs significantly across variables and discuss possible reasons for such differences, including perceptual salience, initial variation in the dialect, and convergence with regionally or socially defined varieties of Russian. In conclusion, we discuss the pros and cons of logistic regression as an approach to quantitative modeling of dialect loss. Our paper contributes to the study and documentation of Russian dialects, most of which are on the verge of extinction.|000|dialect loss, dialectology, Russian, logistic regression, Russian dialects 5351|Gamallo2020|Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.|000|n-gram model, distance-based methods, phylogenetic reconstruction, Ward clustering, European languages, corpus studies, 5352|Gamallo2020|Data and code for their approach are available from GitHub: * https://github.com/gamallo/LanguageDistance/tree/master/corpus |000|corpus studies, automatic language comparison, phylogenetic reconstruction, distance-based methods, 5353|Mallory2020|A proto-language is necessarily a historical object; it is the posited last point at which two historically attested cognates met. I don’t think this is a radically damning problem but it does suggest that we need to develop an account of the relationship between a reconstructed language and a historical language which captures the temporal nature of the historical language and the ideal nature of the reconstruction. The realist position has some positive features but, if it is to help us connect his- torical linguistics and archaeology, it cannot be endorsed in its Platonist form. That is, we should not commit ourselves to the claim that words literally are abstract objects.|10|methodology, linguistic palaeography, linguistic palaeontology, construct validity 5354|Bahnsen2020|Von der Vorstellung eines übersichtlichen Stammbaums müssen wir uns verabschieden. In der Frühgeschichte des Menschen ging es wild durcheinander.|000|homo sapiens, human prehistory, overview, popular science, 5355|Bahnsen2020|Interesting summary in Die ZEIT about Homo Sapiens and their early relatives.|000|homo sapiens, human prehistory, popular science, summary 5356|Derebucha2020|Vor 150 Jahren begann Heinrich Schliemann mit den Ausgrabungen in Troja. Was wir über die Stadt zu wissen glauben, stimmt nicht alles. Ein Faktencheck|000|Heinrich Schliemann, Troja, excavations, history of science, infographics 5357|Derebucha2020|Nice infographic about the excavations in Troja and their history in Die ZEIT.|000|history of science, excavations, Troja, Heinrich Schliemann 5358|Klein2020|Der Anthropologe Joseph Henrich über unseren Hang zur Treue – und seine These, dass es die Blüte der europäischen Kultur ohne den kirchlichen Zwang zur Monogamie nie gegeben hätte.|000|interview, fidelity, European culture, European society, church, sociology, anthropology, Joseph Henrich, 5359|Gerste2020|US-Gründervater Thomas Jefferson war gegen die Sklaverei – und gegen ihre Abschaffung. Hunderte Männer und Frauen gehörten zu seinem Besitz, unter ihnen die junge Sally Hemings, die er sich zur Geliebten nahm.|000|Thomas Jefferson, history, slavery, popular science 5360|Menne2020|Künstliche Intelligenz soll alles können: Autos steuern, Krebs erkennen, die Wirtschaft revolutionieren. Um die besten Experten wird weltweit gestritten. Auch Deutschland rangelt mit.|000|artificial intelligence, overview, popular science, Die ZEIT 5361|Schmid2007|This paper presents a statistical method for the segmentation of words into syllables which is based on a joint n-gram model. Our system assigns syllable boundaries to phonetically transcribed words. The syllabification task was formulated as a tagging task. The syllable tagger was trained on syllable- annotated phone sequences. In an evaluation using ten-fold cross-validation, the system correctly predicted the syllabification of German words with an accuracy by word of 99.85%, which clearly exceeds results previously reported in the literature. The best performance was observed for a context size of five preceding phones. A detailed qualitative error analysis suggests that a further reduction of the error rate by up to 90% is possible by eliminating inconsistencies in the training database.|000|automatic syllable segmentation, n-gram model, German, missing code 5362|Krantz2019|The identification of syllables within phonetic se- quences is known as syllabification. This task is thought to play an important role in natural language understanding, speech production, and the development of speech recognition systems. The concept of the syllable is cross-linguistic, though formal definitions are rarely agreed upon, even within a language. In response, data-driven syllabification methods have been devel- oped to learn from syllabified examples. These methods often employ classical machine learning sequence labeling models. In recent years, recurrence-based neural networks have been shown to perform increasingly well for sequence labeling tasks such as named entity recognition (NER), part of speech (POS) tagging, and chunking. We present a novel approach to the syllabification problem which leverages modern neural network techniques. Our network is constructed with long short-term memory (LSTM) cells, a convolutional component, and a conditional random field (CRF) output layer. Existing syllabification approaches are rarely evaluated across multiple language families. To demonstrate cross-linguistic generalizability, we show that the network is competitive with state of the art systems in syllabifying English, Dutch, Italian, French, Manipuri, and Basque datasets.|000|automatic syllable segmentation, cross-linguistic study, neural sequence labeling, 5363|Krantz2019|Data and code for testing and application are available from GitHub: https://github.com/jacobkrantz/lstm-syllabify|000|automatic syllable segmentation, cross-linguistic study, code 5364|Adsett2008|Although automatic syllabification is an important component in several natural language tasks, little has been done to compare the results of data-driven methods on a wider set of languages. This thesis compares the results of four data-driven syllabification algorithms (IB1, the Look-up Procedure, Liang’s algorithm, and Syllabification by Analogy) on nine European languages (Basque, Dutch, English, French, Frisian, German, Italian, Norwegian, and Spanish). Three questions are investigated: which algorithm performs best, which domain (spelling or pronunciation) is easier for automatic syllabification, and which languages are more straightforward to syllabify. Firstly, findings show that Syllabification by Analogy performs better than the other algorithms tested with a mean word accuracy of 96.84%. Secondly, contrary to claims in the field, no significant difference was found between automatic syllabification performance in the two domains. Finally, the ranking of the languages in terms of syllabic complexity matches the results of previous work using alternate approaches.|000|automatic syllable segmentation, Basque, Dutch, English, French, Frisian, German, Italian, Norwegian, Spanish, European languages, comparison, algorithms, 5365|Fagan2020|This squib provides evidence from the superlative in support of Wiese’s (1996) position that s (sibilant) + stop sequences in German behave as complex segments. With the exception of the sequence /sk/, the consonants that require schwa epenthesis before the superlative suffix are all coronal obstruents: nettest- [ˈnɛtəst] ‘nicest’, süßest- [ˈzyːsəst] ‘sweetest’, frischest- [ˈfrɪʃəst] ‘freshest’, brüskest- [ˈbrʏskəst] ‘most abrupt’. If one assumes that the sequence /sk/ is a single, complex segment with the feature [coronal] as well as [dorsal], the formation of the superlative can be accounted for with a simple rule of schwa epenthesis.|000|phonology, German, Standard German, affricates, phonological analysis, segmentation 5366|Fagan2020|The interesting aspect of this article is that it provides a good summary on people discussing that certain sound combinations in a language like German are better treated as one unit, rather then treating them as several units. This might have crucial practical implications, as it may facilitate specific tasks like automatic syllabification, pseudo-word generation, and morphological segmentation, if a more flexible treatment of units in the phonology of a language is allowed for.|000|phoneme syllabification, phonological analysis, pseudo word, representation, 5367|Floyd2020|Many words are associated with more than a single meaning. Words are sometimes “ambiguous,” applying to unrelated meanings, but the majority of frequent words are “polysemous” in that they apply to multiple related meanings. In a preregistered design that included 2 tasks, we tested adults’ and 4.5- to 7-year-old children’s ability to learn 4 novel polysemous words or 4 novel ambiguous words. Both children and adults demonstrated a polysemy over ambiguity learning advantage on each task after exposure, showing better learning of novel words with multiple related meanings than novel words with unrelated meanings. Stimuli in the polysemy condition were designed and then normed to guard against learners relying on a simple definition to distinguish the multiple target meanings for each word from foils. We retested available participants after a week-long delay without providing additional exposure and found that adults’ performance remained strong in the polysemy condition in 1 task, and children’s performance remained strong in the polysemy condition in both tasks. We conclude that participants are adept at learning polysemous words that vary along multiple dimensions. Current results are consistent with the idea that ambiguous meanings of a word compete, but polysemous meanings instead reinforce one another.|000|polysemy, colexification, ambiguity, language acquisition, psycholinguistics, experimental study 5368|Plummer2017|The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.|000|dataset, sentence-based image description, sentences, corpus, automatic image description 5369|Young2014|We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations, based on a large corpus of 30K images and 150K descriptive captions.|000|automatic image description, sentence-based image description, dataset, 5370|Grimm1819|Diese Sprachkünstler scheinen nicht zu fühlen, daß es kaum eine Regel gibt, die sich steif überall durchführen läßt; jedes Wort hat seine Geschichte und lebt sein eigenes Leben, es gilt daher gar kein sicherer Schluß von den Biegungen und Entfaltungen des einen auf die des anderen, sondern erst das, was der Gebrauch in beiden gemeinschaftlich anerkennt, darf von der Grammatik angenommen werden.|XIV|nice quote, Jacob Grimm, word history, 5371|Klubkova2016|This article examines a brochure “Idea et desiderata” (1773) by H. L. Ch. Bacmeister which includes parallel texts in Russian, German, French and Latin. The text is essentially an instruction to the collection of linguistic material that has not been previously any content analysis. We analyze the relationship of this text with “Russian grammar” by Schlözer and the concept of “different languages” which underlies the language differentiation in the Pallas dictionary. The brochure consists of three parts: the instruction, the questionnaire and the example. The instruction contains information about transcription and data which are to be collected. The questionnaire is not a list of words but sentences containing basic vocabulary and grammar. The example is a translation of a Bible quotation. The article contains a page of translation from Russian into Arabic. The article suggests a place for Bacmeister’s composition in the history of linguistics: Bacmeister’s recommendations for collectors of linguistic material may be considered as a first guide to field linguistics, fi rst in Russia.|000|questionnaire, history of science, Russian, word list, concept list, Hartwig Ludwig Christian Bacmeister 5372|Bacmeister1773|Von dem kurzen Aufsatze, welcher hiebei (Seite 23 bis 30) in Rußischer, Französischer, Lateinischer und Deutscher Sprache gedruckt ist, werde ich Uebersetzungen in so viele verschiedene lebende Sprachen saammlen; als ich nur bekommen kann. Verschiedene Sprachen nenne ich alle diejenigen, die von zweien Völkern geredet werden, deren eines das andere nicht ohne Uebung versteht, wenn gleich diese Sprachen sich sonst sehr ähnlich sind. Zu den lebenden Sprachen rechne ich auch solche, die nur noch bei gewissen Gelegenheiten wie die Slawonische und Alt-griechische beim Gottesdienste, gebraucht werden. Die Schriftzüge, die Aussprache und die Bedeutung eines jeden Wortes der Uebersetzung hoffe ich mit einiger Genauigkeit angezeigt zu erhalten. |6|mutual intelligibility, language boundaries, history of science, nice quote, concept list, 5373|Klein1978|In the late eighteenth century, the Russian Court and the Royal Academy became a center of linguistic studies. Especially under the rule of Catherine II (1762-1796), the St. Petersburg Academy of Science began a systematic collection of the languages of the world. Thoug initially the concern was with the languages of the Russian empire and its neighbors, this interest expanded rapidly to include the languages of Africa and the Americas as well. [...] thus when J. C. @Bacmeister<1773>, one of her academicians, expressed an interest in comparative linguistics, Catherine urged him to appeal to the scholars of the world for their assistance in the collection of this linguistic material. |137|word lists, concept list, comparative linguistics, history of science, Catherine the Great, 5374|Watts2020|Theories differ over whether religious and secular worldviews are in competition or represent overlapping and compatible frameworks. Here we test these theories by examining homogeneity and overlap in Christian and non-religious people's explanations of the world. Christian and non-religious participants produced free text explanations of 54 natural and supernatural phenomena. Using a new text analytic approach, we quantitatively measure the similarity between 7613 participant generated explanations. We find that the relative homogeneity of Christian and non-religious people's explanations vary depending on the kind of phenomena being explained. Non-religious people provided more similar explanations for natural than supernatural phenomena, whereas Christian explanations were relatively similar across both natural and supernatural phenomena. This challenges the idea that religious systems standardize and restrict people's worldviews in general, and instead suggest this effect is domain specific. We also find Christian and non-religious participants used largely overlapping concepts to explain natural and supernatural phenomena. This suggests that religious systems supplement rather than compete with secular based worldviews, and demonstrates how text analytics can help understand the structure of group differences.|000|religion, NLP, text comparison, explanation, Christianity, network approaches, 5375|Bauer2020|In Nigeria sterben Tausende von Menschen im Kampf um Weide- und Ackerflächen. Über den Bürgerkrieg zwischen Nomaden und Bauern wird kaum berichtet, obwohl er einer der schlimmsten Konflikte weltweit ist. Unsere Reporter waren auf beiden Seiten des Frontverlaufs|000|Fulfulde, Fulami, Nigeria, conflicts, civil war, nomads, newspaper article, 5376|Hewson1977|The prime principle lying behind all comparative linguistics is the regularity of sound change. Without this principle, comparative linguistics would be mere empty speculation: any ad hoc rule could be created on the spur of the moment to justify the most fanciful etymology. But with the principle of the regularity of sound change comparative linguistics becomes a rigorous science: 1 it is possible to pro- pose a hypothesis, and then demonstrate clearly from the data whether the hypothesis works or not. In classic scientific method if the hypothesis does not work, it is to be abandoned; if it does work it must be shown to apply to all of the data in a completely coherent fashion. All apparent anomalies must therefore be either explained as the effects of some other cause: as a result of analogy, or borrowing or dialectal interference. [...] Our aim, therefore, was: t o devise a programme that would use the correspondences and reflexes of the daughter languages to find cognates in an input of raw data in these languages and to reconstruct proto forms for these cognates. The proto forms and the cognates from which they had been generated would then be assembled on the computer in the form of a proto language lexicon or dictionary.|000|Proto-Algonkian, computational historical linguistics, linguistic reconstruction, automatic linguistic reconstruction, 5377|Hewson1989|1 Comparative and Historical Linguistics 2 Upstream and Downstream Algorithms 3 Historical Linguistics 4 Lexicostatistics 5 Comparative Reconstruction 6 Conclusion 7 Literature (selected)|000|computational historical linguistics, computer-assisted analysis, summary, overview, 5378|Wang2020|The Tungusic languages form a language family spoken in Xinjiang, Siberia, Manchuria and the Russian Far East. There is a general consensus that these languages are genealogically related and descend from a common ancestral language, conventionally called ‘Proto-Tungusic’. However, the exact geographical location where the ancestral speakers of Proto-Tungusic originated from is subject to debate. Here we take an unprecedented approach to this problem, by integrating linguistic, archaeological and genetic evi- dence in a single study. Our analysis of ancient DNA suggests genetic continuity between an ancient Amur genetic lineage and the contemporary speakers of the Tungusic languages. Adding an archaeolin- guistic perspective, we infer that the most plausible homeland for the speakers of Proto-Tungusic is the region around Lake Khanka in the Russian Far East. Our study pushes the field forward in answering the tantalizing question about the location of the Tungusic homeland and in illustrating how these three disciplines can converge into a holistic approach to the human past.|000|study without data, Proto-Tungusic, Altaic, homeland 5379|Brody2019|This paper explores the notion of analyzing cross-linguistically uncommon morphosyntactic structures in terms of their historical development. What may seem extraordinary in the synchronic snapshot of a language can often be clearly accounted for through diachronic considerations. To illustrate this, the current study examines the typologically uncommon phenomenon of multiple exponence, the realization of the same grammatical information in multiple places within an inflected word, in the Kiranti (Tibeto-Burman) languages. Typologically speaking, we do see a strong tendency cross-linguistically towards encoding grammatical information once within an inflected word, and against multiple exponence. Yet the phenomenon of multiple exponence is attested in a number of languages. This paper presents comparative evidence from the Kiranti languages that supports the claim that multiple exponence in synthetic verbs in the modern Kiranti languages comes as a result of the interaction between language(family)-specific typology (multiple agreement in periphrastic verbs) and an uncontroversial language change process (coalescence of periphrastic forms into synthetic forms).|000|morphosyntax, historical linguistics, linguistic reconstruction, grammatical change, Kiranti 5380|Eden2018|Three independent approaches to measuring cross-language phonological distance are pursued in this thesis: exploiting phonological typological parameters; measuring the cross-entropy of phonologically transcribed texts; and measuring the phonetic similarity of non-word nativisa- tions by speakers from different language backgrounds. Firstly, a set of freely accessible online tools are presented to aid in establishing parametric values for syllable structure and phoneme inventory in different languages. The tools allow re- searchers to make differing analytical and observational choices and compare the results. These tools are applied to 16 languages, and correspondence between the resulting parameter values is used as a measure of phonological distance. Secondly, the computational technique of cross-entropy measurement is applied to texts from seven languages, transcribed in four different ways: a phonemic IPA transcription; with Elements; and with two sets of binary distinctive features in the SPE tradition. This technique results in consistently replicable rankings of phonological similarity for each transcription sys- tem. It is sensitive to differences in transcription systems. It can be used to probe the con- sequences for information transfer of the choices made in devising a representational system. Thirdly, participants from different language backgrounds are presented with non-words covering the vowel space, and asked to nativise them. The accent distance metric ACCDIST is applied to the resulting words. A profile of how each speaker’s productions cluster in the vowel space is produced, and ACCDIST measures the similarity of these profiles. Averaging across speakers with a shared native language produces a measure of similarity between language pro- files. Each of these three approaches delivers a quantitative measure of phonological similarity between individual languages. They are each sensitive to different analytical choices, and re- quire different types and quantities of input data, and so can complement each other. This thesis provides a proof-of-concept for methods which are both internally consistent and falsifi- able.|000|phoneme inventory, databse, transcription systems, dissertation, 5381|Eden2018|Two languages can therefore be compared using the Kullback-Leibler divergence even if we do not know the true probability distribution of character sequences for them, provided we have a reasonable estimate for each. For example, we can derive two encodings for English: the first based on our reasonable estimate of the probability distribution of English, and the second based instead on German. We shall label the distribution derived from English the ‘true’ distribution P , and the one derived from German an approximate distribution Q. The cross-entropy of these estimates will be called H(English, German).|108|entropy, entropy estimation, cross-entropy, Kullback-Leibler divergence 5382|Hu2018|People infer the personalities of others from their facial appearance. Whether they do so from body shapes is less studied. We explored personality inferences made from body shapes. Participants rated personality traits for male and female bodies generated with a three-dimensional body model. Multivariate spaces created from these ratings indicated that people evaluate bodies on valence and agency in ways that directly contrast positive and negative traits from the Big Five domains. Body-trait stereotypes based on the trait ratings revealed a myriad of diverse body shapes that typify individual traits. Personality-trait profiles were predicted reliably from a subset of the body-shape features used to specify the three-dimensional bodies. Body features related to extraversion and conscientiousness were predicted with the highest consensus, followed by openness traits. This study provides the first comprehensive look at the range, diversity, and reliability of personality inferences that people make from body shapes.|000|personality, psychology, body shape, human body perception, multiple correspondence analysis, 5383|Alekseev1985|Процент лексических совпадений, полученный при попарном сопо­ ставлении (табл. 2), позволяет сделать следующие выводы: наиболее близкими друг к другу, как это и принято считать в специальной ли­ тературе, оказываются табасаранский и агульский языки и крыз- ский и будухский. Более отдаленно родстао лезгинского языка с агульским и табасаранским, а также рутульского с цахурским. Все перечисленные языки образую т более широкую группу с возможным включением в нее арчинского языка, дающего чуть меньший про­ цент совпадений по сравнению с цахурским. Вне этой группы стоит удинский язык, но по отношению к хиналугскому, анарскому и лак­ скому язы кам его принадлежность к лезгинским языкам очевидна. Особняком стоит хиналугский язык, имеющий сопоставимый процент совпадений как с лезгинскими, так и с аварским и лакским языками. Таким образом, по данным лексикостатистики, классификация лез­ гинских языков приведена на схеме: [a graphic of a very schematic tree of the Lezgian languages]|23|Lezgian languages, genetic classification, subgrouping, history of science, lexicostatistics, phylogenetic reconstruction, distance matrix 5384|Pulleyblank1968|There has been surprisingly little detailed study of the rhyming of T'ang poetry. No doubt the sheer bulk has been a deterrent. Another factor has undoubtedly been that when so much is known from other sources about T'ang Chinese there has been comparatively little incentive to study poetic rhyming whereas for earlier periods, it may be almost the only means available to investigate pronunciation. Moreover the uniformity of ``regulated poetry'' (lüshī) and the way in which it set the standard for later centuries of imitative poets have given the false impression that departures from this norm in ``old style poetry'' (gǔtóushī) were simply due to greater license and did not necessarily reflect genuine linguistic differences. |000|rhyming practice, Táng poems, rhyme analysis, Chinese, 5385|Bond2019|Canonical Typology is a methodological framework for conducting typological research in which descriptive categories and theoretical concepts are deconstructed into fine-grained parameters of typological variation. Like other multivariate approaches to cross-linguistic research (Haspelmath 2007, Hyman 2009, Bickel 2010, 2011), Canonical Typology utilizes observations on a large number of empirically motivated variables to gauge the similarities and differences between linguistic structures (within or across languages). The method is distinguished from other contemporary approaches to typology by its appeal to the notion of the canon, a logically motivated archetype from which attested and unattested patterns are calibrated.|000|comparative concept, construct validity, linguistic typology, tertium comparationis, methodology 5386|Strauss2009|Measures of psychological constructs are validated by testing whether they relate to measures of other constructs as specified by theory. Each test of relations between measures reflects on the validity of both the measures and the theory driving the test. Construct validation concerns the simultaneous process of measure and theory validation. In this article, we review the recent history of validation efforts in clinical psychological science that has led to this perspective, and we review the following recent advances in validation theory and methodology of importance for clinical researchers. These are: the emergence of nonjustificationist philosophy of science; an increasing appreciation for theory and the need for informative tests of construct validity; valid construct representation in experimental psychopathology; the need to avoid representing multidimensional constructs with a single score; and the emergence of effective new statistical tools for the evaluation of convergent and discriminant validity.|000|construct, construct validity, philosopy of science, objectivity, reliability, psychological constructs 5387|Machery2006|The operationalization of scientific notions is instrumental in enabling experimental evidence to bear on scientific propositions. Conceptual change should thus translate into operationalization change. This article describes some important experimental works in the psychology of concepts since the beginning of the twentieth century. It is argued that since the early days of this field, psy- chologists’ theoretical understanding of concepts has been modified several times. However, in all cases but one, these theoretical changes did not translate into changes in the operationalization of the notion of concept learning.|000|operationalization, construct validity, philosopy of science, 5388|Teng2020|Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage [1, 2]; its poetic genres impose unique combinatory constraints on linguistic elements [3]. How does the constrained poetic structure facilitate speech segmentation when common linguistic [4, 5, 6, 7, 8] and statistical cues [5, 9] are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language. |000|parsing, understanding, Ancient Chinese, Chinese poetry, neurolinguistics, speech segmentation, speech processing 5389|Muecke2020| To assess a phonological theory, we often compare its predictions to phonetic observations. This can be complicated, however, because it requires a theoretical model that maps from phonological representations to articulatory and acoustic observations. In this study we are concerned with the question of how phonetic observations are interpreted in relation to phonological theories. Specifically, we argue that deviations of observations from theoretical predictions do not necessitate the rejection of the theoretical assumptions. We critically discuss the problem of overinterpretation of phonetic measures by using syllable coordination for different speaker groups within Articulatory Phonology. It is shown that surface variation can be explained without necessitating substantial revision of the underlying phonological theory. These results are discussed with respect to two types of interpretational errors in the literature. The first involves the proliferation of phonological categories in order to accommodate variation, and the second the rejection of a phonological theory because the model which generates its predictions is overly simplified. |000|phonological theory, phonetics, articulatory 5390|Muecke2020|Within Articulatory Phonology, it is assumed that distinct phonological syllable parses such as simple (non-branching) and complex (branching) onsets correspond to different organisations of consonants and vowels in the articulatory domain (e.g. Browman & Goldstein 2000, Shaw et al. 2009, 2011, Gafos et al. 2010, Marin & Pouplier 2010, Hermes et al. 2013, Hermes et al. 2017).|136|articulatory phonology, definition, introduction 5391|Muecke2020|Depending on their position in the syllable, consonantal and vocalic gestures are coordinated differently with respect to one another, and it is hypothesised that the underlying phonological organisation varies with syllable complexity, for example between CV and CCV. Furthermore, Articulatory Phonology claims that there are two distinct phonological forms of organisation for word-initial consonant clusters: (i) complex organisation, in which both consonants are associated with the same syllable, and (ii) simplex organisation, in which the initial consonant is extra- syllabic, i.e. less closely associated with the syllable projected by the following vowel. Hermes et al. (2013) provide evidence from Italian consonant clusters, which show a complex organisation for obstruent-liquid clusters (e.g. /pr/ in prima ‘first’) and a simplex organisation for sibilantobstruent clusters (e.g. /sp/ in spina ‘thorn’).|136|articulatory phonology, complex syllable, simplex syllable, phonological theory, definition 5392|Muecke2020|Empirically, it has been observed that when a consonant is added to the beginning of a word to form a complex onset, the prevocalic consonant is shifted towards the vowel to make room for the added consonant. This is the empirical pattern referred to as the C-centre effect, and has been taken to provide phonetic evidence for complex organisation in phonological theory.|136|C-center effect, consonant clusters, phonology, phonetics, 5393|Muecke2020|We note here that our immediate aims in pursuing the above analyses are neither to argue for a particular phonological theory nor to argue for a par- ticular linking model. Rather, our principal aim is to demonstrate how interpretation of empirical data necessitates critical examination of the model that links a theory to its predictions.|139|phonological theory, phonetics, modeling, 5394|Muecke2020|a. CV syllables are associated with in-phase coupling. The consonantal and vocalic gestures start at approximately the same time, but the vocalic movement is executed more slowly. This results in a consonant–vowel sequence on the acoustic surface. b. In a VC syllable, the gestures are associated with anti-phase coupling, resulting in sequential activation, i.e. staggered initiation of timing. c. In a CCV syllable with a branching onset, the two consonants are in-phase with the vowel, but anti-phase with each other. This leads to a leftward shift of the initial C away from the following V, and a rightward shift of the prevocalic C towards the V. This pattern is referred to as the C-centre effect. It is usually assumed that the C-centre effect exhibits symmetrical shift patterns of C1 and C2, implying balanced coupling forces. The occurrence of this pattern is language-specific. d. An alternative organisation is a C.CV structure, with a non-branching onset and extraprosodic initial consonant. Only the immediately prevocalic consonantal gesture is in-phase coupled to the vocalic gesture in this case. This leads to a timing pattern in which the initiation of the immediately prevocalic gesture is synchronised with the initiation of the vocalic gesture, just as in a simple CV syllable.|142|consonant cluster, coupled oscillator model, phonological theory, syllable onset, syllable structure, 5395|Mayer2020b| An important question in phonology is to what degree the learner uses distributional information rather than substantive properties of speech sounds when learning phonological structure. This paper presents an algorithm that learns phonological classes from only distributional information: the contexts in which sounds occur. The input is a segmental corpus, and the output is a set of phonological classes. The algorithm is first tested on an artificial language, with both overlapping and nested classes reflected in the distribution, and retrieves the expected classes, performing well as distributional noise is added. It is then tested on four natural languages. It distinguishes between consonants and vowels in all cases, and finds more detailed, language-specific structure. These results improve on past approaches, and are encouraging, given the paucity of the input. More refined models may provide additional insight into which phonological classes are apparent from the distributions of sounds in natural languages. |000|phonotactics, language model, phonological structure, corpus studies, sound classes, algorithm 5396|Wu2020|Historical language comparison opens windows onto a human past, long before the availability of written records. Since traditional language comparison within the framework of the comparative method is largely based on manual data comparison, requiring the meticulous sifting through dictionaries, word lists, and grammars, the framework is difficult to apply, especially in times where more and more data have become available in digital form. Unfortunately, it is not possible to simply automate the process of historical language comparison, not only because computational solutions lag behind human judgments in historical linguistics, but also because they lack the flexibility that would allow them to integrate various types of information from various kinds of sources. A more promising approach is to integrate computational and classical approaches within a computer-assisted framework, “neither completely computer-driven nor ignorant of the assistance computers afford” [1, p. 4]. In this paper, we will illustrate what we consider the current state of the art of computer-assisted language comparison by presenting a workflow that starts with raw data and leads up to a stage where sound correspondence patterns across multiple languages have been identified and can be readily presented, inspected, and discussed. We illustrate this workflow with the help of a newly prepared dataset on Hmong-Mien languages. Our illustration is accompanied by Python code and instructions on how to use additional web-based tools we developed so that users can apply our workflow for their own purposes.|000|historical language comparison, computer-assisted language comparison, tutorial, Hmong-Mien, partial cognate detection, cross-semantic cognate detection 5397|NelsonSathi2011|Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails hori- zontal components, most commonly through lexical borrowing. For example, the English language was heavily influenced by Old Norse and Old French; eight per cent of its basic vocabulary is borrowed. Borrowing is a distinctly non-tree-like process—akin to horizontal gene transfer in genome evolution— that cannot be recovered by phylogenetic trees. Here, we infer the frequency of hidden borrowing among 2346 cognates (etymologically related words) of basic vocabulary distributed across 84 Indo- European languages. The dataset includes 124 (5%) known borrowings. Applying the uniformitarian principle to inventory dynamics in past and present basic vocabularies, we find that 1373 (61%) of the cognates have been affected by borrowing during their history. Our approach correctly identified 117 (94%) known borrowings. Reconstructed phylogenetic networks that capture both vertical and horizontal components of evolutionary history reveal that, on average, eight per cent of the words of basic vocabulary in each Indo-European language were involved in borrowing during evolution. Basic vocabulary is often assumed to be relatively resistant to borrowing. Our results indicate that the impact of borrowing is far more widespread than previously thought.|000|reference tree, borrowing detection, lateral gene transfer, borrowing, automated borrowing detection 5398|Schuster2020|Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machinegenerated misinformation, and open up the discussion on the desired evaluation benchmarks.|000|fake news, stylometry, language model, misinformation, fake news detection, machine learning 5399|Nissim2020|Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings. Concurrently, they have also been used to expose how strongly human biases are encoded in vector spaces trained on natural language, with examples like man is to computer programmer as woman is to homemaker. Recent work has shown that analogies are in fact not an accurate diagnostic for bias, but this does not mean that they are not used anymore, or that their legacy is fading. Instead of focusing on the intrinsic problems of the analogy task as a bias detection tool, we discuss a series of issues involving implementation as well as subjective choices that might have yielded a distorted picture of bias in word embeddings. We stand by the truth that human biases are present in word embeddings, and of course, the need to address them. But analogies are not an accurate tool to do so, and the way they have been most often used has exacerbated some possibly non-existing biases and perhaps hidden others. Because they are still widely popular, and some of them have become classics within and outside the NLP community, we deem it important to provide a series of clarifications that should put well-known, and potentially new analogies, into the right perspective.|000|word embeddings, analogy, critics, scientific practice 5400|CostaJussa2020|We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing. We situate the special issue’s five articles in the context of our fast-changing field, explaining our motivation for this project. We offer a brief summary of the work in the issue, which includes developments on lexical and sentential semantic representations, from symbolic and neural perspectives.|000|special issue, semantics, NLP, semantic representations, 5401|Forkel2020|While the amount of cross-linguistic data is constantly increasing, most datasets produced today and in the past cannot be considered FAIR (findable, accessible, interoperable, and reproducible). To remedy this and to increase the comparability of cross-linguistic resources, it is not enough to set up standards and best practices for data to be collected in the future. We also need consistent workflows for the “retro-standardization” of data that has been published during the past decades and centuries. With the Cross-Linguistic Data Formats initiative, first standards for cross-linguistic data have been presented and successfully tested. So far, however, CLDF creation was hampered by the fact that it required a considerable degree of computational proficiency. With cldfbench, we introduce a framework for the retro-standardization of legacy data and the curation of new datasets that drastically simplifies the creation of CLDF by providing a consistent, reproducible workflow that rigorously supports version control and long term archiving of research data and code. The framework is distributed in form of a Python package along with usage information and examples for best practice. This study introduces the new framework and illustrates how it can be applied by showing how a resource containing structural and lexical data for Sinitic languages can be efficiently retro-standardized and analyzed.|000|fairness, FAIR data, data lifting, retrostandardization, retro-standardization, cross-linguistic data, standards 5402|McNicol1972|In the example there were two relevant things that could happen. These were state 5 (the subject had his shoes on) and state n (the subject had his shoes off). To decide which of these had occurred, the observer was given some evidence in the form of the height, x, of the subject. The task of the observer was to decide whether the evidence favoured hypothesis s or hypothesis n. As you can see we denote evidence by the symbol x. 1 Thus x is called the evidence variable.|3|evidence, evidence variable, signal detection theory, psychology, 5403|McNicol1972|**Conditional Probabilities** n the example, given a particular value of the evidence variable, say x = 66 in., Table 1.1 can be used to calculate two probabilities: (a) P(x| s): that is, the probability that the evidence variable will take the value x given that state s has occurred. In terms of the example, P(x|s) is the probability that a subject is 66 in. tall given that he is wearing shoes. From Table 1.1 it can be seen that for x = 66 in., P(x|s) = 3/16 (b) P(x|n): the probability that the evidence variable will take the value x given that state n has occurred. Table 1.1 shows that for x = 66 in., P(x|n) = -4/16.|4|signal detection theory, conditional probability, psychology, definition, introduction 5404|McNicol1972|P(x|s) and P(x| n) are called *conditional probabilities* because they represent the probability of one event occurring conditional on another event having occurred.|4|conditional probabilities, signal detection theory, definition, introduction 5405|McNicol1972|*The likelihood ratio* It was suggested that one way of deciding whether state s or state n had occurred was to first calculate the odds favouring s. In signal detection theory, instead of speaking of 'odds' we use the term likelihood ratio. 'Odds' and 'likelihood ratio' are synonymous The likelihood ratio is represented symbolically as l(x). From the foregoing discussion it can be seen that in this example the likelihood ratio is obtained from the formula [pb] lx = P(x|s) / P(x|n) Thus from Table 1.1 we can see that l(x=64) = (1/16)/(2/16), l(x=66) = (3/16)/(4/16), etc. |4f|likelihood ratio, probability, signal detection theory, definition 5406|McNicol1972|*Hits, misses, false alarms and correct rejections* We now come to four conditional probabilities which will be often referred to in the following chapters. They will be defined by referring to Table 1.1. First, however, let us adopt a convenient convention for denoting the observer's decision. The two possible stimulus events have been called 5 and n. Corresponding to them are two possible responses that an observer might make; observer says 's occurred' and observer says 'n occurred'. As we use the lower case letters s and n to refer to stimulus events, we will use the upper case letters S and N to designate the corresponding response events. There are thus four combinations of stimulus and response events.|5|signal detection theory, stimulus even, response event, definition, introduction 5407|McNicol1972|**Desicion Rules and the Criterion** *The meaning of b[eta]* In discussing the example it has been implied that the observer should respond N if the value of the evidence variable is less than or equal to 66 in. If the height is greater than or equal to 67 in. he should respond S. This is the observer's decision rule and we can state it in terms of likelihood ratios in the following manner: * If l(x) < 1, respond N; if l(x) > 1, respond S. |6|signal detection theory, decision rule, definition, introduction 5408|McNicol1972|(a) Maximizing gains and minimizing losses. Rewards and penal­ties may be attached to certain types of response so that * V{_S}S value of making a hit, * C{_s}S cost of making a miss, * C{_n}S cost of making a false alarm, * V{_n}N value of making a correct rejection. |9|signal detection theory, introduction, penalty 5409|McNicol1972|(b) Keeping false alarms at a minimum: Under some circum­ stances an observer may wish to avoid making mistakes of a par­ ticular kind. [...] * Type I error: accepting H_l when H_0 was true, and * Type II error: accepting H_0 when H_1 was true|9|signal detection theory, error types, type I error, type II error, false positives 5410|Hill2019a|This paper proposes the use of network techniques in the exploration of Old Chinese phonology as reflected in the phonophoric determinatives of xiéshēng 諧聲 characters. We use the approach to examine five specific proposals in Chinese historical phonology, and whether the distinctions suggested by these proposals can be said to be recoverable on the basis of phonophoric choice. The major finding is that the type A versus type B distinction is in some cases encoded in the choice of phonophoric determinative, while other distinctions are only spuriously if at all reflected in the phonophoric subseries.|000|Chinese characters, Chinese character formation, network approaches,