Kiel South Asian Typological Database

Peterson, John; Chevallier, Lennart; Ivani, Jessica

doi:10.5281/zenodo.13769431

Published September 17, 2024 | Version v3

Dataset Open

Kiel South Asian Typological Database

Contributors

Data collectors:

Legend for interpreting the data of the

Kiel South Asian Typological Database

This corpus was originally compiled under the direction of John Peterson by Jessica Katiuscia Ivani, with contributions by Netra Prasad Paudyal, Nikita König, Lennart Chevallier, Anika Besser, Nellia Bleyer, Sarah Anders and Josephine Hennig in the project “Towards a linguistic prehistory of eastern central South Asia (and beyond)”, financed by the German Research Council (DFG, Project Grant 326697274), which we gratefully acknowledge.

The database has since been considerably expanded—22 features pertaining to negation strategies and 6 new languages have been added, especially of the Indo-Iranian stock. These Indo-Iranian languages could be included thanks to a generous grant from the Subcluster “ROOTS of Inequalities: Social, Economic, and Environmental Developments” in the Cluster of Excellence “ROOTS – Social, Environmental, and Cultural Connectivity in Past Societies”, also funded by the DFG, which we also gratefully acknowledge here.

The data have been re-checked and corrected, where necessary, by John Peterson, Lennart Chevallier, Shane Arzbach, Sebastian Dudek, Martina Riege, Swantje Fuhrken, and Luna Hemmerling.

This database includes information on up to 230 features (described below) for 42 languages from the Indo-Aryan, Indo-Iranian, Munda and Dravidian stocks, as well as the isolates Kusunda and Nihali. Of these 230 features, 98 derive from the Grambank database of the Glottobank research consortium and were compiled by the members of our project in cooperation with that project. We include here only those 98 features from that database which we felt are of particular relevance for South Asia.

All future updates will be documented in detail with respect to the changes made, together with the date of the respective update.

Feature values: The features are encoded as follows for all languages:

1 – the respective feature is found in this language

0 – the respective feature is not found in this language

? – it is not clear from the available data sources whether this feature is found in the respective language or not

NA – this section of the data has not yet been completed for the relevant data

The values of the multistate features—GB024, GB025, GB065, & GB193—state, whether the adnominal element precedes (1) or follows (2) the noun or both orders occur (3).

Features labeled “GB” are features from the original Grambank database compiled by members of our project. As the labeling of features in that database may have changed somewhat since that time, the labels found here may no longer correlate one-to-one with those features. We hope to synchronize these labels in the near future, but until then users of these features will have to check these on their own.

The feature labels “NGB”, “JPP” and “SA” refer only to different stages during the compilation of the data in our own project and are not relevant to the analysis of the data themselves.

The primary areas of grammar covered by the relevant features (e.g., ergativity, classifiers, negation, number, etc.) have been indicated on the right-hand side of the features list for many of these features. This list is not exhaustive and is only intended to serve as an initial orientation.

Use of these data

These data may be freely used in scientific research under the following two conditions:

That you properly cite this database, including the following information:

Peterson, John, Chevallier, Lennart & Ivani, Jessica Katiuscia. 2024. The Kiel South Asian Typological Database. https://doi.org/10.5281/zenodo.13769431. [Date of last access].
That you inform us in the event of incorrect data in the table, should you find any, so that we can recheck these ourselves.

We would also be grateful if you would send us a copy of your work using these data.

We expressly welcome input on the data from experts in the languages contained in this database and on further languages of the subcontinent!

Files

Kiel_Corpus_bibliography.pdf

Files (85.8 kB)

Name	Size	Download all
Kiel_Corpus_bibliography.pdf md5:97919128aeb9554cd64b7b867e31cf5d	33.5 kB	Preview Download
Kiel_Corpus_data.csv md5:44345f44b9fc56a79ec1ef6a823ac42c	28.3 kB	Preview Download
Kiel_Corpus_features.csv md5:ca6da1787257445e9218da8cd3634f23	24.0 kB	Preview Download

	All versions	This version
Views	1,163	203
Downloads	889	277
Data volume	51.8 MB	11.0 MB

Kiel South Asian Typological Database

Creators

Contributors

Data collectors:

Description

Files

Kiel_Corpus_bibliography.pdf

Files (85.8 kB)