# CLDF dataset derived from Lieberherr and Bodt's "Comparative Wordlists of Kho-Bwa" from 2017

[![CLDF validation](https://github.com/lexibank/lieberherrkhobwa/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/lieberherrkhobwa/actions?query=workflow%3ACLDF-validation)

## How to cite

If you use these data please cite
- the original source
  > Lieberherr, Ismail and Bodt, Timotheus Adrianus (2017): Sub-grouping Kho-Bwa based on shared core vocabulary. Himalayan Linguistics 16(2). 26-63. URL: https://escholarship.org/uc/item/4t27h5fg
- the derived dataset using the DOI of the [particular released version](../../releases/) you were using

## Description


This dataset is licensed under a CC-BY-4.0 license

Available online at https://doi.org/10.5281/zenodo.1154518


Conceptlists in Concepticon:
- [Lieberherr-2017-100](https://concepticon.clld.org/contributions/Lieberherr-2017-100)
## Notes

This data set consists of lexical entries for one hundred
concepts, based on the concept lists of [Haspelmath and Tadmor (2009)](https://concepticon.clld.org/contributions/Haspelmath-2009-1460) and [Swadesh (1971)](https://concepticon.clld.org/contributions/Swadesh-1971-100).
Entries were translated into twenty-two languages of the Kho-Bwa subgroup of the Sino-Tibetan language family and were annotated with respect to cognacy information.

A tutorial accompanying this data set and providing first steps towards an analysis can be found [here](https://github.com/lexibank/phylogenetics-data-management-tutorial/blob/master/Tutorial.md).



## Statistics


[![CLDF validation](https://github.com/lexibank/lieberherrkhobwa/workflows/CLDF-validation/badge.svg)](https://github.com/lexibank/lieberherrkhobwa/actions?query=workflow%3ACLDF-validation)
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%")
![Concepticon: 100%](https://img.shields.io/badge/Concepticon-100%25-brightgreen.svg "Concepticon: 100%")
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%")
![BIPA: 100%](https://img.shields.io/badge/BIPA-100%25-brightgreen.svg "BIPA: 100%")
![CLTS SoundClass: 100%](https://img.shields.io/badge/CLTS%20SoundClass-100%25-brightgreen.svg "CLTS SoundClass: 100%")

- **Varieties:** 22 (linked to 21 different Glottocodes)
- **Concepts:** 100 (linked to 100 different Concepticon concept sets)
- **Lexemes:** 2,144
- **Sources:** 3
- **Synonymy:** 1.01
- **Cognacy:** 2,144 cognates in 310 cognate sets (67 singletons)
- **Cognate Diversity:** 0.10
- **Invalid lexemes:** 0
- **Tokens:** 9,146
- **Segments:** 164 (0 BIPA errors, 0 CLTS sound class errors, 164 CLTS modified)
- **Inventory size (avg):** 49.41

# Contributors

Name | GitHub user | Description | Role
--- | --- | --- | --- |
Tiago Tresoldi | @tresoldi | orthography | Other
Johann-Mattis List | @lingulist | code, orthography, concepts | Editor
Robert Forkel | @xrotwang | code, integration | Editor
Christoph Rzymski | @chrzyki | code, integraration | Editor
Ismail Lieberherr | | | DataCurator, Distributor, Author
Timotheus Adrianus Bodt | | | DataCurator, Distributor, Author




## CLDF Datasets

The following CLDF datasets are available in [cldf](cldf):

- CLDF [Wordlist](https://github.com/cldf/cldf/tree/master/modules/Wordlist) at [cldf/cldf-metadata.json](cldf/cldf-metadata.json)