Published January 5, 2023
| Version 1.0.0
Dataset
Open
Parallel text typology dataset
Authors/Creators
- 1. Department of Linguistics, Stockholm University
Description
This repository contains data accompanying the following paper:
Neural models can sometimes discover typological generalizations. Computational Linguistics (2023) 49 (4): 1003–1051. https://doi.org/10.1162/coli_a_00491
It contains the following information for 1295 different languages:
- language vector representations from a range of neural models
- automatically derived lists of affixes
- automatically derived lists of inflectional paradigms
- typological features derived from annotation projection, and statistics on dependency relations
- typological features derived from classifiers trained on language vectors and typological databases
- automatically derived word lists
- data needed for automatic evaluation of language representations (code in separate repository)
Note that the multilingual word embeddings described in the paper are very large, and therefore distributed in a separate public repository.
Notes
Files
Files
(58.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:38e65961aec6b0213b38cb2f989045e6
|
58.1 MB | Download |