Published January 5, 2023
| Version 1.0.0
Dataset
Open
Parallel text typology dataset
Description
This repository contains data accompanying the paper Neural models can sometimes discover typological generalizations, currently being submitted for publication. It contains the following information for 1295 different languages:
- language vector representations from a range of neural models
- automatically derived lists of affixes
- automatically derived lists of inflectional paradigms
- typological features derived from annotation projection, and statistics on dependency relations
- typological features derived from classifiers trained on language vectors and typological databases
- automatically derived word lists
- data needed for automatic evaluation of language representations (code in separate repository)
Note that the multilingual word embeddings described in the paper are very large, and therefore distributed in a separate public repository.
Notes
Files
Files
(58.1 MB)
Name | Size | Download all |
---|---|---|
md5:38e65961aec6b0213b38cb2f989045e6
|
58.1 MB | Download |