HeLI-OTS
Description
HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one core on a 2021 laptop and around 3 gigabytes of memory.
Producing and publishing this software has been partly supported by The Finnish Research Impact Foundation Tandem Industry Academia -funding in cooperation with Lingsoft.
Files
LanguageModels.zip
Files
(83.1 MB)
Name | Size | Download all |
---|---|---|
md5:7b8f834d96f83e66c633a6ecb4a67200
|
9.8 kB | Download |
md5:1c7d8ff9697d36be1aa17cc9039b8a33
|
41.4 MB | Download |
md5:c2887a199ed7657549c49584da480e01
|
13.7 kB | Download |
md5:c0056c71d042e82b4d5d2370f384418e
|
39 Bytes | Download |
md5:11faca6aa7115d6f7866f1b30df5c174
|
834 Bytes | Download |
md5:6506b58b1a99ce2fa5f1c2ce59e56692
|
41.6 MB | Preview Download |
md5:832f4714255dd27f33aa314be43184a7
|
11.4 kB | Download |
md5:1bf83645b58d436378e14c8f9979b122
|
2.0 kB | Preview Download |
Additional details
References
- Jauhiainen, Tommi et al. (2017). Evaluation of language identification methods using 285 languages. https://www.aclweb.org/anthology/W17-0221