Software Open Access

HeLI-OTS

Jauhiainen, Tommi; Jauhiainen, Heidi

HeLI off-the-shelf language identifier with language models for 200 languages. The program will read the <infile> and classify the language of each line as one of the 200 languages it knows and writes the results, one ISO 639-3 code per line, into file <outfile>. It can identify c. 3000 sentences per second using one core on a 2021 laptop and around 3 gigabytes of memory.

Producing and publishing this software has been partly supported by The Finnish Research Impact Foundation Tandem Industry Academia -funding in cooperation with Lingsoft.

Files (83.1 MB)
Name Size
HeLI.class
md5:7b8f834d96f83e66c633a6ecb4a67200
9.8 kB Download
HeLI.jar
md5:1c7d8ff9697d36be1aa17cc9039b8a33
41.4 MB Download
HeLI.java
md5:c2887a199ed7657549c49584da480e01
13.7 kB Download
HeLI.mf
md5:c0056c71d042e82b4d5d2370f384418e
39 Bytes Download
languagelist
md5:11faca6aa7115d6f7866f1b30df5c174
834 Bytes Download
LanguageModels.zip
md5:6506b58b1a99ce2fa5f1c2ce59e56692
41.6 MB Download
LICENSE
md5:832f4714255dd27f33aa314be43184a7
11.4 kB Download
README.md
md5:1bf83645b58d436378e14c8f9979b122
2.0 kB Download
  • Jauhiainen, Tommi et al. (2017). Evaluation of language identification methods using 285 languages. https://www.aclweb.org/anthology/W17-0221

42
21
views
downloads
All versions This version
Views 4242
Downloads 2121
Data volume 166.0 MB166.0 MB
Unique views 3535
Unique downloads 99

Share

Cite as