Software Open Access
Jauhiainen, Tommi;
Jauhiainen, Heidi
{ "inLanguage": { "alternateName": "eng", "@type": "Language", "name": "English" }, "description": "<p>HeLI off-the-shelf language identifier with language models for 200 languages.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r <infile> -w <outfile></p>\n\n<p>The program will read the <infile> and classify the language of each line as one of the 200 languages it knows<br>\nand writes the results, one ISO 639-3 code per line, into file <outfile>.</p>\n\n<p>You can use the -c option to make the program print a confidence score for the identification after each language code.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -c -r <infile> -w <outfile></p>\n\n<p>You can give the list of comma-separated ISO 639-3 identifiers for relevant languages after -l option.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r <infile> -w <outfile> -l fin,swe,eng</p>\n\n<p>You can give the number of top-scored languages to print after the -t option. (overrides confidence)</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r <infile> -w <outfile> -l fin,swe,eng -t 2</p>\n\n<p>If you omit both of the filenames, the program will read the standard input one line at a time and write the result to standard output.</p>\n\n<p>It can identify c. 3000 sentences per second using one core on a 2021 laptop and around 3 gigabytes of memory.</p>\n\n<p>If you use this program in producing scientific publications, please refer to: <br>\n @inproceedings{jauhiainen-etal-2017-evaluation,<br>\n title = "Evaluation of language identification methods using 285 languages",<br>\n author = "Jauhiainen, Tommi and<br>\n Lind{\\'e}n, Krister and<br>\n Jauhiainen, Heidi",<br>\n booktitle = "Proceedings of the 21st Nordic Conference on Computational Linguistics",<br>\n month = may,<br>\n year = "2017",<br>\n address = "Gothenburg, Sweden",<br>\n publisher = "Association for Computational Linguistics",<br>\n url = "https://www.aclweb.org/anthology/W17-0221",<br>\n pages = "183--191",<br>\n }</p>\n\n<p>Producing and publishing this software has been partly supported by The Finnish Research Impact Foundation Tandem Industry Academia -funding in cooperation with Lingsoft.</p>", "license": "https://creativecommons.org/licenses/by/4.0/legalcode", "creator": [ { "affiliation": "University of Helsinki", "@id": "https://orcid.org/0000-0002-6474-3570", "@type": "Person", "name": "Jauhiainen, Tommi" }, { "affiliation": "University of Helsinki", "@id": "https://orcid.org/0000-0002-8227-5627", "@type": "Person", "name": "Jauhiainen, Heidi" } ], "url": "https://zenodo.org/record/5853116", "datePublished": "2022-01-15", "version": "1.2", "keywords": [ "language identification" ], "@context": "https://schema.org/", "identifier": "https://doi.org/10.5281/zenodo.5853116", "@id": "https://doi.org/10.5281/zenodo.5853116", "@type": "SoftwareSourceCode", "name": "HeLI-OTS 1.2 with Python examples" }
All versions | This version | |
---|---|---|
Views | 977 | 123 |
Downloads | 236 | 58 |
Data volume | 3.0 GB | 573.2 MB |
Unique views | 726 | 99 |
Unique downloads | 114 | 24 |