There is a newer version of this record available.

Software Open Access

HeLI-OTS 1.2 with Python examples

Jauhiainen, Tommi; Jauhiainen, Heidi


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>HeLI off-the-shelf language identifier with language models for 200 languages.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt;</p>\n\n<p>The program will read the &lt;infile&gt; and classify the language of each line as one of the 200 languages it knows<br>\nand writes the results, one ISO 639-3 code per line, into file &lt;outfile&gt;.</p>\n\n<p>You can use the -c option to make the program print a confidence score for the identification after each language code.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -c -r &lt;infile&gt; -w &lt;outfile&gt;</p>\n\n<p>You can give the list of comma-separated ISO 639-3 identifiers for relevant languages after -l option.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt; -l fin,swe,eng</p>\n\n<p>You can give the number of top-scored languages to print after the -t option. (overrides confidence)</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt; -l fin,swe,eng -t 2</p>\n\n<p>If you omit both of the filenames, the program will read the standard input one line at a time and write the result to standard output.</p>\n\n<p>It can identify c. 3000 sentences per second using one core on a 2021 laptop and around 3 gigabytes of memory.</p>\n\n<p>If you use this program in producing scientific publications, please refer to:&nbsp;<br>\n&nbsp;@inproceedings{jauhiainen-etal-2017-evaluation,<br>\n&nbsp; &nbsp; &nbsp;title = &quot;Evaluation of language identification methods using 285 languages&quot;,<br>\n&nbsp; &nbsp; &nbsp;author = &quot;Jauhiainen, Tommi &nbsp;and<br>\n&nbsp; &nbsp; &nbsp; &nbsp;Lind{\\&#39;e}n, Krister &nbsp;and<br>\n&nbsp; &nbsp; &nbsp; &nbsp;Jauhiainen, Heidi&quot;,<br>\n&nbsp; &nbsp; &nbsp;booktitle = &quot;Proceedings of the 21st Nordic Conference on Computational Linguistics&quot;,<br>\n&nbsp; &nbsp; &nbsp;month = may,<br>\n&nbsp; &nbsp; &nbsp;year = &quot;2017&quot;,<br>\n&nbsp; &nbsp; &nbsp;address = &quot;Gothenburg, Sweden&quot;,<br>\n&nbsp; &nbsp; &nbsp;publisher = &quot;Association for Computational Linguistics&quot;,<br>\n&nbsp; &nbsp; &nbsp;url = &quot;https://www.aclweb.org/anthology/W17-0221&quot;,<br>\n&nbsp; &nbsp; &nbsp;pages = &quot;183--191&quot;,<br>\n&nbsp;}</p>\n\n<p>Producing and publishing this software has been partly supported by The Finnish Research Impact Foundation Tandem Industry Academia -funding in cooperation with Lingsoft.</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Helsinki", 
      "@id": "https://orcid.org/0000-0002-6474-3570", 
      "@type": "Person", 
      "name": "Jauhiainen, Tommi"
    }, 
    {
      "affiliation": "University of Helsinki", 
      "@id": "https://orcid.org/0000-0002-8227-5627", 
      "@type": "Person", 
      "name": "Jauhiainen, Heidi"
    }
  ], 
  "url": "https://zenodo.org/record/5853116", 
  "datePublished": "2022-01-15", 
  "version": "1.2", 
  "keywords": [
    "language identification"
  ], 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.5853116", 
  "@id": "https://doi.org/10.5281/zenodo.5853116", 
  "@type": "SoftwareSourceCode", 
  "name": "HeLI-OTS 1.2 with Python examples"
}
977
236
views
downloads
All versions This version
Views 977123
Downloads 23658
Data volume 3.0 GB573.2 MB
Unique views 72699
Unique downloads 11424

Share

Cite as