There is a newer version of this record available.

Software Open Access

HeLI-OTS 1.2 with Python examples

Jauhiainen, Tommi; Jauhiainen, Heidi


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/HeLI.class"
      }, 
      "checksum": "md5:95657280ee492a6ab4844eeb4454a5c0", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "HeLI.class", 
      "type": "class", 
      "size": 13674
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/HeLI.jar"
      }, 
      "checksum": "md5:8537531e1e6f74f67a58fcfc0ac302e3", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "HeLI.jar", 
      "type": "jar", 
      "size": 44050741
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/HeLI.java"
      }, 
      "checksum": "md5:c71b0f3cd044bf424908d905fa0a7a97", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "HeLI.java", 
      "type": "java", 
      "size": 22452
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/HeLI.mf"
      }, 
      "checksum": "md5:bb91c0c41fd40f3fb8a7c4f98c9a7c87", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "HeLI.mf", 
      "type": "mf", 
      "size": 39
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/languagelist"
      }, 
      "checksum": "md5:f44bcfe8a8a8108095b6bc35cea8e31d", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "languagelist", 
      "type": "", 
      "size": 884
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/LanguageModels.zip"
      }, 
      "checksum": "md5:efd3371472a6b3a93133773a6c09d87b", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "LanguageModels.zip", 
      "type": "zip", 
      "size": 44132999
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/LICENSE"
      }, 
      "checksum": "md5:bb0ae3b700049fd806e2a043e01265d6", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "LICENSE", 
      "type": "", 
      "size": 11419
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/README.md"
      }, 
      "checksum": "md5:e6f06930e25726624e53eb7a901e0874", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "README.md", 
      "type": "md", 
      "size": 2734
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/run_HeLI.py"
      }, 
      "checksum": "md5:fa3de39cf2e93085759e3f97cb9f4d0f", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "run_HeLI.py", 
      "type": "py", 
      "size": 1003
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae/supporting_functions.py"
      }, 
      "checksum": "md5:5d551dcb80653aaac5ecebae98842826", 
      "bucket": "4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
      "key": "supporting_functions.py", 
      "type": "py", 
      "size": 745
    }
  ], 
  "owners": [
    189271
  ], 
  "doi": "10.5281/zenodo.5853116", 
  "stats": {
    "version_unique_downloads": 114.0, 
    "unique_views": 99.0, 
    "views": 123.0, 
    "version_views": 977.0, 
    "unique_downloads": 24.0, 
    "version_unique_views": 726.0, 
    "volume": 573190696.0, 
    "version_downloads": 236.0, 
    "downloads": 58.0, 
    "version_volume": 2982382740.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.5853116", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4780897", 
    "bucket": "https://zenodo.org/api/files/4e7ea973-bc01-43ce-8a4e-804c1b4a45ae", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4780897.svg", 
    "html": "https://zenodo.org/record/5853116", 
    "latest_html": "https://zenodo.org/record/6077089", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.5853116.svg", 
    "latest": "https://zenodo.org/api/records/6077089"
  }, 
  "conceptdoi": "10.5281/zenodo.4780897", 
  "created": "2022-01-15T09:03:00.937617+00:00", 
  "updated": "2022-02-15T08:18:38.092283+00:00", 
  "conceptrecid": "4780897", 
  "revision": 6, 
  "id": 5853116, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.5853116", 
    "description": "<p>HeLI off-the-shelf language identifier with language models for 200 languages.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt;</p>\n\n<p>The program will read the &lt;infile&gt; and classify the language of each line as one of the 200 languages it knows<br>\nand writes the results, one ISO 639-3 code per line, into file &lt;outfile&gt;.</p>\n\n<p>You can use the -c option to make the program print a confidence score for the identification after each language code.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -c -r &lt;infile&gt; -w &lt;outfile&gt;</p>\n\n<p>You can give the list of comma-separated ISO 639-3 identifiers for relevant languages after -l option.</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt; -l fin,swe,eng</p>\n\n<p>You can give the number of top-scored languages to print after the -t option. (overrides confidence)</p>\n\n<p>Usage:<br>\njava -jar HeLI.jar -r &lt;infile&gt; -w &lt;outfile&gt; -l fin,swe,eng -t 2</p>\n\n<p>If you omit both of the filenames, the program will read the standard input one line at a time and write the result to standard output.</p>\n\n<p>It can identify c. 3000 sentences per second using one core on a 2021 laptop and around 3 gigabytes of memory.</p>\n\n<p>If you use this program in producing scientific publications, please refer to:&nbsp;<br>\n&nbsp;@inproceedings{jauhiainen-etal-2017-evaluation,<br>\n&nbsp; &nbsp; &nbsp;title = &quot;Evaluation of language identification methods using 285 languages&quot;,<br>\n&nbsp; &nbsp; &nbsp;author = &quot;Jauhiainen, Tommi &nbsp;and<br>\n&nbsp; &nbsp; &nbsp; &nbsp;Lind{\\&#39;e}n, Krister &nbsp;and<br>\n&nbsp; &nbsp; &nbsp; &nbsp;Jauhiainen, Heidi&quot;,<br>\n&nbsp; &nbsp; &nbsp;booktitle = &quot;Proceedings of the 21st Nordic Conference on Computational Linguistics&quot;,<br>\n&nbsp; &nbsp; &nbsp;month = may,<br>\n&nbsp; &nbsp; &nbsp;year = &quot;2017&quot;,<br>\n&nbsp; &nbsp; &nbsp;address = &quot;Gothenburg, Sweden&quot;,<br>\n&nbsp; &nbsp; &nbsp;publisher = &quot;Association for Computational Linguistics&quot;,<br>\n&nbsp; &nbsp; &nbsp;url = &quot;https://www.aclweb.org/anthology/W17-0221&quot;,<br>\n&nbsp; &nbsp; &nbsp;pages = &quot;183--191&quot;,<br>\n&nbsp;}</p>\n\n<p>Producing and publishing this software has been partly supported by The Finnish Research Impact Foundation Tandem Industry Academia -funding in cooperation with Lingsoft.</p>", 
    "language": "eng", 
    "title": "HeLI-OTS 1.2 with Python examples", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 6, 
          "index": 3, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4780897"
          }, 
          "is_last": false, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "6077089"
          }
        }
      ]
    }, 
    "version": "1.2", 
    "references": [
      "Jauhiainen, Tommi et al. (2017). Evaluation of language identification methods using 285 languages. https://www.aclweb.org/anthology/W17-0221"
    ], 
    "keywords": [
      "language identification"
    ], 
    "publication_date": "2022-01-15", 
    "creators": [
      {
        "orcid": "0000-0002-6474-3570", 
        "affiliation": "University of Helsinki", 
        "name": "Jauhiainen, Tommi"
      }, 
      {
        "orcid": "0000-0002-8227-5627", 
        "affiliation": "University of Helsinki", 
        "name": "Jauhiainen, Heidi"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "software", 
      "title": "Software"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4780897", 
        "relation": "isVersionOf"
      }
    ]
  }
}
977
236
views
downloads
All versions This version
Views 977123
Downloads 23658
Data volume 3.0 GB573.2 MB
Unique views 72699
Unique downloads 11424

Share

Cite as