Software Open Access

Buddhist Sanskrit Segmenter

Ligeia Lugli


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials.</p>\n\n<p>The segmenter has been tested on&nbsp;639 sentences from 13 Buddhist text (9 s\u016btras, 4 \u015b\u0101stra) and has been evaluated as achieving 97% accuracy.</p>\n\n<p>The code and materials contained in this folder have been developed as part of a Newton&nbsp;International Fellowship at King&#39;s College London, funded by the British Academy (NF161436)</p>\n\n<p>&nbsp;</p>\n\n<p><strong>Contents</strong></p>\n\n<p>R code for segmentation, lemmatisation, normalization and evaluation (includes instructions to run code)</p>\n\n<p>powerpoint presentation with background and explanation of project</p>\n\n<p>Wordlists and Wordlists documentation</p>\n\n<p>ngrams and stems frequency tables necessary for segmentation</p>\n\n<p>gold standard set of manually segmented and stemmed sentences for evaluation</p>\n\n<p>set of raw sentences for evaluation</p>\n\n<p>evaluation of&nbsp;Krisha et al. seq2seq segmenter on Buddhist sentences for reference purposes</p>\n\n<p>&nbsp;</p>\n\n<p>This segmenter has been used to prepare the Sanskrit Corpus at DOI&nbsp;10.5281/zenodo.3457822 and&nbsp; its later version at 10.5281/zenodo.3526035</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "King's College London", 
      "@id": "https://orcid.org/0000-0003-0473-4290", 
      "@type": "Person", 
      "name": "Ligeia Lugli"
    }
  ], 
  "url": "https://zenodo.org/record/3526469", 
  "datePublished": "2019-09-24", 
  "version": "1", 
  "keywords": [
    "Buddhist Sanskrit", 
    "Natural Language Processing"
  ], 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.3526469", 
  "@id": "https://doi.org/10.5281/zenodo.3526469", 
  "@type": "SoftwareSourceCode", 
  "name": "Buddhist Sanskrit Segmenter"
}
109
549
views
downloads
All versions This version
Views 10976
Downloads 54985
Data volume 548.8 MB164.6 MB
Unique views 9970
Unique downloads 44538

Share

Cite as