There is a newer version of this record available.

Software Open Access

Buddhist Sanskrit Segmenter

Ligeia Lugli

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3459219", 
  "language": "eng", 
  "title": "Buddhist Sanskrit Segmenter", 
  "issued": {
    "date-parts": [
  "abstract": "<p>This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials.</p>\n\n<p>The segmenter has been tested on&nbsp;639 sentences from 13 Buddhist text (9 s\u016btras, 4 \u015b\u0101stra) and has been evaluated as achieving 97% accuracy.</p>\n\n<p>The code and materials contained in this folder have been developed as part of a Newton&nbsp;International Fellowship at King&#39;s College London, funded by the British Academy (NF161436)</p>\n\n<p>&nbsp;</p>\n\n<p><strong>Contents</strong></p>\n\n<p>R code for segmentation, lemmatisation and evaluation (includes instructions to run code)</p>\n\n<p>powerpoint presentation with background and explanation of project</p>\n\n<p>Wordlists and Wordlists documentation</p>\n\n<p>ngrams and stems frequency tables necessary for segmentation</p>\n\n<p>gold standard set of manually segmented and stemmed sentences for evaluation</p>\n\n<p>set of raw sentences for evaluation</p>\n\n<p>evaluation of&nbsp;Krisha et al. seq2seq segmenter on Buddhist sentences for reference purposes</p>\n\n<p>&nbsp;</p>\n\n<p>This segmenter has been used to prepare the Sanskrit Corpus at DOI&nbsp;10.5281/zenodo.3457822</p>", 
  "author": [
      "family": "Ligeia Lugli"
  "version": "1", 
  "type": "article", 
  "id": "3459219"
All versions This version
Views 11133
Downloads 557463
Data volume 577.2 MB383.3 MB
Unique views 10130
Unique downloads 448406


Cite as