Ligeia Lugli
2019-09-24
<p>This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials.</p>
<p>The segmenter has been tested on 639 sentences from 13 Buddhist text (9 sūtras, 4 śāstra) and has been evaluated as achieving 97% accuracy.</p>
<p>The code and materials contained in this folder have been developed as part of a Newton International Fellowship at King's College London, funded by the British Academy (NF161436)</p>
<p> </p>
<p><strong>Contents</strong></p>
<p>R code for segmentation, lemmatisation, normalization and evaluation (includes instructions to run code)</p>
<p>powerpoint presentation with background and explanation of project</p>
<p>Wordlists and Wordlists documentation</p>
<p>ngrams and stems frequency tables necessary for segmentation</p>
<p>gold standard set of manually segmented and stemmed sentences for evaluation</p>
<p>set of raw sentences for evaluation</p>
<p>evaluation of Krisha et al. seq2seq segmenter on Buddhist sentences for reference purposes</p>
<p> </p>
<p>This segmenter has been used to prepare the Sanskrit Corpus at DOI 10.5281/zenodo.3457822 and its later version at 10.5281/zenodo.3526035</p>
https://doi.org/10.5281/zenodo.3526469
oai:zenodo.org:3526469
eng
Zenodo
https://doi.org/10.5281/zenodo.3459218
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Buddhist Sanskrit
Natural Language Processing
Buddhist Sanskrit Segmenter
info:eu-repo/semantics/other