Buddhist Sanskrit Segmenter
Description
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials.
The segmenter has been tested on 639 sentences from 13 Buddhist text (9 sūtras, 4 śāstra) and has been evaluated as achieving 97% accuracy.
The code and materials contained in this folder have been developed as part of a Newton International Fellowship at King's College London, funded by the British Academy (NF161436)
Contents
R code for segmentation, lemmatisation, normalization and evaluation (includes instructions to run code)
powerpoint presentation with background and explanation of project
Wordlists and Wordlists documentation
ngrams and stems frequency tables necessary for segmentation
gold standard set of manually segmented and stemmed sentences for evaluation
set of raw sentences for evaluation
evaluation of Krisha et al. seq2seq segmenter on Buddhist sentences for reference purposes
This segmenter has been used to prepare the Sanskrit Corpus at DOI 10.5281/zenodo.3457822 and its later version at 10.5281/zenodo.3526035
Files
Lugli_BuddhFoundCorpusNgramsRedux.csv
Files
(20.5 MB)
Name | Size | Download all |
---|---|---|
md5:9e2098321870bda7a82c0fc314449795
|
4.9 kB | Download |
md5:ebeceb54230207b55968c113486c979f
|
192.8 kB | Preview Download |
md5:432313287a5a2d084ac64b70b14b9a2a
|
244.6 kB | Download |
md5:394107767ce92f6be6b64e9c8cec9923
|
9.2 MB | Download |
md5:37ce1893b1f8a9f0aa55b3f6a850e3f0
|
212.5 kB | Preview Download |
md5:5e169775fc20db5ea684bb35015fe11a
|
473.7 kB | Preview Download |
md5:63a22b8b1d18c08c6b12506d90c3fc16
|
311.9 kB | Preview Download |
md5:baee76cc1ec672d92cdb8deb6ba52a51
|
3.0 MB | Preview Download |
md5:56a7ab6ba81ceac3954c38c5ad6a7525
|
75.3 kB | Preview Download |
md5:4aa0a4e3672b4ce21e927a7420b07e5f
|
34.6 kB | Preview Download |
md5:dc9996dd5b97530e194f64add6f913e1
|
1.0 MB | Preview Download |
md5:a48598508f02a794ee6fd021c937962c
|
62.9 kB | Download |
md5:d64b2a4b9e10dd9e45e95d4f2f701648
|
1.1 MB | Preview Download |
md5:fb8504f60355452420e87d2f9953fa1c
|
4.5 MB | Preview Download |
md5:5d507c0ac8219998e5150944db8461e5
|
25.3 kB | Preview Download |