00000nmm##2200000uu#4500 1035413 doi 10.5281/zenodo.1035413 oai:zenodo.org:1035413 user-cnerg Bishal Santra IIT Kharagpur Sasi Prasanth Bandaru IIT Kharagpur Gaurav Sahu IIT Kharagpur Vishnu Dutt Sharma American Express Pavankumar Satuluri Chinmya Visvavidyapeeth Pawan Goyal IIT Kharagpur Word Segmentation in Sanskrit Using Energy Based Models Amrith Krishna IIT Kharagpur info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx This is the repository for word segmentation in sanskrit using energy based models.   # Word Segmentation in Sanskrit Using Energy Based Models   ## Getting Started   Please download the 2 compressed files 'dir.zip' and 'wordsegmentation.rar' to your working directory and extract them into folders named 'dir' and 'wordsegmentation' respectively.   Your working directory should be as follows * Working Directory   * wordsegmentation     * skt_dcs_DS.bz2_4K_bigram_mir_10K     * skt_dcs_DS.bz2_4K_bigram_mir_heldout   * dir   ## Prerequisites * Python3   * scipy   * numpy   * csv   * pickle   * multiprocessing   * bz2 ## Instructions for Training Change your current directory to 'dir'   Run the file Train_clique.py by using the following command   * python Train_clique.py   To train on different input features like BM2,BM3,BR2,BR3,PM2,PM3,PR,PR3 please modify the bz2_input_folder value in the main function before beginning the training.   Feature  | bz2_input_folder ------------- | ------------- BM2 | wordsegmentation/skt_dcs_DS.bz2_4K_bigram_mir_10K/ BM3 | wordsegmentation/skt_dcs_DS.bz2_1L_bigram_mir_10K BR2 | wordsegmentation/skt_dcs_DS.bz2_4K_bigram_rfe_10K/ BR3 | wordsegmentation/skt_dcs_DS.bz2_1L_bigram_rfe_10K/ PM2 | wordsegmentation/skt_dcs_DS.bz2_4K_pmi_mir_10K/ PM3 | wordsegmentation/skt_dcs_DS.bz2_1L_pmi_mir_10K2/ PR2 | wordsegmentation/skt_dcs_DS.bz2_4K_pmi_rfe_10K/ PR3 | wordsegmentation/skt_dcs_DS.bz2_1L_pmi_rfe_10K/   ## Instructions for Testing   After training, please modify the 'modelList' dictionary  in 'test_clique.py' with the name of the neural network that has been saved during training. While testing for a feature, please provide the name of the neural net which was trained for the same feature.   We only provide the trained model for the feature BM2 which was our best performing feature. If the name of the neural net is not changed, then the testing will be performed on the pre-trained model for BM2 provided in outputs/train_t7978754709018   To test with a particular feature vector use the tag of the feature while execution   * python test_clique.py -t <tag>   For example:     * python test_clique.py -t BM2   After finishing the testing please run the following command to see the precision and recall values for both the word and word++ prediction tasks   * python evaluate.py <tag>   For example:     * python evaluate.py BM2 Zenodo 2018-08-23 user-cnerg info:eu-repo/semantics/other 20200125072505.0 41733267455 md5:6339b68e76df5aab37d2850fccf68c98 https://zenodo.org/records/1035413/files/wordsegmentation.rar 2418 md5:c0163b57ec0ab0013603d017556e2f2b https://zenodo.org/records/1035413/files/README.md 453229783 md5:016462cbd311404a6c9fb9af950d38a5 https://zenodo.org/records/1035413/files/dir.zip open 10.5281/zenodo.1035412 isVersionOf doi