There is a newer version of the record available.

Published December 28, 2021 | Version v2
Dataset Open

Dataset and additional files/softwares required for the paper "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents"

  • 1. Indian Institute of Technology, Kharagpur


This dump contains all files and softwares required for running the codes for the paper "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents". Specifically, these codes are available at

LeSICiN is a deep neural network for the task of Legal Statute Identification which also uses graphical properties of the document-statute citation network for training and predictions.

We have three datasets --- train, dev and test. These are all .jsonl files with each instance dict per line; each instance dict contains the unique id, list of sentences and cited labels of the particular instance. Also, there is a fourth file --- secs.jsonl, which stores the text of all the statutes in similar format.

schemas.json list out the metapath schemas for fact and section type nodes, while type_map.json maps the id of each node to its type (Act/Chapter/Topic/Section/Fact). 

label_tree.json and citation_network.json list out the edges for the two parts of the network in the format of a 3-tuple ('source id', 'relationship type', 'target id')

"ils2v.bin" is the pretrained sent2vec vectorizer that can generate a 200-dim vector for each sentence



Files (2.8 GB)

Name Size Download all
40.0 MB Download
31.3 MB Preview Download
85.3 MB Download
2.3 GB Download
36.0 kB Preview Download
3.8 kB Preview Download
46.2 kB Download
106.1 MB Download
319.8 MB Download
1.0 MB Preview Download