An annotated dataset of Central Acts enacted by the Indian Parliament for legal research
Description
This dataset consists of 858 annotated Central Acts enacted by the Indian Parliament from the year 1838 to 2020. The Central Acts are available in a portable document format (PDF) on India Code website. This website has been developed by the Legislative Department under the Ministry of Law and Justice in the Government of India and is a digital repository of all enforced Central and State Acts enacted by the Indian Parliament.
We used pdfminer.six, a text extraction python library for PDF documents, to convert these unstructured PDFs into a structured JSON format. Furthermore, we used regular expressions to remove the noisy text and extract meta-information (e.g., initial portions of the document containing act title, enactment date, and other meta information) from these acts. The result was a dataset of 858 structured JSON files corresponding to each of the Central Acts with relevant metadata.
The annotation schema for the Central Act dataset consists of the following fields:
- Act Title: The title, usually called the “short title," is the name by which an act is known. The short title means the term by which an act or resolution may be cited and often references the date of commencement.
- Act ID: It includes the number of the act and the year of its enactment.
- Enactment Date: Specifies the date on which the bill becomes an act through approval.
- Act Definition: Also known as the "long title," is a summarized breakdown of the act's purpose. It may be presented in only a few words, in some cases, while in others, it can run to several pages.
- Chapters and Parts: Chapters or Parts are subdivisions used to arrange the information in an act or other piece of legislation. There is no set pattern as to how these are applied in an act. Sometimes there are only Chapters, or only Parts and other times unlabeled headings. In some cases, Parts are divided into Chapters. So there is no hard-and-fast rule regarding the use of Chapters and Parts in Indian legislation. Furthermore, we define the Chapter (or Part) ID, for example, CHAPTER IV, PART I and Chapter (or Part) Name to identify by its heading. If the Chapter (or Part) has been omitted or repealed within the act, then the ID field will contain [Omitted] or [Repealed] respectively.
- Sections: An arrangement of Sections (this is not part of the law, but assists in navigating through an Act or other piece of legislation, especially lengthy documents).
- Subheadings: This field is optional and is contained within Chapters or Parts. It is further nested into Sections.
- Schedule, Annexure, Appendix and Forms: Many acts have schedules attached at the end, which generally add more details (such as maps or fees), give examples of forms or sets of rules to be used under the act, or list sections of other acts amended by this one. The majority of them will contain amendments and/or repeals of legislation. However, the Schedule to an act always contains supplementary information of importance in meeting the act's objectives.
- Footnotes: Footnotes are notes placed at the bottom of a page. They cite references or comment on a designated part of the text above it. This field consist of a key, value pair of page number and footnote text, respectively. It can be empty if there are no footnotes in an act document.
Files
annotatedCentralActs.zip
Files
(13.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e382eec2d08728be2bd19b02ba442b8e
|
13.3 MB | Preview Download |