Published October 8, 2025
| Version v7
Dataset
Open
A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2025)
Description
This database contains data on the proceedings from each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 31 July 2025, in parquet form. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website.
The database is stored in the folder corpus_1998_to_2025.parquet, which contains the full Hansard corpus.
Since the last version released on 12 August 2025, we have made the following updates:
- Standardized and validated the electorate column.
- Standardized and validated the party abbreviation column.
- Added a column with the full party name.
- Improved question and answer flagging by identifying rows starting with text such as "My question is to", "My question goes to", etc. which were not flagged as questions, and corrected those.
- Fixed many instances of incorrectly flagged interjections
- Separated out interjections which had been detected that were not on their own row.
- Validated that all MPs who were present on each sitting day were actually Members of Parliament on that day.
- Identified and fixed any rows with a null body.
Files
Files
(579.6 MB)
Name | Size | Download all |
---|---|---|
md5:4910153c4c555ea7811887b1803f39e0
|
579.6 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/lindsaykatz/hansard-proj
- Programming language
- R