Published July 6, 2023
| Version v5
Dataset
Open
A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)
Description
This database contains data on the proceedings from each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022, in both CSV and parquet forms. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website.
The database is organized as follows:
- hansard-daily-csv.zip contains all individual Hansard sitting day files in CSV form.
- hansard-daily-parquet.zip contains all individual Hansard sitting day files in parquet form.
- hansard-corpus.zip contains the full Hansard corpus in CSV form and in parquet form.
- hansard-code.zip contains all the R files we used to build our database, and any necessary CSV files to run those R scripts. The README.md file in this folder contains a detailed description of each script, outlines our workflow, and provides some example R code for users of our database.
- hansard-supplementary-data.zip contains data on Hansard debate topics, and data on divisions in the House that were transcribed during our time frame. This folder also contains the CSV we used to correctly map PartyFacts IDs to the party abbreviations found in our database.
Files
hansard-code.zip
Files
(1.6 GB)
Name | Size | Download all |
---|---|---|
md5:b9480104b6c80a81fcf6e7f4824b8138
|
511.7 kB | Preview Download |
md5:cc43c8a2945bc5a303955fac62d23e5f
|
787.5 MB | Preview Download |
md5:b7d42809cc13a786a817eba56bed0445
|
329.3 MB | Preview Download |
md5:d3b9b0e0f7ab7c6849c82617ebd43919
|
471.7 MB | Preview Download |
md5:0abdfd55d806898fd8d0a50812e36469
|
4.3 MB | Preview Download |