There is a newer version of the record available.

Published July 6, 2023 | Version v5
Dataset Open

A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)

  • 1. University of Toronto

Description

This database contains data on the proceedings from each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022, in both CSV and parquet forms. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website.
The database is organized as follows:

  • hansard-daily-csv.zip contains all individual Hansard sitting day files in CSV form.
  • hansard-daily-parquet.zip contains all individual Hansard sitting day files in parquet form.
  • hansard-corpus.zip contains the full Hansard corpus in CSV form and in parquet form.
  • hansard-code.zip contains all the R files we used to build our database, and any necessary CSV files to run those R scripts. The README.md file in this folder contains a detailed description of each script, outlines our workflow, and provides some example R code for users of our database.
  • hansard-supplementary-data.zip contains data on Hansard debate topics, and data on divisions in the House that were transcribed during our time frame. This folder also contains the CSV we used to correctly map PartyFacts IDs to the party abbreviations found in our database.

Files

hansard-code.zip

Files (1.6 GB)

Name Size Download all
md5:b9480104b6c80a81fcf6e7f4824b8138
511.7 kB Preview Download
md5:cc43c8a2945bc5a303955fac62d23e5f
787.5 MB Preview Download
md5:b7d42809cc13a786a817eba56bed0445
329.3 MB Preview Download
md5:d3b9b0e0f7ab7c6849c82617ebd43919
471.7 MB Preview Download
md5:0abdfd55d806898fd8d0a50812e36469
4.3 MB Preview Download