There is a newer version of the record available.

Published April 4, 2023 | Version v3
Dataset Open

A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)

  • 1. University of Toronto

Description

This database contains a file for each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022 in the form of two ZIP folders, one with the data in CSV form and one with the data in Parquet form. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website. We developed four R scripts to parse and clean all of these XML files, and ran each file through two additional scripts: one to fill in missing speaker details, and one to validate each file using a suite of 7 automated tests. All scripts used to build this database are available at https://github.com/lindsaykatz/hansard-proj. Version 3 contains data on Hansard from 1998-2022, and all data in this version were parsed using a slightly different approach than what was used in version 1, which allows for better preservation of the correct chronological ordering of statements.

Files

2022-09-07-main-v2.csv

Files (1.6 GB)

Name Size Download all
md5:6e9012beb30f0ca83c5959f961ad451e
947.2 kB Preview Download
md5:f4bdbb68b8982b962745631be159b0aa
944.1 kB Preview Download
md5:ff8006b8b0eaf39383c047fa68e74228
517.1 kB Preview Download
md5:f9955e0958079058dadf65aa736f554a
513.7 kB Preview Download
md5:0e09303b3d2ac53be703923a182dd65a
326.2 MB Preview Download
md5:6665d7f48227dc2c90c3d0b7e1732132
467.1 MB Preview Download
md5:9d5142d59808422816291ca6ac8afe84
326.2 MB Preview Download
md5:d4ca4b4b51f67fc93f1dd2ce6b4a2a34
467.1 MB Preview Download