A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)
Description
This database contains a file for each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022 in the form of two ZIP folders, one with the data in CSV form and one with the data in Parquet form. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website. We developed four R scripts to parse and clean all of these XML files, and ran each file through two additional scripts: one to fill in missing speaker details, and one to validate each file using a suite of 7 automated tests. All scripts used to build this database are available at https://github.com/lindsaykatz/hansard-proj. Version 3 contains data on Hansard from 1998-2022, and all data in this version were parsed using a slightly different approach than what was used in version 1, which allows for better preservation of the correct chronological ordering of statements.
Files
2022-09-07-main-v2.csv
Files
(1.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6e9012beb30f0ca83c5959f961ad451e
|
947.2 kB | Preview Download |
|
md5:f4bdbb68b8982b962745631be159b0aa
|
944.1 kB | Preview Download |
|
md5:ff8006b8b0eaf39383c047fa68e74228
|
517.1 kB | Preview Download |
|
md5:f9955e0958079058dadf65aa736f554a
|
513.7 kB | Preview Download |
|
md5:0e09303b3d2ac53be703923a182dd65a
|
326.2 MB | Preview Download |
|
md5:6665d7f48227dc2c90c3d0b7e1732132
|
467.1 MB | Preview Download |
|
md5:9d5142d59808422816291ca6ac8afe84
|
326.2 MB | Preview Download |
|
md5:d4ca4b4b51f67fc93f1dd2ce6b4a2a34
|
467.1 MB | Preview Download |