Hansard Speeches and Sentiment V2.2

doi:10.5281/zenodo.832176

Published July 19, 2017 | Version v3

Dataset Open

Hansard Speeches and Sentiment V2.2

Odell, Evan¹

1. Disability Rights UK

Full details are available at https://evanodell.com/projects/datasets/hansard-data

Summary

A public dataset of speeches in the Hansard, stored as a tibble class in RDS files, for the R programming language, and also available as a CSV file. The dataset provides information on every speech made in the House of Commons between the parliament returned from the 1979 general election and the dissolution of parliament for the 2017 general election, with information on the speaking MP, their party, gender, birthdate, starting and finishing dates as an MP, and age at the time of the speech. The dataset also includes all speeches made from 1936 to the dissolution of parliament for the 1979 general election. The post-1979 election dataset is labelled hansard_senti_post_V22 and the pre-1979 election dataset is labelled hansard_senti_pre_V22.

The hansard_senti_post_V22 dataset contains 2,230,357 speeches and 398,815,027 words. The hansard_senti_pre_V22 dataset contains 2,977,461 speeches and 406,103,015 words. It is distributed under a Creative Commons 4.0 BY-SA licence.

Changes in V2.2

* Improvements to file encoding, as mojibake were showing up on some platforms.

* Improvements to spacing to ensure punctuation was followed by a space.

* Dropping the sentiwords library from lexical polarity calculations, as there was very little overlap between the language used in parliament and the Sentiwords dataset, and it takes a very long time to process.

* Added the speaker_office variable, which lists the government or opposition position, if any, held by a speaker.

* Change the name of the hu lexicon to huliu.

* Added UK spellings to the afinn, jockers, nrc and huliu lexicons, to improve compatibility and consistency with the house style used by the Hansard.

Notes

The code and matching data used to generate this dataset is available on Github.

The data used to create this dataset was taken from the parlparse project operated by They Work For You and supported by mySociety.

The dataset is licensed under a Creative Commons Attribution 4.0 International License.

The code used to create this dataset is licensed under an MIT license.

Please contact me if you find any errors in the dataset. The integrity of the public Hansard record is questionable at times, and while I have improved it, the data is presented 'as is'.

Notes

This release is an update of previously released datasets. See full documentation for details.

Files

gender-senti-mean-V22.csv

Files (12.8 GB)

Name	Size	Download all
gender-senti-mean-V22.csv md5:c1d36a945034eb3f7e4c51fc29965b70	1.2 kB	Preview Download
gov-senti-mean-V22.csv md5:458e3b06aa41f27448e6d274d6e1ad1b	1.3 kB	Preview Download
hansard-summary-stats-V22.xlsx md5:567546ec81f4e6b8038894ec56c10812	703.8 kB	Download
hansard_senti_post_V22.csv md5:e1e19fc546c7639ebca165356af9c738	3.3 GB	Preview Download
hansard_senti_post_V22.rds md5:4383ee0e6762eb344817ac49d271c486	3.3 GB	Download
hansard_senti_pre_V22.csv md5:8f9ca03624e9052187c73e9b594d5569	3.1 GB	Preview Download
hansard_senti_pre_V22.rds md5:0b46678c357cbe6ed85f8e22e638dddf	3.1 GB	Download
ministry_senti_mean-V22.csv md5:fa600d86ad58110ad349fe5207621f1c	5.6 kB	Preview Download
mp-senti-mean-V22.csv md5:1007f4ccd150229feb63f0c62cfd2db7	889.4 kB	Preview Download
party-group-senti-mean-V22.csv md5:0918adbc51438ec7bb1b4e9dbfa80467	2.1 kB	Preview Download
party-senti-mean-V22.csv md5:9de2a998fa1f949b46fa44d67a93a65a	14.1 kB	Preview Download
year_senti_mean-V22.csv md5:ea0b86c535ae8b6ce487de00c4b1033d	17.2 kB	Preview Download

	All versions	This version
Views	4,582	156
Downloads	2,901	232
Data volume	6.1 TB	575.0 GB

Hansard Speeches and Sentiment V2.2

Notes

Files

gender-senti-mean-V22.csv

Files (12.8 GB)

Additional details

Related works

Hansard Speeches and Sentiment V2.2

Creators

Description

Notes

Files

gender-senti-mean-V22.csv

Files (12.8 GB)

Additional details

Related works