Published August 28, 2025 | Version v1
Dataset Open

Dataset of Spanish Parliamentary Interventions by Legislature (2000–2023)

  • 1. ROR icon Consejo Superior de Investigaciones Científicas
  • 2. ROR icon Universitat de València
  • 3. Universitat de Valencia
  • 4. ROR icon University of Sussex

Description

This repository contains the datasets of parliamentary records from the Spanish Congress covering legislatures 7 to 14 (2000–2023). The material was collected, cleaned, and pre-processed as part of the study on ideological and affective polarisation in the Spanish parliament, presented in the paper "Analyzing polarization among Spanish political elites using Machine Learning techniques".

Each file corresponds to one legislature and is provided in Parquet format (compressed with gzip). The files include:

  • Full text of parliamentary interventions.

  • Metadata of each speech (date, speaker, party, session).

  • Pre-processed text fields for Natural Language Processing (NLP) applications.

These datasets allow replication of the analyses presented in the article and provide a resource for further research on Spanish political discourse, sentiment analysis, and ideology mapping.

Files

Files (781.9 MB)

Name Size Download all
md5:c89e320bf7d58d4cb9f72ffb6b776dc5
148.7 MB Download
md5:ade97480e89f4d398122026dee9ab992
145.8 MB Download
md5:889814fa54b5ea5344fedacba028b98e
100.5 MB Download
md5:958f5d4040ff8de26227cbe796733cdd
144.5 MB Download
md5:028c7aff2fa7fbd8a077bdffe353ff25
4.5 MB Download
md5:f12cc506a8a783af8b22f5ee0851cb78
104.0 MB Download
md5:f8c8a67a5a785dc6b8889ef17457f2be
2.2 MB Download
md5:453f66257b72b4f5fc262046d3c47a09
131.8 MB Download

Additional details

Related works

Has part
Publication: 10.31235/osf.io/ry4g2 (DOI)

Dates

Available
2025-08-28
Parliamentary corpus interventions

Software

Repository URL
https://github.com/dibuja/polarisation-nlp
Programming language
Python
Development Status
Active