Published February 16, 2021 | Version v1
Dataset Open

Dataset Snickars Scandia

  • 1. Umeå University

Description

Data for the article "Från chiffer till klartext? Temamodellering av statliga offentliga utredningar 1945–1989", Scandia 2021, forthcoming.

In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports – and work performed within different governmental committees – had the task of preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues centered on migration and the environment to cultural policy and media politics.

The article departs from an analysis of all SOU-reports from 1945–89 as one massive dataset; in all 3,154 SOU-reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code which can be run to perform data analysis. The Jupyter Lab environment has been  developed at the digital humanities hub, Humlab at Umeå University, and research is related to the project, Welfare State Analytics. Text Mining and Modeling Swedish Politics, Media & Culture, 1945–89. It is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.

If all SOU-reports are considered as one single text written by the state, what themes in this vast text can software read and perceive? It is possible to answer such a broad question by way of topic modeling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modeling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software as Gephi.

The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays – in the article models of 50, 100, 200 and 500 topics are used – different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model there were several, with more specific traits as for example film censorship or daily press subsidies. One finding is that film was the single medium that the SOU-genre between 1945–89 devoted most attention, another is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.

In conclusion, the article in general explains what topic modeling is, how the method can be used in digital historical research – not the least in relation to close reading – and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data; to load different models in the Jupyter Lab environment, to compute various input values, change parameters and often cure outcomes in a way that differs from traditional historical research practices.

Keywords: digital humanities, digital history, topic modeling, media history, Swedish Governmental Official Reports (SOU)

Files

snickars_scandia_2021_data.zip

Files (3.1 GB)

Name Size Download all
md5:4de3fb5786d798c39c4984a57079f393
3.1 GB Preview Download