Zenodo.org will be unavailable for 2 hours on September 29th from 06:00-08:00 UTC. See announcement.

Dataset Open Access

Dataset Snickars Scandia

Pelle Snickars


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <controlfield tag="005">20210216122719.0</controlfield>
  <controlfield tag="001">4542733</controlfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">3071220679</subfield>
    <subfield code="z">md5:4de3fb5786d798c39c4984a57079f393</subfield>
    <subfield code="u">https://zenodo.org/record/4542733/files/snickars_scandia_2021_data.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-02-16</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:4542733</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Umeå University</subfield>
    <subfield code="0">(orcid)0000-0001-5122-1549</subfield>
    <subfield code="a">Pelle Snickars</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Dataset Snickars Scandia</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Data for the article &amp;quot;Fr&amp;aring;n chiffer till klartext? Temamodellering av statliga offentliga utredningar 1945&amp;ndash;1989&amp;quot;,&amp;nbsp;&lt;em&gt;Scandia&lt;/em&gt; 2021, forthcoming.&lt;/p&gt;

&lt;p&gt;In 2015 the National Library of Sweden finished digitising all Governmental Official Reports (SOU) from 1922 to 1999. Traditionally, SOU reports &amp;ndash; and work performed within different governmental committees &amp;ndash; had the task of preparing the Swedish government for apt and rational decision-making. The range of subjects covered by governmental committees and SOU reports basically includes every area of the Swedish welfare state, from issues centered on migration and the environment to cultural policy and media politics.&lt;/p&gt;

&lt;p&gt;The article departs from an analysis of all SOU-reports from 1945&amp;ndash;89 as one massive dataset; in all 3,154 SOU-reports that contain 87 million tokens. Research has been performed within a Jupyter Lab environment, a web application with executable Python code which can be run to perform data analysis. The Jupyter Lab environment has been&amp;nbsp; developed at the digital humanities hub, Humlab at Ume&amp;aring; University, and research is related to the project, Welfare State Analytics. Text Mining and Modeling Swedish Politics, Media &amp;amp; Culture, 1945&amp;ndash;89. It is a digital humanities and digital history project that will digitise literature, curate already digitised collections, and perform research via probabilistic methods and text mining models.&lt;/p&gt;

&lt;p&gt;If all SOU-reports are considered as one single text written by the state, what themes in this vast text can software read and perceive? It is possible to answer such a broad question by way of topic modeling, a computational method to study themes in texts by accentuating words that tend to co-occur and together create different topics. Via co-occurrence, topic modeling creates topics in the form of clusters of similar words (topics); a term or a word may be a part of several topics with different degrees of probability. Topics also occur in relation to each other, and clusters and networks can be visualised by using software as Gephi.&lt;/p&gt;

&lt;p&gt;The article focuses on topics related to media and media policy. Depending on how many topics a topic model displays &amp;ndash; in the article models of 50, 100, 200 and 500 topics are used &amp;ndash; different media topics can be detected. In the 50 model, one media topic was found, whereas in the 500 model there were several, with more specific traits as for example film censorship or daily press subsidies. One finding is that film was the single medium that the SOU-genre between 1945&amp;ndash;89 devoted most attention, another is that archival issues were closely linked to media topics during the same period. Governmental committees and SOU reports on media were primarily focused on future oriented policies, above all how media should be supported or regulated. Yet, archiving the same media forms was also something that the state was repeatedly interested in.&lt;/p&gt;

&lt;p&gt;In conclusion, the article in general explains what topic modeling is, how the method can be used in digital historical research &amp;ndash; not the least in relation to close reading &amp;ndash; and how statistical analysis of the distribution of words in the form of topics can generate interesting results. The SOU data is rich; topics can be traced with many different themes. As a researcher, however, one must learn to work with data; to load different models in the Jupyter Lab environment, to compute various input values, change parameters and often cure outcomes in a way that differs from traditional historical research practices.&lt;/p&gt;

&lt;p&gt;Keywords: digital humanities, digital history, topic modeling, media history, Swedish Governmental Official Reports (SOU)&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4542732</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4542733</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
54
9
views
downloads
All versions This version
Views 5454
Downloads 99
Data volume 27.6 GB27.6 GB
Unique views 4545
Unique downloads 66

Share

Cite as