Published June 18, 2021 | Version v1
Data Study Group Final Report: The National Archives, UK


Data Study Groups are week-long events at The Alan Turing Institute bringing together some of the country’s top talent from data science, artificial intelligence, and wider fields, to analyse real-world data science challenges.

Discovering topics and trends in the UK government web archive

The challenge we address in this report is to make steps towards improving search and discovery of resources within this vast archive for future archive users, and how the UKGWA collection could begin to be unlocked for research and experimentation by approaching it as data (i.e. as a dataset at scale). The UKGWA has begun to examine independently the usefulness of modelling the hyperlinked structure of its collection for advanced corpus exploration; the aim of this collaboration is to test algorithms capable of searching for documents via the topics that they cover (e.g. ‘climate change’), envisioning a future convergence of these two research frameworks. This is a diachronic corpus that is ideal for studying the emergence of topics and how they feature through government websites over time, and it will indicate engagement priorities and how these change over time.


