Preserving digital news by partnering with newspapers and their platforms
Description
After researching the digital news preservation landscape in the USA, Portico identified a potential gap. While libraries and archives have strategies for preserving print newspapers, hyper-local digital newspapers are less likely to be preserved. This is at a time of rapid loss of hyper-local print news and an increased dependence on digital-only news.
A mechanism for preserving digital news is web archiving, but due to the rapid turnover of stories on news websites, it can be difficult to visit the site frequently enough to capture every article. Some newspapers provide RSS feeds, but not all, and it can be difficult to detect corrected articles in these feeds. Other newspapers implement a subscription model and cannot be harvested without a special arrangement. Some newspapers are aggregated into larger databases, but these often don’t include the smallest digital-only platforms and are for-profit subscription services that may not be preserved by a third party.
Portico is a community-supported dark archive for scholarly material that forms agreements and works with publishers to preserve their content. Based on this research, Portico initiated a pilot to determine if digital news articles could be managed in a similar way to journal articles. Portico partnered with a single newspaper and worked with their content management system provider to retrieve an XML export of every article. The XML and supporting files (photos etc.) were successfully ingested into the archive and were similar to journal articles. To confirm if this was repeatable, Portico worked with another newspaper on the same platform and reused the workflow with few changes.
Portico is repeating this experiment with two more newspapers on different platforms. If the content can be archived from each, Portico will seek to expand the work and develop a business model to support a broader effort in digital news preservation. An early step will be to reach out to the ~3000 newspapers on the platforms that have already been configured.
For the poster, the author will share details of the process used for this project and seek feedback from the community about the value of this approach for preserving digital news.
Files
20240916_Hanson_iPRES_poster.png
Files
(5.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a011b0067b4379dbba5f97aaf19bcdf9
|
902.1 kB | Preview Download |
|
md5:179a6017b566ffb38c50f83b2a29487f
|
4.5 MB | Download |
Additional details
Dates
- Created
-
2024-08-30
References
- McCain, Edward, Neil Mara, Kara Van Malssen, Dorothy Carner, Bernard Reilly, Kerri Willette, Sandy Schiefer, Joe Askins and Sarah Buchanan. Endangered But Not Too Late: The State of Digital News Preservation. Columbia, MO: University of Missouri, 2021. https://doi.org/10.32469/10355/80931