There is a newer version of the record available.

Published May 6, 2024 | Version v1.0.0
Software Open

GLAM-Workbench/trove-web-archives

Description

CURRENT VERSION: v1.0.0

This repository includes information on finding, understanding, and using Pandora's collections of archived web pages.

Pandora has been selecting web sites and online resources for preservation since 1996. It has assembled a collection of more than 80,000 titles, organised into subjects and collections. The archived websites are now part of the Australian Web Archive (AWA), which combines the selected titles with broader domain harvests, and is searchable through Trove. However, Pandora's curated collections offer a useful entry point for researchers trying to find web sites relating to particular topics or events.

The Web Archives section of the GLAM Workbench provides documentation, tools, and examples to help you work with data from a range of web archives, including the Australian Web Archive. The title urls obtained through Pandora can be used to obtain additional data from the AWA for analysis.

For more information and documentation see the Trove web archive collections (Pandora) section of the GLAM Workbench.

Notebooks

  • Create title datasets from collections and subjects
  • Harvest Pandora subjects and collections
  • Harvest the full collection of Pandora titles

Associated datasets

Created by Tim Sherratt for the GLAM Workbench

Files

GLAM-Workbench/trove-web-archives-v1.0.0.zip

Files (60.7 kB)

Name Size Download all
md5:4f8d11853e28897f2dabde9278c76b68
60.7 kB Preview Download

Additional details

Related works

Is derived from
Software: https://github.com/GLAM-Workbench/trove-web-archives/tree/v1.0.0 (URL)
Is documented by
Software documentation: https://glam-workbench.net/trove-web-archives/ (URL)
Is part of
Other: https://glam-workbench.net/ (URL)