Published July 6, 2022 | Version v1.1
Dataset Open

Public tags added to resources in Trove, 2008 to 2022

Description

This dataset contains details of 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022. I harvested the data using the Trove API and saved it as a CSV file with the following columns:

  • `tag` – lower-cased text tag
  • `date` – date the tag was added
  • `zone` – API zone containing the tagged resource
  • `record_id` – the identifier of the tagged resource

I've documented the method used to harvest the tags in this notebook.

Using the `zone` and `record_id` you can find more information about a tagged item. To create urls to the resources in Trove:

  • for resources in the 'book', 'article', 'picture', 'music', 'map', and 'collection' zones add the `record_id` to `https://trove.nla.gov.au/work/`
  • for resources in the 'newspaper' and 'gazette' zones add the `record_id` to `https://trove.nla.gov.au/article/`
  • for resources in the 'list' zone add the `record_id` to `https://trove.nla.gov.au/list/`

Notes:

  • Works (such as books) in Trove can have tags attached at either work or version level. This dataset aggregates all tags at the work level, removing any duplicates.
  • A single resource in Trove can appear in multiple zones – for example, a book that includes maps and illustrations might appear in the 'book', 'picture', and 'map' zones. This means that some of the tags will essentially be duplicates – harvested from different zones, but relating to the same resource. Depending on your needs, you might want to remove these duplicates.
  • While most of the tags were added by Trove users, more than 500,000 tags were added by Trove itself in November 2009. I think these tags were automatically generated from related Wikipedia pages. Depending on your needs, you might want to exclude these by limiting the date range or zones.
  • User content added to Trove, including tags, is available for reuse under a CC-BY-NC licence.

See this notebook for some examples of how you can manipulate, analyse, and visualise the tag data.

Files

trove_tags_20220706.zip

Files (139.4 MB)

Name Size Download all
md5:d0aeea9d423794ef253defd6b830f2bd
139.4 MB Preview Download

Additional details

Related works

Is compiled by
Software: https://github.com/GLAM-Workbench/trove-lists (URL)
Is documented by
Software documentation: https://glam-workbench.net/trove-lists/ (URL)