Published December 1, 2015 | Version v1
Software Open

Farms to Freeways Corpus Tools

Contributors

  • 1. ROR icon University of Queensland

Description

This repository documents how to build a language corpus from the Farms to Freeways history project data.

The data is published on its own domain as an Omeka Classic site available in an Omeka Repository which is considered the published version of the collection.

The data are archived at Western Sydney University. This does not appear to have a persistent ID and the web page is "orphaned" in that it does not have links to the data repository (which appears to be an instance of ReDBox, maintained by QCIF).

The transcripts in the Omeka repository are in PDF format and speaker turns are only indicated using bold-face text.

There are some plain text versions available but they don't have speaker turns indicated.

This repository contains scripts to:

  • Download the published version of Farms to Freeways as an RO-Crate
  • Derive CSV-formatted transcripts from the PDF versions, which have been formatted to indicate which speaker is speaking in each turn (the interviewer is in bold text). These transcripts don't have the IDs of the speakers but can be used to distinguish interviewer from interviewee.

If you got this dataset from Zenodo as a download then the data is already in this dataset.

Files

f2f-archive.zip

Files (3.1 GB)

Name Size Download all
md5:71797cad86aa8a0f13774d949c011685
3.1 GB Preview Download

Additional details

Related works

Is variant form of
Dataset: https://research-data.westernsydney.edu.au/published/31f45ab0519411ecb15399911543e199/ (Other)