Farms to Freeways Corpus Tools
Authors/Creators
Description
This repository documents how to build a language corpus from the Farms to Freeways history project data.
The data is published on its own domain as an Omeka Classic site available in an Omeka Repository which is considered the published version of the collection.
The data are archived at Western Sydney University. This does not appear to have a persistent ID and the web page is "orphaned" in that it does not have links to the data repository (which appears to be an instance of ReDBox, maintained by QCIF).
The transcripts in the Omeka repository are in PDF format and speaker turns are only indicated using bold-face text.
There are some plain text versions available but they don't have speaker turns indicated.
This repository contains scripts to:
- Download the published version of Farms to Freeways as an RO-Crate
- Derive CSV-formatted transcripts from the PDF versions, which have been formatted to indicate which speaker is speaking in each turn (the interviewer is in bold text). These transcripts don't have the IDs of the speakers but can be used to distinguish interviewer from interviewee.
If you got this dataset from Zenodo as a download then the data is already in this dataset.
Files
f2f-archive.zip
Files
(3.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:71797cad86aa8a0f13774d949c011685
|
3.1 GB | Preview Download |
Additional details
Related works
- Is variant form of
- Dataset: https://research-data.westernsydney.edu.au/published/31f45ab0519411ecb15399911543e199/ (Other)
Software
- Repository URL
- https://github.com/Language-Research-Technology/corpus-tools-farms-to-freeways
- Programming language
- Python , JavaScript
- Development Status
- Active