tvarchive Dataset
Authors/Creators
Description
The tvarchive dataset contains word-frequency and other non-consumptive-use data about 1,205,844 English-language transcriptions of U.S. television news broadcasts. The documents were scraped from the Internet Archive's TV News Archive, which includes automatic captions of select U.S. news broadcasts since 2009. While the complete TV News Archive contains over 2.2 million transcripts, WE1S researchers were only able to collect about 1.2 million documents containing complete transcripts. The full TV News Archive includes transcripts from 33 networks and hundreds of shows. Unlike other WE1S datasets, the tvarchive dataset was not collected using keyword searches for specific terms (i.e., documents containing the word "humanities"). (See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")
Notes
Files
Files
(32.1 GB)
| Name | Size | |
|---|---|---|
|
md5:48c843b98370b4bd34f292c41c95a60a
|
32.1 GB | Download |