Published July 4, 2021 | Version v1
Dataset Open

comparison_sciences Dataset


The comparison_sciences dataset contains word-frequency and other non-consumptive-use data about 553,699 unique English-language news documents (no duplicate or close-variant documents) that contain the words "science" or "sciences." The documents came from U.S. mainstream and student news sources published during 1977-2019 (though mostly from 1985-2019). WE1S researchers use this data to understand how public discourse about the humanities compares to public discourse about science.

We gathered this data using keyword searches for "science," which found articles containing either (or both) the words "science" and "sciences." We took data from the top 10 circulating newspapers in the U.S. and from University Wire sources (student newspapers). Documents in this dataset may also contain the word "humanities," just as documents in the humanities_keyword dataset may contain the words "science" or "sciences."

(See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")


WE1S makes available word frequency data only "non-consumptive use". This dataset cannot be used to access, read, or reconstruct the original texts.

The data has been archived in jsonl format (each json document is delimited by a line break).


Files (33.7 GB)

Name Size Download all
33.7 GB Download