These files include the raw data and some analysis for the paper "The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles". The remainder of the analysis can be run using the R scripts here: https://github.com/Impactstory/oadoi-paper1 For more details on oaDOI, including using its API to get more data and updated data, see http://oadoi.org Analysis files: accuracy_analysis.xlsx wos_analysis.xlsx Raw data files: crossref_100k.csv.gz wos_100k.csv unpaywall_100k.csv.gz Columns for raw data files: doi: the DOI, from crossref evidence: the response from oaDOI oa_color_long: the OA "color" of the open copy we found. See paper for details. best_open_url: the url of the open copy we found year: the year of the article, from crossref found_green: true if we found a green copy, even if we also found a hybrid, gold, or bronze copy. See paper for details. journal: the journal of the article, from crossref publisher: the publisher of the article, from crossref license: the license of the paper, when we found one random: a random number