Dataset Open Access
On 2019-01-01, for the first time in 20 years the USA has seen an expansion of the public domain: all works published in 1923 have returned to it. In this dataset we briefly explore the scholarly works published in 1923, as indexed by CrossRef, and their availability.
In the Unpaywall dump of 2018-09-24, 73,484 DOIs are known for publications of 1923 (*1923.dois.txt). For all of these, we extracted availability status according to Unpaywall as of May 2019 using our previous methods (Leva 2017), finding only 10,767 open access works (*dois.oa.txt) and the rest presumably toll-access (*dois.notoa.txt).
A breakdown by DOI prefix and therefore publisher (*dois.prefixes.ods) shows that the 20 biggest providers (with over ~1000 works each) typically have an OA rate of 4–8 %: most notable are Springer (10.1007) with over 9700 restricted works and JSTOR (10.2307) with over 6900. Marginally better does CUP (10.1017) with 16 %, while Wiley (10.1002) reaches 14 % mostly by assigning a DOI to thousands of "mastheads" (frontispices) which typically have no meaningful content. About 40 % of works are available for Nature (10.1038), almost exclusively with items from its "News" section, and for BMJ (10.1136). The Smithsonian is the only provider with a majority of OA works (66 %), thanks to the Biodiversity Heritage Library (BHL) which provides over 1400 works on the Internet Archive.
Our list does not distinguish between "gold" and "green" open access: a portion of the available works is provided by open archives rather than the publishers. Moreover, some works are freely available and not detected as such (or vice versa) due to Unpaywall's unavoidable limitations.
As illustrated by the Hirtle chart (Hirtle, 2019), works published before 1924 are nearly automatically in the public domain in USA, although exceptions may exist for works first published outside USA. Therefore, nearly all of the toll-access works found above represent an enclosure of the public domain and potentially a case of copyfraud if false statements of copyright ownership are attached. Similar considerations may be extended to works first published outside of the USA, which may be considered simultaneously published in the USA (within 30 days) due to the very nature of international journals (which were necessarily distributed in the USA and the mailing of which could benefit, in the 1920s, of a transatlantic crossing taking less than a week).
We conclude that there is a significant opportunity for academic institutions, libraries and open archives to legally and easily expand access to tens of thousands of historical papers, as long as the publishers choose not to.
Hirtle, Peter B. (2019). Copyright Term and the Public Domain in the United States. Copyright, Fair Use, Scholarly Communication, etc.. 106. https://digitalcommons.unl.edu/scholcom/106
Leva, Federico (2017). DOIs linked by the English Wikipedia which could be made available in green Open Access [Data set]. Zenodo. http://doi.org/10.5281/zenodo.997222
Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S. (2018). The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 6:e4375 https://doi.org/10.7717/peerj.4375