Dataset Open Access

Reliance on Science in Patenting

Marx, Matt; Aaron Fuegi

This dataset contains both front-page and in-text citations from patents to scientific articles through 2020.  If you use the data, please cite these two articles:

1. M. Marx & A. Fuegi, "Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations."  forthcoming in Journal of Economics and Management Strategy. (http://doi.org/10.1111/jems.12455)

2. M. Marx, & A. Fuegi, "Reliance on Science: Worldwide Front-Page Patent Citations to Scientific Articles" (2020), Strategic Management Journal 41(9):1572-1594. (https://onlinelibrary.wiley.com/doi/full/10.1002/smj.3145

 

The datafile containing the citations is _pcs_mag_doi_pmid.tsv. DOIs and PMIDs provided where available. Each citation has the applicant/examiner flag, confidence score (1-10), and whether the reference was a) only on the front page, b) only in the body text, or c) in both. Each paper-patent citation also includes a preview release (think: alpha, not beta) of the temporal gap (in months) and three related measures of self-citation (i.e., was one or more of the inventors on the citing patent also an author on the cited paper). _data_description.pdf has full details. bodytextknowngood.tsv contains the known-good references for calculating recall.

The remaining files redistribute much of the *final* edition of the Microsoft Academic Graph (12/20/2021). Please also cite Sinha, A, et al. 2015. Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246. Note that jif.zip, jcif.zip, and the OECD/wos-category crosswalks are derivatives of MAG and may not be updated through the end of 2021.

These data are under an Open Data Commons Attribution license (ODC-By); use them for anything as long as you cite us! Source code for front-page matches is at https://github.com/mattmarx/reliance_on_science and for in-text is at https://github.com/mattmarx/intextcitations. Questions & feedback to support@relianceonscience.org.

This work is sponsored by the Alfred P. Sloan Foundation grant #G-2021-16822.

Files (47.0 GB)
Name Size
__datadescription.pdf
md5:112bf587b7f19d3d5ab82654b91996dc
221.0 kB Download
_pcs_mag_doi_pmid.tsv
md5:718c1d0e9e78fe4ef2229af7189f94f0
3.7 GB Download
authoridname_normalized.zip
md5:3a35d65f9241074976b1083bca7fd96e
3.1 GB Download
bodytextknowngood.tsv
md5:0d20284aadeb443ad48eac1d00ae503f
272.1 kB Download
conferenceidname.zip
md5:d671dffead5994cfad1fa88848a1049c
82.8 kB Download
intlpatfamily.zip
md5:5bb26fd59a0f9b9e2a44a4a124d44b6c
1.0 GB Download
jcif.zip
md5:c2f351238565d2216136aeaacdf55914
5.2 MB Download
jif.zip
md5:7c66b0a4d51721179ce103ce9fdb35c9
8.1 MB Download
journalidnameissn.zip
md5:12a865c40b44735fe82557bd42ff2152
1.5 MB Download
magfield_oecd_wos_crosswalk.zip
md5:bbe297e3f6a71b79d3b754ab00c3eba0
2.2 GB Download
paperauthoridaffiliationname.zip
md5:4de658f319d6243f182fa4f34f3f2669
9.3 GB Download
paperauthororder.zip
md5:ae79bbdfc7820c2f4841ab8f3f965449
4.4 GB Download
papercitations.zip
md5:2c3434f1ca91478901fa79bea665370b
10.9 GB Download
paperconferenceid.zip
md5:5434339c22fda4ae7b03a34ad496fd55
550.9 MB Download
paperjournalid.zip
md5:6874e40f9e0f868e39501d9d8ed3fc74
973.4 MB Download
papertitle.zip
md5:2b2466e4cda4e82184f067e5fede6cc5
8.7 GB Download
papervolisspages.zip
md5:f272c7ac3db9f98f7b5d757c2efd5d3d
1.4 GB Download
paperyear.zip
md5:1153ec5319607a6dff643952a5393f12
752.2 MB Download
  • Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)

  • Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246

38,450
53,737
views
downloads
All versions This version
Views 38,450870
Downloads 53,737394
Data volume 140.5 TB404.7 GB
Unique views 31,138749
Unique downloads 20,376286

Share

Cite as