There is a newer version of the record available.

Published August 31, 2019 | Version v17
Dataset Open

Reliance on Science in Patenting

  • 1. Boston University

Description

This dataset contains citations from USPTO patents granted 1947-2018 to articles captured by the Microsoft Academic Graph (MAG) from 1800-2018.  If you use the data, please cite these two papers:

for the dataset of citations: Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686). 

for the underlying dataset of papers Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.

The main file, pcs.tsv, contains the resolved citations. Fields are tab-separated. Each match has the patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct. Note that this distribution does not contain matches with confidence 2 or 1.

There is also a PubMed-specific match in pcs-pubmed.tsv.

The remaining files are a redistribution of the 1 January 2019 release of the Microsoft Academic Graph. All of these files are compressed using ZIP compression under CentOS5. Original files, documented at https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema, can be downloaded from https://aka.ms/msracad; this redistribution carves up the original files into smaller, variable-specific files that can be loaded individually (see _relianceonscience.pdf for full details).

Source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.

Although MAG contains authors and affiliations for each paper, it does not contain the location for affiliations. We have created a dataset of locations for affiliations appearing at least 100x using Bing Maps and Google Maps; however, it is unclear to us whether the API licensing terms allow us to repost their data. In any case, you can download our source code for doing so here: https://github.com/ksjiaxian/api-requester-locations.

MAG extracts field keywords for each paper (paperfieldid.zip and fieldidname.zip) --more than 200,000 fields in all! When looking to study industries or technical areas you might find this a bit overwhelming. We mapped the MAG subjects to six OECD fields and 39 subfields, defined here: http://www.oecd.org/science/inno/38235147.pdf. Clarivate provides a crosswalk between the OECD classifications and Web of Science fields, so we include WoS fields as well. This file is magfield_oecd_wos_crosswalk.zip.

Files

_relianceonscience.pdf

Files (45.4 GB)

Name Size Download all
md5:44fa128649af5452cc7e324205854752
1.1 MB Preview Download
md5:0917e7304059b52619782aa4a5f1f24a
2.8 GB Preview Download
md5:9e35a6df4f3f6b0fe525eed10afae3d3
3.0 GB Preview Download
md5:f8501b603ac284a7c168d72a1511ad36
78.9 kB Preview Download
md5:a68b721d656a7be3ca6efb677d0a39b0
4.2 MB Preview Download
md5:c2f351238565d2216136aeaacdf55914
5.2 MB Preview Download
md5:7c66b0a4d51721179ce103ce9fdb35c9
8.1 MB Preview Download
md5:4fb35d70897e46a5b3f1ac9a723c095a
1.3 MB Preview Download
md5:bbe297e3f6a71b79d3b754ab00c3eba0
2.2 GB Preview Download
md5:3d7dbb590fa0f834a938e3897b71f4f5
4.3 GB Preview Download
md5:9705a0dc6d517b2336ecc148ba591982
3.5 GB Preview Download
md5:84c293aba31f57bbb85d2e6d5f65dfce
7.8 GB Preview Download
md5:cfde2972be81f7db051edc37e903ac91
448.7 MB Preview Download
md5:ae6a01a43054910834667f6763c4b13e
1.3 GB Preview Download
md5:78e5e3e144a42e8b22bc1f85c2b8ed3e
5.7 GB Preview Download
md5:d9a425c7c183d3a12762d0bf1ced17f2
807.1 MB Preview Download
md5:95c371e6e21169c13e1c5b3e6b7b8aab
6.9 GB Preview Download
md5:43535c579a791b6f07d11b1c3c381c4f
1.1 GB Preview Download
md5:d0067ff44ce5aee7db1be8e51398f950
620.2 MB Preview Download
md5:c55c296aa57a98e543383c4b0a8b06cc
1.5 GB Download
md5:486dead83a2c7f1a5f6e23e58960ed0e
3.4 GB Download

Additional details

References

  • Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)
  • Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246