There is a newer version of the record available.

Published December 5, 2019 | Version v17
Dataset Open

Reliance on Science in Patenting

  • 1. Boston University

Description

This dataset contains citations from the front pages of worldwide patents  to articles captured by the Microsoft Academic Graph (MAG) from 1800-2018.  If you use the data, please cite these two papers:

for the dataset of citations: Marx, Matt and Aaron Fuegi, "Reliance on Science: Worldwide Front-Page Patent Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686). 

for the underlying dataset of papers Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.

The main file, pcs.tsv, contains the resolved citations. Fields are tab-separated. Each match has the patent number, MAG ID, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct. Note that this distribution does not contain matches with confidence 2 or 1. Non-USPTO patents have the country abbreviation at the start of the patent number followed by a hyphen. 

There is also a PubMed-specific match in pcs-pubmed.tsv. This file is currently limited to citations from USPTO patents.

The remaining files are a redistribution of the 1 January 2019 release of the Microsoft Academic Graph. All of these files are compressed using ZIP compression under CentOS5. Original files, documented at https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema, can be downloaded from https://aka.ms/msracad; this redistribution carves up the original files into smaller, variable-specific files that can be loaded individually (see _relianceonscience.pdf for full details).

Source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.

MAG extracts field keywords for each paper (paperfieldid.zip and fieldidname.zip) --more than 200,000 fields in all! When looking to study industries or technical areas you might find this a bit overwhelming. We mapped the MAG subjects to six OECD fields and 39 subfields, defined here: http://www.oecd.org/science/inno/38235147.pdf. Clarivate provides a crosswalk between the OECD classifications and Web of Science fields, so we include WoS fields as well. This file is magfield_oecd_wos_crosswalk.zip.

Files

_relianceonscience.pdf

Files (42.3 GB)

Name Size Download all
md5:019c978ea8c12fcb78ef0d78a9520da2
1.1 MB Preview Download
md5:0917e7304059b52619782aa4a5f1f24a
2.8 GB Preview Download
md5:9e35a6df4f3f6b0fe525eed10afae3d3
3.0 GB Preview Download
md5:f8501b603ac284a7c168d72a1511ad36
78.9 kB Preview Download
md5:a68b721d656a7be3ca6efb677d0a39b0
4.2 MB Preview Download
md5:c2f351238565d2216136aeaacdf55914
5.2 MB Preview Download
md5:7c66b0a4d51721179ce103ce9fdb35c9
8.1 MB Preview Download
md5:4fb35d70897e46a5b3f1ac9a723c095a
1.3 MB Preview Download
md5:bbe297e3f6a71b79d3b754ab00c3eba0
2.2 GB Preview Download
md5:3d7dbb590fa0f834a938e3897b71f4f5
4.3 GB Preview Download
md5:9705a0dc6d517b2336ecc148ba591982
3.5 GB Preview Download
md5:84c293aba31f57bbb85d2e6d5f65dfce
7.8 GB Preview Download
md5:cfde2972be81f7db051edc37e903ac91
448.7 MB Preview Download
md5:ae6a01a43054910834667f6763c4b13e
1.3 GB Preview Download
md5:78e5e3e144a42e8b22bc1f85c2b8ed3e
5.7 GB Preview Download
md5:d9a425c7c183d3a12762d0bf1ced17f2
807.1 MB Preview Download
md5:95c371e6e21169c13e1c5b3e6b7b8aab
6.9 GB Preview Download
md5:43535c579a791b6f07d11b1c3c381c4f
1.1 GB Preview Download
md5:d0067ff44ce5aee7db1be8e51398f950
620.2 MB Preview Download
md5:c55c296aa57a98e543383c4b0a8b06cc
1.5 GB Download
md5:c951c01569247023737e9858aae76792
154.4 MB Preview Download
md5:f6fa79f9c3f9e37f7528a0aca2263d07
212.4 MB Preview Download

Additional details

References

  • Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)
  • Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246