Examining Patented Artefact Knowledge Graphs to understand Linguistic and Structural Basis

Siddharth, L.

doi:10.5281/zenodo.13328258

Published August 16, 2024 | Version v1

Dataset Open

Examining Patented Artefact Knowledge Graphs to understand Linguistic and Structural Basis

Siddharth, L. (Researcher)¹

1. Singapore University of Technology and Design

Contributors

Supervisor:

Jianxi, Luo¹

1. City University of Hong Kong

Introduction

This resource is uploaded in support of our research that involves examining knowledge graphs of patented artefacts to understand the linguistic and structural basis of engineering design knowledge. The research is detailed in the following paper.

https://arxiv.org/abs/2312.06355

The resource is segregated into multiple Pandas dataframes in pickled format – “.pkl”. To access any dataframe, please use the following Python code.

import pandas as pd

data = pd.read_pickle("PATH TO FILE.pkl")
print(data.head())

The individual datasets are described as follows.

Series information (English)

Patent Data

The original patent data is given in "engineering-design-knowledge/1-patent-data/"

For our analysis, we sampled 33,881 patents from USPTO using Patents View such that these are stratified according to the CPC subclasses.

Patent List

In the dataframe "1-patent-list.pkl", we provide the introductory information about these patents as follows.

FIELD	EXAMPLE
patent_id	7745779
patent_date	29/6/2010
patent_title	Color pixel arrays having common color filters for multiple adjacent pixels for use in CMOS imagers
patent_abstract	Image sensors and methods of operating image sensors. An image sensor includes an array of pixels and an array of color filters …
patent_classifications	['H04N', 'G01J']

Patent Sentences

From each patent in the dataset as described above, we acquire the full-text and process the sentences in these. In the dataframe "2-patent-sentences.pkl", we provide the list of 7,566,829 formatted sentences along with patent ID and lengths in terms of token count.

FIELD	EXAMPLE
patent_id	10716120
sentence_id	10716120_725
sentence	In certain examples, aspects of the operations of block 1915 may be performed by an uplink component as described with reference to FIGS 14 through 17.
length	28

Patent Knowledge Graphs

From the sentences as described above, we extracted 24,537,587 facts of the form - head entity :: relationship :: tail entity using a method described in our prior work - https://arxiv.org/abs/2307.06985

Combining these facts within a patent would form a patent knowledge graph that we examined in the current work to understand basis of design knowledge. The dataframe "3-patent-knowledge-graphs.pkl" provides individual facts as follows. The sentence ID in each row is same as the one mentioned in the previous dataframe - "2-patent-sentences.pkl".

FIELD	EXAMPLE
patent_id	10075499
sentence_id	10075499_98
head	the host facility
relation	with
tail	the highest average and aggregate weighting value

We combine facts as described above, within a patent, to get a knowledge graph that is used for examination in our current work. The results of our analysis are compiled into dataframes as explained below.

Series information (English)

Linguistic Basis

We analysed the frequencies of entities and relationships in the knowledge graphs populated for each patent in the sample. In the dataframes provided under "engineering-design-knowledge/2-linguistic-basis/", we provide information for 5,015,681 entities, 845,303 relationships, and 165 hierarchical relationships regarding their frequencies and linguistic syntaxes. In our work, we fit the proportions of the syntaxes to a Zipf distribution to visualise these at different percentiles.

Entity Syntaxes

FIELD	EXAMPLE
entity	the upper connecting member
frequency	27
syntax	the JJ VBG NN

Relation Syntaxes

FIELD	EXAMPLE
entity	are suitable for recovering
frequency	2
syntax	are JJ for VBG

Hierarchical Relation Syntaxes

FIELD	EXAMPLE
hierarchical_relation	comprises determining
count	1237
syntax	compris* VBG

Series information (English)

Structural Basis

We did motif analysis on the network structures of the patent knowledge graphs to identfy statistically recurrent 3-node and 4-node patterns that are building blocks for each patent knowledge graph. In the dataframe - "1-patent-motifs.pkl" under "engineering-design-knowledge/3-structural-basis/", we list the network size and the motifs for each patent in the sample.

FIELD	EXAMPLE
patent_id	9139018
node_count	85
edge_count	146
motifs	[5, 61, 80]

The pattern # mentioned in the motifs field as above could be references using the images included in the folder "pattern-figures".

Files

engineering-design-knowledge.zip

Files (610.0 MB)

Name	Size	Download all
engineering-design-knowledge.zip md5:9c53764978ebd607df260cb3777f72a8	610.0 MB	Preview Download

Additional details

arXiv: arXiv:2312.06355

Requires: Preprint: arXiv:2307.06985 (arXiv)

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	116	116
Downloads	14	14
Data volume	8.5 GB	8.5 GB

Examining Patented Artefact Knowledge Graphs to understand Linguistic and Structural Basis

Contributors

Supervisor:

Introduction

Series information (English)

Patent Data

Patent List

Patent Sentences

Patent Knowledge Graphs

Series information (English)

Linguistic Basis

Entity Syntaxes

Relation Syntaxes

Hierarchical Relation Syntaxes

Series information (English)

Structural Basis

Files

engineering-design-knowledge.zip

Files (610.0 MB)

Additional details

Identifiers

Related works

Examining Patented Artefact Knowledge Graphs to understand Linguistic and Structural Basis

Creators

Contributors

Supervisor:

Description

Introduction

Series information (English)

Patent Data

Patent List

Patent Sentences

Patent Knowledge Graphs

Series information (English)

Linguistic Basis

Entity Syntaxes

Relation Syntaxes

Hierarchical Relation Syntaxes

Series information (English)

Structural Basis

Files

engineering-design-knowledge.zip

Files (610.0 MB)

Additional details

Identifiers

Related works