CPC Patent Classification: USPTO-70k-enriched

Pujari, Subhash Chandra

doi:10.5281/zenodo.6992298

Published August 15, 2022 | Version v1

Dataset Open

CPC Patent Classification: USPTO-70k-enriched

Pujari, Subhash Chandra¹

1. Bosch Center for AI

The patent classification task falls under the category of hierarchical multi-label classification. A patent document contains `title`, `abstract`, `claims` and `description` as four textual fields. Because of the large text, most of the previous work focused on title, abstract and claims as patent fields. In the paper, we make use description as a more elaborate patent field. For evaluation, we create a new dataset (USPTO-70k-enriched) from the previously releasd USPTO-70k dataset which contains title and abstract as patent fields.

Now, the dataset is enriched with four additional text columns, claims, brief-summary, fig-desc, detail-desc, where the later three columns are the subfield of description. Both the datasets are created from the bulk-data-dump provided by United States Patent and Trademark Office (USPTO) released under CC-BY-4.0.

We also release the dataset under the same license, CC-BY-4.0.

Files

bir_dataset_2022.zip

Files (871.1 MB)

Name	Size	Download all
bir_dataset_2022.zip md5:2c3eb45f98a93d42b2f14227b303e81b	871.1 MB	Preview Download

347

Views

Downloads

Show more details

	All versions	This version
Views	347	345
Downloads	82	82
Data volume	90.6 GB	90.6 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

BIR 2022: 12th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2022 (BIR 2022), April 10, 2022

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 15, 2022
Modified: August 15, 2022

CPC Patent Classification: USPTO-70k-enriched

Authors/Creators

Description

Files

bir_dataset_2022.zip

Files (871.1 MB)