PatFigVQA Dataset - Patent Figure Visual Question Answering Dataset

Awale, Sushil; Müller-Budack, Eric; Ewerth, Ralph

doi:10.5281/zenodo.14907473

Published February 21, 2025 | Version v1

Dataset Open

PatFigVQA Dataset - Patent Figure Visual Question Answering Dataset

1. Technische Informationsbibliothek (TIB)
2. L3S Research Center

Dataset Summary

The PatFigVQA dataset is introduced in the paper Patent Figure Classification using Large Vision-language Models accepted at ECIR 2025. The dataset is designed specifically for fine-tuning and evaluating Large Vision-language Models (LVLMs) in few-shot learning setting across multiple aspects, including type, projection, objects and USPC class.

The PatFigVQA dataset is used alongside another dataset called PatFigCLS, which is intended for patent figure classification and evaluation.

The dataset is sourced from two exisiting datasets:

Extended CLEF-IP 2011, and
DeepPatent2

Data Format

The dataset is stored in .tar files for fast and efficient read access.

Data Fields

__key__: unique sample id
image.png: patent figure file
task.txt: type of question (e.g. binary, multiple-choice or open-ended)
question.txt: natural language question asking about the concept depicted in figure
answer.txt: textual answer for the question
concept.txt: concept depicted in the patent figure for the given aspect

Data Splits

For each aspect, multiple data splits exist: train_9, train_18, train_27, train_54, train_81, train_150, val and test.

How to Use

The recommended approach is using the Python library `webdataset`. Below is an example code.

import io

from PIL import Image

from torchvision.transforms import Compose, ToTensor

import webdataset as wds

from braceexpand import braceexpand

def transform(image):

return Compose([ToTensor()])(image)

dataset = (

wds.WebDataset(

braceexpand('PatFigVQA/object/train_150/shard-{000000..000042}.tar'),

shardshuffle=1000,

)

.shuffle(1000)

.to_tuple(

'__key__',

'image.png',

'concept.txt',

'task.txt',

'question.txt',

'answer.txt',

)

.map_tuple(

lambda key: key,

lambda image: transform(Image.open(io.BytesIO(image))),

lambda concept: concept.decode('utf-8'),

lambda task: task.decode('utf-8'),

lambda question: question.decode('utf-8'),

lambda answer: answer.decode('utf-8'),

)

dataloder = wds.WebLoader(dataset)

Source Code

The source code used to produce this dataset can be found at https://github.com/TIBHannover/patent-figure-classification

Licensing Information

PatFigCLS dataset is released under GNU General Public License v3.0.

Files

PatFigVQA.zip

Files (15.6 GB)

Name	Size	Download all
PatFigVQA.zip md5:e41b59bad855858c6a0d7bfd1d6497a3	15.6 GB	Preview Download

Additional details

Is derived from: Dataset: 10.5281/zenodo.10019328 (DOI); Dataset: 10.7910/DVN/UG4SBD (DOI)
Is supplement to: Dataset: 10.5281/zenodo.14905551 (DOI)

European Patent Organisation
Academic Research Programme of the European Patent Office (project "ViP@Scale: Visual and multimodal patent search at scale")

Repository URL: https://github.com/TIBHannover/patent-figure-classification
Programming language: Python

	All versions	This version
Views	159	159
Downloads	22	22
Data volume	357.7 GB	357.7 GB

Dataset Summary

Data Format

Data Fields

Data Splits

How to Use

Source Code

Licensing Information

PatFigVQA.zip

Files (15.6 GB)

Related works

Funding

Software

PatFigVQA Dataset - Patent Figure Visual Question Answering Dataset

Authors/Creators

Description

Dataset Summary

Data Format

Data Fields

Data Splits

How to Use

Source Code

Licensing Information

Files

PatFigVQA.zip

Files (15.6 GB)

Additional details

Related works

Funding

Software