PatFigVQA Dataset - Patent Figure Visual Question Answering Dataset
Authors/Creators
Description
Dataset Summary
The PatFigVQA dataset is introduced in the paper Patent Figure Classification using Large Vision-language Models accepted at ECIR 2025. The dataset is designed specifically for fine-tuning and evaluating Large Vision-language Models (LVLMs) in few-shot learning setting across multiple aspects, including type, projection, objects and USPC class.
The PatFigVQA dataset is used alongside another dataset called PatFigCLS, which is intended for patent figure classification and evaluation.
The dataset is sourced from two exisiting datasets:
Data Format
The dataset is stored in .tar files for fast and efficient read access.
Data Fields
__key__: unique sample idimage.png: patent figure filetask.txt: type of question (e.g. binary, multiple-choice or open-ended)question.txt: natural language question asking about the concept depicted in figureanswer.txt: textual answer for the questionconcept.txt: concept depicted in the patent figure for the given aspect
Data Splits
train_9, train_18, train_27, train_54, train_81, train_150, val and test.How to Use
The recommended approach is using the Python library `webdataset`. Below is an example code.
import iofrom PIL import Imagefrom torchvision.transforms import Compose, ToTensorimport webdataset as wdsfrom braceexpand import braceexpanddef transform(image): return Compose([ToTensor()])(image)dataset = ( wds.WebDataset( braceexpand('PatFigVQA/object/train_150/shard-{000000..000042}.tar'), shardshuffle=1000, ) .shuffle(1000) .to_tuple( '__key__', 'image.png', 'concept.txt', 'task.txt', 'question.txt', 'answer.txt', ) .map_tuple( lambda key: key, lambda image: transform(Image.open(io.BytesIO(image))), lambda concept: concept.decode('utf-8'), lambda task: task.decode('utf-8'), lambda question: question.decode('utf-8'), lambda answer: answer.decode('utf-8'), ))dataloder = wds.WebLoader(dataset)Source Code
The source code used to produce this dataset can be found at https://github.com/TIBHannover/patent-figure-classification
Licensing Information
PatFigCLS dataset is released under GNU General Public License v3.0.
Files
PatFigVQA.zip
Files
(15.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e41b59bad855858c6a0d7bfd1d6497a3
|
15.6 GB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.10019328 (DOI)
- Dataset: 10.7910/DVN/UG4SBD (DOI)
- Is supplement to
- Dataset: 10.5281/zenodo.14905551 (DOI)
Funding
- European Patent Organisation
- Academic Research Programme of the European Patent Office (project "ViP@Scale: Visual and multimodal patent search at scale")
Software
- Repository URL
- https://github.com/TIBHannover/patent-figure-classification
- Programming language
- Python