Published January 30, 2023 | Version 1.0.8
Dataset Open

DUDE competition train - validation - test splits ground truth

Authors/Creators

  • 1. KU Leuven

Description

This JSON file contains the ground truth annotations for the train and validation set of the DUDE competition (https://rrc.cvc.uab.es/?ch=23&com=tasks) of ICDAR 2023 (https://icdar2023.org/).

 

V1.0.7 release: 41454 annotations for 4974 documents (train-validation-test)

DatasetDict({
    train: Dataset({
        features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
        num_rows: 23728
    })
    val: Dataset({
        features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
        num_rows: 6315
    })
    test: Dataset({
        features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
        num_rows: 11402
    })
})

++update on answer_type
+++formatting change to answers_variants
++++stricter check on answer_variants & rename annotations file

+ blind test set (no ground truth answers provided)
++ removed duplicates from test set: 

    "92bd5c758bda9bdceb5f67c17009207b_ac6964cbdf483e765b6668e27b3d0bc4",

    "6ee71a16d4e4d1dbd7c1f569a92d4e08_549f2a163f8ff3e9f0293cf59fdd98bc",

    "e6f3855472231a7ca6aada2f8e85fe5a_827c03a72f2552c722f2c872fd7f74c3",

    "e3eecd7cca5de11f1d17cd94ae6a8d77_6300df64e4cf6ba0600ac81278f68de2",

    "107b4037df8127a92ee4b6ae9b5df8fb_d7a60e7a9fc0b27487ea39cd7f56f98e",

    "300cc3900080064d308983f958141232_6a7cf1aad908d58a75ab8e02ddc856f4",

    "fdd3308efacddb88d4aa6e2073f481d4_138cb868ecc804a63cc7a4502c0009b2",

    "1f7de256ff1743d329a8402ba0d132e7_95b6e8758533a9817b9f20a958e7b776",

    "4f399b8c526ffb6a2fd585a18d4ed5ec_51097231bc327c26c59a4fd8d3ff3069",

 

Notes

Binaries are hosted elsewhere for now, see https://huggingface.co/datasets/jordyvl/DUDE_loader/tree/main/data

Files

2023-03-23_DUDE_gt_test_PUBLIC.json

Files (14.1 MB)

Name Size Download all
md5:bf252b2dc57d501b2d3110a4e7e6e9c5
14.1 MB Preview Download