DUDE competition train - validation - test splits ground truth
Description
This JSON file contains the ground truth annotations for the train and validation set of the DUDE competition (https://rrc.cvc.uab.es/?ch=23&com=tasks) of ICDAR 2023 (https://icdar2023.org/).
V1.0.7 release: 41454 annotations for 4974 documents (train-validation-test)
DatasetDict({
train: Dataset({
features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
num_rows: 23728
})
val: Dataset({
features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
num_rows: 6315
})
test: Dataset({
features: ['docId', 'questionId', 'question', 'answers', 'answers_page_bounding_boxes', 'answers_variants', 'answer_type', 'data_split', 'document', 'OCR'],
num_rows: 11402
})
})
++update on answer_type
+++formatting change to answers_variants
++++stricter check on answer_variants & rename annotations file
+ blind test set (no ground truth answers provided)
++ removed duplicates from test set:
"92bd5c758bda9bdceb5f67c17009207b_ac6964cbdf483e765b6668e27b3d0bc4",
"6ee71a16d4e4d1dbd7c1f569a92d4e08_549f2a163f8ff3e9f0293cf59fdd98bc",
"e6f3855472231a7ca6aada2f8e85fe5a_827c03a72f2552c722f2c872fd7f74c3",
"e3eecd7cca5de11f1d17cd94ae6a8d77_6300df64e4cf6ba0600ac81278f68de2",
"107b4037df8127a92ee4b6ae9b5df8fb_d7a60e7a9fc0b27487ea39cd7f56f98e",
"300cc3900080064d308983f958141232_6a7cf1aad908d58a75ab8e02ddc856f4",
"fdd3308efacddb88d4aa6e2073f481d4_138cb868ecc804a63cc7a4502c0009b2",
"1f7de256ff1743d329a8402ba0d132e7_95b6e8758533a9817b9f20a958e7b776",
"4f399b8c526ffb6a2fd585a18d4ed5ec_51097231bc327c26c59a4fd8d3ff3069",
Notes
Files
2023-03-23_DUDE_gt_test_PUBLIC.json
Files
(14.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:bf252b2dc57d501b2d3110a4e7e6e9c5
|
14.1 MB | Preview Download |