Published February 7, 2022 | Version v1
Dataset Open

ir_metadata: An Extensible Metadata Schema for Information Retrieval Experiments

  • 1. TH Köln

Description

This dataset accompanies our work that introduces a metadata schema for TREC run files based on the PRIMAD model. PRIMAD considers essential components of computational experiments that possibly can affect reproducibility on a conceptual level. We propose to align the metadata annotations to the PRIMAD components. In order to demonstrate the potential of metadata annotations, we curated a dataset with run files derived from experiments with different instantiations of PRIMAD components and annotated these with the corresponding metadata. With this work, we hope to stimulate IR researchers to annotate run files and improve the reuse value of experimental artifacts even further.

 

This archive contains the following data:

  • demo.tar.xz : Selected annotated runs files that are used in the Colab demonstration.

  • metadata.zip : YAML files containing only the metadata annotations for each run.

  • runs.zip : The entire set of run files with annotations.

 

The annotated runs result from the following experiments:

Files

metadata.zip

Files (4.4 GB)

Name Size Download all
md5:90daab0fce2d1a32f63570c54e19dbd3
534.2 MB Download
md5:2e44c09977c46ac46252d4d90d88b512
1.0 MB Preview Download
md5:7793ec9d49ced4c1d5a7b0b0e7bc9fae
3.8 GB Preview Download