Published March 24, 2015 | Version v1
Conference paper Open

CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central

  • 1. National Institute of Informatics, Tokyo


Citation-based similarity measures such as Bibliographic Coupling and Co-Citation are an integral component of many information retrieval systems. However, comparisons of the strengths and weaknesses of measures are challenging due to the lack of suitable test collections. This paper presents CITREC, an open evaluation framework for citation-based and text-based similarity measures. CITREC prepares the data from the PubMed Central Open Access Subset and the TREC Genomics collection for a citation-based analysis and provides tools necessary for performing evaluations of similarity measures. To account for different evaluation purposes, CITREC implements 35 citation-based and text-based similarity measures, and features two gold standards. The first gold standard uses the Medical Subject Headings (MeSH) thesaurus and the second uses the expert relevance feedback that is part of the TREC Genomics collection to gauge similarity. CITREC additionally offers a system that allows creating user defined gold standards to adapt the evaluation framework to individual information needs and evaluation purposes.


New Project Website:



Files (616.3 kB)

Name Size Download all
616.3 kB Preview Download

Additional details

Related works

Is identical to
Conference paper: 2142/73680 (Handle)
Is supplemented by
Dataset: 10.5281/zenodo.3598421 (DOI)
Software: 10.5281/zenodo.3598367 (DOI)
Software: 10.5281/zenodo.3598371 (DOI)