Published October 6, 2023 | Version camera ready
Conference paper Open

A Dense Retrieval System and Evaluation Dataset for Scientific Computational Notebooks

  • 1. University of Amsterdam

Description

The discovery and reutilization of scientific codes

are crucial in many research activities. Computational notebooks

have emerged as a particularly effective medium for sharing

and reusing scientific codes. Nevertheless, effectively locating

relevant computational notebooks is a significant challenge. First,

computational notebooks encompass multi-modal data comprising

unstructured text, source code, and other media, posing

complexities in representing such data for retrieval purposes.

Second, the absence of evaluation datasets for the computational

notebook search task hampers fair performance assessments

within the research community. Prior studies have either treated

computational notebook search as a code-snippet search problem

or focused solely on content-based approaches for searching

computational notebooks. To address the aforementioned difficulties,

we present DeCNR, tackling the information needs of

researchers in seeking computational notebooks. Our approach

leverages a fused sparse-dense retrieval model to represent

computational notebooks effectively. Additionally, we construct

an evaluation dataset including actual scientific queries, computational

notebooks, and relevance judgments for fair and objective

performance assessment. Experimental results demonstrate that

the proposed method surpasses baseline approaches in terms of

F1@5 and NDCG@5. The proposed system has been implemented

as a web service shipped with REST APIs, allowing seamless

integration with other applications and web services.

Files

2023.conference.escience.nali.camera.pdf

Files (4.3 MB)

Name Size Download all
md5:9bc220a00f8ee3243bab75e15c307e27
4.3 MB Preview Download

Additional details

Funding

European Commission
CLARIFY - CLoud ARtificial Intelligence For pathologY 860627
European Commission
ENVRI-FAIR - ENVironmental Research Infrastructures building Fair services Accessible for society, Innovation and Research 824068