Presentations: AI & Software Preservation

Pride David; Ganguly Raman

doi:10.5281/zenodo.20778591

Published June 9, 2026 | Version v1

Presentation Open

Presentations: AI & Software Preservation

2 presentations:

Large Language Models for Software Mention Extraction
David Pride (1), Matteo Guenci (2), Martin Dočekal (3), Silvio Peroni (2), Petr Knoth (1)
(1) CORE, KMi, The Open University, United Kingdom; (2) University of Bologna; (3) Brno University of Technology

A large proportion of scientific studies now rely on software and data as an integral component of the research process. Significant time and resources are committed to the development of research software yet, too often, these valuable assets lie languishing, hidden in the original research paper that presented them. Ensuring the availability of software and data, and directly linking these assets to the research that first introduced them, is a key component in addressing current problems faced by many scientists when attempting to replicate earlier studies. There have been a number of efforts in recent years to develop methodologies for the extraction and classification of software mentions found in full text scholarly documents. In this presentation, we will discuss how large language models can match current SotA approaches to the problem utilising zero-shot methods that require no pre-training.

From Proof of Concept to Practice: A Repository for Preserving and Managing Running Applications
Raman Ganguly, University of Vienna, Austria

The long-term preservation of research outputs increasingly depends on software preservation, as data in many disciplines are inseparable from the computational tools used to generate and interpret them. While repositories have established robust practices for preserving data and publications, software remains fragile due to short lifecycles, complex dependencies, and rapidly evolving technological environments. Archiving source code alone is often not enough.

This presentation reports on the progress of a repository-based approach to preserving running applications. Building on an earlier proof of concept, the project has evolved into an operational repository service. Recent developments include improved container ingestion workflows, clearer separation between preservation and execution layers, standardized packaging practices, and controlled, sandboxed execution environments that lower technical barriers while addressing security concerns.

In parallel, the project has focused on collaboration with researchers, particularly in the digital humanities, to better understand software development practices and to develop guidance, documentation, and support structures for long-term preservation. Rather than proposing a single solution, this work outlines a practical path forward and highlights open questions regarding standards, interoperability, and FAIR principles for research software, contributing to ongoing community discussions on the preservation of executable software.

Files

437.mp4

Files (345.0 MB)

Name	Size
437.mp4 md5:a904d5504f3897248d61c4b9e3b604a1	344.9 MB	Preview Download
437.vtt md5:f71535607db63d8abacd6b4a076b1381	51.7 kB	Download

Additional details

Has part: 10.5281/zenodo.20789120 (DOI); 10.5281/zenodo.20788974 (DOI)

	All versions	This version
Views	9	9
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Presentations: AI & Software Preservation

Authors/Creators

Description

Files

437.mp4

Files (345.0 MB)

Additional details

Related works