"We share our code online": Why this is not enough to ensure reproducibility and progress in recommender systems research
Authors/Creators
Description
This record hosts the list of papers and the evaluation data that are part of the RecSys 2025 reproducibility analysis "We share our code online": Why this is not enough to ensure reproducibility and progress in recommender systems research. More information can also be found at the corresponding website.
Abstract:
Issues with reproducibility have been identified as a major factor hampering progress in recommender systems research. In response, researchers increasingly share the code of their models. However, the provision of only the code of the proposed model is usually not sufficient to ensure reproducibility. In many works, the central claim is that a new model is advancing the state-of-the-art. Thus, it is crucial that the entire experiment is reproducible, including the configuration and the results of the considered baselines. With this work, our goal is to gauge the level of reproducibility in algorithms research in recommender systems. We systematically analyzed the reproducibility level of 65 papers published at a top-ranked conference during the last three years. Our results are sobering. While the model code is shared in about two thirds of the papers, the code of the baselines is provided only in eight cases. The hyperparameters of the baselines are reported even less frequently, and how these were exactly determined is not explained in any paper. As a result, it is commonly not only impossible to reproduce the full result tables reported in the papers, it is also unclear if the claimed improvements over the state-of-the-art were actually achieved. Overall, we conclude that the research community has not reached the required level of reproducibility yet. We therefore call for more rigorous reproducibility standards to ensure progress in this field.