Reproducibility of LLM-based Recommender Systems Research: Worrying Observations from an Initial Analysis
Authors/Creators
Description
This record contains a list of papers and collected artifacts used in our paper titled “Reproducibility of LLM-based Recommender Systems Research: Worrying Observations from an Initial Analysis”, visit https://faisalse.github.io/LLM-reproducibility-audit_/ for further information.
Abstract:
Large language models (LLMs) are disrupting recommender systems research and have become central to modern recommendation architectures. At the same time, LLMs introduce additional stochasticity and architectural complexity, exacerbating existing reproducibility challenges. In this study, we assess the reproducibility of recent LLM-based recommender systems research to highlight gaps in this rapidly evolving area. We define a set of LLM-specific reproducibility variables and systematically analyze a representative sample of 64 papers published at top-ranked conferences. Our findings reveal several concerns: while source code is shared in nearly two thirds of the papers, none provides all the details required to fully reproduce the reported results. Fine-tuning strategies are for example used in 39 papers, yet only about 15% supply complete fine-tuning materials. Overall, we argue that the community should not overlook emerging reproducibility issues surrounding LLMs and should adopt more rigorous standards to ensure reliable progress.
Files
Additional details
Software
- Repository URL
- https://faisalse.github.io/LLM-reproducibility-audit_/