Published April 15, 2026 | Version v2
Dataset Restricted

Data from: Machine learning models identify compounds that mimic alcoholic flavor perception in alcohol-free beverages

  • 1. ROR icon KU Leuven
  • 2. ROR icon Vlaams Instituut voor Biotechnologie
  • 3. ROR icon Universidade Federal do Rio Grande do Sul
  • 4. EDMO icon Katholieke Universiteit Leuven
  • 5. Tianjin Institute for Industrial Biotechnology
  • 6. VIB
  • 7. ROR icon University of Nottingham
  • 8. Leuven Institute for Beer Research (LIBR)
  • 9. VIB Center for Microbiology

Description

R Figure Generator.Rmd: The R script used to generate all main and supplemental figures of this work. This script was written in R version 4.4.1 and executed with RStudio version 2023.06.1. This script uses the "FINAL_DATASET.CSV", the "Validation tastings.xlsx", "SHAP XGBR for Body fullness.CSV", "SHAP XGBR for alcohol.CSV" and "Model predictions vs actuals XGBR.CSV" to calculate statistics and generate all figures available in the manuscript.

run_models.py: The Python script used to train, evaluate and interpret (explainable AI) the machine learning models discussed in this paper. This model was run on Python version 3.9.0. The BorutaShap package had slight issues from version incompatibility that were manually edited in the package code. This script uses the "FINAL_DATASET.CSV" dataset to generate "my_nested_cv_regressor_XGBR.pkl", "SHAP XGBR for Body fullness.CSV", "SHAP XGBR for alcohol.CSV" and "Model predictions vs actuals XGBR.CSV". It can easily be customized to train similar machine learning models for different intents and purposes.

Validation tastings.xlsx: Microsoft Excel file with the results from follow-up tastings performed to test the effectiveness of our mixture of compounds in various beverages.

SHAP XGBR for Body fullness.CSV: The results from a SHAP analysis used to explain the predictions of our best performing model for beer body.

SHAP XGBR for alcohol.CSV: The results from a SHAP analysis used to explain the predictions of our best performing model for alcoholic impression.

Model predictions vs actuals XGBR.CSV: A table with the predicted and actual scores of all beers, for the attributes modeled by our XGBR model.

my_nested_cv_regressor_XGBR.pkl: Python joblib pickle file containing the best performing model trained with nested cross-validation on the "FINAL DATASET.CSV".

FINAL DATASET.CSV: The complete beer dataset of this work (chemical + sensory), except for the RateBeer scores (these are proprietary data owned by RateBeer - contact the authors and RateBeer to gain full access).

 

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/19729708">Log in</a> to check if you have access.