Published January 25, 2024 | Version v1
Dataset Open

Using a Machine Learning Regression Approach to Predict the Aroma Partitioning in Diary Matrices - Accompanying material

  • 1. ROR icon University of Hohenheim
  • 2. ROR icon Kempten University of Applied Sciences

Description

These files are accompanying material for our submission "Using a Machine Learning Regression Approach to Predict the Aroma Partitioning in Diary Matrices" to MDPI Processes:

Aroma partitioning in food is a challenging area of research due to the contribution of several physical and chemical factors that affect the binding and release of aroma in food matrices. The partition coefficient measured by the Kmg value refers to the partition coefficient that describes how aroma compounds distribute themselves between matrices and a gas phase, such as between different components of a food matrix and air. This study introduces a regression approach to predict the Kmg value of aroma compounds of a wide range of physicochemical properties in dairy matrices representing products of different compositions and/or processing. The approach consists of data cleaning, grouping based on the temperature of Kmg analysis, pre-processing (log transformation and normalization), and, finally, the development and evaluation of prediction models with regression methods. We compared regression analysis with linear regression (LR) to five machine-learning-based regression algorithms: Random Forest Regressor (RFR), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost, XGB), Support Vector Regression (SVR), and Artificial Neural Network Regression (NNR). Explainable AI (XAI) was used to calculate feature importance and therefore identify the features that mainly contribute to the prediction. The top three features that were identified are log P, specific gravity, and molecular weight. For the prediction of the Kmg in dairy matrices, R2 scores of up to 0.99 were reached. For 37.0 °C, which resembles the temperature of the mouth, RFR delivered the best results, and, at lower temperatures of 7.0 ◦C, typical for a household fridge, XGB performed best. The results from the models work as a proof of concept and show the applicability of a data-driven approach with machine learning to predict the Kmg value of aroma compounds in different dairy matrices.

We provided two folders with the results and scripts, described by documentation:

  • Results_Aroma_Regression (Feature_Importance, Histogram_Plots)
  • Scripts_Aroma_Regression (Pipeline.py, regr_simple.py, regr_preprocessed.py)

 

 

 

Files

Results_Aroma_Regression.zip

Files (955.6 kB)

Name Size Download all
md5:a1bf16664a2227219d7bc19ef686ab6c
949.7 kB Preview Download
md5:087f66acf23e6f907136a7e5625b4c91
5.9 kB Preview Download

Additional details

Related works

Continues
10.3390/ECP2023-14707 (DOI)