Published May 7, 2026 | Version 1.0
Standard Open

Experimental dataset, analysis code, and trained models for: Comparative Evaluation of Six Machine Learning Models for Multi-Fuel Variable Compression Ratio Diesel Engine Emission Prediction Under Leave-One-Out Cross-Validation

  • 1. SIMATS, Chennai

Contributors

Description

This deposit contains the complete experimental dataset, reproducible analysis code, and serialised trained models supporting the journal article "Comparative Evaluation of Six Machine Learning Models for Multi-Fuel Variable Compression Ratio Diesel Engine Emission Prediction Under Leave-One-Out Cross-Validation" by Sathiyaseelan et al. (DOI to be inserted upon acceptance).

A reviewer or replicator should be able to download this archive, install the listed Python dependencies, and run a single command to reproduce every numerical result, table, and figure underlying the manuscript.

WHAT IS IN THIS DEPOSIT

* data/  — The 45-record steady-state emission dataset from a single-cylinder, four-stroke, water-cooled, direct-injection variable compression ratio (VCR) diesel engine (Kirloskar TV1, 661 cc, 5.2 kW at 1500 rpm), provided as both UTF-8 CSV and Microsoft Excel formats. Records cover a balanced 3 × 3 × 5 factorial design: three fuels (diesel, rubber seed oil biodiesel, Chlorella vulgaris algae oil biodiesel), three compression ratios (16:1, 17:1, 18:1), and five engine load conditions (0, 25, 50, 75, 100 % of rated load). Six exhaust gas concentrations were measured by an AVL DiGas 444N five-gas analyser: CO, HC, CO2, O2, NOx, and lambda. A detailed data dictionary documents each column, unit, instrument specification, and measurement protocol.

* src/  — Six Python modules implementing the full analysis pipeline: model definitions for linear regression, polynomial regression (degree 2), support vector regression with RBF kernel, random forest, gradient boosting, and a single-hidden-layer multilayer perceptron; leave-one-out and stratified 5-fold cross-validation; performance metrics (R², RMSE, MAE, MAPE); and a single-command entry point that reproduces every reported result.

* results/  — Canonical numerical results in JSON and CSV form: LOOCV performance for all six models on all six outputs (Table 6 of the manuscript); ANN-MLP architectural sensitivity analysis (Table 7); LOOCV-versus-5-fold validation comparison (Table 8); compression-ratio-stratified gradient boosting performance (Table 9); per-record out-of-fold predictions for all 36 model-output combinations.

* trained_models/  — Eight serialised scikit-learn pipelines (joblib format), fitted on the full 45-sample dataset and ready for downstream prediction. The best-performing model for each output is included along with the runner-up gradient boosting variant for completeness.

REPRODUCIBILITY

All stochastic models use a fixed random seed. With the dependency versions pinned in requirements.txt, the analysis is bit-for-bit reproducible across machines. Total runtime is approximately 90 seconds on a single CPU thread; no GPU is required. Detailed reproduction instructions are in README.md.

LIMITATIONS

Per-replicate raw measurements (three replicates per cell collapsed to condition-means in the released dataset) are retained at the originating institution and are available from the corresponding author on reasonable request. Cylinder-pressure traces, brake-specific fuel consumption, brake thermal efficiency, exhaust-gas temperature, and particulate matter measurements were captured during the experimental campaign but are out of scope for this emission-focused study. The trained models were fitted on three fuels, three compression ratios, and five loads; predictions for fuels or operating points outside this calibration envelope constitute extrapolation and are not validated.

Files

README.md

Files (1.5 MB)

Name Size Download all
md5:ec7f54d79261c27474e13ca46215a370
1.8 kB Preview Download
md5:a1ea794b7b86b48c5dd2360a2915449b
2.1 kB Download
md5:6e8717f252dd85997664cc16235ad6ff
5.4 kB Preview Download
md5:471b3cf41bb238a5fce465d55dad1886
2.1 kB Preview Download
md5:a3cfe86eaab1f8d29ffddb0ae3996989
63.1 kB Download
md5:2446292ac4b1588a5f37588e96db884e
2.7 kB Download
md5:dffd218bc53259a4c02b688bffe5fa04
143.6 kB Download
md5:18c169a1fe94610a0c360c983cef7bee
143.6 kB Download
md5:fac303aafa8b89597776d81d2f3bc435
143.6 kB Download
md5:3e543acb14ab4b22078c97c21a6b867f
143.6 kB Download
md5:99950e1d929b73a8eb8c2698c2c02933
143.6 kB Download
md5:1989e48d103b2715ae661e9e1ff9362b
143.6 kB Download
md5:3ff817c7f3c05f5cb660e5cbf260b6d7
4.3 kB Preview Download
md5:e453372060dae74b6bb37c9488ad05f5
2.7 kB Download
md5:55b295497fc913a95626843001ea1596
31.2 kB Preview Download
md5:c5cf5c92568670dfeb6bac5a63c11184
1.9 kB Download
md5:bc1fb66e11524c4790389009094648bc
5.9 kB Download
md5:394c3e22097f631a982fa65295e0a3ed
1.5 kB Download
md5:fe1210dacba3dbd388b6f38dd39c1c1e
1.1 kB Preview Download
md5:04585e2338842402e35bdc7765bb48db
707 Bytes Preview Download
md5:cb70ab2e7ab3b35a7fa5efdb6098a787
442.2 kB Download
md5:558633e31233d3586437d565c1b38279
7.2 kB Download
md5:bc1f6da6038a9fb54432ed605a8d750a
5.6 kB Preview Download
md5:b2898f8eecf149c5c411d5f11f17e6ef
1.2 kB Preview Download
md5:d2fcf79573db4fae13a75d596117ad15
1.2 kB Preview Download
md5:8bdeea6d48a0e9adac6562b0f9829075
631 Bytes Preview Download
md5:b68e9090b04beb253c89f3e9fc8e2626
3.9 kB Download
md5:a4356e4a4a2acf33918f63ccf12be913
3.7 kB Download