Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen.
Notes
Files
thesis.ipynb
Files
(2.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:13e9a4ebc037a2ba04291b9c57d86777
|
26.9 kB | Download |
|
md5:70d70277f196d03954a60b90f62f187b
|
2.4 MB | Preview Download |
Additional details
Related works
- Is source of
- 10.5061/dryad.wh70rxwrh (DOI)