Application of Machine Learning Models for Patients Health Insurance Cost Prediction
- 1. Assistant Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
Contributors
Contact person:
- 1. Assistant Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
- 2. Associate Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
- 3. Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
Description
Abstract: The use of machine learning models to forecast health insurance costs based on personal characteristics is examined in this study. Age, sex, BMI, number of children, smoking status, and region were among the demographic variables included in the dataset. It was investigated how well several machine learning methods, such as Random Forest, Gradient Boosting, and Linear Regression, estimated insurance costs. After preprocessing the dataset by scaling numerical features and encoding categorical variables, k-fold cross-validation was employed to train and evaluate the regression models. The coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE) were used to evaluate performance. According to experimental results, Gradient Boosting performed better than Random Forest and Linear Regression.
Files
D368515040925.pdf
Files
(976.0 kB)
Name | Size | Download all |
---|---|---|
md5:dea1dcc2f2ad6523335b9c3c369f1c8f
|
976.0 kB | Preview Download |
Additional details
Identifiers
- DOI
- 10.35940/ijsce.D3685.15040925
- EISSN
- 2231-2307
Dates
- Accepted
-
2025-09-15Manuscript Received on 05 August 2025 | Revised Manuscript Received on 06 September 2025 | Manuscript Accepted on 15 September 2025 | Manuscript published on 30 September 2025.
References
- Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the Future — Big Data, Machine Learning, and Clinical Medicine. The New England Journal of Medicine, 375(13), 1216-1219. DOI: https://doi.org/10.1056/nejmp1606181
- Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 1228-1242. DOI: https://doi.org/10.1080/01621459.2017.1319839
- Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447- 453. DOI: https://doi.org/10.1126/science.aax2342
- Goldstein, B. A., Navar, A. M., Pencina, M. J., & Ioannidis, J. P. (2017). Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association, 24(1), 198-208. DOI: https://doi.org/10.1093/jamia/ocw042
- Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association, 24(2), 361-370. DOI: https://doi.org/10.1093/jamia/ocw112
- Rajkomar, A., Oren, E., Chen, K., et al. (2018). Scalable and accurate deep learning for electronic health records. npj Digital Medicine, 1, 18. DOI: https://doi.org/10.1038/s41746-018-0029-1
- Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep Learning for Healthcare: A Review, Opportunities, and Challenges. Briefings in Bioinformatics, 19(6), 1236-1246. DOI: https://doi.org/10.1093/bib/bbx044
- Ng, K., Sun, J., Hu, J., Wang, F., & Shen, Y. (2017). Personalized predictive modeling and risk factor identification using patient similarity. AMIA Annual Symposium Proceedings, 2015, 1176-1185. https://pubmed.ncbi.nlm.nih.gov/26306255/
- Paul Thomas, Yabin. (2024). Application Of Data Mining In Health Care. International Research Journal of Modernisation in Engineering, Technology, and Science. 06. 2582-5208. DOI: https://www.doi.org/10.56726/IRJMETS7375510
- Futoma, J., Simons, M., Panch, T., Doshi-Velez, F., & Celi, L. A. (2017). Predicting disease progression with a model combining sequence and non-sequence data. International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v56/Futoma16.html
- Liu, Y., Chen, P. H. C., Krause, J., & Peng, L. (2019). How to Read Articles That Use Machine Learning: Users' Guides to the Medical Literature. JAMA, 322(18), 1806- 1816. DOI: https://doi.org/10.1001/jama.2019.16489
- Davenport, T., & Kalakota, R. (2019). The Potential for Artificial Intelligence in Healthcare Future Healthcare Journal, 6(2), 94-98. DOI: https://doi.org/10.7861/futurehosp.6-2-94
- Shah, N. D., Steyerberg, E. W., & Kent, D. M. (2018). Big Data and Predictive Analytics: Recalibrating Expectations. Journal of the American Medical Association, 320(1), 27-28. DOI: https://doi.org/10.1001/jama.2018.5602
- Beam, A. L., & Kohane, I. S. (2018). Big Data and Machine Learning in Health Care. JAMA, 319(13), 1317-1318. DOI: https://doi.org/10.1001/jama.2017.18391
- Chen, J. H., & Asch, S. M. (2017). Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations. The New England Journal of Medicine, 376(26), 2507-2509. DOI: https://doi.org/10.1056/nejmp1702071
- Rutter, J. L., & Boudreault, D. J. (2019). Artificial Intelligence in Health Care: Benefits and Challenges of Machine Learning Approaches. Applied Clinical Informatics, 10(5), 844-846. DOI: https://doi.org/10.3346/jkms.2020.35.e379