Published October 18, 2021 | Version v1
Preprint Open

ChefBoost: A Lightweight Boosted Decision Tree Framework

  • 1. Yapi Kredi Teknoloji

Description

Decision tree based models overwhelmingly over-perform in applied machine learning studies. In this paper, first of all a review decision tree algorithms such as ID3, C4.5, CART, CHAID, Regression Trees and some bagging and boosting methods such as Gradient Boosting, Adaboost and Random Forest have been done and then the description of the developed lightweight boosted decision tree framework - ChefBoost - has been made. Due to its widespread use and intensive choice as a machine learning programming language; Python was selected for the development of framework published also as open source package under MIT license. Moreover, the framework will build decision trees with regular if and else statements as an output. In this way, those statements can be produced and consumed programming language independently.

Files

ChefBoost_PrePrint.pdf

Files (264.4 kB)

Name Size Download all
md5:d7b8bea0bffd905a5f098d32db8c0356
264.4 kB Preview Download

Additional details

References

  • Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  • LightGBM Authors. Machine learning challenge winning solutions. https://github.com/Microsoft/ LightGBM/blob/master/examples/README.md, 2016. [Online; accessed Oct 12, 2021].
  • Jeff Reback, jbrockmendel, Wes McKinney, and et al. pandas-dev/pandas: Pandas 1.3.3, September 2021.
  • Charles R Harris, K Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. Array programming with numpy. Nature, 585(7825):357–362, 2020.
  • Olivier Grisel, Andreas Mueller, Lars, and et al. scikit-learn/scikit-learn: scikit-learn 1.0, sep 2021.
  • Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30:3146–3154, 2017.
  • J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986.
  • Sefik Ilkin Serengil. A step by step id3 decision tree example. https://sefiks.com/2017/11/20/ a-step-by-step-id3-decision-tree-example/, 2017. [Online; accessed Oct 12, 2021].
  • J Ross Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
  • Sefik Ilkin Serengil. A step by step c4.5 decision tree example. https://sefiks.com/2018/05/13/ a-step-by-step-c4-5-decision-tree-example/, 2018. [Online; accessed Oct 12, 2021].
  • Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classification and regression trees. Routledge, 2017.
  • Sefik Ilkin Serengil. A step by step cart decision tree example. https://sefiks.com/2018/08/27/ a-step-by-step-cart-decision-tree-example/, 2018. [Online; accessed Oct 12, 2021].
  • Gordon V Kass. An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2):119–127, 1980.
  • Sefik Ilkin Serengil. A step by step chaid decision tree example. https://sefiks.com/2020/03/18/ a-step-by-step-chaid-decision-tree-example/, 2018. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. A step by step regression tree example. https://sefiks.com/2018/08/28/ a-step-by-step-regression-decision-tree-example/, 2018. [Online; accessed Oct 12, 2021].
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135–1144, 2016.
  • ScottMLundbergandSu-InLee.Aunifiedapproachtointerpretingmodelpredictions.InI.Guyon,U.V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017.
  • Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  • Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. Boosting algorithms as gradient descent in function space. In Proc. NIPS, volume 12, pages 512–518, 1999.
  • Sefik Ilkin Serengil. A step by step gradient boosting decision tree example. https://sefiks.com/2018/10/ 04/a-step-by-step-gradient-boosting-decision-tree-example/, 2018. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. A step by step gradient boosting example for classification. https://sefiks.com/2018/ 10/29/a-step-by-step-gradient-boosting-example-for-classification/, 2018. [Online; accessed Oct 12, 2021].
  • Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  • Sefik Ilkin Serengil. A step by step adaboost example. https://sefiks.com/2018/11/02/ a-step-by-step-adaboost-example/, 2018. [Online; accessed Oct 12, 2021].
  • Tin Kam Ho. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE, 1995.
  • Sefik Ilkin Serengil. How random forests can keep you from decision tree. https://sefiks.com/2017/11/ 19/how-random-forests-can-keep-you-from-decision-tree/, 2017. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. A gentle introduction to feature importance in machine learning. https://sefiks.com/ 2019/12/20/a-gentle-introduction-to-feature-importance-in-machine-learning/, 2019. [On- line; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. Feature importance in logistic regression for machine learning interpretability. https: //sefiks.com/2021/01/06/feature-importance-in-logistic-regression/, 2021. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. Feature importance in decision trees. https://sefiks.com/2020/04/06/ feature-importance-in-decision-trees/, 2020. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. A gentle introduction to xgboost for applied machine learning. https://sefiks.com/ 2019/11/03/a-gentle-introduction-to-xgboost-for-applied-machine-learning/, 2019. [Online; accessed Oct 12, 2021].
  • Sefik Ilkin Serengil. A gentle introduction to lightgbm for applied machine learning. https://sefiks.com/ 2018/10/13/a-gentle-introduction-to-lightgbm-for-applied-machine-learning/, 2018. [On- line; accessed Oct 12, 2021].