TEXT-CLASSIFY: A COMPREHENSIVE COMPARATIVE STUDY OF LOGISTIC REGRESSION, RANDOM FOREST, AND KNN MODELS FOR ENHANCED TEXT CLASSIFICATION PERFORMANCE

Ravikant, Kholwal

doi:10.5281/zenodo.10148008

Published October 1, 2023 | Version v1

Journal Open

TEXT-CLASSIFY: A COMPREHENSIVE COMPARATIVE STUDY OF LOGISTIC REGRESSION, RANDOM FOREST, AND KNN MODELS FOR ENHANCED TEXT CLASSIFICATION PERFORMANCE

Ravikant, Kholwal (Researcher)¹

1. PDPM IIITDMJ, Jabalpur, India

In an era inundated with text documents, the essence of text classification technology is paramount, serving as a linchpin for the categorization and delineation of diverse content types and facilitating streamlined information retrieval. This research delineates the development of an intricate text classification model, specifically tailored for BBC news articles, utilizing pivotal machine learning algorithms such as logistic regression, random forest, and K-nearest neighbour algorithms. The model is meticulously structured into distinct segments including text preprocessing, representation, classifier implementation, and classification, each playing a crucial role in the overall classification process. The evaluation phase of this research was marked by rigorous testing and analytical scrutiny of three distinct classifiers on the BBC news dataset, focusing on deriving outputs characterized by parameters like accuracy, precision, F1-score, support matrix, and confusion matrix. These parameters were instrumental in providing insights into the features exhibiting the highest value across various classes in the dataset, thereby assessing the reliability and performance of the classification models in categorizing text data effectively. The findings of this research underscore the superior efficacy of the logistic regression classifier, integrated with the TF-IDF Vectorizer feature, achieving an impressive accuracy of 97% on the dataset, proving its reliability especially with smaller datasets. The random forest and K-nearest neighbour classification algorithms also demonstrated commendable accuracy, with rates of 93% and 92% respectively, contributing to the advancements in the field of text classification using machine learning methodologies. The insights derived from the extensive evaluations and comparisons conducted have not only contributed to the advancement of text classification methodologies but also have enhanced the capability to organize and retrieve information efficiently in news articles. This refined classification system optimizes information retrieval in news content and lays down foundational innovations in text classification, extending its applicability to diverse domains and content types, and paving the way for more intuitive and intelligent information management systems. This document serves as a comprehensive guide, elucidating the selection rationale for these specific algorithms and aiding in discerning the most apt algorithm amongst the evaluated ones, based on meticulous analysis conducted, keeping in view the advancements and nuances in the field. The detailed exploration and results of this study are aimed at providing accessible and comprehensible solutions, advancing the field of text classification, and offering insights into models' decision-making processes, thereby fostering a deeper understanding of the models' decisions made through them.

Series information (English)

Paper published in International Journal of Advances in Engineering & Technology (IJAET), Volume 16 Issue 5, pp. 415-433, October 2023. Available online at : https://www.ijaet.org/media/11I77-IJAET1605034-v16-i5-pp415-433.pdf

Files