Published December 15, 2022 | Version v1
Journal article Open

A STATISTICAL INDEX CALCULATED USING THE TF-IDF FOR TEXTS IN THE UZBEK LANGUAGE CORPUS

Description

One of the most common methods of processing textual data is TF-IDF. Google's search engine has been using the TF-IDF method for ranking content relevant to user queries for many years. According to the results of the conducted research, it was determined that the Google system paid more attention to the frequency of terms than to the calculation of keywords. The value determined by the TF-IDF method represents the relevance of the keyword in the language corpus. Using the TF-IDF method, a digital vector corresponding to corpus documents is generated. This numeric vector is a measure used in the fields of information retrieval (IR) and machine learning (ML) to represent the importance of string representations (words, phrases, lemmas, etc.) to a document. In this article, we will consider the process of sorting documents in the Uzbek language corpus using the TF-IDF method according to the keyword.

Files

B-363.pdf

Files (1.0 MB)

Name Size Download all
md5:e3fc047281db1002be7ee814645903ee
1.0 MB Preview Download