Published December 15, 2022 | Version v1
Journal article Open

Extraction of Chinese Health News Using Computation of Noun Numbers

Description

- Significant amounts of health information can be obtained from Chinese newspapers and magazines, but the reader must spend much time to study this. Common methods of extracting information from articles include machine learning, text mining, word cloud sampling or use of algorithms. A high-quality model of machine learning for extracting information must be trained using a large amount of good data. Before high precision and recall of extracting information is obtained from text mining, many keywords should be collected to identify token sentences. This means that both extracting information from machine learning and text mining take up significant amounts of time. Although word cloud systems can quickly identify which words are widely used in the article, the extracted information is often fragmented. Accordingly, the author has created an elegant algorithm to extract health information from Chinese news using computation of noun numbers. Firstly, the title or subtitle of context from Chinese health news of websites were labeled. Secondly, each sentence was separated via identification of commas, periods, and question marks. Thirdly, word segments of context were tagged as parts of speech via natural language processing. Fourthly, the score of each sentence was identified via computation of the number of nouns where the nouns were identified as 3 points and 2 points as nouns detected in the title and subtitle respectively, while other nouns were identified as 1 point. Finally, high scoring sentences were selected via the query of the user

Files

IJISRT22NOV1368.pdf

Files (517.6 kB)

Name Size Download all
md5:0b0b5fc66c8c07f5c9a620014fd89dc9
517.6 kB Preview Download