Extraction of Chinese Health News Using Computation of Noun Numbers
Creators
Description
- Significant amounts of health information can be obtained from Chinese newspapers and magazines, but the reader must spend much time to study this. Common methods of extracting information from articles include machine learning, text mining, word cloud sampling or use of algorithms. A high-quality model of machine learning for extracting information must be trained using a large amount of good data. Before high precision and recall of extracting information is obtained from text mining, many keywords should be collected to identify token sentences. This means that both extracting information from machine learning and text mining take up significant amounts of time. Although word cloud systems can quickly identify which words are widely used in the article, the extracted information is often fragmented. Accordingly, the author has created an elegant algorithm to extract health information from Chinese news using computation of noun numbers. Firstly, the title or subtitle of context from Chinese health news of websites were labeled. Secondly, each sentence was separated via identification of commas, periods, and question marks. Thirdly, word segments of context were tagged as parts of speech via natural language processing. Fourthly, the score of each sentence was identified via computation of the number of nouns where the nouns were identified as 3 points and 2 points as nouns detected in the title and subtitle respectively, while other nouns were identified as 1 point. Finally, high scoring sentences were selected via the query of the user
Files
IJISRT22NOV1368.pdf
Files
(517.6 kB)
Name | Size | Download all |
---|---|---|
md5:0b0b5fc66c8c07f5c9a620014fd89dc9
|
517.6 kB | Preview Download |