Dataset Open Access
Abstract (our paper)
In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.
The first column is the extraction text by our method (journal name), the second column is the cleaned text, the third column is the news date, and the fourth column is the news URL.
This data set is part of our experimental results. If you make use of this data set, please cite:
Masato Kikuchi, Kento Kawakami, Mitsuo Yoshida, Kyoji Umemura. Conservative Direct Estimation for Likelihood Ratios Based on Observed Frequencies. The IEICE Transactions on Information and Systems (Japanese edition). vol.J102-D, no.4, pp.289-301, 2019.
Masato Kikuchi, Mitsuo Yoshida, Kyoji Umemura. Journal Name Extraction from Japanese Scientific News Articles. Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2018. pp.143-148, 2018.