Published June 14, 2023 | Version v1
Presentation Open

The Japanese academic dataset integration based on PID and text processing.

  • 1. National Institute of Informatics
  • 2. The university of Tokyo

Description

The academic discovery infrastructure is becoming essential to researchers for the proof of transparency of the research and the promotion of data-driven science. In April 2021, we published a new Japanese academic discovery service, CiNii Research. This search service includes a variety of non-traditional research outputs in the index. We processed the internal data with the following unique algorithm so that users can search integrated academic information efficiently. CiNii Research represents rich search results, including various research outputs in one page with these structured datasets. On the data processing, we utilized PID identification, text matching, and ID mapping to remove the duplicated academic information with unique prioritization. Thirty-nine percent of instances have been duplicated and reduced with these processes. In addition, each instance is connected by using the information on citation relationships and parent-child relationships. We were able to add more than 60 million link information to 30 million instances with academic resource data integration. These dataset improvements will contribute to the richness of the search result.

Files

6_Jun-ichi_Onami_The_Japanese_academic_dataset.pdf

Files (2.3 MB)