Enhancing Knowledge Graph Extraction and Validation From Scholarly Publications Using Bibliographic Metadata
Description
Fully structured semantic resources representing facts in the form of triples (i.e., knowledge graphs) have a major function in driving computer applications, particularly the ones related to biomedicine, to library and information science and to digital humanities (Haslhofer et al., 2018; Sargsyan et al., 2020). They can be easily processed using Application Programming Interfaces (APIs, like REST APIs) and query languages (mainly SPARQL) to assess the reference semantic information and to generate accurate and precise interpretations and predictions, particularly when the analyzed data is multifactorial and ever-changing such as the COVID-19 knowledge (Turki et al., 2021c), information about the laureates of Nobel Prize in Literature (Lebuda and Karwowski, 2016), and the findings of scholarly publications (Fathalla et al., 2017). In particular, the role of open knowledge graphs to facilitate scientific collaboration has been stressed against the backdrop of the COVID-19 pandemic (Anteghini et al., 2020; Colavizza et al., 2021; Turki et al., 2021a). Effectively, the information included in textual or semi-structured resources such as electronic health records, scholarly publications, encyclopedic entries, and citation indexes can be converted into fully structured Research Description Framework (RDF) triples and included in knowledge graphs and then processed in near real-time using computer methods to obtain evolving research outputs that are automatically updated as the knowledge graphs feeding them is regularly curated. These living research outputs include systematic reviews (Wang and Lo, 2021), clinical trials (Servant et al., 2014), scientometric studies (Nielsen et al., 2017), and epidemiological studies (Turki et al., 2021b).
However, the construction of knowledge graphs is a complex effort including the recognition of scholarly publications related to the scope of the semantic resource (Turki, 2018), the retrieval of abbreviations and terms for every concept (Turki et al., 2021a), and the extraction and validation of semantic relations (Turki et al., 2018a). Many projects depend on advanced neural network-driven machine-learning techniques for applying these tasks as these methods contribute to higher quality (Asada et al., 2021; Fei et al., 2021). However, these techniques are considered as black boxes and cannot be debugged to identify the reasons behind returned false results and consequently to solve these limitations in a transparent way (Turki et al., 2021b). What is more, the quality of these techniques is considered imperfect in some cases, requiring more time to achieve the same results as specific well-defined algorithms (Turki et al., 2021b). Here, Bibliometric-Enhanced Information Retrieval (BIR) has evolved as a novel field that utilizes bibliographic metadata to efficiently drive the extraction and refinement of semantic data from scholarly publications (Cabanac et al., 2018). This field contributed to the development of many intuitive and explainable algorithms for knowledge engineering. On the one hand, this has been achieved through the restriction of the analysis of full texts to the publications including a particular value of a metadata to reveal the bibliographic settings where assessed algorithms perform well or bad (Safder and Hassan, 2019). On the other hand, this could be done thanks to the analysis of the bibliographic information using taxonomies like MeSH and Wikipedia Category Graph (Hadj Taieb et al., 2020) or using the probabilistic heuristics and constraints inferred from publications using statistical models including TF*IDF (Ramos, 2003) or extracted from knowledge graphs using inference engines, particularly HySpirit (Fuhr and Rölleke, 1998) and F-OWL (Zou et al., 2004).
In this opinion article, we explain how each type of bibliographic metadata can provide useful insights to enhance the automatic enrichment and fact-checking of knowledge graphs from scholarly publications based on the outcomes of research efforts about BIR.
Files
frma-06-694307.pdf
Files
(605.1 kB)
Name | Size | Download all |
---|---|---|
md5:eb77035b34cf9d4efb9c04fc204835f6
|
605.1 kB | Preview Download |