Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published May 26, 2024 | Version v2
Software Open

A materials terminology knowledge graph automatically constructed from text corpus

Description

A materials terminology knowledge graph automatically constructed from text corpus

by Yuwei Zhang , Fangyi Chen, Zeyi Liu, Yunzhuo Ju, Dongliang Cui, Jinyi Zhu, Xue Jiang, Xi Guo, Jie He , Lei Zhang, Xiaotong Zhang, Yanjing Su.

This paper has been submitted to Scientific Data.

Abstract

A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation. 

Demo

You can use MGED-KG Here .

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://gitee.com/ustb-mge_1/mged-kg.git
 

Dependencies

You'll need a working Python environment to run the code. The recommended way to set up your environment is through the Anaconda Python distribution which provides the conda package manager. Anaconda can be installed in your user directory and does not interfere with the system Python installation. The required dependencies are specified in the file environment.yml.

We use conda virtual environments to manage the project dependencies in isolation. Thus, you can install our dependencies without causing conflicts with your setup (even with different Python versions).

Run the following command in the repository folder (where environment.yml is located) to create a separate environment and install all required dependencies in it:

conda env create
 

Reproducing the results

Before running any code you must activate the conda environment:

conda activate MGED-KG
 

This will enable the environment for your current terminal session. Any subsequent commands will use software that is installed in the environment.

Ensure that MySQL and ElasticSearch are installed, and configure the addresses, ports, usernames, passwords, and databases for MySQL and Elasticsearch in djangoProject/settings.py..

To re-build database of MGED-KG, Run this in the top level of the repository:

python3 manage.py makemigrations
python3 manage.py migrate
 

Then switch to the sql_restore directory, run this in the top level of the repository:

cd sql_restore
 

Connect to the MySQL database, then run this to restore data of MySQL database.

source ./catalog.sql
source ./es_material.sql
source ./main_term.sql
source ./template.sql
source ./term_catalog.sql
 

Add this codes in the top of the file material/models.py :

import os
import django
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'djangoProject.settings')
django.setup()
 

Then run this to transform data from MySQL database to Elasticsearch:

python3 ./material/function.py
 

After you finish, remove the code that was added just now.

To run MGED-KG web system, run this in the top level of the repository:

python3 manage.py runserver 127.0.0.1:8000
 

If all goes well, this will start the server of MGED-KG web system. You can visit it in 127.0.0.1:8000.

License

All source code is made available under a GPL-2.0 license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE.md for the full license text.

The manuscript text is not open source for now.

Files

MGED-KG Web System Source Code.zip

Files (61.1 MB)

Name Size Download all
md5:5baf4adf53da36fccf88bc49a44176e9
4.5 MB Download
md5:aee95d23c92e911da35de128a90bf45e
15.3 MB Preview Download
md5:9c6e0841151620dd881ca4d72283718e
41.3 MB Download

Additional details

Dates

Created
2024-03-04

Software

Repository URL
https://gitee.com/ustb-mge_1/mged-kg
Programming language
Python
Development Status
Active