A materials terminology knowledge graph automatically constructed from text corpus
Description
A materials terminology knowledge graph automatically constructed from text corpus
by Yuwei Zhang , Fangyi Chen, Zeyi Liu, Yunzhuo Ju, Dongliang Cui, Jinyi Zhu, Xue Jiang, Xi Guo, Jie He , Lei Zhang, Xiaotong Zhang, Yanjing Su.
This paper has been submitted to Scientific Data.
Abstract
A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation.
Demo
You can use MGED-KG Here .
Getting the code
You can download a copy of all the files in this repository by cloning the git repository:
git clone https://gitee.com/ustb-mge_1/mged-kg.git
Dependencies
You'll need a working Python environment to run the code. The recommended way to set up your environment is through the Anaconda Python distribution which provides the conda
package manager. Anaconda can be installed in your user directory and does not interfere with the system Python installation. The required dependencies are specified in the file environment.yml
.
We use conda
virtual environments to manage the project dependencies in isolation. Thus, you can install our dependencies without causing conflicts with your setup (even with different Python versions).
Run the following command in the repository folder (where environment.yml
is located) to create a separate environment and install all required dependencies in it:
conda env create
Reproducing the results
Before running any code you must activate the conda environment:
conda activate MGED-KG
This will enable the environment for your current terminal session. Any subsequent commands will use software that is installed in the environment.
Ensure that MySQL and ElasticSearch are installed, and configure the addresses, ports, usernames, passwords, and databases for MySQL and Elasticsearch in djangoProject/settings.py
..
To re-build database of MGED-KG, Run this in the top level of the repository:
python3 manage.py makemigrations python3 manage.py migrate
Then switch to the sql_restore
directory, run this in the top level of the repository:
cd sql_restore
Connect to the MySQL database, then run this to restore data of MySQL database.
source ./catalog.sql source ./es_material.sql source ./main_term.sql source ./template.sql source ./term_catalog.sql
Add this codes in the top of the file material/models.py
:
import os import django os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'djangoProject.settings') django.setup()
Then run this to transform data from MySQL database to Elasticsearch:
python3 ./material/function.py
After you finish, remove the code that was added just now.
To run MGED-KG web system, run this in the top level of the repository:
python3 manage.py runserver 127.0.0.1:8000
If all goes well, this will start the server of MGED-KG web system. You can visit it in 127.0.0.1:8000
.
License
All source code is made available under a GPL-2.0 license. You can freely use and modify the code, without warranty, so long as you provide attribution to the authors. See LICENSE.md
for the full license text.
The manuscript text is not open source for now.
Files
MGED-KG Web System Source Code.zip
Files
(61.1 MB)
Name | Size | Download all |
---|---|---|
md5:5baf4adf53da36fccf88bc49a44176e9
|
4.5 MB | Download |
md5:aee95d23c92e911da35de128a90bf45e
|
15.3 MB | Preview Download |
md5:9c6e0841151620dd881ca4d72283718e
|
41.3 MB | Download |
Additional details
Dates
- Created
-
2024-03-04
Software
- Repository URL
- https://gitee.com/ustb-mge_1/mged-kg
- Programming language
- Python
- Development Status
- Active