Helping Ontology Extension with Natural Language Processing for Catalysis
Description
A workflow to process scientific textual text corpora is introduced regarding catalysis research. NLP techniques are used to vectorize the textual data. This allows for hierarchical clustering of concepts, also yielding concept names. In addition, ontologies containing the resulting concept names are searched from a database. Once found, corresponding existing definitions of those concepts are also important output enabling domain experts to validate correctly found ontology classes. Subsequently performed hierarchical clustering of the concept names based on the text corpora prepares the found data for ontology matching, assisting in ontology extension. Previously undefined concepts and unstructured relations can thus be more easily introduced into existing ontologies based on their descriptive scientific texts. A structured extension of ontologies supported by NLP methods is thus made possible to facilitate FAIR data management workflow. The contribution shows successful applications and highlights existing hurdles, too.
Files
Mardi-Workshop-26-10-2022_Behr.pdf
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5467a68d60f02a716dbe251b47983540
|
1.1 MB | Preview Download |