Construct a lemma graph, then perform entity linking based on:
spaCy
, transformers
, SpanMarkerNER
,
spaCy-DBpedia-Spotlight
, REBEL
, OpenNRE
,
qwikidata
, pulp
In other words, this hybrid approach integrates NLP parsing, LLMs, graph algorithms, semantic inference, operations research, and also provides UX affordances for including human-in-the-loop practices. The following demo illustrates a small problem, and addresses a much broader class of AI problems in industry.
This step is a prelude before leveraging topological transforms, large language models, graph representation learning, plus human-in-the-loop domain expertise to infer the nodes, edges, properties, and probabilities needed for the semi-automated construction of a knowledge graph from raw unstructured text sources.
In addition to providing a library for production use cases,
TextGraphs
creates a "playground" or "gym"
in which to prototype and evaluate abstractions based on
"Graph Levels Of Detail".
spaCy
to parse a document, with SpanMarkerNER
LLM assistspaCy-DBpedia-Spotlight
, WikiMedia API
REBEL
, OpenNRE
, qwikidata
NetworkX
from the parse resultstextrank
algorithm plus graph analyticspulp
PyVis
...
Implementation of an LLM-augmented textgraph
algorithm for
constructing a lemma graph from raw, unstructured text source.
The TextGraphs
library is based on work developed by
Derwen
in 2023 Q2 for customer apps and used in our Cysoni
product.
This demo integrates code from:
For more details about this approach, see these talks:
Other good tutorials (during 2023) which include related material:
"Automatic generation of hypertext knowledge bases"
Udo Hahn, Ulrich Reimer
ACM SIGOIS 9:2 (1988-04-01)
https://doi.org/10.1145/966861.45429
The condensation process transforms the text representation structures resulting from the text parse into a more abstract thematic description of what the text is about, filtering out irrelevant knowledge structures and preserving only the most salient concepts.
Graph Representation Learning
William Hamilton
Morgan and Claypool (pre-print 2020)
https://www.cs.mcgill.ca/~wlh/grl_book/
A brief but comprehensive introduction to graph representation learning, including methods for embedding graph data, graph neural networks, and deep generative models of graphs.
"REDFM: a Filtered and Multilingual Relation Extraction Dataset"
Pere-Lluís Huguet Cabot, Simone Tedeschi, Axel-Cyrille Ngonga Ngomo, Roberto Navigli
ACL (2023-06-19)
https://arxiv.org/abs/2306.09802
Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English. In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems.
"InGram: Inductive Knowledge Graph Embedding via Relation Graphs"
Jaejun Lee, Chanyoung Chung, Joyce Jiyoung Whang
ICML (2023–08–17)
https://arxiv.org/abs/2305.19987
In this paper, we propose an INductive knowledge GRAph eMbedding method, InGram, that can generate embeddings of new relations as well as new entities at inference time.
"TextRank: Bringing Order into Text"
Rada Mihalcea, Paul Tarau
EMNLP (2004-07-25)
https://aclanthology.org/W04-3252
In this paper, the authors introduce TextRank, a graph-based ranking model for text processing, and show how this model can be successfully used in natural language applications.