Build knowledge graphs with LLM-driven entity extraction
Extract and build knowledge with LLMs and Knowledge Graphs
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, retrieval augmented generation and more.
Up until txtai 7.0, semantic graphs only supported automatic relationship detection. Now relationships can be loaded directly into a txtai database. This article will demonstrate how to work with this new feature.
Install dependencies
Install txtai
and all dependencies.
# Install txtai
pip install txtai[graph] autoawq
Load Wikipedia database
The txtai-wikipedia database stores all Wikipedia article abstracts as of January 2024. This database is a great way to explore a wide variety of topics. It also has the number of page views integrated in, which enables pulling frequently viewed or popular articles on a topic.
For this example, we'll work with a couple articles related to Viking raids in France
.
from txtai import Embeddings
# Load dataset
wikipedia = Embeddings()
wikipedia.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")
query = """
SELECT id, text FROM txtai WHERE similar('Viking raids in France') and percentile >= 0.5
"""
results = wikipedia.search(query, 5)
for x in results:
print(x)
LLM-driven entity extraction
Now that we have a couple relevant articles, let's go through and run an entity extraction process. For this task, we'll have a LLM prompt to do the work.
import json
from txtai import LLM
# Load LLM
llm = LLM("TheBloke/Mistral-7B-OpenOrca-AWQ")
data = []
for result in results:
prompt = f"""<|im_start|>system
You are a friendly assistant. You answer questions from users.<|im_end|>
<|im_start|>user
Extract an entity relationship graph from the following text. Output as JSON
Nodes must have label and type attributes. Edges must have source, target and relationship attributes.
text: {result['text']} <|im_end|>
<|im_start|>assistant
"""
try:
data.append(json.loads(llm(prompt, maxlength=4096)))
except:
pass
Build an embeddings database
Now that we've extracted entities from the documents, next we'll review and load a graph network using these entity-relationships.
def stream():
nodes = {}
for row in data.copy():
# Create nodes
for node in row["nodes"]:
if node["label"] not in nodes:
node["id"] = len(nodes)
nodes[node["label"]] = node
for edge in row["edges"]:
source = nodes.get(edge["source"])
target = nodes.get(edge["target"])
if source and target:
if "relationships" not in source:
source["relationships"] = []
source["relationships"].append({"id": target["id"], "relationship": edge["relationship"]})
return nodes.values()
# Create embeddings instance with a semantic graph
embeddings = Embeddings(
autoid = "uuid5",
path = "intfloat/e5-base",
instructions = {
"query": "query: ",
"data": "passage: "
},
columns = {
"text": "label"
},
content = True,
graph = {
"approximate": False,
"topics": {}
}
)
embeddings.index(stream())
Show the network
Now let's visualize the entity-relationship network. This period might not be that familar to most, unless you've watched the Vikings TV series, in which case it should make sense.
import matplotlib.pyplot as plt
import networkx as nx
def plot(graph):
labels = {x: f"{graph.attribute(x, 'text')} ({x})" for x in graph.scan()}
lookup = {
"Person": "#d32f2f",
"Location": "#0277bd",
"Event": "#e64980",
"Role": "#757575"
}
colors = []
for x in graph.scan():
value = embeddings.search("select type from txtai where id = :x", parameters={"x": x})[0]["type"]
colors.append(lookup.get(value, "#7e57c2"))
options = {
"node_size": 2000,
"node_color": colors,
"edge_color": "#454545",
"font_color": "#efefef",
"font_size": 11,
"alpha": 1.0
}
fig, ax = plt.subplots(figsize=(20, 9))
pos = nx.spring_layout(graph.backend, seed=0, k=3, iterations=250)
nx.draw_networkx(graph.backend, pos=pos, labels=labels, **options)
ax.set_facecolor("#303030")
ax.axis("off")
fig.set_facecolor("#303030")
plot(embeddings.graph)
Graph traversal
The last thing we'll cover is extracting a specific path from the graph. Let's show a path from node 8 to node 5 requiring a specific relationship type to start.
A new feature of txtai 7.0 is the ability to return a graph of search results. This is a powerful addition as not only do we get the search results but we get how the search results relate to each other.
# Traverse graph looking for certain nodes and edge values
g = embeddings.graph.search("""
MATCH P=(A{id: 8})-[R1]->()-[*1..3]->(D{id:5})
WHERE
R1.relationship == "has_object"
RETURN P
""", graph=True)
plot(g)
Wrapping up
This article showed how knowledge graphs can be created with LLM-driven entity extraction. Automatically derived relationships via semantic similarity and manually specified ones are a powerful combination!