Smart RAG System with Graph Databases & LLMs

Building an advanced retrieval-augmented generation system with graph databases and large language models

Large language models (LLMs) have become essential tools for processing, storing, and generalizing knowledge at scale. Their ability to flexibly analyze complex data makes them powerful components of decision-support systems. However, to fully unleash this potential, systems must be carefully designed to ensure high relevance and structured relationships between data entities.

Business Challenge

Modern data-driven applications often face a significant gap between the structured world of databases and the unstructured reality of human language. Our business goal was to develop a system that could intelligently process incoming documents, capture meaningful relationships between entities, and respond to user queries using a Retrieval-Augmented Generation (RAG) pipeline.

A key use case involved the legal domain, where massive collections of legal documents, court cases, and precedents must be analyzed and cross-referenced. Legal texts are typically unstructured, verbose, and context-dependent, making them difficult to process with traditional keyword search or rigid database queries.

A major challenge was ensuring that relevant context is selected not just by keyword or similarity search, but also through an understanding of the connections between entities – whether by cluster, semantic proximity, or graph relationships. Without a meaningful way to link and reuse data, the system would risk generating inconsistent responses and duplicating entities, ultimately degrading the performance and interpretability of the knowledge graph.

Solution Overview

To address this challenge, we designed a hybrid system combining the power of Neo4j – a graph database optimized for storing and querying relational structures – with Anthropic’s Claude Sonnet, a language model particularly adept at processing structured XML-like content.

Key to our approach was avoiding direct Cypher query generation by the LLM, as this often led to syntactic errors or uncontrolled entity duplication. Instead, we leveraged Claude's strength in structured generation by prompting it to produce XML-like representations of extracted knowledge, which were then deterministically parsed and inserted into the graph database using custom logic.

We also solved a second major challenge – bridging the gap between highly structured database records and unstructured, noisy user input. To do this, we introduced symbolic fields – data from processing steps that reduce both internal and external text to short, unified representations. These symbolic fields serve as reliable anchors for embedding-based similarity searches, ensuring that the RAG component operates on semantically consistent and format-aligned data.

Technical Details

Our architecture builds on three core components:

  1. Graph Database Layer (Neo4j). Entity relationships are stored in Neo4j, taking advantage of its ability to define named, directed edges between objects. This structure allows us to model clusters of related data points and query them contextually. However, directly prompting LLMs to generate Cypher code proved unreliable – especially as the knowledge base grew. Instead, we implemented an intermediate XML-like format as the output of the LLM, which is easier for Claude Sonnet to produce and more deterministic to parse.
  2. Preprocessing Pipeline. Texts received from users vary in length, style, and formality compared to the compact, formal records stored in the graph. To close this gap, we introduced a “symbolic normalization” process: using language model prompts to extract key entities, reduce verbosity, and represent both user input and internal data in a shared, simplified format. These symbolic representations are short, canonicalized phrases that can be directly embedded and compared using vector similarity.
  3. RAG System Integration. The embeddings of symbolic fields are used in a RAG pipeline to fetch the most relevant graph-connected data. This ensures that query results are not only similar in content but also contextually and relationally aligned. By combining embedding similarity with graph-based filtering, we achieve higher-quality responses while preserving the interpretability and maintainability of the data.

This multi-layered system leverages the complementary strengths of symbolic modeling, graph relationships, and neural representations. It is both scalable and adaptable – designed to grow with data volume while preserving accuracy and semantic clarity.

Technology Stack

Neo4j

Neo4j

claude

Claude

langchain

Langchain