图RAG实战：知识图谱增强检索新范式完整实现

老张2026/4/30大约 8 分钟

图RAG实战：知识图谱增强检索新范式完整实现

适读人群：有1-5年Java开发经验，想向AI工程师方向转型的开发者 阅读时长：约20分钟 文章价值：① 理解Graph RAG相比普通RAG的核心优势 ② 掌握用Neo4j构建知识图谱的完整流程 ③ 实现一套生产可用的图增强检索系统

小李在我们内部群里发了一张截图，配上三个字："气死了。"

截图是他们RAG系统的一次失败检索：用户问"张总和王总在哪个项目上合作过"，系统给出的答案是"根据知识库暂时无法回答该问题"。

但知识库里明明有！项目合同、人员档案、会议纪要里到处都是这两个人的信息，只是分散在几十个文档里，普通向量检索根本无法把它们关联起来。

我看完截图，在群里回了一句："你需要的不是RAG，是Graph RAG。"

普通RAG的根本缺陷

大家都用过RAG，基本流程是：文档 → 切块 → 向量化 → 存储 → 查询时向量相似度匹配 → 送给LLM。

这套流程对"单文档问答"非常好用。但有一类问题它天生答不好——需要跨文档、跨实体关联推理的问题。

比如：

"谁负责过A项目又参与了B项目？"（关系查询）
"这个技术决策背后的依赖链是什么？"（路径查询）
"客户X和供应商Y之间有什么间接联系？"（关联发现）

这类问题的答案不在某一个chunk里，而在实体之间的关系网络里。向量相似度算的是语义距离，不是关系路径。

Graph RAG的思路是：先从文档中抽取实体和关系，构建知识图谱，查询时先走图检索找到相关子图，再用子图上下文增强LLM生成。

整体技术选型

组件	选型	理由
图数据库	Neo4j 5.x	最成熟的图数据库，Spring Data Neo4j支持完善
向量数据库	PgVector	结合PostgreSQL，减少基础设施复杂度
实体抽取	Spring AI + GPT-4o	用LLM做NER，准确率高
框架	Spring AI 1.0 + Spring Boot 3.x	统一AI调用入口
图可视化	Neo4j Bloom	方便调试查看图结构

实体和关系建模

先设计图谱的数据模型，这是整个系统的基础。

代码实现

第一步：Neo4j 实体定义

// Person节点
@Node("Person")
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class PersonNode {

    @Id
    private String name;

    private String position;
    private String department;
    private String email;

    @Relationship(type = "WORKS_ON", direction = Relationship.Direction.OUTGOING)
    private List<ProjectNode> projects = new ArrayList<>();

    @Relationship(type = "KNOWS", direction = Relationship.Direction.OUTGOING)
    private List<PersonNode> colleagues = new ArrayList<>();

    @Relationship(type = "BELONGS_TO", direction = Relationship.Direction.OUTGOING)
    private OrganizationNode organization;
}

// Project节点
@Node("Project")
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class ProjectNode {

    @Id
    private String name;

    private String status;        // ACTIVE/COMPLETED/PLANNED
    private String description;
    private LocalDate startDate;
    private LocalDate endDate;

    @Relationship(type = "USES", direction = Relationship.Direction.OUTGOING)
    private List<TechnologyNode> technologies = new ArrayList<>();
}

// 工作关系（带属性的关系）
@RelationshipProperties
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class WorksOnRelationship {

    @RelationshipId
    private Long id;

    private String role;          // 在项目中的角色
    private LocalDate joinDate;
    private Boolean isLeader;

    @TargetNode
    private ProjectNode project;
}

第二步：实体和关系抽取服务

这是Graph RAG最核心、也是最容易忽视的部分——用LLM从非结构化文档中抽取结构化图数据。

@Service
@RequiredArgsConstructor
@Slf4j
public class GraphExtractionService {

    private final ChatClient chatClient;
    private final Neo4jTemplate neo4jTemplate;
    private final ObjectMapper objectMapper;

    private static final String EXTRACTION_PROMPT = """
            你是一个专业的知识图谱构建专家。请从以下文本中抽取实体和关系。
            
            【抽取规则】
            1. 实体类型：Person（人物）、Project（项目）、Organization（组织）、Technology（技术）
            2. 关系类型：WORKS_ON（人→项目）、BELONGS_TO（人→组织）、USES（项目→技术）、KNOWS（人→人）
            3. 只抽取文本中明确提到的信息，不要推断
            4. 人名使用文本中出现的原始称呼
            
            【输出格式】严格返回JSON，不要有其他内容：
            {
                "entities": [
                    {"type": "Person", "name": "张总", "properties": {"position": "技术总监"}},
                    {"type": "Project", "name": "智能客服系统", "properties": {"status": "ACTIVE"}}
                ],
                "relations": [
                    {"from": "张总", "fromType": "Person", "to": "智能客服系统", "toType": "Project",
                     "type": "WORKS_ON", "properties": {"role": "负责人"}}
                ]
            }
            
            【待抽取文本】
            %s
            """;

    /**
     * 从文档中抽取图数据并写入Neo4j
     */
    public GraphExtractionResult extractAndStore(String documentContent, String documentId) {
        log.info("开始抽取文档图数据，documentId={}", documentId);

        try {
            // 1. 调用LLM抽取
            String extractionResult = chatClient.prompt()
                    .user(String.format(EXTRACTION_PROMPT, documentContent))
                    .call()
                    .content();

            // 2. 解析JSON
            GraphExtractionResult result = parseExtractionResult(extractionResult);

            // 3. 写入Neo4j（MERGE保证幂等，不重复创建节点）
            storeToNeo4j(result, documentId);

            log.info("图数据抽取完成，实体数={}, 关系数={}",
                    result.getEntities().size(), result.getRelations().size());
            return result;

        } catch (Exception e) {
            log.error("图数据抽取失败，documentId={}", documentId, e);
            throw new GraphExtractionException("抽取失败: " + e.getMessage(), e);
        }
    }

    private GraphExtractionResult parseExtractionResult(String jsonStr) {
        try {
            // 清理LLM可能输出的markdown代码块标记
            String cleaned = jsonStr.replaceAll("```json\\s*", "").replaceAll("```\\s*", "").trim();
            return objectMapper.readValue(cleaned, GraphExtractionResult.class);
        } catch (JsonProcessingException e) {
            throw new RuntimeException("JSON解析失败: " + jsonStr, e);
        }
    }

    private void storeToNeo4j(GraphExtractionResult result, String documentId) {
        // 用Cypher MERGE保证幂等性
        for (EntityDTO entity : result.getEntities()) {
            String cypher = String.format(
                    "MERGE (n:%s {name: $name}) SET n += $props",
                    entity.getType()
            );
            neo4jTemplate.toExecutableQuery(cypher, Map.of(
                    "name", entity.getName(),
                    "props", entity.getProperties()
            )).getSingleResult();
        }

        for (RelationDTO rel : result.getRelations()) {
            String cypher = String.format("""
                    MATCH (a:%s {name: $fromName})
                    MATCH (b:%s {name: $toName})
                    MERGE (a)-[r:%s]->(b)
                    SET r += $props, r.documentId = $docId
                    """,
                    rel.getFromType(), rel.getToType(), rel.getType()
            );
            neo4jTemplate.toExecutableQuery(cypher, Map.of(
                    "fromName", rel.getFrom(),
                    "toName", rel.getTo(),
                    "props", rel.getProperties(),
                    "docId", documentId
            )).getSingleResult();
        }
    }
}

第三步：图增强检索服务

@Service
@RequiredArgsConstructor
@Slf4j
public class GraphRagService {

    private final ChatClient chatClient;
    private final Neo4jClient neo4jClient;
    private final VectorStore vectorStore;

    /**
     * 图RAG核心查询：先识别查询意图，再做图+向量双路检索，最后融合生成
     */
    public String query(String userQuestion) {
        log.info("Graph RAG查询：{}", userQuestion);

        // 1. 从问题中识别关键实体
        List<String> entities = extractQueryEntities(userQuestion);
        log.info("识别到查询实体：{}", entities);

        // 2. 图检索：获取相关子图
        String graphContext = querySubGraph(entities, userQuestion);

        // 3. 向量检索：获取相关文档片段
        String vectorContext = queryVectorStore(userQuestion);

        // 4. 融合两路结果，送给LLM生成最终答案
        return generateAnswer(userQuestion, graphContext, vectorContext);
    }

    private List<String> extractQueryEntities(String question) {
        String extractPrompt = """
                从以下问题中提取所有实体名称（人名、项目名、组织名、技术名）。
                只返回JSON数组，如：["张总", "智能客服系统"]
                
                问题：%s
                """.formatted(question);

        String result = chatClient.prompt()
                .user(extractPrompt)
                .call()
                .content();

        try {
            String cleaned = result.replaceAll("```json\\s*|```", "").trim();
            return objectMapper.readValue(cleaned, new TypeReference<List<String>>() {});
        } catch (Exception e) {
            log.warn("实体提取解析失败，返回空列表", e);
            return List.of();
        }
    }

    private String querySubGraph(List<String> entities, String question) {
        if (entities.isEmpty()) return "";

        // 查询与这些实体相关的2跳子图
        String cypher = """
                MATCH (n)
                WHERE n.name IN $entityNames
                OPTIONAL MATCH (n)-[r1]-(m)
                OPTIONAL MATCH (m)-[r2]-(k)
                WHERE NOT k.name IN $entityNames
                RETURN n, r1, m, r2, k
                LIMIT 50
                """;

        Collection<Map<String, Object>> results = neo4jClient
                .query(cypher)
                .bind(entities).to("entityNames")
                .fetch()
                .all();

        // 将子图结果转化为自然语言描述
        return formatGraphContext(results);
    }

    private String formatGraphContext(Collection<Map<String, Object>> graphData) {
        if (graphData.isEmpty()) return "知识图谱中未找到相关实体信息。";

        StringBuilder sb = new StringBuilder("【知识图谱中的相关信息】\n");
        Set<String> described = new HashSet<>();

        for (Map<String, Object> row : graphData) {
            // 解析节点和关系，转为自然语言
            // 格式：张总 -[WORKS_ON {role: 负责人}]-> 智能客服系统
            if (row.get("n") != null && row.get("r1") != null && row.get("m") != null) {
                String desc = formatTriple(row.get("n"), row.get("r1"), row.get("m"));
                if (described.add(desc)) {
                    sb.append("- ").append(desc).append("\n");
                }
            }
        }
        return sb.toString();
    }

    private String queryVectorStore(String question) {
        List<Document> docs = vectorStore.similaritySearch(
                SearchRequest.query(question).withTopK(5)
        );
        if (docs.isEmpty()) return "";

        StringBuilder sb = new StringBuilder("【相关文档片段】\n");
        for (Document doc : docs) {
            sb.append("- ").append(doc.getContent(), 0,
                    Math.min(300, doc.getContent().length())).append("...\n");
        }
        return sb.toString();
    }

    private String generateAnswer(String question, String graphContext, String vectorContext) {
        String systemPrompt = """
                你是一个专业的企业知识库助手。
                请基于提供的知识图谱信息和文档片段，准确回答用户问题。
                如果信息不足，请明确说明，不要编造。
                回答要简洁、结构化，重要信息用列表呈现。
                """;

        String userPrompt = """
                【用户问题】
                %s
                
                %s
                
                %s
                
                请综合以上信息，给出准确、有条理的回答。
                """.formatted(question,
                graphContext.isEmpty() ? "" : graphContext,
                vectorContext.isEmpty() ? "" : vectorContext);

        return chatClient.prompt()
                .system(systemPrompt)
                .user(userPrompt)
                .call()
                .content();
    }
}

第四步：完整的文档导入流水线

@Service
@RequiredArgsConstructor
@Slf4j
public class DocumentIngestionService {

    private final GraphExtractionService graphExtractionService;
    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    /**
     * 完整的文档导入：同时建图和建向量索引
     */
    @Transactional
    public IngestionResult ingestDocument(String content, String documentId, String title) {
        log.info("开始导入文档，documentId={}, title={}", documentId, title);

        // 1. 建知识图谱（抽取实体和关系）
        GraphExtractionResult graphResult = graphExtractionService
                .extractAndStore(content, documentId);

        // 2. 建向量索引（切块、向量化）
        List<Document> chunks = textSplitter.apply(List.of(
                new Document(content, Map.of(
                        "documentId", documentId,
                        "title", title,
                        "source", "upload"
                ))
        ));
        vectorStore.add(chunks);

        log.info("文档导入完成，chunks={}, entities={}, relations={}",
                chunks.size(),
                graphResult.getEntities().size(),
                graphResult.getRelations().size());

        return IngestionResult.builder()
                .documentId(documentId)
                .chunkCount(chunks.size())
                .entityCount(graphResult.getEntities().size())
                .relationCount(graphResult.getRelations().size())
                .build();
    }
}

效果展示

小李按这套方案改造后，来群里发了一条消息：

"老张，之前那个'张总和王总在哪个项目合作'的问题，现在能直接答了：张总（技术总监）和王总（产品总监）在智能客服系统项目中均有深度参与，张总担任技术负责人，王总负责产品规划。该项目起始于2023年Q2，当前状态为ACTIVE。"

图谱把分散在十几个文档里的关系给串起来了。这才是知识管理该有的样子。