知识图谱构建实战:用Java+Neo4j打造企业智慧大脑
知识图谱构建实战:用Java+Neo4j打造企业智慧大脑
小王的故事:客服每天问"同一个问题",为什么找不到?
小王是某电信公司的Java工程师,工作4年,负责客服系统的后端开发。
公司有4000多款产品套餐,客户经常问的问题是"我用了5G套餐,为什么还不能用VoLTE通话?"。这个问题背后涉及:5G套餐→需要开通VoLTE→VoLTE需要终端支持→终端需要符合Cat.XX标准→用户手机型号是否符合……这是一条长达6跳的依赖链。
老的关键词搜索系统根本找不到这层关系。客服每天要查4个系统才能回答这个问题,平均处理时长8.7分钟,客户等待时间平均12.4分钟,投诉率居高不下。
2025年6月,公司用Neo4j构建了产品知识图谱,把4000多款产品、设备型号、业务依赖关系全部建模成图数据。上线后,同样的问题,一条Cypher查询语句0.3秒出结果,客服处理时长降至2.1分钟,效率提升40%,月度投诉率下降了31%。
这就是知识图谱的威力:它不存储数据,它存储关系。
一、知识图谱基础概念
在动手之前,先把概念搞清楚。
三个核心概念:
- 节点(Node):图中的实体,对应现实中的"人/物/事"。带标签(Label)和属性(Properties)。
- 关系(Relationship):节点之间的有向连接,有类型(Type)和属性。关系是图数据库最核心的价值所在。
- 属性(Properties):节点或关系上的键值对,类似JSON字段。
知识图谱 vs 关系型数据库 vs 向量数据库:
| 维度 | 关系型数据库 | 向量数据库 | 知识图谱(Neo4j) |
|---|---|---|---|
| 核心能力 | 结构化数据存储 | 语义相似度搜索 | 复杂关系推理 |
| 查询方式 | SQL | 向量相似度 | 图遍历(Cypher) |
| 擅长场景 | 事务、报表 | RAG语义搜索 | 多跳关系查询 |
| 不擅长场景 | 多跳关系查询 | 精确查询、聚合 | 海量数据聚合统计 |
| 典型问题 | "查用户的订单" | "找相似的问题" | "找和产品A存在间接竞品关系的产品" |
黄金法则:三者互补,不互斥。 大多数AI应用需要同时用到这三种数据库。
二、项目依赖:pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.2</version>
</parent>
<groupId>com.laozhang.ai</groupId>
<artifactId>knowledge-graph</artifactId>
<version>1.0.0</version>
<properties>
<java.version>21</java.version>
<spring-ai.version>1.0.0-M1</spring-ai.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring Data Neo4j -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-neo4j</artifactId>
</dependency>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
</dependency>
<!-- Neo4j Java Driver(直接执行Cypher) -->
<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
</dependency>
<!-- Jackson -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-json</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>spring-milestones</id>
<url>https://repo.spring.io/milestone</url>
</repository>
</repositories>
</project>三、application.yml
spring:
application:
name: knowledge-graph
neo4j:
uri: bolt://localhost:7687
authentication:
username: neo4j
password: ${NEO4J_PASSWORD:neo4j123456}
connection-timeout: 30s
max-transaction-retry-time: 15s
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o-mini
temperature: 0.1 # 实体抽取需要低温度,保证一致性
max-tokens: 2000
# 知识图谱配置
knowledge-graph:
extraction:
entity-types:
- PRODUCT # 产品/套餐
- SERVICE # 业务/服务
- DEVICE # 终端设备
- STANDARD # 技术标准
- POLICY # 政策规定
- ORGANIZATION # 部门/组织
relation-types:
- REQUIRES # A需要B
- SUPPORTS # A支持B
- CONFLICTS # A与B冲突
- INCLUDES # A包含B
- UPGRADES_TO # A升级到B
- BELONGS_TO # A属于B
batch-size: 10 # 每批处理文本数
max-retries: 3
logging:
level:
com.laozhang.ai: DEBUG
org.springframework.data.neo4j: INFO
org.neo4j.driver: WARN四、图谱节点与关系定义
4.1 产品节点
package com.laozhang.ai.kg.model;
import lombok.Data;
import org.springframework.data.neo4j.core.schema.*;
import java.util.ArrayList;
import java.util.List;
/**
* 产品节点
* 对应Neo4j中的 :Product 标签
*/
@Data
@Node("Product")
public class ProductNode {
@Id
@GeneratedValue
private Long id;
/** 产品唯一业务ID */
@Property("productCode")
private String productCode;
@Property("name")
private String name;
@Property("category")
private String category; // 套餐/增值业务/终端
@Property("price")
private Double price;
@Property("description")
private String description;
@Property("status")
private String status; // active/discontinued
/** 产品需要哪些服务 */
@Relationship(type = "REQUIRES", direction = Relationship.Direction.OUTGOING)
private List<RequiresRelation> requiredServices = new ArrayList<>();
/** 产品包含哪些子产品 */
@Relationship(type = "INCLUDES", direction = Relationship.Direction.OUTGOING)
private List<ProductNode> includedProducts = new ArrayList<>();
/** 产品与哪些产品冲突 */
@Relationship(type = "CONFLICTS", direction = Relationship.Direction.OUTGOING)
private List<ProductNode> conflictingProducts = new ArrayList<>();
}package com.laozhang.ai.kg.model;
import lombok.Data;
import org.springframework.data.neo4j.core.schema.*;
/**
* 服务节点
* 对应Neo4j中的 :Service 标签
*/
@Data
@Node("Service")
public class ServiceNode {
@Id
@GeneratedValue
private Long id;
@Property("serviceCode")
private String serviceCode;
@Property("name")
private String name;
@Property("type")
private String type; // network/value-added/infrastructure
@Property("description")
private String description;
@Property("status")
private String status;
}package com.laozhang.ai.kg.model;
import lombok.Data;
import org.springframework.data.neo4j.core.schema.*;
/**
* REQUIRES关系实体
* 带属性的关系对象
*/
@Data
@RelationshipProperties
public class RequiresRelation {
@Id
@GeneratedValue
private Long id;
/** 依赖优先级:MANDATORY/RECOMMENDED/OPTIONAL */
private String priority;
/** 是否自动开通 */
private Boolean autoProvision;
/** 依赖说明 */
private String description;
@TargetNode
private ServiceNode targetService;
}五、实体抽取:用LLM从文本中自动抽取实体
package com.laozhang.ai.kg.extractor;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
import java.util.List;
/**
* 实体抽取服务
* 使用LLM从非结构化文本中自动识别实体和关系
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class EntityExtractionService {
private final ChatClient chatClient;
private final ObjectMapper objectMapper;
private static final String EXTRACTION_PROMPT = """
你是一个电信行业知识图谱构建专家。
请从以下文本中抽取实体和它们之间的关系,以JSON格式输出。
实体类型:
- PRODUCT:产品套餐(如"5G至尊版"、"VoLTE通话")
- SERVICE:网络服务或增值业务(如"VoLTE"、"5G SA")
- DEVICE:终端设备(如"华为P60"、"iPhone 15")
- STANDARD:技术标准(如"Cat.19"、"5G NR")
- POLICY:政策规定(如"实名制要求")
关系类型:
- REQUIRES:A需要B才能正常使用
- SUPPORTS:A支持/兼容B
- CONFLICTS:A与B不能同时使用
- INCLUDES:A包含B
- UPGRADES_TO:A可以升级为B
输出格式(严格JSON):
{
"entities": [
{"id": "e1", "type": "PRODUCT", "name": "5G至尊版", "properties": {"price": 299}}
],
"relations": [
{"from": "e1", "to": "e2", "type": "REQUIRES", "properties": {"priority": "MANDATORY"}}
]
}
待分析文本:
{text}
注意:
1. 只输出JSON,不要有额外说明
2. 不确定的关系不要强行推断
3. 实体名称保持原文表述
""";
/**
* 从文本中抽取实体和关系
*/
public ExtractionResult extract(String text) {
log.debug("[EntityExtractor] 开始抽取,文本长度={}", text.length());
try {
String prompt = EXTRACTION_PROMPT.replace("{text}", text);
String response = chatClient.prompt()
.user(prompt)
.call()
.content();
// 解析JSON响应
ExtractionResult result = objectMapper.readValue(
cleanJsonResponse(response), ExtractionResult.class);
log.info("[EntityExtractor] 抽取完成:实体={}个,关系={}个",
result.entities().size(), result.relations().size());
return result;
} catch (Exception e) {
log.error("[EntityExtractor] 抽取失败", e);
return new ExtractionResult(List.of(), List.of());
}
}
/**
* 批量处理文本段落
* 每段独立抽取后合并(避免跨段幻觉)
*/
public ExtractionResult batchExtract(List<String> textSegments) {
List<EntityDto> allEntities = new java.util.ArrayList<>();
List<RelationDto> allRelations = new java.util.ArrayList<>();
for (int i = 0; i < textSegments.size(); i++) {
String segment = textSegments.get(i);
try {
ExtractionResult result = extract(segment);
// 给ID添加段落前缀避免冲突
String prefix = "seg" + i + "_";
result.entities().forEach(e -> allEntities.add(
new EntityDto(prefix + e.id(), e.type(), e.name(), e.properties())));
result.relations().forEach(r -> allRelations.add(
new RelationDto(prefix + r.from(), prefix + r.to(),
r.type(), r.properties())));
} catch (Exception e) {
log.error("[EntityExtractor] 段落{}抽取失败", i, e);
}
}
return new ExtractionResult(allEntities, allRelations);
}
private String cleanJsonResponse(String response) {
// 去除可能的markdown代码块标记
return response
.replaceAll("```json\\s*", "")
.replaceAll("```\\s*", "")
.trim();
}
// DTO定义
public record ExtractionResult(
List<EntityDto> entities,
List<RelationDto> relations) {}
public record EntityDto(
String id, String type, String name,
java.util.Map<String, Object> properties) {}
public record RelationDto(
String from, String to, String type,
java.util.Map<String, Object> properties) {}
}六、关系抽取:识别实体间的语义关系
package com.laozhang.ai.kg.extractor;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
import java.util.List;
/**
* 关系抽取服务
* 对已知实体对,推断其之间的关系
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class RelationExtractionService {
private final ChatClient chatClient;
private static final String RELATION_PROMPT = """
给定以下两个电信产品实体,请判断它们之间是否存在关系,以及关系类型。
实体A:{entityA}(类型:{typeA})
实体B:{entityB}(类型:{typeB})
背景文本:
{context}
可选的关系类型:
- REQUIRES:A必须依赖B
- SUPPORTS:A支持或兼容B
- CONFLICTS:A与B互斥
- INCLUDES:A包含B作为子项
- UPGRADES_TO:A可升级为B
- NO_RELATION:无明显关系
请返回JSON格式:
{
"relation": "REQUIRES",
"direction": "A_TO_B",
"confidence": 0.95,
"reason": "5G套餐需要VoLTE业务才能使用高清语音通话",
"properties": {"priority": "RECOMMENDED"}
}
只返回JSON,不要额外说明。
""";
/**
* 推断两个已知实体之间的关系
*/
public RelationInference inferRelation(String entityA, String typeA,
String entityB, String typeB, String contextText) {
String prompt = RELATION_PROMPT
.replace("{entityA}", entityA)
.replace("{typeA}", typeA)
.replace("{entityB}", entityB)
.replace("{typeB}", typeB)
.replace("{context}", contextText.substring(0, Math.min(1000, contextText.length())));
try {
String response = chatClient.prompt()
.user(prompt)
.call()
.content();
// 简单解析
if (response.contains("\"NO_RELATION\"")) {
return new RelationInference("NO_RELATION", "A_TO_B", 0.0, "无关系", null);
}
// 使用Jackson解析
com.fasterxml.jackson.databind.ObjectMapper mapper =
new com.fasterxml.jackson.databind.ObjectMapper();
return mapper.readValue(
response.replaceAll("```json\\s*", "").replaceAll("```\\s*", "").trim(),
RelationInference.class);
} catch (Exception e) {
log.error("[RelationExtractor] 关系推断失败:{}→{}", entityA, entityB, e);
return new RelationInference("NO_RELATION", "A_TO_B", 0.0, "解析失败", null);
}
}
public record RelationInference(
String relation,
String direction,
Double confidence,
String reason,
java.util.Map<String, Object> properties) {}
}七、图谱构建服务:将抽取结果写入Neo4j
package com.laozhang.ai.kg.service;
import com.laozhang.ai.kg.extractor.EntityExtractionService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.neo4j.driver.Driver;
import org.neo4j.driver.Session;
import org.neo4j.driver.Values;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* 知识图谱构建服务
* 将实体和关系持久化到Neo4j
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class KnowledgeGraphBuilder {
private final Driver neo4jDriver;
private final EntityExtractionService extractionService;
/**
* 从文本构建知识图谱
* 支持增量添加(MERGE语义保证幂等)
*/
@Transactional
public BuildResult buildFromText(String text, String sourceDocId) {
// 抽取实体和关系
EntityExtractionService.ExtractionResult extraction =
extractionService.extract(text);
if (extraction.entities().isEmpty()) {
log.info("[GraphBuilder] 文本中未抽取到实体:sourceDocId={}", sourceDocId);
return new BuildResult(0, 0, 0);
}
int nodeCount = 0;
int relationCount = 0;
int errorCount = 0;
try (Session session = neo4jDriver.session()) {
// 1. 使用MERGE创建/更新节点(MERGE = 存在则不重复创建)
Map<String, String> entityIdMap = new HashMap<>();
for (EntityExtractionService.EntityDto entity : extraction.entities()) {
try {
String neo4jId = mergeEntity(session, entity, sourceDocId);
entityIdMap.put(entity.id(), neo4jId);
nodeCount++;
} catch (Exception e) {
log.error("[GraphBuilder] 节点创建失败:{}", entity.name(), e);
errorCount++;
}
}
// 2. 创建关系
for (EntityExtractionService.RelationDto relation : extraction.relations()) {
String fromNeo4jId = entityIdMap.get(relation.from());
String toNeo4jId = entityIdMap.get(relation.to());
if (fromNeo4jId == null || toNeo4jId == null) {
log.warn("[GraphBuilder] 关系端点节点不存在:{} -> {}",
relation.from(), relation.to());
continue;
}
try {
mergeRelation(session, fromNeo4jId, toNeo4jId,
relation.type(), relation.properties(), sourceDocId);
relationCount++;
} catch (Exception e) {
log.error("[GraphBuilder] 关系创建失败:{} -> {}",
relation.from(), relation.to(), e);
errorCount++;
}
}
}
log.info("[GraphBuilder] 图谱构建完成:节点={}, 关系={}, 错误={}",
nodeCount, relationCount, errorCount);
return new BuildResult(nodeCount, relationCount, errorCount);
}
/**
* 使用MERGE语义创建/更新节点
* 返回节点的内部ID
*/
private String mergeEntity(Session session,
EntityExtractionService.EntityDto entity, String sourceDocId) {
// 根据实体类型构建不同的MERGE查询
String cypher = """
MERGE (n:%s {name: $name})
ON CREATE SET
n.createdAt = datetime(),
n.sourceDocId = $sourceDocId,
n.extractedType = $type
ON MATCH SET
n.updatedAt = datetime(),
n.lastSourceDocId = $sourceDocId
SET n += $props
RETURN elementId(n) as nodeId
""".formatted(entity.type());
Map<String, Object> props = entity.properties() != null ?
entity.properties() : new HashMap<>();
var result = session.run(cypher, Map.of(
"name", entity.name(),
"sourceDocId", sourceDocId,
"type", entity.type(),
"props", props
));
return result.single().get("nodeId").asString();
}
/**
* 使用MERGE语义创建/更新关系
*/
private void mergeRelation(Session session,
String fromId, String toId, String relType,
Map<String, Object> props, String sourceDocId) {
String cypher = """
MATCH (from) WHERE elementId(from) = $fromId
MATCH (to) WHERE elementId(to) = $toId
MERGE (from)-[r:%s]->(to)
ON CREATE SET
r.createdAt = datetime(),
r.sourceDocId = $sourceDocId
ON MATCH SET
r.updatedAt = datetime()
SET r += $props
""".formatted(relType);
session.run(cypher, Map.of(
"fromId", fromId,
"toId", toId,
"sourceDocId", sourceDocId,
"props", props != null ? props : new HashMap<>()
));
}
public record BuildResult(int nodeCount, int relationCount, int errorCount) {}
}八、Cypher查询实战:图谱的真正价值
package com.laozhang.ai.kg.service;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.neo4j.driver.Driver;
import org.neo4j.driver.Record;
import org.neo4j.driver.Session;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* 图谱查询服务
* 提供各类业务场景的Cypher查询封装
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class GraphQueryService {
private final Driver neo4jDriver;
/**
* 场景1:查询产品的所有依赖链(多跳查询)
* 回答:"要使用5G套餐,需要什么条件?"
*/
public List<DependencyChain> findDependencyChain(String productName) {
String cypher = """
MATCH path = (p:Product {name: $name})-[:REQUIRES*1..5]->(dep)
RETURN
[node in nodes(path) | node.name] AS chainNames,
[rel in relationships(path) | type(rel)] AS relTypes,
length(path) AS depth
ORDER BY depth
LIMIT 50
""";
try (Session session = neo4jDriver.session()) {
return session.run(cypher, Map.of("name", productName))
.list(record -> new DependencyChain(
record.get("chainNames").asList(v -> v.asString()),
record.get("relTypes").asList(v -> v.asString()),
record.get("depth").asInt()
));
}
}
/**
* 场景2:查找产品冲突(避免销售错误)
* 回答:"5G套餐和哪些老套餐不能同时使用?"
*/
public List<String> findConflicts(String productName) {
String cypher = """
MATCH (p:Product {name: $name})-[:CONFLICTS]-(conflict)
RETURN conflict.name AS conflictName, labels(conflict) AS types
ORDER BY conflictName
""";
try (Session session = neo4jDriver.session()) {
return session.run(cypher, Map.of("name", productName))
.list(record -> record.get("conflictName").asString());
}
}
/**
* 场景3:影响分析(变更影响范围评估)
* 回答:"如果VoLTE服务下线,哪些产品会受影响?"
*/
public List<ImpactedItem> findImpactedProducts(String serviceName) {
String cypher = """
MATCH (s:Service {name: $name})<-[:REQUIRES*1..3]-(p)
RETURN
p.name AS itemName,
labels(p) AS types,
COUNT(path) AS pathCount
ORDER BY pathCount DESC
LIMIT 100
""";
try (Session session = neo4jDriver.session()) {
return session.run(cypher, Map.of("name", serviceName))
.list(record -> new ImpactedItem(
record.get("itemName").asString(),
record.get("types").asList(v -> v.asString()),
record.get("pathCount").asInt()
));
}
}
/**
* 场景4:推荐关联产品(交叉销售)
* 回答:"买了5G套餐的用户还经常买什么?"
*/
public List<RecommendedProduct> findFrequentlyBoughtTogether(String productName) {
String cypher = """
MATCH (p:Product {name: $name})-[:INCLUDES|REQUIRES]->(shared)
<-[:INCLUDES|REQUIRES]-(other:Product)
WHERE other.name <> $name
AND other.status = 'active'
RETURN
other.name AS productName,
other.price AS price,
COUNT(shared) AS sharedDeps
ORDER BY sharedDeps DESC
LIMIT 10
""";
try (Session session = neo4jDriver.session()) {
return session.run(cypher, Map.of("name", productName))
.list(record -> new RecommendedProduct(
record.get("productName").asString(),
record.get("price").isNull() ? null : record.get("price").asDouble(),
record.get("sharedDeps").asInt()
));
}
}
/**
* 场景5:最短依赖路径(诊断问题)
* 回答:"华为P60和5G套餐之间是什么关系?"
*/
public List<Map<String, Object>> findShortestPath(String nodeA, String nodeB) {
String cypher = """
MATCH (a {name: $nodeA}), (b {name: $nodeB})
MATCH path = shortestPath((a)-[*..6]-(b))
RETURN
[n in nodes(path) | n.name] AS pathNodes,
[r in relationships(path) | type(r)] AS pathRels,
length(path) AS pathLength
LIMIT 3
""";
try (Session session = neo4jDriver.session()) {
return session.run(cypher, Map.of("nodeA", nodeA, "nodeB", nodeB))
.list(record -> {
Map<String, Object> result = new HashMap<>();
result.put("pathNodes",
record.get("pathNodes").asList(v -> v.asString()));
result.put("pathRels",
record.get("pathRels").asList(v -> v.asString()));
result.put("pathLength", record.get("pathLength").asInt());
return result;
});
}
}
/**
* 场景6:图谱统计概览
*/
public GraphStats getStats() {
String cypher = """
CALL apoc.meta.stats() YIELD labels, relTypesCount
RETURN labels, relTypesCount
""";
// 简化版统计(不依赖APOC插件)
String simpleCypher = """
MATCH (n) RETURN labels(n)[0] AS label, COUNT(*) AS count
UNION ALL
MATCH ()-[r]->() RETURN type(r) AS label, COUNT(*) AS count
""";
Map<String, Long> labelCounts = new HashMap<>();
try (Session session = neo4jDriver.session()) {
session.run(simpleCypher).list().forEach(record -> {
String label = record.get("label").asString();
Long count = record.get("count").asLong();
labelCounts.put(label, count);
});
}
return new GraphStats(labelCounts);
}
// DTO定义
public record DependencyChain(List<String> chain, List<String> relTypes, int depth) {}
public record ImpactedItem(String name, List<String> types, int impactScore) {}
public record RecommendedProduct(String name, Double price, int sharedDeps) {}
public record GraphStats(Map<String, Long> labelCounts) {}
}九、增量更新:图谱的持续学习
package com.laozhang.ai.kg.service;
import com.laozhang.ai.kg.extractor.EntityExtractionService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.neo4j.driver.Driver;
import org.neo4j.driver.Session;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
/**
* 图谱增量更新服务
* 支持实体属性更新、关系变更、过时数据清理
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class GraphUpdateService {
private final Driver neo4jDriver;
private final KnowledgeGraphBuilder graphBuilder;
/**
* 增量添加新知识
* 使用MERGE保证幂等性,不会重复创建
*/
public void addKnowledge(String text, String sourceDocId) {
log.info("[GraphUpdate] 增量添加知识:sourceDocId={}", sourceDocId);
graphBuilder.buildFromText(text, sourceDocId);
}
/**
* 更新节点属性
* 例如:产品价格调整、状态变更
*/
public void updateNodeProperty(String nodeName, String labelType,
String propertyKey, Object newValue) {
String cypher = """
MATCH (n:%s {name: $name})
SET n.%s = $value, n.updatedAt = datetime()
RETURN COUNT(n) AS updated
""".formatted(labelType, propertyKey);
try (Session session = neo4jDriver.session()) {
var result = session.run(cypher, Map.of("name", nodeName, "value", newValue));
int updated = result.single().get("updated").asInt();
log.info("[GraphUpdate] 属性更新:节点={}, 属性={}, 新值={}, 影响{}个节点",
nodeName, propertyKey, newValue, updated);
}
}
/**
* 删除关系(例如:产品下线后删除相关关系)
*/
public void removeRelation(String fromName, String toName, String relType) {
String cypher = """
MATCH (from {name: $fromName})-[r:%s]->(to {name: $toName})
DELETE r
RETURN COUNT(r) AS deleted
""".formatted(relType);
try (Session session = neo4jDriver.session()) {
var result = session.run(cypher,
Map.of("fromName", fromName, "toName", toName));
int deleted = result.single().get("deleted").asInt();
log.info("[GraphUpdate] 关系删除:{} -[{}]-> {}, 删除{}条",
fromName, relType, toName, deleted);
}
}
/**
* 软删除节点(标记为已停用,不物理删除以保留历史)
*/
public void deactivateNode(String nodeName, String reason) {
String cypher = """
MATCH (n {name: $name})
SET n.status = 'discontinued',
n.deactivatedAt = datetime(),
n.deactivationReason = $reason
RETURN COUNT(n) AS updated
""";
try (Session session = neo4jDriver.session()) {
session.run(cypher, Map.of("name", nodeName, "reason", reason));
log.info("[GraphUpdate] 节点已停用:name={}, reason={}", nodeName, reason);
}
}
/**
* 清理孤立节点(没有任何关系的节点)
* 定期运行清理图谱质量
*/
public int cleanOrphanNodes() {
String cypher = """
MATCH (n)
WHERE NOT (n)--()
AND n.createdAt < datetime() - duration('P7D')
WITH n LIMIT 1000
DELETE n
RETURN COUNT(n) AS deleted
""";
try (Session session = neo4jDriver.session()) {
var result = session.run(cypher);
int deleted = result.single().get("deleted").asInt();
log.info("[GraphUpdate] 孤立节点清理:{}个", deleted);
return deleted;
}
}
}十、REST接口:图谱能力对外暴露
package com.laozhang.ai.kg.controller;
import com.laozhang.ai.kg.service.GraphQueryService;
import com.laozhang.ai.kg.service.GraphUpdateService;
import com.laozhang.ai.kg.service.KnowledgeGraphBuilder;
import lombok.RequiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Map;
/**
* 知识图谱REST接口
* 提供图谱查询、构建、更新能力
*/
@RestController
@RequestMapping("/api/kg")
@RequiredArgsConstructor
public class KnowledgeGraphController {
private final GraphQueryService queryService;
private final GraphUpdateService updateService;
private final KnowledgeGraphBuilder graphBuilder;
/** 从文本构建图谱 */
@PostMapping("/build")
public ResponseEntity<?> buildFromText(@RequestBody BuildRequest req) {
var result = graphBuilder.buildFromText(req.text(), req.sourceDocId());
return ResponseEntity.ok(Map.of(
"nodeCount", result.nodeCount(),
"relationCount", result.relationCount(),
"errorCount", result.errorCount()
));
}
/** 查询产品依赖链 */
@GetMapping("/dependency/{productName}")
public ResponseEntity<?> getDependencyChain(@PathVariable String productName) {
List<GraphQueryService.DependencyChain> chains =
queryService.findDependencyChain(productName);
return ResponseEntity.ok(chains);
}
/** 查询产品冲突 */
@GetMapping("/conflicts/{productName}")
public ResponseEntity<?> getConflicts(@PathVariable String productName) {
return ResponseEntity.ok(Map.of(
"product", productName,
"conflicts", queryService.findConflicts(productName)
));
}
/** 影响分析 */
@GetMapping("/impact/{serviceName}")
public ResponseEntity<?> getImpact(@PathVariable String serviceName) {
return ResponseEntity.ok(queryService.findImpactedProducts(serviceName));
}
/** 最短路径查询 */
@GetMapping("/path")
public ResponseEntity<?> findPath(
@RequestParam String nodeA, @RequestParam String nodeB) {
return ResponseEntity.ok(queryService.findShortestPath(nodeA, nodeB));
}
/** 推荐关联产品 */
@GetMapping("/recommend/{productName}")
public ResponseEntity<?> getRecommendations(@PathVariable String productName) {
return ResponseEntity.ok(queryService.findFrequentlyBoughtTogether(productName));
}
/** 图谱统计 */
@GetMapping("/stats")
public ResponseEntity<?> getStats() {
return ResponseEntity.ok(queryService.getStats());
}
record BuildRequest(String text, String sourceDocId) {}
}十一、架构图:图谱在AI系统中的位置
十二、性能数据与实测结果
在该电信公司生产环境的测试数据(4023个产品节点,15847条关系):
| 查询类型 | 平均响应时间 | P99响应时间 | 并发50时TPS |
|---|---|---|---|
| 单跳查询(直接相邻节点) | 2ms | 8ms | 1200/s |
| 2-3跳路径查询 | 15ms | 45ms | 800/s |
| 最短路径(6跳以内) | 48ms | 120ms | 300/s |
| 全图影响分析 | 180ms | 350ms | 80/s |
| 复杂模式匹配 | 320ms | 650ms | 45/s |
业务效果数据:
| 指标 | 上线前 | 上线后 | 提升幅度 |
|---|---|---|---|
| 客服平均处理时长 | 8.7分钟 | 2.1分钟 | -75.9% |
| 客户等待时间 | 12.4分钟 | 3.2分钟 | -74.2% |
| 月度投诉率 | 4.2% | 2.9% | -31% |
| 客服首次解决率 | 68% | 89% | +31% |
十三、FAQ
Q1:Neo4j的内存怎么配置?
A:Neo4j官方推荐:堆内存(heap)= 系统内存的25%,页缓存(page cache)= 系统内存的50%。生产环境至少给16GB内存,其中heap=4G,page cache=8G。在neo4j.conf中配置:server.memory.heap.initial_size=4g,server.memory.pagecache.size=8g。
Q2:节点数量增长后查询会变慢吗?
A:Neo4j的图遍历性能与图的总节点数无关,只与被遍历的路径长度有关。这是图数据库天然的"免索引邻接"特性。1亿节点的图做2跳查询,和10万节点做2跳查询,时间复杂度相同。需要担心的是:关系密度过高(SuperNode超级节点问题)。
Q3:如何防止LLM抽取出错误的实体关系污染图谱?
A:三层保障:①低温度(temperature=0.1)减少幻觉;②置信度过滤,低于0.7的关系不写入;③人工审核队列,标记为"待确认"状态,通过后才变为"已确认"正式节点。小刘的团队初期设置了30%的抽样人工审核,准确率稳定后降至5%。
Q4:Spring Data Neo4j和直接用Driver有什么区别?
A:Spring Data Neo4j提供OGM(对象图映射),让你用Java对象操作图谱,适合CRUD场景。但复杂的Cypher查询(多跳、聚合、路径算法)建议直接用Driver执行原生Cypher,性能更好,可控性更强。本文两种方式都有展示,按场景选择。
Q5:知识图谱和RAG如何结合使用?
A:Graph-RAG(图增强RAG)是当前最前沿的方向。基本思路:①用户提问→实体识别→图谱查询获取实体及其关系;②将图谱子图转换为自然语言描述作为上下文;③与向量检索结果融合后送给LLM生成答案。图谱提供精确的结构化知识,向量提供模糊的语义相关文档,两者互补效果最佳。
总结
知识图谱不是数据库的替代品,而是数据关系的放大镜。
对于Java工程师来说,Spring Data Neo4j让接入成本大幅降低,5行注解就能定义节点,Cypher查询比SQL更直观地表达"关系"语义。
在AI时代,知识图谱的价值被重新发现:它是LLM的"结构化记忆",补充了向量数据库无法精确推理的短板。下一步,尝试把你们公司的产品目录或FAQ文档构建成知识图谱,你会发现很多意想不到的关联。
