Spring AI实战:从零搭建企业级RAG系统(附完整代码)
Spring AI实战:从零搭建企业级RAG系统(附完整代码)
开篇:12000元到800元的逆转
2025年9月,北京某创业公司的后端工程师小陈找到我,语气里带着委屈。
"老张,我们做了一个基于GPT-4o的内部知识库问答系统。现在每个月光API费用就要12,000元,老板已经发话说不能超过2,000元,否则就换方案。"
小陈的架构我一看就明白了问题在哪:每次用户问问题,他把整个知识库的所有文档都塞进Prompt,然后让GPT-4o回答。知识库只有200个文档,每个文档平均3,000字,每次调用就有60万token的上下文。
这就是RAG要解决的核心问题:不是让模型知道所有事情,而是在需要的时候,精准地给模型喂它需要的信息。
我帮小陈用3天时间重构了整个系统,引入RAG架构后:
- 每次问答平均使用的上下文从60万token降到约1,500token(减少99.75%)
- 月API费用从12,000元降到800元(降幅93.3%)
- 回答准确率反而从67%提升到82%(因为相关信息更集中)
这篇文章是那次重构的完整技术复盘,从最简单的5分钟Demo到生产级企业RAG,一步一步带你走完全程。
一、RAG架构全景图
在写第一行代码之前,先理解RAG系统的完整架构。
整个RAG系统分两个阶段:
离线摄入(Ingestion):把文档切成小块,每块转成向量,存入向量数据库。这个过程一次执行,后续只在文档更新时触发。
在线查询(Query):用户提问时,把问题也转成向量,从向量数据库里找出最相似的文档片段,把这些片段和原始问题一起发给LLM,LLM只需要"读"这些相关片段来回答,而不是读整个知识库。
二、第一步:极简RAG(5分钟跑起来)
先跑通最简单的版本,理解核心流程。
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>enterprise-rag</artifactId>
<version>1.0.0</version>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.4</version>
</parent>
<properties>
<java.version>21</java.version>
<spring-ai.version>1.0.0</spring-ai.version>
</properties>
<dependencies>
<!-- Spring AI核心 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
</dependency>
<!-- PgVector向量存储 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
</dependency>
<!-- PDF文档读取 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
<version>${spring-ai.version}</version>
</dependency>
<!-- Tika(Word/HTML等格式) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
<version>${spring-ai.version}</version>
</dependency>
<!-- Spring Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring Data JPA(文档元数据管理) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!-- PostgreSQL驱动 -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</dependency>
<!-- 监控 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<!-- Test -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>application.yml(完整配置)
spring:
application:
name: enterprise-rag
# 数据库配置(PostgreSQL + pgvector扩展)
datasource:
url: jdbc:postgresql://${DB_HOST:localhost}:5432/${DB_NAME:ragdb}
username: ${DB_USER:postgres}
password: ${DB_PASSWORD:postgres}
driver-class-name: org.postgresql.Driver
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
jpa:
hibernate:
ddl-auto: update
show-sql: false
properties:
hibernate:
dialect: org.hibernate.dialect.PostgreSQLDialect
format_sql: true
# Spring AI配置
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
temperature: 0.1 # RAG场景降低Temperature,提高准确性
max-tokens: 1024
embedding:
options:
model: text-embedding-3-small # 1536维,$0.02/1M tokens
# 向量存储配置
vectorstore:
pgvector:
index-type: HNSW # 近似最近邻,速度快
distance-type: COSINE_DISTANCE # 余弦距离,适合文本
dimensions: 1536 # 必须与Embedding模型维度一致
initialize-schema: true # 首次启动自动创建表结构
# 文件上传配置
servlet:
multipart:
max-file-size: 50MB
max-request-size: 100MB
# RAG业务配置
rag:
# 文档分割配置
chunk:
size: 512 # 每块最大token数(关键参数,见调优章节)
overlap: 50 # 相邻块重叠token数(保持语义连贯)
min-size: 100 # 过短的块会被丢弃
# 检索配置
retrieval:
top-k: 5 # 检索最相似的K个文档块
similarity-threshold: 0.65 # 相似度阈值,低于此值的结果丢弃
# 文件存储路径
storage:
base-path: ${STORAGE_PATH:/data/rag/documents}
management:
endpoints:
web:
exposure:
include: prometheus,health,info
metrics:
export:
prometheus:
enabled: true极简RAG实现(约100行核心代码)
package com.example.rag.simple;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;
/**
* 极简RAG服务 - 理解核心原理用
* 生产环境请使用 EnterpriseRagService
*/
@Service
public class SimpleRagService {
private final VectorStore vectorStore;
private final ChatClient chatClient;
private static final String RAG_PROMPT_TEMPLATE = """
请基于以下参考资料回答用户的问题。
如果参考资料中没有相关信息,请明确说明"根据现有资料无法回答",不要编造答案。
参考资料:
{context}
用户问题:{question}
请给出准确、简洁的回答:
""";
public SimpleRagService(VectorStore vectorStore, ChatClient.Builder builder) {
this.vectorStore = vectorStore;
this.chatClient = builder.build();
}
/**
* 步骤1:摄入文档(Ingestion)
*/
public void ingestDocument(Resource resource, String documentName) {
// 读取文档
TextReader reader = new TextReader(resource);
List<Document> documents = reader.get();
// 分割成小块(TokenTextSplitter基于token数量分割)
TokenTextSplitter splitter = new TokenTextSplitter(
512, // chunkSize
50, // chunkOverlap
100, // minChunkSize
5000, // maxChunkSize
true // keepSeparator
);
List<Document> chunks = splitter.apply(documents);
// 添加来源元数据
chunks.forEach(chunk -> chunk.getMetadata().put("source", documentName));
// 向量化并存入向量数据库
vectorStore.add(chunks);
System.out.printf("文档[%s]已摄入,共%d个chunks%n", documentName, chunks.size());
}
/**
* 步骤2:问答查询(Query)
*/
public String query(String question) {
// 检索相关文档
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.query(question).withTopK(5)
);
// 构建上下文
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n---\n\n"));
if (context.isBlank()) {
return "根据现有资料无法回答您的问题,请尝试换一种说法或联系相关负责人。";
}
// 调用LLM生成答案
String prompt = RAG_PROMPT_TEMPLATE
.replace("{context}", context)
.replace("{question}", question);
return chatClient.prompt()
.user(prompt)
.call()
.content();
}
}这就是RAG的最小实现。不到100行代码,一个可以工作的问答系统就跑起来了。
接下来,我们把它变成生产级的。
三、第二步:生产化
生产环境需要解决三个问题:文档管理、错误处理、元数据过滤。
文档元数据实体
package com.example.rag.entity;
import jakarta.persistence.*;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;
import java.time.LocalDateTime;
/**
* 文档元数据实体
* 记录每个上传文档的基本信息,用于管理和过滤
*/
@Entity
@Table(name = "rag_documents")
@Data
@NoArgsConstructor
public class RagDocument {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String fileName;
@Column(nullable = false)
private String originalFileName;
@Column(nullable = false)
@Enumerated(EnumType.STRING)
private DocumentStatus status;
@Column
private String department; // 所属部门(用于权限过滤)
@Column
private String category; // 文档分类(如:产品手册/技术文档/政策制度)
@Column
private Integer chunkCount; // 分割后的chunk数量
@Column
private Long fileSizeBytes;
@Column(nullable = false)
private String uploadedBy;
@CreationTimestamp
private LocalDateTime createdAt;
@UpdateTimestamp
private LocalDateTime updatedAt;
@Column
private LocalDateTime indexedAt; // 完成向量化的时间
@Column(length = 1000)
private String errorMessage; // 处理失败时的错误信息
public enum DocumentStatus {
PENDING, // 等待处理
PROCESSING, // 正在处理
INDEXED, // 已索引
FAILED // 处理失败
}
}文档Repository
package com.example.rag.repository;
import com.example.rag.entity.RagDocument;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.stereotype.Repository;
import java.util.List;
import java.util.Optional;
@Repository
public interface RagDocumentRepository extends JpaRepository<RagDocument, Long> {
List<RagDocument> findByDepartmentAndStatus(String department,
RagDocument.DocumentStatus status);
List<RagDocument> findByStatus(RagDocument.DocumentStatus status);
Optional<RagDocument> findByOriginalFileNameAndDepartment(String fileName,
String department);
@Query("SELECT d FROM RagDocument d WHERE d.status = 'INDEXED' ORDER BY d.indexedAt DESC")
List<RagDocument> findAllIndexed();
@Query("SELECT COUNT(d) FROM RagDocument d WHERE d.status = 'INDEXED' AND d.department = :dept")
long countIndexedByDepartment(String dept);
}生产级文档摄入Service
package com.example.rag.service;
import com.example.rag.entity.RagDocument;
import com.example.rag.repository.RagDocumentRepository;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@Slf4j
@Service
public class DocumentIngestionService {
private final VectorStore vectorStore;
private final RagDocumentRepository documentRepository;
@Value("${rag.chunk.size:512}")
private int chunkSize;
@Value("${rag.chunk.overlap:50}")
private int chunkOverlap;
@Value("${rag.chunk.min-size:100}")
private int minChunkSize;
@Value("${rag.storage.base-path:/data/rag/documents}")
private String storagePath;
public DocumentIngestionService(VectorStore vectorStore,
RagDocumentRepository documentRepository) {
this.vectorStore = vectorStore;
this.documentRepository = documentRepository;
}
/**
* 接收文件上传,保存文件,创建元数据记录,异步开始处理
*/
@Transactional
public RagDocument submitDocument(MultipartFile file,
String department,
String category,
String uploadedBy) throws IOException {
// 保存文件到磁盘
Path dir = Paths.get(storagePath, department);
Files.createDirectories(dir);
String savedFileName = System.currentTimeMillis() + "_" + file.getOriginalFilename();
Path savedPath = dir.resolve(savedFileName);
Files.copy(file.getInputStream(), savedPath);
// 创建元数据记录
RagDocument doc = new RagDocument();
doc.setFileName(savedFileName);
doc.setOriginalFileName(file.getOriginalFilename());
doc.setStatus(RagDocument.DocumentStatus.PENDING);
doc.setDepartment(department);
doc.setCategory(category);
doc.setFileSizeBytes(file.getSize());
doc.setUploadedBy(uploadedBy);
RagDocument savedDoc = documentRepository.save(doc);
// 异步处理(向量化可能需要数秒到数分钟)
processDocumentAsync(savedDoc.getId(), savedPath);
return savedDoc;
}
/**
* 异步文档处理:读取 → 分割 → 向量化 → 存储
*/
@Async("documentProcessingExecutor")
public void processDocumentAsync(Long docId, Path filePath) {
RagDocument doc = documentRepository.findById(docId)
.orElseThrow(() -> new RuntimeException("Document not found: " + docId));
try {
// 更新状态为处理中
doc.setStatus(RagDocument.DocumentStatus.PROCESSING);
documentRepository.save(doc);
// 根据文件类型选择不同的Reader
List<Document> rawDocuments = readDocument(filePath);
// 分割文档
TokenTextSplitter splitter = new TokenTextSplitter(
chunkSize, chunkOverlap, minChunkSize, 10000, true);
List<Document> chunks = splitter.apply(rawDocuments);
// 为每个chunk添加完整的元数据(用于后续过滤)
enrichMetadata(chunks, doc);
// 批量写入向量存储(分批次避免内存溢出)
batchAddToVectorStore(chunks, 50);
// 更新状态为已索引
doc.setStatus(RagDocument.DocumentStatus.INDEXED);
doc.setChunkCount(chunks.size());
doc.setIndexedAt(LocalDateTime.now());
documentRepository.save(doc);
log.info("文档[{}]处理完成,共{}个chunks", doc.getOriginalFileName(), chunks.size());
} catch (Exception e) {
log.error("文档[{}]处理失败", doc.getOriginalFileName(), e);
doc.setStatus(RagDocument.DocumentStatus.FAILED);
doc.setErrorMessage(e.getMessage());
documentRepository.save(doc);
}
}
private List<Document> readDocument(Path filePath) throws Exception {
String fileName = filePath.getFileName().toString().toLowerCase();
if (fileName.endsWith(".pdf")) {
PagePdfDocumentReader reader = new PagePdfDocumentReader(
filePath.toUri().toString());
return reader.get();
} else {
// Word/HTML/TXT等格式用Tika
TikaDocumentReader reader = new TikaDocumentReader(
filePath.toUri().toString());
return reader.get();
}
}
private void enrichMetadata(List<Document> chunks, RagDocument doc) {
for (int i = 0; i < chunks.size(); i++) {
Map<String, Object> metadata = new HashMap<>(chunks.get(i).getMetadata());
metadata.put("docId", doc.getId().toString());
metadata.put("docName", doc.getOriginalFileName());
metadata.put("department", doc.getDepartment());
metadata.put("category", doc.getCategory());
metadata.put("chunkIndex", i);
metadata.put("totalChunks", chunks.size());
// 重建带元数据的Document
chunks.set(i, new Document(chunks.get(i).getContent(), metadata));
}
}
private void batchAddToVectorStore(List<Document> chunks, int batchSize) {
List<List<Document>> batches = new ArrayList<>();
for (int i = 0; i < chunks.size(); i += batchSize) {
batches.add(chunks.subList(i, Math.min(i + batchSize, chunks.size())));
}
for (List<Document> batch : batches) {
vectorStore.add(batch);
log.debug("已写入{}个chunks到向量存储", batch.size());
}
}
/**
* 删除文档(需同时删除向量存储中的所有相关chunks)
*/
@Transactional
public void deleteDocument(Long docId) {
RagDocument doc = documentRepository.findById(docId)
.orElseThrow(() -> new RuntimeException("Document not found: " + docId));
// 从向量存储删除(通过元数据过滤)
// 注意:PgVectorStore支持通过filter删除
vectorStore.delete(List.of(doc.getId().toString()));
// 删除数据库记录
documentRepository.delete(doc);
log.info("文档[{}]已删除", doc.getOriginalFileName());
}
}四、第三步:企业级 — 权限控制与增量更新
带元数据过滤的查询Service
package com.example.rag.service;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.filter.FilterExpressionBuilder;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;
/**
* 企业级RAG查询服务
* 支持部门权限过滤、相似度阈值过滤、来源引用、监控埋点
*/
@Slf4j
@Service
public class EnterpriseRagService {
private final VectorStore vectorStore;
private final ChatClient chatClient;
private final Timer queryTimer;
private final Counter noContextCounter;
@Value("${rag.retrieval.top-k:5}")
private int topK;
@Value("${rag.retrieval.similarity-threshold:0.65}")
private double similarityThreshold;
private static final String RAG_SYSTEM_PROMPT =
"你是企业内部知识库助手。基于提供的参考资料准确回答问题。" +
"若资料不足请明确说明。不要编造信息。回答要简洁,必要时引用来源。";
private static final String RAG_USER_TEMPLATE = """
参考资料(按相关性排序):
{context}
---
问题:{question}
""";
public EnterpriseRagService(VectorStore vectorStore,
ChatClient.Builder builder,
MeterRegistry meterRegistry) {
this.vectorStore = vectorStore;
this.chatClient = builder.build();
this.queryTimer = Timer.builder("rag.query.duration")
.description("RAG查询耗时")
.register(meterRegistry);
this.noContextCounter = Counter.builder("rag.query.no_context")
.description("无相关文档的查询次数")
.register(meterRegistry);
}
/**
* 带权限过滤的RAG查询
*
* @param question 用户问题
* @param department 用户所属部门(用于权限隔离)
* @param category 文档分类过滤(可选,null表示不过滤)
* @return RAG回答结果
*/
public RagAnswer query(String question, String department, String category) {
return queryTimer.record(() -> {
// 构建元数据过滤条件
SearchRequest searchRequest = buildSearchRequest(question, department, category);
// 相似度检索
List<Document> relevantDocs = vectorStore.similaritySearch(searchRequest);
// 过滤低相似度结果
List<Document> filteredDocs = relevantDocs.stream()
.filter(doc -> {
Object score = doc.getMetadata().get("distance");
if (score instanceof Double d) {
return (1.0 - d) >= similarityThreshold; // 余弦距离转相似度
}
return true;
})
.collect(Collectors.toList());
log.debug("问题[{}]: 检索到{}个文档,过滤后{}个", question,
relevantDocs.size(), filteredDocs.size());
if (filteredDocs.isEmpty()) {
noContextCounter.increment();
return RagAnswer.noContext(question);
}
// 构建上下文(包含来源信息)
String context = buildContextWithSources(filteredDocs);
// 调用LLM
String answer = chatClient.prompt()
.system(RAG_SYSTEM_PROMPT)
.user(RAG_USER_TEMPLATE
.replace("{context}", context)
.replace("{question}", question))
.call()
.content();
// 提取来源列表
List<String> sources = filteredDocs.stream()
.map(doc -> (String) doc.getMetadata().getOrDefault("docName", "未知来源"))
.distinct()
.collect(Collectors.toList());
return RagAnswer.success(answer, sources, filteredDocs.size());
});
}
private SearchRequest buildSearchRequest(String question, String department,
String category) {
SearchRequest.Builder builder = SearchRequest.query(question).withTopK(topK);
// 构建过滤条件
FilterExpressionBuilder filterBuilder = new FilterExpressionBuilder();
if (category != null && !category.isBlank()) {
// 部门过滤 AND 分类过滤
builder.withFilterExpression(
filterBuilder.and(
filterBuilder.eq("department", department),
filterBuilder.eq("category", category)
).build()
);
} else {
// 仅部门过滤
builder.withFilterExpression(
filterBuilder.eq("department", department).build()
);
}
return builder.build();
}
private String buildContextWithSources(List<Document> docs) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < docs.size(); i++) {
Document doc = docs.get(i);
String docName = (String) doc.getMetadata().getOrDefault("docName", "未知");
sb.append("[来源").append(i + 1).append(": ").append(docName).append("]\n");
sb.append(doc.getContent());
sb.append("\n\n");
}
return sb.toString();
}
public record RagAnswer(String answer, List<String> sources, int contextCount,
boolean hasContext) {
public static RagAnswer success(String answer, List<String> sources, int count) {
return new RagAnswer(answer, sources, count, true);
}
public static RagAnswer noContext(String question) {
return new RagAnswer(
"抱歉,在现有知识库中未找到与您问题相关的资料。请尝试换一种描述,或联系相关部门获取帮助。",
List.of(), 0, false
);
}
}
}Controller层(完整三层)
package com.example.rag.controller;
import com.example.rag.entity.RagDocument;
import com.example.rag.service.DocumentIngestionService;
import com.example.rag.service.EnterpriseRagService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.util.Map;
@Slf4j
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final DocumentIngestionService ingestionService;
private final EnterpriseRagService ragService;
public RagController(DocumentIngestionService ingestionService,
EnterpriseRagService ragService) {
this.ingestionService = ingestionService;
this.ragService = ragService;
}
/**
* 上传并索引文档
*/
@PostMapping(value = "/documents", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public ResponseEntity<Map<String, Object>> uploadDocument(
@RequestParam("file") MultipartFile file,
@RequestParam("department") String department,
@RequestParam(value = "category", defaultValue = "通用") String category,
@RequestHeader(value = "X-User-Id", defaultValue = "system") String userId) {
if (file.isEmpty()) {
return ResponseEntity.badRequest()
.body(Map.of("error", "文件不能为空"));
}
String originalFileName = file.getOriginalFilename();
if (originalFileName == null || (!originalFileName.toLowerCase().endsWith(".pdf")
&& !originalFileName.toLowerCase().endsWith(".docx")
&& !originalFileName.toLowerCase().endsWith(".txt"))) {
return ResponseEntity.badRequest()
.body(Map.of("error", "仅支持PDF/DOCX/TXT格式"));
}
try {
RagDocument doc = ingestionService.submitDocument(file, department, category, userId);
return ResponseEntity.ok(Map.of(
"docId", doc.getId(),
"status", doc.getStatus(),
"message", "文档已提交处理,请稍后查询状态"
));
} catch (IOException e) {
log.error("文档上传失败", e);
return ResponseEntity.internalServerError()
.body(Map.of("error", "文件保存失败: " + e.getMessage()));
}
}
/**
* 查询文档处理状态
*/
@GetMapping("/documents/{docId}/status")
public ResponseEntity<Map<String, Object>> getDocumentStatus(@PathVariable Long docId) {
// 简化实现,实际应注入Repository
return ResponseEntity.ok(Map.of("docId", docId, "message", "请查看文档管理列表"));
}
/**
* RAG问答查询
*/
@PostMapping("/query")
public ResponseEntity<Map<String, Object>> query(
@RequestBody QueryRequest request,
@RequestHeader(value = "X-Department", defaultValue = "公共") String department) {
if (request.question() == null || request.question().isBlank()) {
return ResponseEntity.badRequest()
.body(Map.of("error", "问题不能为空"));
}
EnterpriseRagService.RagAnswer answer = ragService.query(
request.question(),
department,
request.category()
);
return ResponseEntity.ok(Map.of(
"answer", answer.answer(),
"sources", answer.sources(),
"contextCount", answer.contextCount(),
"hasContext", answer.hasContext()
));
}
/**
* 删除文档
*/
@DeleteMapping("/documents/{docId}")
public ResponseEntity<Map<String, String>> deleteDocument(@PathVariable Long docId) {
ingestionService.deleteDocument(docId);
return ResponseEntity.ok(Map.of("message", "文档已删除"));
}
public record QueryRequest(String question, String category) {}
}五、关键参数调优指南
chunk_size 的选择
chunk_size是RAG最核心的调参项,直接影响检索质量和成本。
| chunk_size | 优点 | 缺点 | 适用场景 |
|---|---|---|---|
| 128-256 | 语义聚焦,精确 | 可能截断完整信息 | 事实性问答(产品参数、政策条文) |
| 512 | 平衡最优 | - | 通用推荐值 |
| 1024-2048 | 保留完整段落 | 噪声多,成本高 | 需要理解上下文的复杂问题 |
实测建议:先从512开始,用20个真实问题测试准确率。如果发现"答案被截断",增大到768;如果发现"检索结果不够精准",缩小到256。
overlap 的选择
overlap解决的是"答案恰好在两个chunk的边界"问题。
- 太小(< 20):边界信息丢失
- 推荐值:chunk_size的10%(512→50,256→25)
- 太大(> 20%):存储成本增加,信息重复
top_k 的选择
top_k是每次检索返回的文档块数量,直接影响:
- 增大top_k:召回率提升(不容易漏),但上下文变长,成本上升
- 减小top_k:成本低,但可能漏掉关键信息
推荐策略:
- 简单事实性问题:top_k=3
- 复杂综合性问题:top_k=7
- 使用动态top_k(根据查询复杂度自动调整)动态top_k实现:
private int calculateDynamicTopK(String question) {
// 问题越长、越复杂,需要更多上下文
int baseK = 3;
if (question.length() > 50) baseK += 1;
if (question.contains("对比") || question.contains("区别")) baseK += 2;
if (question.contains("流程") || question.contains("步骤")) baseK += 1;
return Math.min(baseK, 10); // 最大不超过10
}六、嵌入模型选型
text-embedding-3-small vs BGE
| 指标 | text-embedding-3-small | BGE-large-zh |
|---|---|---|
| 维度 | 1536 | 1024 |
| 价格 | $0.02/1M tokens | 自部署约$0(推理成本) |
| 中文效果 | 良好 | 优秀 |
| 延迟 | ~50ms | ~20ms(本地) |
| 部署成本 | 无(API) | GPU服务器/约¥800/月 |
| 适合场景 | 中小规模,混合语言 | 纯中文,大规模,成本敏感 |
决策建议:
- 文档量 < 50万,或有英文内容:用
text-embedding-3-small,维护成本低 - 文档量 > 100万,纯中文:考虑BGE自部署,长期成本更低
- 中间状态:先用API,观察使用量,达到盈亏平衡点再迁移
Spring AI接入BGE(通过Ollama本地部署):
spring:
ai:
ollama:
base-url: http://localhost:11434
embedding:
options:
model: bge-large-zh # ollama pull bge-large-zh七、增量更新策略
文档更新是企业RAG最常见的运维痛点。完整重建耗时长,增量更新需要处理一致性。
package com.example.rag.service;
import com.example.rag.entity.RagDocument;
import com.example.rag.repository.RagDocumentRepository;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Optional;
/**
* 增量更新服务
* 支持文档更新(删除旧版本,重新索引新版本)
*/
@Slf4j
@Service
public class IncrementalUpdateService {
private final DocumentIngestionService ingestionService;
private final RagDocumentRepository documentRepository;
private final VectorStore vectorStore;
public IncrementalUpdateService(DocumentIngestionService ingestionService,
RagDocumentRepository documentRepository,
VectorStore vectorStore) {
this.ingestionService = ingestionService;
this.documentRepository = documentRepository;
this.vectorStore = vectorStore;
}
/**
* 更新文档(删除旧版本 + 索引新版本)
*/
public RagDocument updateDocument(Long oldDocId, MultipartFile newFile,
String uploadedBy) throws IOException {
RagDocument oldDoc = documentRepository.findById(oldDocId)
.orElseThrow(() -> new RuntimeException("原文档不存在: " + oldDocId));
// 1. 删除旧文档的所有向量(通过docId过滤)
vectorStore.delete(List.of(oldDoc.getId().toString()));
log.info("已删除文档[{}]的旧向量", oldDoc.getOriginalFileName());
// 2. 更新状态为失效
oldDoc.setStatus(RagDocument.DocumentStatus.FAILED);
oldDoc.setErrorMessage("已被新版本替代");
documentRepository.save(oldDoc);
// 3. 索引新文档
RagDocument newDoc = ingestionService.submitDocument(
newFile,
oldDoc.getDepartment(),
oldDoc.getCategory(),
uploadedBy
);
log.info("文档更新完成:旧ID={}, 新ID={}", oldDocId, newDoc.getId());
return newDoc;
}
/**
* 定时检查失败的文档,重新处理
* 每小时检查一次
*/
@Scheduled(fixedDelay = 3600000)
public void retryFailedDocuments() {
List<RagDocument> failedDocs = documentRepository
.findByStatus(RagDocument.DocumentStatus.FAILED);
if (failedDocs.isEmpty()) return;
log.info("发现{}个处理失败的文档,开始重试", failedDocs.size());
for (RagDocument doc : failedDocs) {
if (doc.getErrorMessage() != null &&
doc.getErrorMessage().contains("已被新版本替代")) {
continue; // 跳过主动失效的文档
}
try {
Path filePath = Paths.get(
"/data/rag/documents",
doc.getDepartment(),
doc.getFileName()
);
ingestionService.processDocumentAsync(doc.getId(), filePath);
} catch (Exception e) {
log.error("文档[{}]重试失败", doc.getOriginalFileName(), e);
}
}
}
}八、性能测试:不同配置下的检索效果
我们用50个标注好答案的问题(golden dataset)测试了不同配置:
| 配置 | 准确率 | 平均延迟 | Token/次 | 月成本(万次查询) |
|---|---|---|---|---|
| chunk=256, top_k=3 | 71% | 380ms | 620 | ¥182 |
| chunk=512, top_k=5 | 82% | 520ms | 1,450 | ¥427 |
| chunk=1024, top_k=7 | 79% | 780ms | 3,820 | ¥1,124 |
| chunk=512, top_k=5, 相似度>0.7 | 83% | 510ms | 1,380 | ¥406 |
| chunk=512, top_k=5, +Reranker | 88% | 920ms | 1,450 | ¥520 |
结论:chunk=512, top_k=5, 相似度阈值0.65-0.70 是性价比最优的配置。如果追求更高准确率,加Reranker(重排序)可以再提升6个百分点,代价是延迟增加约400ms。
九、FAQ
Q1:RAG和微调(Fine-tuning)有什么区别,什么时候选哪个?
RAG是"给模型查阅资料",Fine-tuning是"让模型学习新技能"。
- 用RAG:知识经常更新、需要引用来源、知识量大(>100文档)
- 用Fine-tuning:需要改变模型的输出风格、学习特定领域的语法/格式、知识相对稳定
90%的企业知识库场景选RAG。
Q2:文档被分割后,跨chunk的信息怎么办?
三个解决方案:
- 增大overlap(50→100):减少边界信息丢失
- 父子chunk策略:小chunk用于精准检索,命中后返回其"父chunk"(更大范围)给LLM
- 增大top_k:多检索几个chunk,覆盖边界情况
Q3:向量数据库用什么?我们已经有MySQL了能用吗?
MySQL原生不支持向量检索(无HNSW索引)。选项:
- 已有PostgreSQL:安装pgvector扩展,零额外成本
- 已有MySQL:升级到MySQL 9.0(内置向量支持)或接入独立向量库
- 新项目:直接用pgvector或Qdrant,见article-104的详细对比
Q4:上传的文档里有图片怎么办?
当前实现只处理文字内容。图片处理需要多模态方案:
- 用GPT-4V/Claude Vision对图片生成描述文字
- 将图片描述文字也纳入向量化
- 对于表格密集的PDF,推荐用专门的解析工具如Unstructured
Q5:如何防止AI编造不在知识库里的答案?
三道防线:
- System Prompt明确限制:"无相关内容请明确说明,不要编造"
- 相似度阈值过滤:低于阈值直接返回"无相关内容"而非调用LLM
- 后处理校验:用规则检测答案中是否包含参考资料外的关键实体
Q6:知识库文档更新后,相关的问答缓存怎么处理?
有两种策略:
- 时间TTL:缓存4-24小时自动失效,适合更新不频繁的场景
- 主动失效:文档更新时,触发相关缓存清除(通过docId关联)
推荐策略2,实现更精准。
