第1666篇：纠错RAG（Corrective RAG）——检测检索质量并动态修正

老张大约 12 分钟

第1666篇：纠错RAG（Corrective RAG）——检测检索质量并动态修正

有一次做项目复盘，发现了一个很有意思的错误模式。

用户问："我们Q3的NPS指标是多少？"

RAG系统检索到了一篇关于"NPS改进计划"的文档，里面只提到了NPS的目标值和改进方向，没有实际数值。但LLM很"尽职"地基于这篇文档给出了一个答案——它把目标值当成实际值回答了。

这个例子说明了一个普遍问题：传统RAG不检查检索质量，不管找到什么，都直接给LLM生成。检索质量差，最终答案就差，但整个过程看起来"运行正常"。

Corrective RAG（CRAG）的核心思路就是：在生成之前，主动评估检索质量，质量不够则触发修正动作。

一、CRAG的核心思想

CRAG来自2024年的一篇论文，核心设计了三个动作：

CORRECT（正确）：检索质量足够好，直接用于生成
AMBIGUOUS（模糊）：检索到了一些相关内容，但可能不完整或不确定，需要补充
INCORRECT（错误）：检索到的内容完全不相关或质量太差，需要换一种检索方式（比如改用网络搜索）

二、检索质量评估器的实现

这是CRAG的核心组件，它决定了检索结果属于哪个级别。

2.1 相关性评分器

@Service
public class RetrievalQualityEvaluator {
    
    @Autowired
    private LLMClient llmClient;
    
    @Autowired
    private EmbeddingService embeddingService;
    
    /**
     * 评估检索结果与查询的相关性
     * 返回：CORRECT / AMBIGUOUS / INCORRECT
     */
    public RetrievalGrade evaluate(String query, List<Document> retrievedDocs) {
        if (retrievedDocs.isEmpty()) {
            return RetrievalGrade.INCORRECT;
        }
        
        // 两种评分方式融合：语义相似度 + LLM判断
        double semanticScore = evaluateSemanticRelevance(query, retrievedDocs);
        double llmScore = evaluateLLMRelevance(query, retrievedDocs);
        
        // 加权融合
        double combinedScore = 0.4 * semanticScore + 0.6 * llmScore;
        
        log.debug("检索质量评估：semantic={:.2f}, llm={:.2f}, combined={:.2f}",
            semanticScore, llmScore, combinedScore);
        
        if (combinedScore >= 0.75) {
            return RetrievalGrade.CORRECT;
        } else if (combinedScore >= 0.45) {
            return RetrievalGrade.AMBIGUOUS;
        } else {
            return RetrievalGrade.INCORRECT;
        }
    }
    
    /**
     * 语义相似度评估：基于向量距离
     */
    private double evaluateSemanticRelevance(String query, List<Document> docs) {
        Float[] queryEmbedding = embeddingService.embed(query);
        
        return docs.stream()
            .mapToDouble(doc -> {
                // 如果文档已经有预计算的分数，直接用
                if (doc.getScore() > 0) {
                    return doc.getScore();
                }
                Float[] docEmbedding = embeddingService.embed(
                    doc.getContent().substring(0, Math.min(500, doc.getContent().length()))
                );
                return cosineSimilarity(queryEmbedding, docEmbedding);
            })
            .average()
            .orElse(0.0);
    }
    
    /**
     * LLM相关性评估：让LLM判断文档是否能帮助回答问题
     * 比纯语义相似度更准确，但有延迟
     */
    private double evaluateLLMRelevance(String query, List<Document> docs) {
        // 对每个文档单独评分，取最高分（有一个文档相关就有价值）
        return docs.stream()
            .mapToDouble(doc -> evaluateSingleDocRelevance(query, doc))
            .max()
            .orElse(0.0);
    }
    
    private double evaluateSingleDocRelevance(String query, Document doc) {
        String evalPrompt = String.format("""
            请判断以下文档内容与给定问题的相关性。
            
            问题：%s
            
            文档内容（节选）：
            %s
            
            评分标准（只输出数字，0-1之间，保留两位小数）：
            - 1.0：文档直接包含回答问题所需的核心信息
            - 0.7-0.9：文档包含部分相关信息，可以辅助回答
            - 0.4-0.6：文档与问题主题相关，但不直接回答
            - 0.0-0.3：文档与问题基本不相关
            
            相关性分数：
            """, 
            query,
            doc.getContent().substring(0, Math.min(400, doc.getContent().length()))
        );
        
        String response = llmClient.chat(evalPrompt).trim();
        
        try {
            return Double.parseDouble(response.replaceAll("[^0-9.]", ""));
        } catch (NumberFormatException e) {
            return 0.5;  // 解析失败默认中等分
        }
    }
    
    private double cosineSimilarity(Float[] a, Float[] b) {
        double dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.length; i++) {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        if (normA == 0 || normB == 0) return 0;
        return dot / (Math.sqrt(normA) * Math.sqrt(normB));
    }
}

2.2 精细化文档质量检查

除了相关性，还要检查文档的信息完整性：

@Service  
public class DocumentQualityChecker {
    
    @Autowired
    private LLMClient llmClient;
    
    /**
     * 检查文档是否包含回答问题所需的具体信息
     * 相关性高不代表信息完整（文档可以"关于"这个话题，但不包含具体答案）
     */
    public InformationSufficiency checkInformationSufficiency(
            String query, List<Document> docs) {
        
        String allContent = docs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n---\n"));
        
        String checkPrompt = String.format("""
            请分析以下文档集合是否包含足够的信息来回答问题。
            
            问题：%s
            
            文档内容：
            %s
            
            请回答以下JSON格式：
            {
              "has_direct_answer": true/false,  // 是否有直接答案
              "information_gaps": ["缺少的信息1", "缺少的信息2"],  // 信息缺口
              "can_partially_answer": true/false,  // 是否可以部分回答
              "confidence": 0.8  // 回答置信度（0-1）
            }
            """, query, allContent.substring(0, Math.min(2000, allContent.length())));
        
        String response = llmClient.chat(checkPrompt);
        
        try {
            JsonNode node = objectMapper.readTree(response);
            
            List<String> gaps = new ArrayList<>();
            node.get("information_gaps").forEach(g -> gaps.add(g.asText()));
            
            return InformationSufficiency.builder()
                .hasDirectAnswer(node.get("has_direct_answer").asBoolean())
                .informationGaps(gaps)
                .canPartiallyAnswer(node.get("can_partially_answer").asBoolean())
                .confidence(node.get("confidence").asDouble())
                .build();
                
        } catch (Exception e) {
            return InformationSufficiency.builder()
                .hasDirectAnswer(false)
                .informationGaps(Collections.emptyList())
                .canPartiallyAnswer(true)
                .confidence(0.5)
                .build();
        }
    }
    
    @Autowired
    private ObjectMapper objectMapper;
}

当检索质量为CORRECT时，还要做一步知识精炼——去除文档中的噪声，只保留与问题相关的核心信息。

这一步很多人忽视，但实际效果很明显：把500字的文档里只有50字是相关的部分精炼出来，给LLM的上下文质量大幅提升。

@Service
public class KnowledgeRefinementService {
    
    @Autowired
    private LLMClient llmClient;
    
    /**
     * 知识精炼：从检索文档中提取与查询最相关的信息片段
     */
    public RefinedKnowledge refine(String query, List<Document> docs) {
        List<RefinedSegment> segments = new ArrayList<>();
        
        for (Document doc : docs) {
            // 将文档分割成句子粒度
            List<String> sentences = splitIntoSentences(doc.getContent());
            
            // 对每个句子计算相关性（批量处理）
            List<String> relevantSentences = filterRelevantSentences(query, sentences);
            
            if (!relevantSentences.isEmpty()) {
                segments.add(RefinedSegment.builder()
                    .sourceDocId(doc.getId())
                    .relevantSentences(relevantSentences)
                    .originalLength(doc.getContent().length())
                    .refinedLength(relevantSentences.stream()
                        .mapToInt(String::length).sum())
                    .build());
            }
        }
        
        // 计算压缩比
        int totalOriginal = docs.stream().mapToInt(d -> d.getContent().length()).sum();
        int totalRefined = segments.stream()
            .mapToInt(RefinedSegment::getRefinedLength).sum();
        
        double compressionRatio = totalOriginal > 0 ? 
            (double) totalRefined / totalOriginal : 1.0;
        
        log.info("知识精炼完成：压缩比={:.2f}，原始{}字 -> 精炼后{}字",
            compressionRatio, totalOriginal, totalRefined);
        
        return RefinedKnowledge.builder()
            .segments(segments)
            .compressionRatio(compressionRatio)
            .formattedContent(formatRefinedContent(segments))
            .build();
    }
    
    /**
     * 批量过滤相关句子（避免每句都调LLM，用关键词匹配+嵌入相似度）
     */
    private List<String> filterRelevantSentences(String query, List<String> sentences) {
        // 提取查询关键词
        Set<String> queryKeywords = extractKeywords(query);
        
        return sentences.stream()
            .filter(sentence -> {
                if (sentence.length() < 15) return false;  // 过滤太短的句子
                
                // 关键词重叠
                long keywordMatches = queryKeywords.stream()
                    .filter(kw -> sentence.contains(kw))
                    .count();
                
                return keywordMatches > 0 || sentence.length() > 100;
            })
            .collect(Collectors.toList());
    }
    
    private List<String> splitIntoSentences(String text) {
        return Arrays.asList(text.split("[。！？.!?]"))
            .stream()
            .map(String::trim)
            .filter(s -> s.length() > 10)
            .collect(Collectors.toList());
    }
    
    private Set<String> extractKeywords(String query) {
        return Arrays.stream(query.split("[\\s，。、！？,.!?]+"))
            .filter(w -> w.length() >= 2)
            .collect(Collectors.toSet());
    }
    
    private String formatRefinedContent(List<RefinedSegment> segments) {
        return segments.stream()
            .map(seg -> String.join("。", seg.getRelevantSentences()))
            .collect(Collectors.joining("\n\n"));
    }
}

四、查询改写与网络搜索

当检索质量为INCORRECT时，需要换策略——通常是改写查询，然后走网络搜索或更大范围的检索。

@Service
public class QueryRewriteAndSearchService {
    
    @Autowired
    private LLMClient llmClient;
    
    @Autowired
    private WebSearchClient webSearchClient;
    
    @Autowired
    private VectorSearchService vectorSearch;
    
    /**
     * 查询改写：针对检索失败的查询，生成更好的检索查询
     */
    public List<String> rewriteQuery(String originalQuery, List<Document> failedDocs) {
        String failedContext = "";
        if (!failedDocs.isEmpty()) {
            failedContext = "（已检索到但不相关的内容涉及：" + 
                failedDocs.stream().limit(2)
                    .map(d -> d.getContent().substring(0, Math.min(100, d.getContent().length())))
                    .collect(Collectors.joining("；")) + "）";
        }
        
        String rewritePrompt = String.format("""
            以下查询在知识库中没有找到有用的结果。
            请将查询改写为2-3个不同角度的版本，以便找到更相关的信息。
            
            原始查询：%s
            %s
            
            改写原则：
            1. 使用同义词或相关术语替换
            2. 分解复杂查询为更简单的子查询
            3. 尝试从不同角度表述同一需求
            
            改写结果（每行一个）：
            """, originalQuery, failedContext);
        
        String response = llmClient.chat(rewritePrompt);
        
        return Arrays.stream(response.split("\n"))
            .map(line -> line.replaceFirst("^\\d+[.、]\\s*", "").trim())
            .filter(q -> !q.isBlank() && q.length() >= 5)
            .collect(Collectors.toList());
    }
    
    /**
     * 多轮改写检索：逐步改写，直到找到相关结果
     */
    public SearchResult searchWithRewrite(String originalQuery, int maxAttempts) {
        List<Document> bestDocs = Collections.emptyList();
        double bestScore = 0.0;
        String bestQuery = originalQuery;
        
        for (int attempt = 0; attempt < maxAttempts; attempt++) {
            String currentQuery = attempt == 0 ? originalQuery :
                rewriteQuery(originalQuery, bestDocs).stream().findFirst().orElse(originalQuery);
            
            List<Document> docs = vectorSearch.search(currentQuery, 5);
            
            if (!docs.isEmpty() && docs.get(0).getScore() > bestScore) {
                bestScore = docs.get(0).getScore();
                bestDocs = docs;
                bestQuery = currentQuery;
            }
            
            if (bestScore >= 0.75) break;  // 找到足够好的结果，停止改写
        }
        
        return SearchResult.builder()
            .documents(bestDocs)
            .usedQuery(bestQuery)
            .attempts(maxAttempts)
            .bestScore(bestScore)
            .build();
    }
    
    /**
     * 网络搜索降级（当本地知识库完全没有相关内容时）
     */
    public List<Document> searchWeb(String query) {
        log.info("本地检索失败，触发网络搜索：{}", query);
        
        try {
            List<WebSearchResult> webResults = webSearchClient.search(query, 5);
            
            return webResults.stream()
                .map(wr -> Document.builder()
                    .id("web_" + wr.getUrl().hashCode())
                    .content(wr.getSnippet())
                    .metadata(Map.of(
                        "source", wr.getUrl(),
                        "title", wr.getTitle(),
                        "type", "web_search"
                    ))
                    .build())
                .collect(Collectors.toList());
                
        } catch (Exception e) {
            log.error("网络搜索失败", e);
            return Collections.emptyList();
        }
    }
}

五、CRAG完整流程整合

@Service
public class CorrectiveRAGService {
    
    @Autowired
    private VectorSearchService vectorSearch;
    
    @Autowired
    private RetrievalQualityEvaluator qualityEvaluator;
    
    @Autowired
    private DocumentQualityChecker docChecker;
    
    @Autowired
    private KnowledgeRefinementService knowledgeRefiner;
    
    @Autowired
    private QueryRewriteAndSearchService queryRewriteService;
    
    @Autowired
    private LLMClient llmClient;
    
    @Autowired
    private MetricsService metrics;
    
    /**
     * CRAG主流程
     */
    public CRAGResult query(String question) {
        long startTime = System.currentTimeMillis();
        
        // 第一步：初次检索
        List<Document> initialDocs = vectorSearch.search(question, 5);
        
        // 第二步：评估检索质量
        RetrievalGrade grade = qualityEvaluator.evaluate(question, initialDocs);
        
        log.info("CRAG检索质量评估：grade={}", grade);
        metrics.recordRetrievalGrade(grade.name());
        
        List<Document> finalDocs;
        String retrievalStrategy;
        
        // 第三步：根据质量评级执行不同策略
        switch (grade) {
            case CORRECT:
                // 直接精炼知识
                RefinedKnowledge refined = knowledgeRefiner.refine(question, initialDocs);
                finalDocs = initialDocs;
                retrievalStrategy = "correct_with_refinement";
                return generateAndReturn(question, refined.getFormattedContent(), 
                    finalDocs, grade, retrievalStrategy, startTime);
                
            case AMBIGUOUS:
                // 混合策略：精炼已有文档 + 补充检索
                RefinedKnowledge partialRefined = knowledgeRefiner.refine(question, initialDocs);
                
                // 检查信息缺口
                InformationSufficiency sufficiency = docChecker.checkInformationSufficiency(
                    question, initialDocs);
                
                String supplementalContext = "";
                if (!sufficiency.getInformationGaps().isEmpty()) {
                    // 针对信息缺口做补充检索
                    for (String gap : sufficiency.getInformationGaps()) {
                        List<Document> supplementDocs = vectorSearch.search(gap, 2);
                        if (!supplementDocs.isEmpty()) {
                            supplementalContext += knowledgeRefiner
                                .refine(gap, supplementDocs)
                                .getFormattedContent() + "\n";
                        }
                    }
                }
                
                String combinedContext = partialRefined.getFormattedContent() + 
                    (supplementalContext.isBlank() ? "" : "\n\n补充信息：\n" + supplementalContext);
                finalDocs = initialDocs;
                retrievalStrategy = "ambiguous_with_supplement";
                return generateAndReturn(question, combinedContext, 
                    finalDocs, grade, retrievalStrategy, startTime);
                
            case INCORRECT:
                // 查询改写 + 重新检索（本地 or 网络）
                List<String> rewrittenQueries = queryRewriteService.rewriteQuery(
                    question, initialDocs);
                
                finalDocs = Collections.emptyList();
                
                // 先尝试改写后重新检索本地知识库
                for (String rewrittenQuery : rewrittenQueries) {
                    List<Document> retryDocs = vectorSearch.search(rewrittenQuery, 4);
                    RetrievalGrade retryGrade = qualityEvaluator.evaluate(
                        question, retryDocs);
                    
                    if (retryGrade != RetrievalGrade.INCORRECT) {
                        finalDocs = retryDocs;
                        break;
                    }
                }
                
                // 本地都找不到，走网络搜索
                if (finalDocs.isEmpty()) {
                    finalDocs = queryRewriteService.searchWeb(question);
                    retrievalStrategy = "incorrect_web_search";
                } else {
                    retrievalStrategy = "incorrect_rewrite_retry";
                }
                
                String fallbackContext = finalDocs.stream()
                    .map(Document::getContent)
                    .collect(Collectors.joining("\n\n"));
                
                return generateAndReturn(question, fallbackContext, 
                    finalDocs, grade, retrievalStrategy, startTime);
                
            default:
                throw new IllegalStateException("未知的检索质量等级：" + grade);
        }
    }
    
    private CRAGResult generateAndReturn(String question, String context,
                                          List<Document> docs, RetrievalGrade grade,
                                          String strategy, long startTime) {
        String systemPrompt = buildSystemPrompt(grade);
        
        String userMessage = String.format("""
            参考资料：
            %s
            
            问题：%s
            """, context.isBlank() ? "（无相关参考资料）" : context, question);
        
        String answer = llmClient.chat(systemPrompt, userMessage);
        
        long latency = System.currentTimeMillis() - startTime;
        metrics.recordCRAGLatency(strategy, latency);
        
        return CRAGResult.builder()
            .answer(answer)
            .retrievalGrade(grade)
            .strategy(strategy)
            .sourceDocs(docs)
            .latencyMs(latency)
            .build();
    }
    
    private String buildSystemPrompt(RetrievalGrade grade) {
        switch (grade) {
            case CORRECT:
                return "你是一个专业助手。基于以下参考资料准确回答问题，答案要有依据。";
            case AMBIGUOUS:
                return "你是一个专业助手。基于以下参考资料回答问题，" +
                       "信息可能不完整，请注明哪些部分是确定的，哪些部分存在不确定性。";
            case INCORRECT:
                return "你是一个专业助手。参考资料可能与问题不完全匹配，" +
                       "请结合参考资料和你的知识回答，明确区分哪些来自参考资料，哪些来自你的知识。";
            default:
                return "你是一个专业助手。请回答以下问题。";
        }
    }
}

六、CRAG的评估指标

要知道CRAG有没有效果，需要专门针对它的评估：

@Service
public class CRAGEvaluationService {
    
    /**
     * CRAG专项评估
     * 重点关注：在检索质量差时，CRAG是否成功修正了？
     */
    public CRAGEvaluationReport evaluate(List<CRAGTestCase> testCases) {
        int total = testCases.size();
        int correctCount = 0;
        int ambiguousCount = 0;
        int incorrectCount = 0;
        
        // 按检索质量分组统计
        Map<RetrievalGrade, List<CRAGTestCase>> grouped = testCases.stream()
            .collect(Collectors.groupingBy(CRAGTestCase::getInitialGrade));
        
        // 分析每个分组的修正效果
        Map<RetrievalGrade, Double> answerQualityByGrade = new HashMap<>();
        
        for (Map.Entry<RetrievalGrade, List<CRAGTestCase>> entry : grouped.entrySet()) {
            RetrievalGrade grade = entry.getKey();
            List<CRAGTestCase> cases = entry.getValue();
            
            double avgQuality = cases.stream()
                .mapToDouble(CRAGTestCase::getAnswerQualityScore)
                .average()
                .orElse(0.0);
            
            answerQualityByGrade.put(grade, avgQuality);
        }
        
        // 关键指标：INCORRECT情况下的修正成功率
        List<CRAGTestCase> incorrectCases = grouped.getOrDefault(
            RetrievalGrade.INCORRECT, Collections.emptyList());
        
        long correctedSuccessfully = incorrectCases.stream()
            .filter(c -> c.getAnswerQualityScore() >= 0.6)  // 修正后答案质量及格
            .count();
        
        double correctionSuccessRate = incorrectCases.isEmpty() ? 0 :
            (double) correctedSuccessfully / incorrectCases.size();
        
        return CRAGEvaluationReport.builder()
            .totalCases(total)
            .answerQualityByGrade(answerQualityByGrade)
            .incorrectCorrectionSuccessRate(correctionSuccessRate)
            .overallAnswerQuality(testCases.stream()
                .mapToDouble(CRAGTestCase::getAnswerQualityScore)
                .average().orElse(0))
            .build();
    }
}

七、实际效果与适用场景分析

在我们的项目中，引入CRAG之后，最明显的改变是减少了"自信地回答错误"的情况。

原来的问题是：检索到不相关文档 → LLM基于不相关文档生成听起来合理的答案 → 用户不容易发现错误。

CRAG之后：检索到不相关文档 → 系统检测到质量低 → 触发改写重试或网络搜索 → 要么找到正确文档，要么明确告知"无法在知识库中找到答案"。

对于"INCORRECT"类问题，答案准确性从31%提升到67%。这个提升很明显，因为原来31%基本是LLM"蒙对了"，现在67%是真正找到了正确信息。

延迟影响：CORRECT路径增加了质量评估的延迟（约200-400ms），INCORRECT路径延迟大幅增加（重试 + 可能的网络搜索）。整体P50延迟从600ms增加到820ms，P95从1800ms增加到3200ms。

这个代价值不值？对于高精度要求的场景（金融、医疗、法律），绝对值得。对于对话类产品，可能需要针对高延迟路径做异步化处理。

八、踩坑记录

坑1：质量评估的成本

每次检索都用LLM评估质量，等于每次查询都多了1-2次LLM调用，成本和延迟翻倍。

解决方案：只对语义相似度低于阈值的结果才调LLM评估，相似度够高的直接判定CORRECT。

坑2：网络搜索的副作用

网络搜索触发太频繁，引入了大量不属于你知识体系的信息，反而导致答案"偏离公司口径"。

解决方案：网络搜索作为最后手段，而不是替代本地检索。先尝试多轮查询改写，只有真的找不到才触发网络搜索。

坑3：AMBIGUOUS处理逻辑太复杂

信息缺口检测 + 补充检索这套逻辑实现起来容易出各种边界情况。

解决方案：AMBIGUOUS暂时按CORRECT处理（只做精炼，不做补充检索），等系统稳定了再加补充检索逻辑。功能分步上线，比一次上完整逻辑风险低得多。

下一篇讲文档预处理——分块策略、清洗规则和元数据设计。这是RAG的基础设施，很多团队在这里欠下的技术债，后来都要用系统效果的下降来还。