第1664篇：自适应RAG——根据查询复杂度动态选择检索策略

老张2026/4/30大约 13 分钟

第1664篇：自适应RAG——根据查询复杂度动态选择检索策略

说一个我们团队真实发生的故事。

去年有个项目，知识库里有几万篇文档，我们搭了一个很重的RAG管道：查询改写 + 混合检索 + 重排序 + 生成，整个流程下来P95延迟将近2秒。

然后我们做了一次查询日志分析，发现有大概40%的查询是这种形式："什么是向量数据库"、"RAG是什么意思"、"Java中HashMap的原理是什么"——这些问题，LLM凭自己的参数知识就能答得很好，根本不需要检索。

把这部分流量直接走LLM不检索，延迟降到200ms以内，而且因为不需要检索，也没有"检索到不相关文档"的干扰，答案质量反而更稳定。

这就是自适应RAG的核心逻辑：不是所有问题都需要检索，也不是所有问题都需要同等重量级的检索。

一、查询复杂度的维度

在讲实现之前，先把"复杂度"这个词说清楚，它在RAG上下文里有几个不同的含义：

知识需求层面：这个问题需要外部知识吗？是最新知识还是通用知识？

推理需求层面：问题需要几步推理？是单跳检索还是多跳推理？

精确度需求层面：问题需要精确答案（某个数字、某个日期）还是概述性答案？

上下文依赖层面：问题是独立的，还是依赖对话历史？

自适应RAG的本质是：根据这些维度的评估结果，动态决定用什么检索策略。

二、查询分类器的实现

自适应RAG的第一步是准确的查询分类。这个分类器的质量直接决定整个系统的效果。

2.1 基于规则的粗分类

@Component
public class RuleBasedQueryClassifier {
    
    // 不需要检索的问题模式
    private static final List<String> NO_RETRIEVAL_PATTERNS = Arrays.asList(
        "什么是", "定义", "概念", "原理", "如何理解",
        "请解释", "介绍一下", "说说", "基础知识"
    );
    
    // 明确需要检索的问题模式
    private static final List<String> RETRIEVAL_REQUIRED_PATTERNS = Arrays.asList(
        "我们公司", "我们的产品", "最新", "最近", "昨天", "上周",
        "项目中", "在我们的", "请查一下", "帮我查"
    );
    
    // 需要多跳推理的模式
    private static final List<String> COMPLEX_REASONING_PATTERNS = Arrays.asList(
        "以及", "同时", "既...又", "不仅...而且", "因此",
        "所以", "导致了", "与...相比", "相比之下"
    );
    
    public QueryCategory quickClassify(String question) {
        // 优先检查是否明确需要检索
        for (String pattern : RETRIEVAL_REQUIRED_PATTERNS) {
            if (question.contains(pattern)) {
                return QueryCategory.RETRIEVAL_REQUIRED;
            }
        }
        
        // 检查是否是通用知识问题
        boolean isGeneralKnowledge = NO_RETRIEVAL_PATTERNS.stream()
            .anyMatch(question::contains);
        
        // 检查是否需要复杂推理
        boolean isComplex = COMPLEX_REASONING_PATTERNS.stream()
            .filter(question::contains).count() >= 2;
        
        // 检查是否包含特定实体（时间、地点、人名等）
        boolean hasSpecificEntity = hasSpecificEntity(question);
        
        if (isGeneralKnowledge && !hasSpecificEntity) {
            return QueryCategory.GENERAL_KNOWLEDGE;
        }
        if (isComplex) {
            return QueryCategory.COMPLEX_REASONING;
        }
        return QueryCategory.SIMPLE_FACTUAL;
    }
    
    private boolean hasSpecificEntity(String question) {
        // 简化检测：是否包含数字、专有名词格式
        return question.matches(".*\\d{4}年.*") ||
               question.matches(".*[A-Z][a-zA-Z]{2,}.*") ||
               question.matches(".*项目[A-Z\\d].*");
    }
}

2.2 LLM辅助的精细分类

规则分类处理不了边界案例，用LLM做二次判断：

@Component
public class LLMQueryClassifier {
    
    @Autowired
    private LLMClient llmClient;
    
    @Autowired
    private RuleBasedQueryClassifier ruleClassifier;
    
    private static final String CLASSIFICATION_PROMPT = """
        分析以下用户查询，判断回答这个问题需要什么类型的信息。
        
        查询：%s
        
        请从以下选项中选择最合适的类别，只输出类别代码：
        
        A - 通用知识（可以直接从LLM知识中回答，不需要检索外部文档）
        B - 简单事实（需要检索，但单次检索即可）
        C - 复杂推理（需要多次检索和推理）
        D - 实时信息（需要最新数据，知识库可能过期）
        E - 上下文相关（依赖对话历史，需要结合上下文理解）
        
        只输出字母（A/B/C/D/E）。
        """;
    
    /**
     * 混合分类：先规则分类，置信度不够时用LLM
     */
    public ClassificationResult classify(String question, ChatHistory history) {
        // 1. 规则分类（快速）
        QueryCategory ruleCategory = ruleClassifier.quickClassify(question);
        
        // 2. 判断规则分类置信度
        if (isHighConfidence(question, ruleCategory)) {
            return ClassificationResult.builder()
                .category(ruleCategory)
                .confidence(0.9)
                .source("rule")
                .build();
        }
        
        // 3. 规则分类不确定，用LLM做精细判断
        String prompt = String.format(CLASSIFICATION_PROMPT, question);
        String llmVerdict = llmClient.chat(prompt).trim().toUpperCase();
        
        QueryCategory llmCategory = mapLLMVerdict(llmVerdict);
        
        // 4. 上下文相关性检测
        if (history != null && isContextDependent(question, history)) {
            llmCategory = QueryCategory.CONTEXT_DEPENDENT;
        }
        
        return ClassificationResult.builder()
            .category(llmCategory)
            .confidence(0.8)
            .source("llm")
            .build();
    }
    
    private boolean isHighConfidence(String question, QueryCategory category) {
        // 某些模式的规则分类置信度很高
        if (category == QueryCategory.GENERAL_KNOWLEDGE) {
            return question.length() < 20 && !question.contains("我们");
        }
        if (category == QueryCategory.RETRIEVAL_REQUIRED) {
            return question.contains("我们公司") || question.contains("项目中");
        }
        return false;
    }
    
    private boolean isContextDependent(String question, ChatHistory history) {
        // 检查问题是否依赖上下文（包含代词、省略主语等）
        List<String> contextIndicators = Arrays.asList(
            "它", "他", "她", "这个", "那个", "上面说的", "刚才", "之前"
        );
        return contextIndicators.stream().anyMatch(question::contains);
    }
    
    private QueryCategory mapLLMVerdict(String verdict) {
        switch (verdict) {
            case "A": return QueryCategory.GENERAL_KNOWLEDGE;
            case "B": return QueryCategory.SIMPLE_FACTUAL;
            case "C": return QueryCategory.COMPLEX_REASONING;
            case "D": return QueryCategory.REALTIME_INFO;
            case "E": return QueryCategory.CONTEXT_DEPENDENT;
            default: return QueryCategory.SIMPLE_FACTUAL;
        }
    }
}

三、策略路由器

分类完成后，路由到对应的执行策略：

@Service
public class AdaptiveRAGRouter {
    
    @Autowired
    private LLMQueryClassifier classifier;
    
    @Autowired
    private DirectLLMStrategy directLLMStrategy;
    
    @Autowired
    private SimpleRAGStrategy simpleRAGStrategy;
    
    @Autowired
    private ComplexRAGStrategy complexRAGStrategy;
    
    @Autowired
    private WebSearchStrategy webSearchStrategy;
    
    @Autowired
    private MetricsService metrics;
    
    /**
     * 自适应RAG主入口
     */
    public AdaptiveRAGResult query(String question, ChatHistory history) {
        long startTime = System.currentTimeMillis();
        
        // 1. 查询分类
        ClassificationResult classification = classifier.classify(question, history);
        
        log.info("查询分类结果: category={}, confidence={}, source={}",
            classification.getCategory(), 
            classification.getConfidence(),
            classification.getSource());
        
        // 2. 执行对应策略
        RAGResult result;
        try {
            result = executeStrategy(question, history, classification);
        } catch (Exception e) {
            log.error("策略执行失败，降级到简单RAG", e);
            result = simpleRAGStrategy.execute(question, history);
        }
        
        long elapsed = System.currentTimeMillis() - startTime;
        
        // 3. 记录指标
        metrics.recordQueryLatency(classification.getCategory().name(), elapsed);
        metrics.recordStrategyUsage(classification.getCategory().name());
        
        return AdaptiveRAGResult.builder()
            .answer(result.getAnswer())
            .strategy(classification.getCategory())
            .latencyMs(elapsed)
            .retrievedDocs(result.getDocuments())
            .build();
    }
    
    private RAGResult executeStrategy(String question, ChatHistory history, 
                                       ClassificationResult classification) {
        switch (classification.getCategory()) {
            case GENERAL_KNOWLEDGE:
                return directLLMStrategy.execute(question, history);
                
            case SIMPLE_FACTUAL:
                return simpleRAGStrategy.execute(question, history);
                
            case COMPLEX_REASONING:
                return complexRAGStrategy.execute(question, history);
                
            case REALTIME_INFO:
                // 先查知识库，再考虑是否需要网络搜索
                RAGResult localResult = simpleRAGStrategy.execute(question, history);
                if (localResult.getConfidence() < 0.6) {
                    return webSearchStrategy.execute(question, history);
                }
                return localResult;
                
            case CONTEXT_DEPENDENT:
                // 上下文相关：先重写问题，再检索
                String rewrittenQuestion = rewriteWithContext(question, history);
                return simpleRAGStrategy.execute(rewrittenQuestion, history);
                
            default:
                return simpleRAGStrategy.execute(question, history);
        }
    }
    
    /**
     * 利用对话历史重写上下文相关问题
     */
    private String rewriteWithContext(String question, ChatHistory history) {
        if (history == null || history.isEmpty()) return question;
        
        String rewritePrompt = String.format("""
            根据对话历史，将最后一个问题改写为独立的完整问题（不依赖上下文理解）。
            
            对话历史：
            %s
            
            最后一个问题：%s
            
            改写后的完整问题（只输出问题，不要其他内容）：
            """, 
            history.getRecentTurns(3).toString(),
            question);
        
        return llmClient.chat(rewritePrompt).trim();
    }
}

四、各策略的具体实现

4.1 直接LLM策略（无检索）

@Component
public class DirectLLMStrategy implements RAGStrategy {
    
    @Autowired
    private LLMClient llmClient;
    
    @Override
    public RAGResult execute(String question, ChatHistory history) {
        // 不检索，直接回答，加一个明确的范围声明
        String systemPrompt = """
            你是一个专业的技术助手。请基于你的知识直接回答问题。
            如果问题涉及最新信息或特定的内部数据，请说明你的知识可能不是最新的。
            """;
        
        String answer = llmClient.chat(systemPrompt, question);
        
        return RAGResult.builder()
            .answer(answer)
            .documents(Collections.emptyList())
            .confidence(0.85)  // 通用知识置信度较高
            .strategy("direct_llm")
            .build();
    }
}

4.2 简单RAG策略

@Component
public class SimpleRAGStrategy implements RAGStrategy {
    
    @Autowired
    private VectorSearchService vectorSearch;
    
    @Autowired
    private LLMClient llmClient;
    
    @Override
    public RAGResult execute(String question, ChatHistory history) {
        // 单次检索，取Top-K
        List<Document> docs = vectorSearch.search(question, 4);
        
        if (docs.isEmpty()) {
            // 没有检索到相关文档，回退到直接LLM
            String answer = llmClient.chat(
                "请基于你的知识回答以下问题，如果不确定请说明：", 
                question
            );
            return RAGResult.builder()
                .answer(answer)
                .documents(Collections.emptyList())
                .confidence(0.5)
                .strategy("simple_rag_fallback")
                .build();
        }
        
        // 过滤低相似度文档
        docs = docs.stream()
            .filter(d -> d.getScore() > 0.75)
            .collect(Collectors.toList());
        
        String context = buildContext(docs);
        String answer = generateAnswer(question, context);
        
        double confidence = docs.stream()
            .mapToDouble(Document::getScore)
            .average()
            .orElse(0.5);
        
        return RAGResult.builder()
            .answer(answer)
            .documents(docs)
            .confidence(confidence)
            .strategy("simple_rag")
            .build();
    }
    
    private String buildContext(List<Document> docs) {
        return docs.stream()
            .map(doc -> "[来源: " + doc.getMetadata().get("source") + "]\n" + doc.getContent())
            .collect(Collectors.joining("\n\n"));
    }
    
    private String generateAnswer(String question, String context) {
        String prompt = String.format("""
            请基于以下参考资料回答问题。如果参考资料不足以回答问题，请说明。
            
            参考资料：
            %s
            
            问题：%s
            """, context, question);
        
        return llmClient.chat(prompt);
    }
}

4.3 复杂RAG策略（多步骤）

@Component
public class ComplexRAGStrategy implements RAGStrategy {
    
    @Autowired
    private LLMClient llmClient;
    
    @Autowired
    private VectorSearchService vectorSearch;
    
    @Override
    public RAGResult execute(String question, ChatHistory history) {
        // 第一步：问题分解
        List<String> subQuestions = decomposeQuestion(question);
        
        log.info("复杂问题分解为{}个子问题：{}", subQuestions.size(), subQuestions);
        
        // 第二步：逐个子问题检索和回答
        List<SubQuestionResult> subResults = new ArrayList<>();
        for (String subQ : subQuestions) {
            List<Document> docs = vectorSearch.search(subQ, 3);
            String subAnswer = generateSubAnswer(subQ, docs);
            subResults.add(SubQuestionResult.of(subQ, subAnswer, docs));
        }
        
        // 第三步：综合子答案生成最终答案
        String finalAnswer = synthesizeAnswer(question, subResults);
        
        // 收集所有用到的文档
        List<Document> allDocs = subResults.stream()
            .flatMap(r -> r.getDocs().stream())
            .distinct()
            .collect(Collectors.toList());
        
        return RAGResult.builder()
            .answer(finalAnswer)
            .documents(allDocs)
            .confidence(0.75)
            .strategy("complex_rag")
            .subResults(subResults)
            .build();
    }
    
    /**
     * 问题分解：把复杂问题拆解为可以独立检索的子问题
     */
    private List<String> decomposeQuestion(String question) {
        String prompt = String.format("""
            请将以下复杂问题分解为2-4个可以独立检索回答的子问题。
            每个子问题应该是完整的、可以独立理解的。
            
            原始问题：%s
            
            请按以下格式输出，每行一个子问题：
            1. 子问题1
            2. 子问题2
            ...
            """, question);
        
        String response = llmClient.chat(prompt);
        
        return Arrays.stream(response.split("\n"))
            .filter(line -> line.matches("\\d+\\..*"))
            .map(line -> line.replaceFirst("\\d+\\.\\s*", "").trim())
            .filter(q -> !q.isBlank())
            .collect(Collectors.toList());
    }
    
    private String generateSubAnswer(String subQuestion, List<Document> docs) {
        String context = docs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));
        
        if (context.isBlank()) {
            return llmClient.chat("请简短回答（如果不确定请说不知道）：" + subQuestion);
        }
        
        return llmClient.chat(
            "基于以下内容简短回答：\n" + context,
            subQuestion
        );
    }
    
    /**
     * 综合多个子问题的答案，生成最终回答
     */
    private String synthesizeAnswer(String originalQuestion, 
                                     List<SubQuestionResult> subResults) {
        StringBuilder context = new StringBuilder();
        context.append("各子问题的分析结果：\n\n");
        
        for (SubQuestionResult sub : subResults) {
            context.append("问题：").append(sub.getQuestion()).append("\n");
            context.append("分析：").append(sub.getAnswer()).append("\n\n");
        }
        
        String prompt = String.format("""
            基于以下分析结果，综合回答原始问题。
            
            %s
            
            原始问题：%s
            
            请给出综合性的完整回答：
            """, context.toString(), originalQuestion);
        
        return llmClient.chat(prompt);
    }
}

五、自适应的进阶技巧：置信度反馈循环

一次检索置信度不够，自动触发二次检索：

@Service
public class ConfidenceBasedAdaptiveRetrieval {
    
    @Autowired
    private VectorSearchService vectorSearch;
    
    @Autowired
    private LLMClient llmClient;
    
    private static final double CONFIDENCE_THRESHOLD = 0.7;
    private static final int MAX_RETRY = 2;
    
    /**
     * 置信度驱动的自适应检索
     * 如果生成的答案置信度不够，自动改写查询重试
     */
    public RAGResult queryWithConfidenceCheck(String originalQuestion) {
        String currentQuery = originalQuestion;
        List<Document> allDocs = new ArrayList<>();
        
        for (int attempt = 0; attempt <= MAX_RETRY; attempt++) {
            // 检索
            List<Document> docs = vectorSearch.search(currentQuery, 4);
            allDocs.addAll(docs);
            
            // 生成答案并评估置信度
            String context = docs.stream()
                .map(Document::getContent)
                .collect(Collectors.joining("\n\n"));
            
            AnswerWithConfidence result = generateWithConfidence(originalQuestion, context);
            
            if (result.getConfidence() >= CONFIDENCE_THRESHOLD || attempt == MAX_RETRY) {
                return RAGResult.builder()
                    .answer(result.getAnswer())
                    .documents(allDocs)
                    .confidence(result.getConfidence())
                    .retrivalAttempts(attempt + 1)
                    .build();
            }
            
            // 置信度不够，改写查询
            log.info("第{}次检索置信度不足({})，改写查询重试", 
                attempt + 1, result.getConfidence());
            currentQuery = rewriteQuery(originalQuestion, docs, result.getMissingInfo());
        }
        
        // 不应该到这里，但为了安全
        throw new IllegalStateException("未能生成有效答案");
    }
    
    /**
     * 让LLM生成答案的同时评估置信度
     */
    private AnswerWithConfidence generateWithConfidence(String question, String context) {
        String prompt = String.format("""
            基于以下参考内容回答问题。同时评估你对答案的置信度（0-1分）以及缺少什么关键信息。
            
            参考内容：
            %s
            
            问题：%s
            
            请按以下JSON格式输出：
            {
              "answer": "你的回答",
              "confidence": 0.8,
              "missing_info": "缺少的关键信息（如果有的话）"
            }
            """, context, question);
        
        String response = llmClient.chat(prompt);
        
        try {
            // 解析JSON
            JsonNode node = objectMapper.readTree(response);
            return AnswerWithConfidence.builder()
                .answer(node.get("answer").asText())
                .confidence(node.get("confidence").asDouble())
                .missingInfo(node.has("missing_info") ? 
                    node.get("missing_info").asText() : "")
                .build();
        } catch (Exception e) {
            // 解析失败，返回中等置信度
            return AnswerWithConfidence.builder()
                .answer(response)
                .confidence(0.6)
                .missingInfo("")
                .build();
        }
    }
    
    /**
     * 根据缺失信息改写查询
     */
    private String rewriteQuery(String originalQuestion, List<Document> previousDocs, 
                                  String missingInfo) {
        String rewritePrompt = String.format("""
            原始问题：%s
            缺失的关键信息：%s
            
            请改写查询，以便检索到能补充缺失信息的文档。
            只输出改写后的查询，不要其他内容。
            """, originalQuestion, missingInfo);
        
        return llmClient.chat(rewritePrompt).trim();
    }
    
    @Autowired
    private ObjectMapper objectMapper;
}

六、自适应RAG的A/B实验

引入自适应路由之后，要验证它确实有效，而不是"感觉有效"。

@Service
public class AdaptiveRAGExperiment {
    
    @Autowired
    private AdaptiveRAGRouter adaptiveRouter;
    
    @Autowired
    private SimpleRAGStrategy baselineRouter;
    
    @Autowired
    private ExperimentMetrics experimentMetrics;
    
    /**
     * A/B实验：自适应RAG vs 全量RAG
     * 按用户ID做流量分割，保证同一用户体验一致
     */
    public RAGResult queryWithExperiment(String question, String userId, ChatHistory history) {
        boolean useAdaptive = isInAdaptiveGroup(userId);
        
        RAGResult result;
        String group;
        
        if (useAdaptive) {
            result = adaptiveRouter.query(question, history);
            group = "adaptive";
        } else {
            // baseline：所有问题都走完整RAG流程
            RAGResult baseResult = baselineRouter.execute(question, history);
            result = RAGResult.builder()
                .answer(baseResult.getAnswer())
                .documents(baseResult.getDocuments())
                .confidence(baseResult.getConfidence())
                .strategy("full_rag_baseline")
                .build();
            group = "baseline";
        }
        
        // 记录实验数据
        experimentMetrics.record(ExperimentEvent.builder()
            .experimentId("adaptive_rag_v1")
            .userId(userId)
            .group(group)
            .question(question)
            .latencyMs(result.getLatencyMs())
            .strategy(result.getStrategy() != null ? result.getStrategy().name() : "unknown")
            .build());
        
        return result;
    }
    
    /**
     * 按用户ID哈希分配实验组（10% 流量给自适应组）
     */
    private boolean isInAdaptiveGroup(String userId) {
        int hash = Math.abs(userId.hashCode() % 100);
        return hash < 10;  // 10%流量
    }
}

七、真实项目中的效果数据

在我们实际的企业知识库项目里，部署自适应RAG后的效果：

指标	全量RAG（基线）	自适应RAG	变化
P50延迟	680ms	320ms	-53%
P95延迟	1850ms	890ms	-52%
LLM Token消耗	基准	-38%	节省38%
用户评分（1-5分）	3.7	4.1	+11%
无关检索导致的错误答案	12%	5%	-58%

最后那个指标很有意思：无关检索导致的错误答案减少了58%。这是因为对于通用知识问题，不检索反而比检索质量更好——检索到的文档可能是公司内部文档，对于"什么是Transformer"这种通用问题，公司内部文档里可能根本没有这些内容，或者有不太准确的描述，给LLM看了反而带偏了答案。

八、几个需要注意的边界情况

边界1：分类错误的代价不对称

把"需要检索的"误判成"通用知识"，结果是答案过时或错误，用户伤害大。把"通用知识"误判成"需要检索的"，只是多花了一些时间和Token，用户伤害小。

所以分类时应该保守一点，宁可多检索，不要漏检索。我们的策略是：模型不确定时，默认走检索。

边界2：查询分类本身的延迟

用LLM做查询分类需要一次额外的LLM调用，增加延迟。如果分类调用的延迟抵消了"不检索"带来的延迟节省，那就得不偿失了。

实践中我们用轻量级模型（如小参数量的GPT-3.5-turbo或本地部署的小模型）做分类，保证分类调用 < 100ms。

边界3：分类结果的缓存

相同的问题（比如"什么是RAG"），每次都分类一遍是浪费。对问题字符串做哈希，缓存分类结果，命中率通常在30%-50%。

自适应RAG是一个很值得投入的工程优化方向，它不是让系统变得更复杂，而是让系统更聪明——知道什么时候该花力气，什么时候可以轻装上阵。

下一篇讲多跳推理RAG，比今天的"复杂问题分解"更深入，专门处理需要跨文档推理链的场景。