AI 的 Few-Shot 设计深度——例子怎么选才真正有效

老张2026/4/30大约 11 分钟

AI 的 Few-Shot 设计深度——例子怎么选才真正有效

我刚开始做 AI 应用的时候，有段时间特别迷信 Few-Shot：觉得只要往 Prompt 里塞几个例子，模型就会表现得更好。于是一个 Prompt 里堆了十几个例子，有时候反而效果更差，而且每次调用的 Token 费用也上去了。

后来才慢慢想明白：随便放几个例子不叫 Few-Shot，那叫给模型添乱。真正有效的 Few-Shot 设计，背后有一套选例子的逻辑。

这篇文章聚焦工程实现层面——不讲理论，讲怎么选例子、怎么动态检索、怎么落地到生产系统里。

静态 Few-Shot 的问题

先说清楚为什么静态 Few-Shot（直接在 Prompt 里写死几个例子）有问题。

问题一：例子和当前输入不相关

如果我做的是餐厅评论分类，我在 Prompt 里放了几个关于电影评论的例子，模型会参考这些例子的格式和推理方式，但例子本身的内容对理解"这条餐厅评论是正面还是负面"没有太大帮助。

问题二：例子覆盖的分布太窄

假设你做的是合同条款分类，一共有 8 个类别，但你只放了 3 个例子，每个例子分别对应 3 个类别，那对于另外 5 个类别的输入，模型缺乏直接的参考。

问题三：例子数量和 Context Window 的矛盾

例子越多，覆盖越全，但 Token 数也越多。对于需要放大量内容的任务（比如每个例子本身就很长），静态放 10 个例子可能直接把 System Prompt 搞到 8000 Token，留给用户输入的空间就很有限了。

解法是：动态 Few-Shot——根据当前输入，从例子库中实时检索最相关的例子，只把相关的例子放进 Prompt。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

例子库的数据模型

@Entity
@Table(name = "few_shot_examples")
@Data
public class FewShotExample {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    // 任务类型，用于区分不同应用场景的例子
    @Column(nullable = false)
    private String taskType;
    
    // 输入文本
    @Column(nullable = false, length = 4096)
    private String input;
    
    // 期望输出（含推理过程）
    @Column(nullable = false, length = 8192)
    private String output;
    
    // 标签/分类
    @Column
    private String label;
    
    // 是否是边界案例
    @Column
    private Boolean isBoundaryCase = false;
    
    // 标注质量评分（0-1）
    @Column
    private Double qualityScore = 1.0;
    
    // 向量 ID（在向量数据库中的引用）
    @Column
    private String vectorId;
    
    // 例子的描述性元数据
    @Column(length = 1024)
    private String metadata;
    
    @CreationTimestamp
    private LocalDateTime createdAt;
}

向量化和存储

@Service
@Slf4j
public class FewShotExampleManager {
    
    private final FewShotExampleRepository exampleRepository;
    private final EmbeddingClient embeddingClient;
    private final VectorStore vectorStore;
    
    // 向量化并存储例子
    public FewShotExample addExample(FewShotExample example) {
        // 1. 向量化输入文本
        List<Double> embedding = embeddingClient
            .embed(example.getInput());
        
        // 2. 存入向量数据库
        Document doc = new Document(
            example.getInput(),
            Map.of(
                "output", example.getOutput(),
                "label", example.getLabel() != null ? example.getLabel() : "",
                "taskType", example.getTaskType(),
                "isBoundaryCase", String.valueOf(example.getIsBoundaryCase()),
                "qualityScore", String.valueOf(example.getQualityScore()),
                "exampleId", example.getId() != null ? example.getId().toString() : "pending"
            )
        );
        
        vectorStore.add(List.of(doc));
        example.setVectorId(doc.getId());
        
        // 3. 存入关系数据库（便于管理和查询）
        return exampleRepository.save(example);
    }
    
    /**
     * 批量导入例子（从 CSV 或 JSON）
     */
    public void bulkImport(List<FewShotExample> examples) {
        // 批量向量化（减少 API 调用次数）
        List<String> inputs = examples.stream()
            .map(FewShotExample::getInput)
            .collect(Collectors.toList());
        
        List<List<Double>> embeddings = embeddingClient.embed(inputs);
        
        List<Document> docs = new ArrayList<>();
        for (int i = 0; i < examples.size(); i++) {
            FewShotExample example = examples.get(i);
            Document doc = new Document(
                example.getInput(),
                Map.of(
                    "output", example.getOutput(),
                    "label", example.getLabel() != null ? example.getLabel() : "",
                    "taskType", example.getTaskType(),
                    "isBoundaryCase", String.valueOf(example.getIsBoundaryCase()),
                    "qualityScore", String.valueOf(example.getQualityScore())
                )
            );
            docs.add(doc);
        }
        
        vectorStore.add(docs);
        log.info("批量导入 {} 个例子到向量数据库", examples.size());
    }
}

动态检索核心——带多样性的 MMR 算法

简单的 Top-K 相似度检索会返回一堆高度相似的例子，多样性不足。我用 MMR（Maximal Marginal Relevance）算法来平衡相似度和多样性：

@Service
public class DynamicFewShotRetriever {
    
    private final VectorStore vectorStore;
    private final EmbeddingClient embeddingClient;
    private final FewShotExampleRepository exampleRepository;
    
    // MMR 参数：lambda=1 时纯相似度，lambda=0 时纯多样性
    private static final double LAMBDA = 0.7;
    
    /**
     * 动态检索最相关且多样的 Few-Shot 例子
     * @param query 当前用户输入
     * @param taskType 任务类型
     * @param targetCount 目标例子数量
     */
    public List<SelectedExample> retrieveExamples(String query,
                                                   String taskType,
                                                   int targetCount) {
        
        // 第一步：向量化查询
        List<Double> queryEmbedding = embeddingClient.embed(query);
        
        // 第二步：宽泛检索 Top-K 候选（取目标数量的 3-5 倍）
        int candidateCount = targetCount * 4;
        SearchRequest searchRequest = SearchRequest.query(query)
            .withTopK(candidateCount)
            .withFilterExpression("taskType == '" + taskType + "'");
        
        List<Document> candidates = vectorStore.similaritySearch(searchRequest);
        
        if (candidates.isEmpty()) {
            log.warn("未找到任何候选例子，任务类型：{}", taskType);
            return Collections.emptyList();
        }
        
        // 第三步：用 MMR 算法选择多样的子集
        List<Document> selectedDocs = mmrSelect(queryEmbedding, candidates, targetCount);
        
        // 第四步：补充边界案例（如果不足目标数量）
        List<SelectedExample> result = convertToSelectedExamples(selectedDocs, query);
        
        if (result.size() < targetCount) {
            List<SelectedExample> boundaryExamples = getBoundaryExamples(taskType, 
                targetCount - result.size(), result);
            result.addAll(boundaryExamples);
        }
        
        log.debug("动态 Few-Shot 检索完成，任务：{}，检索到 {} 个例子", taskType, result.size());
        return result;
    }
    
    /**
     * MMR（Maximal Marginal Relevance）算法
     * 平衡相似度和多样性
     */
    private List<Document> mmrSelect(List<Double> queryEmbedding,
                                      List<Document> candidates,
                                      int targetCount) {
        
        if (candidates.size() <= targetCount) {
            return candidates;
        }
        
        List<Document> selected = new ArrayList<>();
        List<Document> remaining = new ArrayList<>(candidates);
        
        // 预计算所有候选的 embedding（从 Document metadata 或重新计算）
        Map<String, List<Double>> embeddingCache = new HashMap<>();
        for (Document doc : candidates) {
            // 这里简化处理，实际可以从缓存或重新计算
            List<Double> docEmbedding = embeddingClient.embed(doc.getContent());
            embeddingCache.put(doc.getId(), docEmbedding);
        }
        
        // 迭代选择
        while (selected.size() < targetCount && !remaining.isEmpty()) {
            Document bestDoc = null;
            double bestScore = Double.NEGATIVE_INFINITY;
            
            for (Document candidate : remaining) {
                List<Double> candidateEmbedding = embeddingCache.get(candidate.getId());
                
                // 计算与 query 的相似度
                double relevanceScore = cosineSimilarity(queryEmbedding, candidateEmbedding);
                
                // 计算与已选例子的最大相似度
                double maxSelectedSimilarity = 0;
                for (Document sel : selected) {
                    List<Double> selEmbedding = embeddingCache.get(sel.getId());
                    double sim = cosineSimilarity(selEmbedding, candidateEmbedding);
                    maxSelectedSimilarity = Math.max(maxSelectedSimilarity, sim);
                }
                
                // MMR 分数
                double mmrScore = LAMBDA * relevanceScore - 
                                  (1 - LAMBDA) * maxSelectedSimilarity;
                
                if (mmrScore > bestScore) {
                    bestScore = mmrScore;
                    bestDoc = candidate;
                }
            }
            
            if (bestDoc != null) {
                selected.add(bestDoc);
                remaining.remove(bestDoc);
            }
        }
        
        return selected;
    }
    
    private double cosineSimilarity(List<Double> a, List<Double> b) {
        if (a.size() != b.size()) return 0;
        
        double dotProduct = 0;
        double normA = 0;
        double normB = 0;
        
        for (int i = 0; i < a.size(); i++) {
            dotProduct += a.get(i) * b.get(i);
            normA += a.get(i) * a.get(i);
            normB += b.get(i) * b.get(i);
        }
        
        if (normA == 0 || normB == 0) return 0;
        return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
    }
    
    private List<SelectedExample> getBoundaryExamples(String taskType,
                                                        int count,
                                                        List<SelectedExample> alreadySelected) {
        Set<String> selectedIds = alreadySelected.stream()
            .map(SelectedExample::getExampleId)
            .collect(Collectors.toSet());
        
        // 从关系数据库查边界案例
        return exampleRepository.findByTaskTypeAndIsBoundaryCase(taskType, true)
            .stream()
            .filter(e -> !selectedIds.contains(e.getId().toString()))
            .limit(count)
            .map(e -> SelectedExample.builder()
                .exampleId(e.getId().toString())
                .input(e.getInput())
                .output(e.getOutput())
                .label(e.getLabel())
                .isBoundaryCase(true)
                .selectionReason("边界案例补充")
                .build())
            .collect(Collectors.toList());
    }
    
    private List<SelectedExample> convertToSelectedExamples(List<Document> docs, String query) {
        return docs.stream()
            .map(doc -> SelectedExample.builder()
                .exampleId(doc.getId())
                .input(doc.getContent())
                .output((String) doc.getMetadata().get("output"))
                .label((String) doc.getMetadata().get("label"))
                .isBoundaryCase(Boolean.parseBoolean(
                    (String) doc.getMetadata().getOrDefault("isBoundaryCase", "false")))
                .selectionReason("相似度检索")
                .build())
            .collect(Collectors.toList());
    }
}

动态 Few-Shot Prompt 构建

@Service
public class DynamicFewShotPromptBuilder {
    
    private final DynamicFewShotRetriever retriever;
    
    /**
     * 构建包含动态 Few-Shot 例子的 Prompt
     */
    public String buildPrompt(String taskDescription,
                               String userInput,
                               String taskType,
                               int exampleCount) {
        
        // 检索相关例子
        List<SelectedExample> examples = retriever.retrieveExamples(
            userInput, taskType, exampleCount);
        
        StringBuilder sb = new StringBuilder();
        sb.append(taskDescription).append("\n\n");
        
        if (!examples.isEmpty()) {
            sb.append("以下是一些参考示例：\n\n");
            
            for (int i = 0; i < examples.size(); i++) {
                SelectedExample example = examples.get(i);
                sb.append("示例 ").append(i + 1).append("：\n");
                
                if (example.isBoundaryCase()) {
                    sb.append("[注意：这是一个需要特别判断的边界案例]\n");
                }
                
                sb.append("输入：").append(example.getInput()).append("\n");
                sb.append("分析与输出：\n").append(example.getOutput()).append("\n\n");
            }
            
            sb.append("---\n\n");
        }
        
        sb.append("现在请处理以下输入（参考上述示例的分析方式）：\n");
        sb.append("输入：").append(userInput);
        
        return sb.toString();
    }
    
    /**
     * 估算 Token 数量，如果超出限制则减少例子数
     */
    public String buildPromptWithTokenBudget(String taskDescription,
                                              String userInput,
                                              String taskType,
                                              int maxTokens) {
        int exampleCount = 5;
        String prompt = buildPrompt(taskDescription, userInput, taskType, exampleCount);
        
        // 粗略估算：1 Token ≈ 1.5 汉字，1 Token ≈ 4 英文字符
        int estimatedTokens = prompt.length() / 2; // 粗估
        
        while (estimatedTokens > maxTokens && exampleCount > 1) {
            exampleCount--;
            prompt = buildPrompt(taskDescription, userInput, taskType, exampleCount);
            estimatedTokens = prompt.length() / 2;
        }
        
        return prompt;
    }
}

完整的任务执行 Service

@Service
@Slf4j
public class FewShotTaskService {
    
    private final ChatClient chatClient;
    private final DynamicFewShotPromptBuilder promptBuilder;
    
    // 餐厅评论情感分析
    private static final String SENTIMENT_TASK_DESCRIPTION = """
            你是一个餐厅评论情感分析专家。
            对给定的评论进行分析，输出格式如下：
            - 情感倾向：正面/负面/中性
            - 关键情感词：[列出关键词]
            - 分析说明：[简短说明]
            """;
    
    public SentimentResult analyzeSentiment(String review) {
        String prompt = promptBuilder.buildPromptWithTokenBudget(
            SENTIMENT_TASK_DESCRIPTION,
            review,
            "restaurant_sentiment",
            3000  // 最大 3000 Token
        );
        
        log.debug("动态 Few-Shot Prompt 构建完成，长度：{} 字符", prompt.length());
        
        String response = chatClient.prompt()
            .user(prompt)
            .call()
            .content();
        
        return parseSentimentResult(response, review);
    }
    
    private SentimentResult parseSentimentResult(String response, String originalReview) {
        SentimentResult result = new SentimentResult();
        result.setOriginalReview(originalReview);
        result.setRawResponse(response);
        
        // 解析情感倾向
        if (response.contains("正面")) {
            result.setSentiment("正面");
        } else if (response.contains("负面")) {
            result.setSentiment("负面");
        } else {
            result.setSentiment("中性");
        }
        
        return result;
    }
}

从生产数据中持续补充例子库

静态的例子库是不够的。生产系统跑了一段时间之后，会出现之前没覆盖到的模式。我做了一个半自动的例子补充流程：

@Service
@Slf4j
public class ExampleHarvestingService {
    
    private final FewShotTaskService taskService;
    private final FewShotExampleManager exampleManager;
    private final HumanReviewQueue reviewQueue;
    
    /**
     * 从生产日志中识别值得加入例子库的案例
     * 标准：模型置信度低，或者用户纠正过的
     */
    @Scheduled(fixedDelay = 3600000) // 每小时执行
    public void harvestFromProductionLogs() {
        // 1. 获取低置信度的预测（置信度 < 0.7）
        List<ProductionLog> lowConfidenceLogs = 
            productionLogRepository.findLowConfidenceLogs(0.7, LocalDateTime.now().minusHours(24));
        
        // 2. 获取用户纠正过的案例
        List<ProductionLog> correctedLogs = 
            productionLogRepository.findCorrectedLogs(LocalDateTime.now().minusHours(24));
        
        List<ProductionLog> candidates = new ArrayList<>();
        candidates.addAll(lowConfidenceLogs);
        candidates.addAll(correctedLogs);
        
        // 3. 去重和过滤
        candidates = candidates.stream()
            .distinct()
            .filter(log -> !isDuplicate(log))
            .collect(Collectors.toList());
        
        // 4. 推入人工审核队列
        for (ProductionLog log : candidates) {
            reviewQueue.enqueue(ReviewTask.builder()
                .input(log.getInput())
                .modelOutput(log.getModelOutput())
                .userCorrection(log.getUserCorrection())
                .reason(log.getConfidenceScore() < 0.7 ? "低置信度" : "用户纠正")
                .build());
        }
        
        log.info("发现 {} 个候选例子，已推入人工审核队列", candidates.size());
    }
    
    /**
     * 人工审核通过后，自动添加到例子库
     */
    public void approveAndAddToExampleLibrary(ReviewTask task, String verifiedOutput) {
        FewShotExample example = new FewShotExample();
        example.setTaskType(task.getTaskType());
        example.setInput(task.getInput());
        example.setOutput(verifiedOutput);
        example.setQualityScore(1.0); // 人工审核的质量最高
        example.setIsBoundaryCase("低置信度".equals(task.getReason())); // 低置信度的视为边界案例
        
        exampleManager.addExample(example);
        log.info("新例子已加入库，任务类型：{}，是否边界案例：{}", 
                 task.getTaskType(), example.getIsBoundaryCase());
    }
    
    private boolean isDuplicate(ProductionLog log) {
        // 检查例子库中是否已有语义相近的例子
        List<SelectedExample> similar = dynamicFewShotRetriever.retrieveExamples(
            log.getInput(), log.getTaskType(), 1);
        
        if (similar.isEmpty()) return false;
        
        // 如果最相似的例子相似度 > 0.95，认为是重复
        // 这里用简单的字符相似度作为快速过滤
        return similar.get(0).getInput().equals(log.getInput());
    }
}

和静态 Few-Shot 的效果对比

我在餐厅评论分类任务上做过一个 A/B 测试，结果如下：

无 Few-Shot（Zero-Shot）：准确率 74%，平均每次输入 Prompt 350 Token
静态 5 个例子：准确率 81%，平均每次输入 Prompt 1200 Token
动态 Few-Shot（3-5 个例子，MMR 检索）：准确率 89%，平均每次输入 Prompt 900 Token

动态 Few-Shot 的准确率比静态还高，而且 Token 消耗更少——因为静态例子里有些和当前输入根本不相关，是在浪费 Token。

边界案例的识别准确率差异更明显：

静态 Few-Shot：边界案例准确率 61%
动态 Few-Shot（含边界案例库）：边界案例准确率 83%

几个工程细节

例子库要分任务类型隔离。不同任务的例子不能混在一起，否则检索时会出现跨任务污染。

例子质量比数量更重要。100 个高质量、覆盖充分的例子，比 1000 个随意收集的例子更有价值。宁缺毋滥。

定期评估例子库的覆盖率。每季度对生产环境的输入分布做聚类分析，看看有没有某类输入在例子库里没有对应覆盖的，及时补充。

别忘了例子的时效性。有些业务场景（比如时事相关的情感分析），旧例子可能不适用于新的语言模式。要有例子的过期机制。

AI 的 Few-Shot 设计深度——例子怎么选才真正有效

AI 的 Few-Shot 设计深度——例子怎么选才真正有效

静态 Few-Shot 的问题

例子选择的四个标准

标准一：语义相似性

标准二：多样性

标准三：边界覆盖

标准四：标签质量

动态 Few-Shot 的完整工程实现

架构设计

Maven 依赖

例子库的数据模型

向量化和存储

动态检索核心——带多样性的 MMR 算法

动态 Few-Shot Prompt 构建

完整的任务执行 Service

从生产数据中持续补充例子库

和静态 Few-Shot 的效果对比

几个工程细节