微调还是RAG：什么时候该Fine-tune，什么时候用RAG，工程选型标准

老张大约 10 分钟

微调还是RAG：什么时候该Fine-tune，什么时候用RAG，工程选型标准

适读人群：Java后端工程师、AI架构师、技术决策者 | 阅读时长：约20分钟 | 依赖：Spring AI 1.0、OpenAI Fine-tuning API

开篇故事

这是我被问过最多次的选型问题："老张，我这个需求到底是微调好还是RAG好？"

每次被问到这个问题，我都会先反问几个问题：你的数据多少？数据会频繁更新吗？你的需求是让模型学会新知识，还是让模型具备新能力？预算有多少？

这几个问题的答案，基本上就能决定技术方向。

我自己做过的项目里，有一个案例让我对这个选型有了更深的理解：某电商平台想让AI更好地处理商品评论，自动识别评论中的情感（好评/中评/差评）和问题类型（物流/质量/客服/价格）。

第一版我用的是RAG——把历史标注样本放进知识库，检索相似评论来辅助分类。效果凑合，准确率约78%。

第二版我做了微调——用5000条标注好的评论数据微调gpt-3.5-turbo，准确率直接跳到了94%。推理时不需要RAG，一次调用就能完成分类，延迟从平均850ms降到了180ms。

这个案例让我明白：对于固定任务、格式明确、有足够标注数据的场景，微调远比RAG强。但不是所有场景都能微调，也不是所有场景都适合RAG。今天把这个选型决策框架系统地整理出来。

一、核心问题分析

Fine-tune和RAG解决的是不同问题：

Fine-tune解决的问题：

让模型学会特定的输出风格和格式
让模型掌握特定领域的专业能力（分类、提取、生成特定格式）
减少对复杂Prompt的依赖，提升推理速度

RAG解决的问题：

给模型注入最新的、动态的知识（模型训练截止日期之后的信息）
减少幻觉（模型基于检索到的真实文档回答，而不是依赖可能过时的训练知识）
提供可溯源的答案（知道回答来自哪份文档）

两者本质的差异：

Fine-tune改变的是模型权重，是永久性的知识/能力注入；RAG每次推理时动态注入上下文，是临时性的知识扩充。

二、原理深度解析

2.1 选型决策框架

2.2 成本对比分析

RAG成本结构：
- 建设成本：文档处理 + 向量化 + 向量库部署
- 运行成本：每次查询 = Embedding调用 + LLM调用（含较长上下文）
- 维护成本：知识库更新、索引维护

Fine-tune成本结构：
- 建设成本：数据标注 + 微调训练（较高，GPT-3.5微调约$0.008/1K tokens）
- 运行成本：每次推理 = 单次短Prompt LLM调用（较低，且无需检索步骤）
- 维护成本：数据更新时需要重新训练

规律：
- 调用频率高（>10万次/月）：Fine-tune的运行成本优势显著
- 知识更新频率高：RAG的维护成本优势显著
- 两者都高：RAG + Fine-tune（Fine-tune负责能力，RAG负责知识）

三、完整代码实现

3.1 OpenAI Fine-tuning Java实现

@Service
public class FineTuningService {

    private static final Logger log = LoggerFactory.getLogger(FineTuningService.class);

    private final RestTemplate restTemplate;
    private final ObjectMapper objectMapper;

    @Value("${spring.ai.openai.api-key}")
    private String apiKey;

    public FineTuningService(RestTemplate restTemplate, ObjectMapper objectMapper) {
        this.restTemplate = restTemplate;
        this.objectMapper = objectMapper;
    }

    /**
     * 准备训练数据文件（JSONL格式）
     * OpenAI微调的数据格式：每行一个对话示例
     */
    public File prepareTrainingData(List<TrainingExample> examples)
            throws IOException {
        File tmpFile = File.createTempFile("finetune_data_", ".jsonl");

        try (BufferedWriter writer = new BufferedWriter(
                new FileWriter(tmpFile))) {
            for (TrainingExample example : examples) {
                // 构建OpenAI微调格式
                Map<String, Object> entry = new HashMap<>();
                entry.put("messages", List.of(
                        Map.of("role", "system",
                               "content", example.getSystemPrompt()),
                        Map.of("role", "user",
                               "content", example.getUserInput()),
                        Map.of("role", "assistant",
                               "content", example.getExpectedOutput())
                ));
                writer.write(objectMapper.writeValueAsString(entry));
                writer.newLine();
            }
        }

        log.info("训练数据准备完成：{}条样本，文件：{}",
                examples.size(), tmpFile.getAbsolutePath());
        return tmpFile;
    }

    /**
     * 上传训练文件到OpenAI
     */
    public String uploadTrainingFile(File trainingFile) throws IOException {
        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
        body.add("purpose", "fine-tune");
        body.add("file", new FileSystemResource(trainingFile));

        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + apiKey);
        headers.setContentType(MediaType.MULTIPART_FORM_DATA);

        ResponseEntity<Map> response = restTemplate.exchange(
                "https://api.openai.com/v1/files",
                HttpMethod.POST,
                new HttpEntity<>(body, headers),
                Map.class);

        String fileId = (String) response.getBody().get("id");
        log.info("训练文件上传成功，fileId: {}", fileId);
        return fileId;
    }

    /**
     * 创建微调任务
     */
    public String createFineTuningJob(String trainingFileId,
                                       String model,
                                       FineTuningConfig config) {
        Map<String, Object> request = new HashMap<>();
        request.put("training_file", trainingFileId);
        request.put("model", model); // "gpt-3.5-turbo" 或 "gpt-4o-mini"
        request.put("hyperparameters", Map.of(
                "n_epochs", config.getEpochs(),           // 训练轮次
                "batch_size", config.getBatchSize(),        // 批次大小
                "learning_rate_multiplier", config.getLearningRateMultiplier()
        ));
        if (config.getSuffix() != null) {
            request.put("suffix", config.getSuffix()); // 模型名称后缀
        }

        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + apiKey);
        headers.setContentType(MediaType.APPLICATION_JSON);

        ResponseEntity<Map> response = restTemplate.exchange(
                "https://api.openai.com/v1/fine_tuning/jobs",
                HttpMethod.POST,
                new HttpEntity<>(request, headers),
                Map.class);

        String jobId = (String) response.getBody().get("id");
        log.info("微调任务已创建，jobId: {}", jobId);
        return jobId;
    }

    /**
     * 查询微调任务状态
     */
    public FineTuningStatus getJobStatus(String jobId) {
        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + apiKey);

        ResponseEntity<Map> response = restTemplate.exchange(
                "https://api.openai.com/v1/fine_tuning/jobs/" + jobId,
                HttpMethod.GET,
                new HttpEntity<>(headers),
                Map.class);

        Map body = response.getBody();
        return new FineTuningStatus(
                (String) body.get("status"),
                (String) body.get("fine_tuned_model"),
                (String) body.get("error")
        );
    }

    @Data
    public static class FineTuningConfig {
        private int epochs = 3;
        private int batchSize = 4;
        private double learningRateMultiplier = 2.0;
        private String suffix; // 自定义模型名称后缀
    }

    @Data
    @AllArgsConstructor
    public static class FineTuningStatus {
        private String status;
        private String fineTunedModelId; // 训练完成后的模型ID
        private String error;
    }
}

3.2 微调效果评估框架

@Service
public class FineTuneEvaluationService {

    private final ChatClient baseModelClient;  // 基础模型
    private final ChatClient fineTunedClient;  // 微调模型

    public FineTuneEvaluationService(ChatClient.Builder builder,
                                      @Value("${finetune.model-id}") String fineTunedModelId) {
        this.baseModelClient = builder.build();
        this.fineTunedClient = builder
                .defaultOptions(ChatOptions.builder()
                        .model(fineTunedModelId)
                        .build())
                .build();
    }

    /**
     * 对比基础模型和微调模型在测试集上的表现
     */
    public ComparisonReport compare(List<EvalTestCase> testCases) {
        int baseCorrect = 0, finetuneCorrect = 0;
        long baseLatencySum = 0, finetuneLatencySum = 0;

        List<CaseResult> results = new ArrayList<>();

        for (EvalTestCase tc : testCases) {
            // 基础模型评估
            long start = System.currentTimeMillis();
            String baseOutput = baseModelClient.prompt(tc.getPrompt()).call().content();
            long baseLatency = System.currentTimeMillis() - start;
            boolean baseCorrectFlag = tc.evaluate(baseOutput);
            if (baseCorrectFlag) baseCorrect++;
            baseLatencySum += baseLatency;

            // 微调模型评估
            start = System.currentTimeMillis();
            String finetuneOutput = fineTunedClient.prompt(tc.getPrompt()).call().content();
            long finetuneLatency = System.currentTimeMillis() - start;
            boolean finetuneCorrectFlag = tc.evaluate(finetuneOutput);
            if (finetuneCorrectFlag) finetuneCorrect++;
            finetuneLatencySum += finetuneLatency;

            results.add(new CaseResult(tc.getId(), tc.getInput(),
                    baseOutput, baseCorrectFlag, baseLatency,
                    finetuneOutput, finetuneCorrectFlag, finetuneLatency));
        }

        double baseAccuracy = (double) baseCorrect / testCases.size();
        double finetuneAccuracy = (double) finetuneCorrect / testCases.size();
        double baseAvgLatency = (double) baseLatencySum / testCases.size();
        double finetuneAvgLatency = (double) finetuneLatencySum / testCases.size();

        log.info("评估结果 - 基础模型: 准确率{:.1f}%，平均延迟{:.0f}ms",
                baseAccuracy * 100, baseAvgLatency);
        log.info("评估结果 - 微调模型: 准确率{:.1f}%，平均延迟{:.0f}ms",
                finetuneAccuracy * 100, finetuneAvgLatency);
        log.info("提升: 准确率+{:.1f}%，延迟-{:.0f}ms",
                (finetuneAccuracy - baseAccuracy) * 100,
                baseAvgLatency - finetuneAvgLatency);

        return new ComparisonReport(baseAccuracy, finetuneAccuracy,
                baseAvgLatency, finetuneAvgLatency, results);
    }
}

3.3 RAG + Fine-tune组合方案

@Service
public class HybridRagFineTuneService {

    /**
     * 组合方案：Fine-tune负责格式化输出能力，RAG负责知识注入
     *
     * 场景：产品问答助手
     * - Fine-tune训练了模型按标准格式输出（JSON结构化答案 + 置信度）
     * - RAG注入最新的产品文档知识
     */

    private final VectorStore productKnowledgeStore;
    private final ChatClient fineTunedClient; // 已微调的格式化能力

    private static final String HYBRID_PROMPT = """
            请根据以下产品知识库内容，回答用户问题。
            严格按照JSON格式输出，包含answer、confidence、sources字段。
            
            知识库内容：
            {context}
            
            用户问题：{question}
            """;

    public HybridRagFineTuneService(VectorStore productKnowledgeStore,
                                      ChatClient.Builder builder,
                                      @Value("${finetune.model-id}") String modelId) {
        this.productKnowledgeStore = productKnowledgeStore;
        this.fineTunedClient = builder
                .defaultOptions(ChatOptions.builder().model(modelId).build())
                .build();
    }

    public ProductQaResponse answer(String question) {
        // 1. RAG检索相关知识
        List<Document> relevantDocs = productKnowledgeStore.similaritySearch(
                SearchRequest.builder().query(question).topK(3).build());

        String context = relevantDocs.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n\n---\n\n"));

        // 2. 用微调模型生成格式化答案
        String prompt = HYBRID_PROMPT
                .replace("{context}", context)
                .replace("{question}", question);

        return fineTunedClient.prompt(prompt)
                .call()
                .entity(ProductQaResponse.class);
    }

    @Data
    public static class ProductQaResponse {
        private String answer;
        private double confidence;
        private List<String> sources;
    }
}

3.4 ROI计算工具

@Service
public class FineTuneRoiCalculator {

    /**
     * 计算Fine-tune的ROI
     * 帮助决策者量化选型的经济效益
     */
    public RoiReport calculateRoi(FineTuneRoiInput input) {
        // Fine-tune建设成本
        double trainingCost = input.getTrainingSamples() * 0.008 / 1000.0
                * input.getAvgTrainingTokens(); // gpt-3.5-turbo微调费用
        double labelingCost = input.getTrainingSamples() * input.getLabelingCostPerSample();
        double totalSetupCost = trainingCost + labelingCost;

        // RAG月运行成本
        double ragMonthlyQueryCost = input.getMonthlyCallCount() *
                (input.getRagContextTokens() * 0.005 / 1000.0  // GPT-4o input
                + input.getAvgOutputTokens() * 0.015 / 1000.0  // GPT-4o output
                + input.getRagEmbeddingCost()); // Embedding成本

        // Fine-tune月运行成本（短Prompt，无需RAG）
        double ftMonthlyQueryCost = input.getMonthlyCallCount() *
                (input.getFtContextTokens() * 0.003 / 1000.0   // fine-tuned gpt-3.5 input
                + input.getAvgOutputTokens() * 0.006 / 1000.0); // fine-tuned output

        double monthlySaving = ragMonthlyQueryCost - ftMonthlyQueryCost;
        int paybackMonths = monthlySaving > 0 ?
                (int) Math.ceil(totalSetupCost / monthlySaving) : -1;

        return new RoiReport(
                totalSetupCost,
                ragMonthlyQueryCost,
                ftMonthlyQueryCost,
                monthlySaving,
                paybackMonths,
                paybackMonths > 0 && paybackMonths <= 12 ?
                        "推荐Fine-tune，约" + paybackMonths + "个月回本" :
                        paybackMonths <= 0 ? "Fine-tune运行成本更高，推荐RAG" :
                                "回本周期超过12个月，建议优先RAG"
        );
    }
}

四、效果评估与优化

以电商评论分类为例，三种方案的全面对比：

维度	Prompt Engineering	RAG + Few-shot	Fine-tune
开发周期	1天	3天	2周（含数据标注）
准确率	72%	78%	94%
推理延迟（P50）	650ms	850ms	180ms
推理延迟（P99）	2100ms	2800ms	520ms
月运行成本（10万次）	¥2100	¥2800	¥380
知识更新难度	修改Prompt	更新知识库	需重新训练
可解释性	低	中（有来源）	低

Fine-tune在这个固定格式任务上全面领先，主要贡献来自：不需要长上下文（节省Token成本）、模型已经学会了输出格式（不需要Prompt中的详细说明）、更短更专注的推理路径（延迟大幅降低）。

五、踩坑实录

坑1：Fine-tune数据质量比数量更重要

第一版收集了2000条训练数据，但其中有400条是同事随手标注的，标注标准不统一。微调后模型的准确率反而比Prompt Engineering低。原因是模型学到了错误的标注模式。清洗掉质量差的数据，用1500条高质量样本重新训练，准确率才超过了基准。训练数据的质量检验（inter-rater agreement，至少两个标注者标注同一条数据，检查一致性）是必须做的工程步骤。

坑2：Fine-tune模型"遗忘"了通用能力

微调后的模型在分类任务上效果很好，但如果要让它做其他任务（比如写一段文字描述），质量明显低于基础模型。这是"灾难性遗忘"（Catastrophic Forgetting）问题——微调在特定任务上过度优化，通用能力下降。解决方案：一个Fine-tune模型只用于一类任务，不要试图让它成为万能的；对于需要通用能力的场景，保留基础模型。

坑3：OpenAI Fine-tuning API的epoch数默认是自动选择

我指定了n_epochs: "auto"，以为会选最优，结果训练了很多epoch，过拟合了——训练集上95%准确率，测试集只有81%。后来固定n_epochs: 3，测试集准确率提升到94%。Fine-tune的hyperparameter tuning和传统机器学习一样重要，不能全交给auto。

六、总结

Fine-tune vs RAG的选型，没有绝对的好坏，只有适不适合当前场景。

一句话总结选型原则：如果你需要模型学会"怎么做"（格式、风格、专业能力），用Fine-tune；如果你需要模型知道"是什么"（新知识、动态信息），用RAG；两者不互斥，高级场景可以组合使用。

从投入产出看：Fine-tune的前期投入高（数据标注、训练费用），但规模化后边际成本极低，适合高频调用的标准化任务；RAG的前期投入低（有文档就能起步），但每次调用成本较高，适合知识密集型、信息时效性强的场景。