第2182篇：多模型路由决策引擎——根据任务特征动态选择最合适的模型

老张2026/4/30大约 7 分钟

第2182篇：多模型路由决策引擎——根据任务特征动态选择最合适的模型

适读人群：需要在成本和质量之间做平衡的AI系统架构师 | 阅读时长：约17分钟 | 核心价值：构建智能模型路由引擎，让不同复杂度的任务走最适合的模型

去年做成本优化时，我整理了三个月的模型调用账单。

数据让我有点震惊：60%的请求是"回复邮件格式是否正确"、"这个词语对不对"这类低复杂度任务，它们都在调用GPT-4——而这类任务用GPT-4o-mini完全能搞定，成本只有GPT-4的1/40。

另一方面，有些高复杂度的任务（需要深度推理和多步骤分析）走了便宜的小模型，导致输出质量明显下降，用户抱怨变多。

问题不是"用贵模型还是便宜模型"，而是"没有根据任务难度选择合适的模型"。模型路由解决的就是这个问题。

模型路由的核心逻辑

任务特征 → 路由决策 → 模型选择

任务特征维度：
  ├── 复杂度：需要多少步推理？
  ├── 领域：是否需要专业知识？
  ├── 风险：错误的代价有多高？
  ├── 时效性：需要最新信息吗？
  ├── 长度：输入+预期输出有多长？
  └── 格式：是否需要特定格式输出？

模型能力层级：
  强模型（GPT-4 / Claude-3.5）：
    → 复杂推理、代码生成、专业分析
    → 高风险决策支持
    → 成本：高
    
  中等模型（GPT-4o-mini / Claude-3-haiku）：
    → 标准问答、文案生成、数据提取
    → 成本：低
    
  专用模型（Embedding / 分类器）：
    → 向量化、分类、简单过滤
    → 成本：极低
    
  本地模型（LLaMA等）：
    → 简单任务、隐私敏感场景
    → 成本：仅算力

任务复杂度分类器

/**
 * 任务复杂度分类器
 * 
 * 快速判断一个请求的复杂度，决定应该走哪类模型
 */
@Component
@RequiredArgsConstructor
@Slf4j
public class TaskComplexityClassifier {

    private final EmbeddingService embeddingService;
    private final LocalClassificationModel localModel;  // 本地轻量分类器
    
    /**
     * 任务特征提取
     */
    public TaskFeatures extractFeatures(String userQuery, String systemContext) {
        return TaskFeatures.builder()
            .queryLength(userQuery.length())
            .contextLength(systemContext != null ? systemContext.length() : 0)
            .estimatedOutputLength(estimateOutputLength(userQuery))
            .requiresMultiStep(requiresMultiStep(userQuery))
            .requiresCurrentKnowledge(requiresCurrentKnowledge(userQuery))
            .domainType(detectDomain(userQuery))
            .taskType(detectTaskType(userQuery))
            .hasMathOrCode(hasMathOrCode(userQuery))
            .sensitivityLevel(detectSensitivity(userQuery, systemContext))
            .build();
    }

    /**
     * 基于特征判断复杂度等级
     */
    public ComplexityLevel classify(TaskFeatures features) {
        int complexityScore = 0;
        
        // 多步骤推理 → 高复杂度
        if (features.isRequiresMultiStep()) complexityScore += 30;
        
        // 涉及代码或数学 → 较高复杂度
        if (features.isHasMathOrCode()) complexityScore += 25;
        
        // 专业领域（医疗/法律/金融）→ 较高复杂度
        if (features.getDomainType() == DomainType.PROFESSIONAL) complexityScore += 20;
        
        // 长输出要求 → 中等复杂度
        if (features.getEstimatedOutputLength() > 500) complexityScore += 15;
        
        // 创意任务 → 中等复杂度
        if (features.getTaskType() == TaskType.CREATIVE) complexityScore += 10;
        
        // 简单QA → 低复杂度
        if (features.getTaskType() == TaskType.SIMPLE_QA) complexityScore -= 10;
        
        // 格式转换 → 低复杂度
        if (features.getTaskType() == TaskType.FORMAT_CONVERSION) complexityScore -= 20;
        
        log.debug("复杂度评分: {}, 特征: {}", complexityScore, features);
        
        if (complexityScore >= 50) return ComplexityLevel.HIGH;
        if (complexityScore >= 20) return ComplexityLevel.MEDIUM;
        return ComplexityLevel.LOW;
    }

    /**
     * 判断是否需要多步骤推理
     * 使用简单规则避免调用LLM（节省成本）
     */
    private boolean requiresMultiStep(String query) {
        // 关键词信号
        List<String> multiStepIndicators = List.of(
            "分析", "比较", "如何实现", "步骤", "方案", "设计",
            "评估", "推断", "解释为什么", "有什么原因"
        );
        
        // 问题结构信号（包含多个问号或多个子问题）
        long questionCount = query.chars()
            .filter(c -> c == '？' || c == '?').count();
        
        boolean hasMultiStepKeyword = multiStepIndicators.stream()
            .anyMatch(query::contains);
        
        return hasMultiStepKeyword || questionCount > 2;
    }

    private boolean hasMathOrCode(String query) {
        return query.matches(".*[\\d+\\-*/=(){}\\[\\];<>].*") ||
               query.contains("代码") || query.contains("算法") ||
               query.contains("计算") || query.contains("公式");
    }

    private DomainType detectDomain(String query) {
        if (query.contains("医") || query.contains("诊断") || 
            query.contains("症状") || query.contains("药")) {
            return DomainType.MEDICAL;
        }
        if (query.contains("法律") || query.contains("合同") || 
            query.contains("诉讼")) {
            return DomainType.LEGAL;
        }
        if (query.contains("投资") || query.contains("股票") || 
            query.contains("财务")) {
            return DomainType.FINANCIAL;
        }
        if (query.contains("代码") || query.contains("编程") || 
            query.contains("API")) {
            return DomainType.TECHNICAL;
        }
        return DomainType.GENERAL;
    }
}

路由决策引擎

/**
 * 模型路由决策引擎
 * 
 * 综合考虑：任务复杂度、成本预算、用户等级、当前负载
 */
@Service
@RequiredArgsConstructor
@Slf4j
public class ModelRoutingEngine {

    private final TaskComplexityClassifier complexityClassifier;
    private final ProviderHealthMonitor healthMonitor;
    private final CostBudgetManager budgetManager;
    private final RoutingConfigRepository configRepo;

    /**
     * 路由决策：为这个请求选择最合适的模型
     */
    public RoutingDecision route(RoutingRequest request) {
        // 1. 提取任务特征
        TaskFeatures features = complexityClassifier.extractFeatures(
            request.getUserQuery(), request.getSystemContext());
        
        ComplexityLevel complexity = complexityClassifier.classify(features);
        
        // 2. 应用强制路由规则（某些场景必须用特定模型）
        Optional<RoutingDecision> forcedDecision = applyForcedRules(
            request, features);
        if (forcedDecision.isPresent()) {
            log.debug("强制路由规则触发: {}", forcedDecision.get().getReason());
            return forcedDecision.get();
        }
        
        // 3. 根据复杂度确定候选模型列表
        List<ModelCandidate> candidates = getCandidates(complexity, request.getUserTier());
        
        // 4. 过滤掉不健康的供应商
        candidates = candidates.stream()
            .filter(c -> !healthMonitor.isUnhealthy(c.getProviderId()))
            .collect(Collectors.toList());
        
        // 5. 检查预算限制
        candidates = budgetManager.filterByBudget(candidates, request.getUserId());
        
        if (candidates.isEmpty()) {
            log.warn("没有可用的模型候选，使用最基础的降级方案");
            return RoutingDecision.fallback();
        }
        
        // 6. 选择最优候选（在满足质量要求的前提下选最便宜的）
        ModelCandidate selected = selectOptimal(candidates, features);
        
        log.debug("路由决策: complexity={}, model={}, estimatedCost=${:.6f}",
            complexity, selected.getModelName(), 
            estimateCost(features, selected));
        
        return RoutingDecision.builder()
            .modelId(selected.getModelId())
            .modelName(selected.getModelName())
            .providerId(selected.getProviderId())
            .complexityLevel(complexity)
            .taskFeatures(features)
            .estimatedCostUSD(estimateCost(features, selected))
            .build();
    }

    /**
     * 强制路由规则
     * 某些场景不管复杂度如何，都必须用特定模型
     */
    private Optional<RoutingDecision> applyForcedRules(
            RoutingRequest request, TaskFeatures features) {
        
        // 高敏感度（医疗/法律建议）→ 必须用最强模型
        if (features.getSensitivityLevel() == SensitivityLevel.HIGH) {
            return Optional.of(RoutingDecision.forcedModel(
                "gpt-4-turbo", "HIGH_SENSITIVITY_MANDATORY"));
        }
        
        // 企业VIP用户 → 始终用最强模型
        if (request.getUserTier() == UserTier.ENTERPRISE_VIP) {
            return Optional.of(RoutingDecision.forcedModel(
                "gpt-4-turbo", "VIP_USER_POLICY"));
        }
        
        // 极简任务（格式检查、分类等）→ 必须用小模型
        if (features.getTaskType() == TaskType.FORMAT_CHECK ||
            features.getTaskType() == TaskType.SIMPLE_CLASSIFICATION) {
            return Optional.of(RoutingDecision.forcedModel(
                "gpt-4o-mini", "SIMPLE_TASK_COST_OPTIMIZATION"));
        }
        
        return Optional.empty();
    }

    /**
     * 根据复杂度获取候选模型列表
     */
    private List<ModelCandidate> getCandidates(
            ComplexityLevel complexity, UserTier userTier) {
        
        RoutingConfig config = configRepo.loadConfig();
        
        return switch (complexity) {
            case HIGH -> config.getHighComplexityModels();
            case MEDIUM -> config.getMediumComplexityModels();
            case LOW -> config.getLowComplexityModels();
        };
    }

    /**
     * 在候选中选最优（质量满足要求的前提下成本最低）
     */
    private ModelCandidate selectOptimal(
            List<ModelCandidate> candidates, 
            TaskFeatures features) {
        
        // 按成本升序，选第一个能满足质量要求的
        return candidates.stream()
            .sorted(Comparator.comparingDouble(ModelCandidate::getCostPerToken))
            .filter(c -> c.getQualityLevel().meets(features.getMinQualityRequired()))
            .findFirst()
            .orElse(candidates.get(0));  // 找不到满足的就用第一个
    }
}

路由效果监控与自动校准

/**
 * 路由效果监控
 * 
 * 追踪路由决策的实际效果，自动校准路由规则
 */
@Service
@RequiredArgsConstructor
public class RoutingEffectivenessMonitor {

    private final RoutingDecisionRepository decisionRepo;
    private final QualityScoreRepository qualityRepo;

    /**
     * 分析路由决策质量
     * 
     * 核心问题：
     * 1. 低复杂度任务是否被路由到了昂贵模型？（过分配）
     * 2. 高复杂度任务是否被路由到了便宜模型？（欠分配）
     */
    @Scheduled(cron = "0 0 6 * * *")
    public void analyzeRoutingEffectiveness() {
        LocalDate yesterday = LocalDate.now().minusDays(1);
        
        // 找出"小模型但用户不满意"的案例（欠分配信号）
        List<RoutingAnalysisItem> underAllocated = decisionRepo
            .findByDateAndModelTier(yesterday, ModelTier.LOW)
            .stream()
            .filter(d -> qualityRepo.getQualityScore(d.getRequestId()) < 0.6)
            .collect(Collectors.toList());
        
        // 找出"大模型但质量并没有更好"的案例（过分配信号）
        List<RoutingAnalysisItem> overAllocated = decisionRepo
            .findByDateAndModelTier(yesterday, ModelTier.HIGH)
            .stream()
            .filter(d -> qualityRepo.getQualityScore(d.getRequestId()) < 0.75)
            .collect(Collectors.toList());
        
        double underAllocationRate = (double) underAllocated.size() / 
            decisionRepo.countByDate(yesterday);
        double overAllocationRate = (double) overAllocated.size() / 
            decisionRepo.countByDate(yesterday);
        
        log.info("路由效果分析: 欠分配率={:.1f}%, 过分配率={:.1f}%",
            underAllocationRate * 100, overAllocationRate * 100);
        
        // 如果过分配率超过20%，建议放宽路由到小模型的条件
        if (overAllocationRate > 0.20) {
            log.warn("路由过分配率过高，建议检查复杂度判断逻辑");
        }
        
        // 如果欠分配率超过10%，建议收紧路由到小模型的条件
        if (underAllocationRate > 0.10) {
            log.warn("路由欠分配率过高，可能影响用户满意度");
        }
    }
}

成本节省量化

/**
 * 路由成本节省计算
 */
@Service
@RequiredArgsConstructor
public class RoutingCostSavingsCalculator {

    public CostSavingsReport calculateSavings(LocalDate from, LocalDate to) {
        List<RoutingDecision> decisions = decisionRepo.findByDateRange(from, to);
        
        double actualCost = decisions.stream()
            .mapToDouble(RoutingDecision::getActualCostUSD)
            .sum();
        
        // 如果全部用最贵模型，费用是多少？
        double worstCaseCost = decisions.stream()
            .mapToDouble(d -> estimateCostWithModel(d, "gpt-4-turbo"))
            .sum();
        
        double savingsAmount = worstCaseCost - actualCost;
        double savingsRate = savingsAmount / worstCaseCost;
        
        log.info("路由成本节省报告: 实际费用=${:.2f}, 最差情况=${:.2f}, 节省=${:.2f} ({:.1f}%)",
            actualCost, worstCaseCost, savingsAmount, savingsRate * 100);
        
        return new CostSavingsReport(
            from, to, actualCost, worstCaseCost, savingsAmount, savingsRate);
    }
}

核心洞察：模型路由是AI系统的资源调度器

模型路由不只是成本优化，本质上是AI系统的资源调度器——把计算资源分配给最需要的地方。

实施了三个月的模型路由之后，我们的实际数据：

API成本降低58%
用户满意度不降反升（因为高复杂度任务终于得到了强模型的处理）
P99延迟降低35%（小模型响应更快）

几个工程经验：

从简单规则开始，不要一上来就做ML路由器。关键词规则 + 长度判断的简单路由器，往往能覆盖80%的场景，而且可解释性好，调试容易。
持续监控欠分配。成本优化的压力会让你倾向于多路由到便宜模型，但欠分配的代价是用户不满意，长期损害产品。欠分配比过分配更危险。
任务特征提取要快。如果分类任务本身就调用了LLM，那分类的成本可能超过了路由节省的成本。用轻量的本地规则分类，是路由系统的核心设计原则。
给用户一定的控制权。允许高级用户选择"精准模式"（强模型）或"快速模式"（轻量模型），既增加透明度，也能收集用户对质量-速度取舍的真实偏好数据。