第2158篇:Shadow Mode部署——上线新模型前的无风险验证策略
2026/4/30大约 6 分钟
第2158篇:Shadow Mode部署——上线新模型前的无风险验证策略
适读人群:需要上线新LLM版本又不敢冒险的工程师 | 阅读时长:约16分钟 | 核心价值:通过Shadow Mode在真实生产流量上验证新模型,零用户影响地收集充足的对比数据
每次LLM升级都是一场赌博。
在测试集上表现很好,上线后可能踩到测试集没覆盖的边界情况。直接灰度发布,哪怕只影响5%的用户,也是真实用户在承担风险。
Shadow Mode(影子模式)是解决这个问题的优雅方案:新模型同时收到所有请求,处理这些请求,但结果不返回给用户,只记录下来供分析。这样你能在真实流量上充分验证新模型,同时对用户完全无感知。
Shadow Mode的核心原理
关键点:
- 影子调用是完全异步的,不影响用户的响应时间
- 影子模型的结果永远不返回给用户
- 影子调用可能失败,失败不影响主路由
- 两个模型收到完全相同的输入(包括RAG检索结果)
Shadow Mode的实现
/**
* Shadow Mode路由器
*
* 核心实现:主模型同步调用,影子模型异步调用
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class ShadowModeRouter {
private final PrimaryLlmService primaryService;
private final ShadowLlmService shadowService;
private final ShadowResultRepository shadowResultRepository;
private final ShadowModeConfig shadowConfig;
private final ThreadPoolTaskExecutor shadowExecutor;
private final Random random = new Random();
/**
* 处理请求,主路由正常返回,影子路由异步执行
*
* @param request LLM请求
* @return 主模型的响应(与不开shadow模式完全相同)
*/
public LlmResponse handle(LlmRequest request) {
// 1. 调用主模型(同步)
long primaryStart = System.currentTimeMillis();
LlmResponse primaryResponse = primaryService.call(request);
long primaryLatency = System.currentTimeMillis() - primaryStart;
// 2. 判断是否需要触发影子调用(流量采样)
if (shouldTriggerShadow(request)) {
final String requestId = UUID.randomUUID().toString();
final LlmRequest capturedRequest = captureRequest(request);
final LlmResponse capturedPrimaryResponse = primaryResponse;
// 3. 异步触发影子调用
shadowExecutor.submit(() -> {
executeShadowCall(requestId, capturedRequest, capturedPrimaryResponse, primaryLatency);
});
}
// 4. 直接返回主模型结果(不等待影子调用)
return primaryResponse;
}
/**
* 影子调用的执行逻辑
*/
private void executeShadowCall(String requestId, LlmRequest request,
LlmResponse primaryResponse, long primaryLatency) {
long shadowStart = System.currentTimeMillis();
LlmResponse shadowResponse = null;
String errorMessage = null;
boolean shadowSuccess = false;
try {
shadowResponse = shadowService.call(request);
shadowSuccess = true;
} catch (Exception e) {
log.warn("影子调用失败,requestId={}, error={}", requestId, e.getMessage());
errorMessage = e.getMessage();
} finally {
long shadowLatency = System.currentTimeMillis() - shadowStart;
// 保存对比结果
ShadowResult result = ShadowResult.builder()
.requestId(requestId)
.timestamp(Instant.now())
.userInput(request.getUserInput())
.primaryModel(shadowConfig.getPrimaryModelName())
.shadowModel(shadowConfig.getShadowModelName())
.primaryOutput(primaryResponse.getContent())
.shadowOutput(shadowResponse != null ? shadowResponse.getContent() : null)
.primaryLatencyMs(primaryLatency)
.shadowLatencyMs(shadowLatency)
.shadowSuccess(shadowSuccess)
.shadowError(errorMessage)
.context(request.getContext())
.build();
shadowResultRepository.save(result);
}
}
/**
* 流量采样:不是所有请求都需要触发影子调用
*
* 根据配置的采样率决定
*/
private boolean shouldTriggerShadow(LlmRequest request) {
if (!shadowConfig.isEnabled()) return false;
// 高优先级请求(如VIP用户)可以提高采样率
double sampleRate = request.isHighPriority()
? shadowConfig.getHighPrioritySampleRate()
: shadowConfig.getDefaultSampleRate();
return random.nextDouble() < sampleRate;
}
/**
* 捕获请求快照(避免请求对象被修改)
*/
private LlmRequest captureRequest(LlmRequest original) {
return LlmRequest.builder()
.userInput(original.getUserInput())
.systemPrompt(original.getSystemPrompt())
.context(original.getContext())
.temperature(original.getTemperature())
.maxTokens(original.getMaxTokens())
.highPriority(original.isHighPriority())
.build();
}
}
/**
* Shadow Mode配置
*/
@Component
@ConfigurationProperties(prefix = "shadow-mode")
@Data
public class ShadowModeConfig {
private boolean enabled = false;
private double defaultSampleRate = 0.10; // 默认10%流量
private double highPrioritySampleRate = 0.30;
private String primaryModelName;
private String shadowModelName;
private int executorPoolSize = 10;
private int executorQueueCapacity = 1000;
}影子结果的分析与对比
收集了影子数据后,需要系统化地分析:
/**
* 影子模式结果分析服务
*
* 对比主模型和影子模型的输出,评估影子模型是否可以替换主模型
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class ShadowAnalysisService {
private final ShadowResultRepository shadowResultRepository;
private final LlmEvaluationService evaluationService;
private final PairwiseJudgeService pairwiseJudge;
/**
* 对影子收集的数据进行分析
*
* @param since 分析起始时间
* @return 完整的对比报告
*/
public ShadowAnalysisReport analyze(Instant since) {
List<ShadowResult> results = shadowResultRepository.findSince(since);
log.info("开始影子分析,样本数={}", results.size());
// 基础统计
long successCount = results.stream().filter(ShadowResult::isShadowSuccess).count();
double shadowSuccessRate = (double) successCount / results.size();
// 延迟对比
DoubleSummaryStatistics primaryLatencies = results.stream()
.mapToDouble(ShadowResult::getPrimaryLatencyMs).summaryStatistics();
DoubleSummaryStatistics shadowLatencies = results.stream()
.filter(ShadowResult::isShadowSuccess)
.mapToDouble(ShadowResult::getShadowLatencyMs).summaryStatistics();
// 输出长度对比
DoubleSummaryStatistics primaryLengths = results.stream()
.mapToDouble(r -> r.getPrimaryOutput() != null ? r.getPrimaryOutput().length() : 0)
.summaryStatistics();
DoubleSummaryStatistics shadowLengths = results.stream()
.filter(r -> r.getShadowOutput() != null)
.mapToDouble(r -> r.getShadowOutput().length()).summaryStatistics();
// 质量对比(抽样评估,避免成本过高)
List<ShadowResult> sampledForQuality = sampleForQualityEval(results, 100);
QualityComparison qualityComparison = evaluateQualityComparison(sampledForQuality);
// Pairwise对比(偏好判断)
List<ShadowResult> sampledForPairwise = sampleForQualityEval(results, 50);
PairwisePreference pairwisePreference = evaluatePairwisePreference(sampledForPairwise);
return ShadowAnalysisReport.builder()
.analyzedCount(results.size())
.shadowSuccessRate(shadowSuccessRate)
.primaryAvgLatencyMs(primaryLatencies.getAverage())
.shadowAvgLatencyMs(shadowLatencies.getAverage())
.latencyDeltaPercent((shadowLatencies.getAverage() - primaryLatencies.getAverage())
/ primaryLatencies.getAverage() * 100)
.primaryAvgOutputLength(primaryLengths.getAverage())
.shadowAvgOutputLength(shadowLengths.getAverage())
.qualityComparison(qualityComparison)
.pairwisePreference(pairwisePreference)
.recommendation(generateRecommendation(
shadowSuccessRate, qualityComparison, pairwisePreference,
primaryLatencies.getAverage(), shadowLatencies.getAverage()
))
.build();
}
private QualityComparison evaluateQualityComparison(List<ShadowResult> samples) {
List<Double> primaryScores = new ArrayList<>();
List<Double> shadowScores = new ArrayList<>();
for (ShadowResult result : samples) {
if (result.getShadowOutput() == null) continue;
EvaluationReport primaryEval = evaluationService.evaluate(
EvaluationRequest.builder()
.userInput(result.getUserInput())
.llmOutput(result.getPrimaryOutput())
.context(result.getContext())
.build()
);
EvaluationReport shadowEval = evaluationService.evaluate(
EvaluationRequest.builder()
.userInput(result.getUserInput())
.llmOutput(result.getShadowOutput())
.context(result.getContext())
.build()
);
primaryScores.add(primaryEval.getOverallScore());
shadowScores.add(shadowEval.getOverallScore());
}
double primaryAvg = primaryScores.stream().mapToDouble(Double::doubleValue).average().orElse(0);
double shadowAvg = shadowScores.stream().mapToDouble(Double::doubleValue).average().orElse(0);
return QualityComparison.builder()
.primaryAvgScore(primaryAvg)
.shadowAvgScore(shadowAvg)
.scoreDelta(shadowAvg - primaryAvg)
.shadowBetter(shadowAvg > primaryAvg * 1.02) // 2%以上的改善才算"更好"
.build();
}
private PairwisePreference evaluatePairwisePreference(List<ShadowResult> samples) {
int shadowWins = 0, primaryWins = 0, ties = 0;
for (ShadowResult result : samples) {
if (result.getShadowOutput() == null) continue;
PairwiseEvaluation eval = pairwiseJudge.compare(
result.getUserInput(),
result.getPrimaryOutput(),
result.getShadowOutput(),
"overall"
);
switch (eval.getResult()) {
case A_BETTER -> primaryWins++;
case B_BETTER -> shadowWins++;
case TIE -> ties++;
default -> {} // UNCERTAIN不计入
}
}
int total = shadowWins + primaryWins + ties;
return PairwisePreference.builder()
.shadowWinRate((double) shadowWins / total)
.primaryWinRate((double) primaryWins / total)
.tieRate((double) ties / total)
.shadowWins(shadowWins)
.primaryWins(primaryWins)
.ties(ties)
.build();
}
private String generateRecommendation(double shadowSuccessRate,
QualityComparison quality,
PairwisePreference pairwise,
double primaryLatency, double shadowLatency) {
if (shadowSuccessRate < 0.95) {
return "不建议上线:影子模型成功率仅" + String.format("%.1f%%", shadowSuccessRate * 100) +
",存在稳定性问题";
}
if (shadowLatency > primaryLatency * 1.5) {
return "谨慎上线:影子模型延迟比主模型高" +
String.format("%.0f%%", (shadowLatency / primaryLatency - 1) * 100) +
",需要评估对SLA的影响";
}
if (quality.isShadowBetter() && pairwise.getShadowWinRate() > 0.45) {
return "推荐上线:影子模型在质量评估和偏好对比上均优于主模型。" +
String.format("质量分提升+%.1f%%,偏好胜率%.1f%%",
quality.getScoreDelta() * 100, pairwise.getShadowWinRate() * 100);
}
if (quality.getScoreDelta() < -0.05) {
return "不建议上线:影子模型质量分下降" +
String.format("%.1f%%", Math.abs(quality.getScoreDelta()) * 100);
}
return "可以上线:影子模型与主模型质量相当,可以进行小范围灰度验证";
}
private List<ShadowResult> sampleForQualityEval(List<ShadowResult> results, int sampleSize) {
List<ShadowResult> eligible = results.stream()
.filter(ShadowResult::isShadowSuccess)
.filter(r -> r.getPrimaryOutput() != null && r.getShadowOutput() != null)
.collect(Collectors.toList());
if (eligible.size() <= sampleSize) return eligible;
Collections.shuffle(eligible, new Random(42));
return eligible.subList(0, sampleSize);
}
}工程注意事项
注意1:影子调用的资源隔离
影子调用必须有独立的线程池和连接池。如果影子模型调用变慢,不能影响主模型的响应时间。
@Bean("shadowExecutor")
public ThreadPoolTaskExecutor shadowExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(500);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.DiscardPolicy()); // 队列满了直接丢弃
executor.setThreadNamePrefix("shadow-");
return executor;
}注意2:成本控制
Shadow Mode会让LLM调用成本翻倍(同样的请求打了两次)。默认采样率不要超过20%,除非有充足的预算。
注意3:敏感数据处理
用户的真实输入被发送给了影子模型。确认影子模型满足隐私合规要求,特别是如果主模型是私有部署、影子模型是公有云的情况。
注意4:影子调用不能有副作用
如果LLM调用会触发数据库写入、发邮件等副作用,Shadow Mode下需要特别处理,确保影子调用不产生真实的副作用(比如给影子服务传入一个"dry-run"标志)。
