第2313篇:多步推理链的可靠性工程——让AI的推理过程可验证
2026/4/30大约 6 分钟
第2313篇:多步推理链的可靠性工程——让AI的推理过程可验证
适读人群:AI系统工程师、关注AI可靠性的架构师 | 阅读时长:约18分钟 | 核心价值:掌握多步推理链的可验证性设计方法,系统性提升复杂推理任务的可靠性
这件事发生在我们做一个合同风险分析系统的时候。AI要分析一份合同,判断违约条款是否对我方不利,然后给出建议。
有一次,AI给出了一个看起来逻辑完整的分析:第一步,识别了违约条款;第二步,分析了适用法律;第三步,判断风险程度;第四步,给出了"风险可控"的结论。
法务同学拿去核实后发现:第二步识别的适用法律是错的(把中国合同法和香港法律混淆了),但后续的推理"看起来"基于第二步顺理成章地推出了结论。整条推理链是自洽的,但建立在一个错误的中间步骤上。
这就是多步推理的核心风险:错误不会在第一步就被发现,它会被后续步骤"继承"并放大,最终产生自洽但错误的结论。
推理链可靠性的核心问题
多步推理链(Chain of Thought)有三类可靠性问题:
中间步骤错误传播:步骤N的错误会成为步骤N+1的输入,错误被层层传递和放大。
幻觉隐藏:单步看起来合理,但是在知识不足的地方凭空生成了内容,后续步骤继续利用这些幻觉内容推理。
逻辑断裂:每个步骤单独看都有道理,但步骤之间的逻辑连接是跳跃的、不严密的。
可验证推理步骤的数据结构
可靠性的基础是:每个推理步骤都要带着可供验证的依据:
/**
* 可验证的推理步骤
* 每个步骤必须包含:声明、依据、验证结果
*/
public record VerifiableReasoningStep(
int stepNumber,
String stepType, // OBSERVATION | INFERENCE | CONCLUSION
String claim, // 本步骤的核心声明
String reasoning, // 推理过程
List<Evidence> evidence, // 支撑该声明的证据列表
VerificationResult verification, // 验证结果
double confidenceScore, // 0.0 - 1.0
List<String> assumptions // 依赖的假设(如有)
) {
public boolean isReliable() {
return verification.passed() && confidenceScore >= 0.7;
}
/**
* 检查步骤是否有未经验证的假设
*/
public boolean hasUnverifiedAssumptions() {
return !assumptions.isEmpty() &&
verification.verifiedAssumptions().size() < assumptions.size();
}
}
/**
* 支撑推理的证据
*/
public record Evidence(
String sourceType, // DOCUMENT | CALCULATION | PREVIOUS_STEP | RULE
String sourceId, // 来源标识(文档ID、步骤编号等)
String content, // 证据内容
double credibility // 证据可信度 0.0-1.0
) {}
/**
* 推理步骤的验证结果
*/
public record VerificationResult(
boolean passed,
VerificationMethod method,
String detail,
List<String> verifiedAssumptions,
List<String> failedChecks
) {
public enum VerificationMethod {
FACTUAL_CHECK, // 与已知事实对比
LOGICAL_CHECK, // 逻辑一致性检查
CALCULATION_CHECK, // 数值计算验证
CONSISTENCY_CHECK // 与其他步骤一致性检查
}
}推理步骤验证器
每个步骤生成后立即验证:
@Component
public class ReasoningStepVerifier {
private final ChatClient verificationClient;
private final CalculationVerifier calculationVerifier;
private final FactCheckService factCheckService;
/**
* 验证单个推理步骤
*/
public VerificationResult verify(VerifiableReasoningStep step,
List<VerifiableReasoningStep> previousSteps,
String originalContext) {
List<String> failedChecks = new ArrayList<>();
// 1. 事实性验证:声明是否与原始上下文一致
FactCheckResult factCheck = factCheckService.check(step.claim(), originalContext);
if (!factCheck.isConsistent()) {
failedChecks.add("事实不一致:" + factCheck.inconsistency());
}
// 2. 逻辑一致性验证:与前序步骤有无矛盾
if (!previousSteps.isEmpty()) {
LogicalConsistencyResult logicCheck = checkLogicalConsistency(step, previousSteps);
if (!logicCheck.isConsistent()) {
failedChecks.add("逻辑矛盾:" + logicCheck.contradiction());
}
}
// 3. 计算验证:如果步骤涉及数值计算
if (step.stepType().equals("CALCULATION") || containsNumbers(step.claim())) {
CalculationCheckResult calcCheck = calculationVerifier.verify(step.claim(), step.evidence());
if (!calcCheck.isCorrect()) {
failedChecks.add("计算错误:期望=" + calcCheck.expected() + ",实际=" + calcCheck.actual());
}
}
// 4. 依据充分性验证:证据是否足以支撑声明
EvidenceSufficiencyResult evidenceCheck = checkEvidenceSufficiency(step);
if (!evidenceCheck.isSufficient()) {
failedChecks.add("证据不足:" + evidenceCheck.gaps());
}
boolean passed = failedChecks.isEmpty();
return new VerificationResult(
passed,
VerificationResult.VerificationMethod.FACTUAL_CHECK,
passed ? "验证通过" : "验证失败:" + String.join(";", failedChecks),
List.of(),
failedChecks
);
}
private LogicalConsistencyResult checkLogicalConsistency(
VerifiableReasoningStep currentStep,
List<VerifiableReasoningStep> previousSteps) {
String previousClaims = previousSteps.stream()
.map(s -> "步骤%d:%s".formatted(s.stepNumber(), s.claim()))
.collect(Collectors.joining("\n"));
String response = verificationClient.prompt()
.system("""
检查当前推理步骤是否与前序步骤存在逻辑矛盾。
只关注直接矛盾,不要过度解读。
输出JSON:{"isConsistent": true/false, "contradiction": "矛盾描述(如有)"}
""")
.user("前序步骤:\n%s\n\n当前步骤:%s".formatted(previousClaims, currentStep.claim()))
.call()
.content();
return parseLogicalConsistencyResult(response);
}
}带验证的推理链执行器
@Service
public class VerifiedChainOfThoughtExecutor {
private final ChatClient reasoningClient;
private final ReasoningStepVerifier stepVerifier;
private final StepCorrectionService correctionService;
private static final String STRUCTURED_REASONING_PROMPT = """
你需要对问题进行逐步推理。每个推理步骤必须按以下格式输出(JSON):
{
"stepNumber": 步骤编号,
"stepType": "OBSERVATION/INFERENCE/CONCLUSION",
"claim": "本步骤的核心声明(一句话)",
"reasoning": "详细推理过程",
"evidence": [
{"sourceType": "DOCUMENT/PREVIOUS_STEP", "sourceId": "来源", "content": "证据内容"}
],
"assumptions": ["如果有假设,列出来"],
"isLastStep": false/true
}
重要规则:
1. 每步只做一件事,不要在一步中做多个推断
2. 每个声明必须有证据支撑
3. 如果需要假设,明确列出
4. 结论步骤(isLastStep=true)必须基于前序步骤的声明
""";
public VerifiedReasoningChain execute(String question, String context) {
List<VerifiableReasoningStep> completedSteps = new ArrayList<>();
String accumulatedContext = context;
while (true) {
// 生成下一个推理步骤
String stepJson = generateNextStep(question, accumulatedContext, completedSteps);
ReasoningStepDraft draft = parseStepDraft(stepJson);
// 立即验证这个步骤
VerificationResult verification = stepVerifier.verify(
draft.toStep(), completedSteps, context
);
if (!verification.passed()) {
log.warn("推理步骤{}验证失败: {}", draft.stepNumber(), verification.detail());
// 尝试修正
StepCorrectionResult correction = correctionService.correct(
draft, verification, completedSteps, context
);
if (correction.success()) {
// 使用修正后的步骤
VerifiableReasoningStep correctedStep = correction.correctedStep();
completedSteps.add(correctedStep);
log.info("步骤{}已修正", draft.stepNumber());
} else {
// 修正失败,标记为低可信度步骤继续
VerifiableReasoningStep uncertainStep = draft.toStepWithUncertainty(
verification, 0.3
);
completedSteps.add(uncertainStep);
log.warn("步骤{}无法修正,以低可信度继续", draft.stepNumber());
}
} else {
completedSteps.add(draft.toStep(verification));
}
if (draft.isLastStep()) {
break;
}
// 防止无限循环
if (completedSteps.size() > 10) {
log.warn("推理步骤超过10步,强制终止");
break;
}
}
return buildReasoningChain(question, completedSteps);
}
private String generateNextStep(String question, String context,
List<VerifiableReasoningStep> previousSteps) {
String previousStepsSummary = previousSteps.isEmpty() ? "无" :
previousSteps.stream()
.map(s -> "步骤%d[%s]:%s(可信度:%.0f%%)"
.formatted(s.stepNumber(), s.stepType(), s.claim(), s.confidenceScore() * 100))
.collect(Collectors.joining("\n"));
return reasoningClient.prompt()
.system(STRUCTURED_REASONING_PROMPT)
.user("""
背景信息:%s
问题:%s
已完成的推理步骤:
%s
请给出下一个推理步骤。
""".formatted(context, question, previousStepsSummary))
.call()
.content();
}
private VerifiedReasoningChain buildReasoningChain(String question,
List<VerifiableReasoningStep> steps) {
// 计算整体推理链的可信度(取最低步骤的可信度)
double chainConfidence = steps.stream()
.mapToDouble(VerifiableReasoningStep::confidenceScore)
.min()
.orElse(0.0);
// 找出所有失败的步骤
List<VerifiableReasoningStep> failedSteps = steps.stream()
.filter(s -> !s.isReliable())
.toList();
String finalConclusion = steps.isEmpty() ? "无法得出结论" :
steps.get(steps.size() - 1).claim();
return new VerifiedReasoningChain(
question, steps, finalConclusion,
chainConfidence, failedSteps
);
}
}推理链可靠性报告
最终输出不只是结论,还要有推理过程的可靠性说明:
@Service
public class ReasoningReliabilityReporter {
public ReasoningReport generateReport(VerifiedReasoningChain chain) {
List<String> warnings = new ArrayList<>();
List<String> caveats = new ArrayList<>();
// 检查是否有低可信度步骤
chain.steps().stream()
.filter(s -> s.confidenceScore() < 0.7)
.forEach(s -> warnings.add(
"步骤%d可信度偏低(%.0f%%):%s"
.formatted(s.stepNumber(), s.confidenceScore() * 100, s.claim())
));
// 检查是否有未验证的假设
chain.steps().stream()
.filter(VerifiableReasoningStep::hasUnverifiedAssumptions)
.forEach(s -> caveats.addAll(
s.assumptions().stream()
.map(a -> "未验证假设:" + a)
.toList()
));
// 整体可靠性等级
ReliabilityLevel level;
if (chain.chainConfidence() >= 0.85 && warnings.isEmpty()) {
level = ReliabilityLevel.HIGH;
} else if (chain.chainConfidence() >= 0.65) {
level = ReliabilityLevel.MEDIUM;
} else {
level = ReliabilityLevel.LOW;
}
return new ReasoningReport(
chain.finalConclusion(),
level,
chain.chainConfidence(),
warnings,
caveats,
chain.steps()
);
}
}我们在合同风险分析系统中引入了这套可验证推理框架后,法务团队的反馈发生了根本性变化:之前他们会整体质疑AI的分析,现在他们会看每个步骤的验证结果,对高可信度的步骤直接接受,只对标注了"低可信度"的步骤重点核实。这大大提高了人机协作的效率,而不是让AI和人类的工作完全隔离。
