Spring AI全链路追踪:构建AI应用的完整可观测性体系
Spring AI全链路追踪:构建AI应用的完整可观测性体系
开篇故事:排查三天的上下文截断
2024年10月,上海某在线教育公司的后端负责人李伟接到了一个奇怪的用户投诉:
"我跟AI老师聊了半小时,前面讲得很好,突然就开始答非所问了,就好像它完全忘记了之前说的话。"
李伟拿到了用户的sessionId,去日志系统里查找。他们用的是ELK Stack,但日志是非结构化的,格式大概这样:
2024-10-15 14:23:11 INFO AI response: 好的,让我来解释一下递归的概念...
2024-10-15 14:23:45 INFO AI response: 您好!我是AI学习助手,有什么可以帮您的?他看到第二条就懵了:一个已经对话了30分钟的用户,AI怎么突然自我介绍了?
他翻遍了代码,没找到原因。他去看OpenAI的调用日志,发现没有——他们没有记录完整的请求/响应。他去看会话存储,Redis里有数据,但不知道那次调用到底传了什么给模型。
他花了一天,把全部日志捞出来手动拼接,才发现:
对话历史超过了Token限制,服务的上下文截断逻辑有bug,截断后没有保留系统提示词,导致AI忘记了自己的角色。
修复这个bug只用了10分钟,但排查花了3天。
这3天,李伟反复想一个问题:如果我有完整的可观测性,这个问题30分钟就能定位。
本文就来系统地解决这个问题。
一、AI可观测性三要素
传统微服务的可观测性是三个维度:
AI应用在此基础上有额外需求:
| 维度 | 传统微服务 | AI应用额外需求 |
|---|---|---|
| 日志 | 请求/响应日志 | Prompt内容、Token数、模型版本、上下文窗口使用率 |
| 指标 | QPS、延迟、错误率 | Token消耗速率、成本趋势、模型选择分布、幻觉率 |
| 链路追踪 | HTTP调用链 | 跨服务AI调用 + RAG检索链路 + 工具调用链路 |
| 特有 | - | AI行为审计(谁问了什么)、对话上下文追踪 |
二、项目依赖和配置
2.1 pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.2</version>
</parent>
<groupId>com.laozhang.ai</groupId>
<artifactId>spring-ai-observability</artifactId>
<version>1.0.0</version>
<properties>
<java.version>17</java.version>
<spring-ai.version>1.0.0</spring-ai.version>
<otel.version>1.38.0</otel.version>
</properties>
<dependencies>
<!-- Spring Boot Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI OpenAI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Spring Boot Actuator(Micrometer集成) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer Prometheus Registry -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Micrometer Tracing with OpenTelemetry -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<!-- OpenTelemetry OTLP Exporter(发送到Jaeger/Tempo) -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
<version>${otel.version}</version>
</dependency>
<!-- Spring AI Micrometer Observation(内置AI指标) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-core</artifactId>
</dependency>
<!-- Logback JSON Layout(结构化日志) -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version>
</dependency>
<!-- Redis(会话存储) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>2.2 application.yml
spring:
application:
name: spring-ai-observability-demo
ai:
openai:
api-key: ${OPENAI_API_KEY}
base-url: ${OPENAI_BASE_URL:https://api.openai.com}
chat:
options:
model: gpt-4o
temperature: 0.7
max-tokens: 2048
# Spring AI 1.0 内置可观测性开关
chat:
observations:
include-prompt: false # 生产环境关闭:Prompt可能含敏感信息
include-completion: false # 生产环境关闭
log-prompt: true # 开发环境可开启DEBUG日志
redis:
host: ${REDIS_HOST:localhost}
port: 6379
# Micrometer Tracing
management:
tracing:
sampling:
probability: 0.1 # 生产环境采样10%,降低存储压力
otlp:
tracing:
endpoint: ${OTLP_ENDPOINT:http://localhost:4318/v1/traces}
endpoints:
web:
exposure:
include: health,prometheus,metrics,info
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
tags:
application: ${spring.application.name}
environment: ${ENVIRONMENT:local}
# 结构化日志配置
logging:
config: classpath:logback-spring.xml
level:
org.springframework.ai: INFO
com.laozhang.ai: DEBUG
io.micrometer.tracing: DEBUG
server:
port: 80802.3 结构化日志配置(logback-spring.xml)
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<springProperty scope="context" name="appName"
source="spring.application.name"/>
<springProperty scope="context" name="environment"
source="ENVIRONMENT" defaultValue="local"/>
<!-- 开发环境:普通文本格式,便于阅读 -->
<appender name="CONSOLE_TEXT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level [%X{traceId}/%X{spanId}]
[userId=%X{userId}] [sessionId=%X{sessionId}] %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- 生产环境:JSON格式,便于ELK/Loki采集 -->
<appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<includeMdcKeyName>userId</includeMdcKeyName>
<includeMdcKeyName>sessionId</includeMdcKeyName>
<includeMdcKeyName>requestId</includeMdcKeyName>
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
<!-- 不记录AI响应内容(敏感信息保护) -->
<excludeMdcKeyName>aiResponseContent</excludeMdcKeyName>
<customFields>{"app":"${appName}","env":"${environment}"}</customFields>
</encoder>
</appender>
<!-- AI请求专用日志文件(脱敏后的审计日志) -->
<appender name="AI_AUDIT" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/ai-audit.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/ai-audit.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
<maxHistory>90</maxHistory>
<timeBasedFileNamingAndTriggeringPolicy
class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>500MB</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
<!-- 只记录audit标记的日志 -->
<filter class="ch.qos.logback.core.filter.EvaluatorFilter">
<evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
<marker>AI_AUDIT</marker>
</evaluator>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
</appender>
<springProfile name="local,dev">
<root level="INFO">
<appender-ref ref="CONSOLE_TEXT"/>
</root>
</springProfile>
<springProfile name="prod">
<root level="INFO">
<appender-ref ref="CONSOLE_JSON"/>
<appender-ref ref="AI_AUDIT"/>
</root>
</springProfile>
</configuration>三、Spring AI Micrometer内置指标
Spring AI 1.0内置了Micrometer观测点,开箱即用:
package com.laozhang.ai.config;
import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* Micrometer观测配置
* 启用Spring AI内置的观测点
*/
@Configuration
public class ObservabilityConfig {
/**
* 启用@Observed注解支持
*/
@Bean
ObservedAspect observedAspect(ObservationRegistry observationRegistry) {
return new ObservedAspect(observationRegistry);
}
}Spring AI 1.0内置的Micrometer指标(无需额外代码):
| 指标名 | 类型 | 描述 |
|---|---|---|
spring.ai.chat.client.operation | Timer | 整体ChatClient调用时间 |
spring.ai.openai.chat | Timer | OpenAI特定调用时间 |
gen_ai.client.token.usage | Counter | Token使用量(prompt/completion分开) |
spring.ai.vector.store.operation | Timer | 向量存储操作时间 |
查看方式:
curl http://localhost:8080/actuator/metrics/spring.ai.chat.client.operation四、MDC追踪:每次AI调用绑定userId/sessionId/requestId
MDC(Mapped Diagnostic Context)是将请求上下文绑定到日志的标准方式:
package com.laozhang.ai.trace;
import lombok.extern.slf4j.Slf4j;
import org.slf4j.MDC;
import org.springframework.stereotype.Component;
import java.util.UUID;
/**
* AI请求追踪上下文管理器
* 负责在MDC中设置和清理追踪信息
*/
@Slf4j
@Component
public class AiTraceContext {
// MDC Key常量
public static final String USER_ID = "userId";
public static final String SESSION_ID = "sessionId";
public static final String REQUEST_ID = "requestId";
public static final String CONVERSATION_ID = "conversationId";
public static final String MODEL_NAME = "modelName";
public static final String OPERATION = "aiOperation";
/**
* 设置AI调用追踪上下文
*/
public void setupContext(TraceInfo traceInfo) {
MDC.put(USER_ID, traceInfo.getUserId());
MDC.put(SESSION_ID, traceInfo.getSessionId());
MDC.put(REQUEST_ID, traceInfo.getRequestId());
MDC.put(CONVERSATION_ID, traceInfo.getConversationId());
MDC.put(MODEL_NAME, traceInfo.getModelName());
MDC.put(OPERATION, traceInfo.getOperation());
}
/**
* 清理MDC(防止线程池中的上下文泄漏)
*/
public void clearContext() {
MDC.remove(USER_ID);
MDC.remove(SESSION_ID);
MDC.remove(REQUEST_ID);
MDC.remove(CONVERSATION_ID);
MDC.remove(MODEL_NAME);
MDC.remove(OPERATION);
}
/**
* 生成新的requestId
*/
public static String generateRequestId() {
return UUID.randomUUID().toString().replace("-", "").substring(0, 16);
}
}package com.laozhang.ai.trace;
import lombok.Builder;
import lombok.Data;
@Data
@Builder
public class TraceInfo {
private String userId;
private String sessionId;
private String requestId;
private String conversationId;
private String modelName;
private String operation;
public static TraceInfo of(String userId, String sessionId) {
return TraceInfo.builder()
.userId(userId)
.sessionId(sessionId)
.requestId(AiTraceContext.generateRequestId())
.conversationId(sessionId) // 默认conversationId=sessionId
.modelName("gpt-4o")
.operation("chat")
.build();
}
}4.1 AOP拦截器自动设置MDC
package com.laozhang.ai.trace;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;
import jakarta.servlet.http.HttpServletRequest;
/**
* AI调用追踪切面
* 自动为带@AiTraceable注解的方法设置MDC
*/
@Slf4j
@Aspect
@Component
@RequiredArgsConstructor
public class AiTraceAspect {
private final AiTraceContext traceContext;
@Around("@annotation(aiTraceable)")
public Object aroundAiCall(ProceedingJoinPoint joinPoint,
AiTraceable aiTraceable) throws Throwable {
// 从HTTP请求头中提取追踪信息
String userId = extractUserId();
String sessionId = extractSessionId();
String requestId = AiTraceContext.generateRequestId();
TraceInfo traceInfo = TraceInfo.builder()
.userId(userId)
.sessionId(sessionId)
.requestId(requestId)
.conversationId(sessionId)
.modelName(aiTraceable.model())
.operation(aiTraceable.operation())
.build();
traceContext.setupContext(traceInfo);
log.debug("AI调用开始: operation={}, userId={}", aiTraceable.operation(), userId);
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
long elapsed = System.currentTimeMillis() - startTime;
log.debug("AI调用完成: elapsed={}ms", elapsed);
return result;
} catch (Exception e) {
log.error("AI调用失败: {}", e.getMessage());
throw e;
} finally {
traceContext.clearContext();
}
}
private String extractUserId() {
try {
ServletRequestAttributes attrs =
(ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
if (attrs != null) {
HttpServletRequest request = attrs.getRequest();
String userId = request.getHeader("X-User-Id");
return userId != null ? userId : "anonymous";
}
} catch (Exception ignored) {}
return "unknown";
}
private String extractSessionId() {
try {
ServletRequestAttributes attrs =
(ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
if (attrs != null) {
HttpServletRequest request = attrs.getRequest();
String sessionId = request.getHeader("X-Session-Id");
return sessionId != null ? sessionId : AiTraceContext.generateRequestId();
}
} catch (Exception ignored) {}
return AiTraceContext.generateRequestId();
}
}package com.laozhang.ai.trace;
import java.lang.annotation.*;
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface AiTraceable {
String operation() default "chat";
String model() default "gpt-4o";
}五、关键指标实现:Token消耗、响应时间、成本
package com.laozhang.ai.metrics;
import io.micrometer.core.instrument.*;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicLong;
/**
* AI服务核心指标收集器
* Token消耗 + 响应时间 + 估算成本
*/
@Slf4j
@Component
public class AiMetricsCollector {
// OpenAI定价(2024年,每1M tokens)
private static final Map<String, ModelPricing> MODEL_PRICING = Map.of(
"gpt-4o", new ModelPricing(5.0, 15.0), // input/output $/1M tokens
"gpt-4o-mini", new ModelPricing(0.15, 0.6),
"gpt-3.5-turbo",new ModelPricing(0.5, 1.5)
);
// 指标
private final Counter promptTokenCounter;
private final Counter completionTokenCounter;
private final Timer chatResponseTimer;
private final Counter costCounter;
private final DistributionSummary contextLengthSummary;
// 今日累计成本(用于告警)
private final AtomicLong dailyCostCentsTimes100 = new AtomicLong(0);
public AiMetricsCollector(MeterRegistry registry) {
promptTokenCounter = Counter.builder("ai.tokens.prompt")
.tag("type", "prompt")
.description("Prompt Token消耗总量")
.baseUnit("tokens")
.register(registry);
completionTokenCounter = Counter.builder("ai.tokens.completion")
.tag("type", "completion")
.description("Completion Token消耗总量")
.baseUnit("tokens")
.register(registry);
chatResponseTimer = Timer.builder("ai.chat.response.time")
.description("AI对话响应时间")
.publishPercentiles(0.5, 0.75, 0.95, 0.99)
.publishPercentileHistogram()
.sla(Duration.ofSeconds(2), Duration.ofSeconds(5), Duration.ofSeconds(10),
Duration.ofSeconds(20), Duration.ofSeconds(30))
.register(registry);
costCounter = Counter.builder("ai.cost.usd")
.description("AI调用估算成本(美元)")
.baseUnit("USD")
.register(registry);
contextLengthSummary = DistributionSummary.builder("ai.context.length")
.description("对话上下文长度(Token数)")
.baseUnit("tokens")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
// 今日成本Gauge
registry.gauge("ai.daily.cost.usd", dailyCostCentsTimes100,
v -> v.get() / 10000.0); // 转换回美元
}
/**
* 记录一次AI调用的完整指标
*/
public void record(AiCallMetric metric) {
// Token计数
promptTokenCounter.increment(metric.getPromptTokens());
completionTokenCounter.increment(metric.getCompletionTokens());
// 响应时间
chatResponseTimer.record(metric.getDuration());
// 成本估算
double costUsd = estimateCost(metric.getModel(),
metric.getPromptTokens(), metric.getCompletionTokens());
costCounter.increment(costUsd);
dailyCostCentsTimes100.addAndGet((long) (costUsd * 10000));
// 上下文长度
contextLengthSummary.record(metric.getContextTokens());
log.debug("AI调用指标: model={}, promptTokens={}, completionTokens={}, " +
"duration={}ms, cost=${}",
metric.getModel(),
metric.getPromptTokens(),
metric.getCompletionTokens(),
metric.getDuration().toMillis(),
String.format("%.6f", costUsd));
}
private double estimateCost(String model, int promptTokens, int completionTokens) {
ModelPricing pricing = MODEL_PRICING.getOrDefault(model,
MODEL_PRICING.get("gpt-4o-mini"));
double promptCost = (promptTokens / 1_000_000.0) * pricing.inputPricePerM();
double completionCost = (completionTokens / 1_000_000.0) * pricing.outputPricePerM();
return promptCost + completionCost;
}
private record ModelPricing(double inputPricePerM, double outputPricePerM) {}
}package com.laozhang.ai.metrics;
import lombok.Builder;
import lombok.Data;
import java.time.Duration;
@Data
@Builder
public class AiCallMetric {
private String model;
private String userId;
private String sessionId;
private int promptTokens;
private int completionTokens;
private int contextTokens;
private Duration duration;
private boolean success;
private String errorType;
}六、AI监控服务(核心业务层)
package com.laozhang.ai.service;
import com.laozhang.ai.metrics.AiCallMetric;
import com.laozhang.ai.metrics.AiMetricsCollector;
import com.laozhang.ai.trace.AiTraceable;
import com.laozhang.ai.trace.AiTraceContext;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.slf4j.Marker;
import org.slf4j.MarkerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.metadata.ChatResponseMetadata;
import org.springframework.ai.chat.metadata.Usage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.time.Instant;
import java.util.List;
/**
* 带完整可观测性的AI对话服务
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class ObservableAiChatService {
private static final Marker AUDIT_MARKER = MarkerFactory.getMarker("AI_AUDIT");
private final ChatClient chatClient;
private final AiMetricsCollector metricsCollector;
private final AiTraceContext traceContext;
private final ConversationContextService contextService;
@AiTraceable(operation = "chat", model = "gpt-4o")
public ChatResult chat(String userMessage, String userId, String sessionId) {
Instant start = Instant.now();
// 1. 获取对话历史
List<Message> history = contextService.getHistory(sessionId);
int contextTokenEstimate = estimateTokens(history);
log.info("开始AI对话: userId={}, sessionId={}, historyMessages={}, " +
"contextTokens约={}",
userId, sessionId, history.size(), contextTokenEstimate);
// 2. 记录审计日志(脱敏:不记录完整内容,只记录摘要)
log.info(AUDIT_MARKER,
"AI请求审计: userId={}, sessionId={}, requestId={}, " +
"messageLength={}, historyCount={}",
userId, sessionId,
org.slf4j.MDC.get(AiTraceContext.REQUEST_ID),
userMessage.length(),
history.size());
try {
// 3. 执行AI调用
ChatResponse response = callAi(userMessage, history);
Duration elapsed = Duration.between(start, Instant.now());
// 4. 提取Token使用情况
ChatResponseMetadata metadata = response.getMetadata();
Usage usage = metadata.getUsage();
int promptTokens = (int) usage.getPromptTokens();
int completionTokens = (int) usage.getGenerationTokens();
// 5. 记录指标
metricsCollector.record(AiCallMetric.builder()
.model("gpt-4o")
.userId(userId)
.sessionId(sessionId)
.promptTokens(promptTokens)
.completionTokens(completionTokens)
.contextTokens(contextTokenEstimate)
.duration(elapsed)
.success(true)
.build());
// 6. 更新对话历史
String aiResponse = response.getResult().getOutput().getContent();
contextService.addToHistory(sessionId, userMessage, aiResponse);
log.info("AI对话完成: elapsed={}ms, promptTokens={}, completionTokens={}",
elapsed.toMillis(), promptTokens, completionTokens);
return ChatResult.builder()
.content(aiResponse)
.promptTokens(promptTokens)
.completionTokens(completionTokens)
.elapsedMs(elapsed.toMillis())
.build();
} catch (Exception e) {
Duration elapsed = Duration.between(start, Instant.now());
log.error("AI对话失败: userId={}, sessionId={}, elapsed={}ms, error={}",
userId, sessionId, elapsed.toMillis(), e.getMessage());
metricsCollector.record(AiCallMetric.builder()
.model("gpt-4o")
.userId(userId)
.sessionId(sessionId)
.promptTokens(0)
.completionTokens(0)
.duration(elapsed)
.success(false)
.errorType(e.getClass().getSimpleName())
.build());
throw e;
}
}
private ChatResponse callAi(String userMessage, List<Message> history) {
return chatClient.prompt()
.messages(history)
.user(userMessage)
.call()
.chatResponse();
}
/**
* Token数量粗估(1个中文字约1.5个token,英文词约1.3个token)
*/
private int estimateTokens(List<Message> messages) {
return messages.stream()
.mapToInt(m -> (int) (m.getContent().length() * 1.5))
.sum();
}
}package com.laozhang.ai.service;
import lombok.Builder;
import lombok.Data;
@Data
@Builder
public class ChatResult {
private String content;
private int promptTokens;
private int completionTokens;
private long elapsedMs;
}七、对话上下文服务(追踪上下文截断问题的根因)
解决李伟遭遇的上下文截断问题的正确实现:
package com.laozhang.ai.service;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
/**
* 对话上下文管理服务
* 关键特性:
* 1. 带Token数量监控
* 2. 截断时保留系统提示词(解决李伟的bug)
* 3. 记录截断事件(可观测)
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class ConversationContextService {
private final RedisTemplate<String, Object> redisTemplate;
private static final String KEY_PREFIX = "ai:conv:";
private static final Duration TTL = Duration.ofHours(2);
// Token限制:为响应留500个token的余量
private static final int MAX_CONTEXT_TOKENS = 8000;
private static final int TOKENS_PER_CHAR = 2; // 保守估计
private static final String SYSTEM_PROMPT =
"你是一个专业的学习助手,帮助用户理解复杂概念。请用简洁清晰的语言回答。";
@SuppressWarnings("unchecked")
public List<Message> getHistory(String sessionId) {
String key = KEY_PREFIX + sessionId;
List<Object> stored = redisTemplate.opsForList().range(key, 0, -1);
if (stored == null || stored.isEmpty()) {
// 系统提示词始终是第一条
return List.of(new SystemMessage(SYSTEM_PROMPT));
}
// 重建历史消息列表
List<Message> messages = new ArrayList<>();
messages.add(new SystemMessage(SYSTEM_PROMPT)); // 始终保留系统提示词
for (Object item : stored) {
if (item instanceof String[] pair && pair.length == 2) {
messages.add(new UserMessage(pair[0]));
messages.add(new AssistantMessage(pair[1]));
}
}
// 检查是否需要截断
int totalTokens = estimateTotalTokens(messages);
if (totalTokens > MAX_CONTEXT_TOKENS) {
messages = truncateContext(messages, sessionId);
}
return messages;
}
public void addToHistory(String sessionId, String userMsg, String aiMsg) {
String key = KEY_PREFIX + sessionId;
redisTemplate.opsForList().rightPush(key, new String[]{userMsg, aiMsg});
redisTemplate.expire(key, TTL);
log.debug("对话历史已更新: sessionId={}, totalPairs={}",
sessionId, redisTemplate.opsForList().size(key));
}
/**
* 截断上下文时:
* 1. 始终保留系统提示词
* 2. 保留最近N轮对话
* 3. 记录截断事件(关键!)
*/
private List<Message> truncateContext(List<Message> messages, String sessionId) {
int originalSize = messages.size();
// 系统提示词(第0条)始终保留
SystemMessage systemMessage = (SystemMessage) messages.get(0);
List<Message> conversationMessages = messages.subList(1, messages.size());
// 从最新的消息开始保留,直到接近Token限制
List<Message> kept = new ArrayList<>();
kept.add(systemMessage);
// 逆序遍历,保留最新的消息
for (int i = conversationMessages.size() - 1; i >= 0; i--) {
kept.add(1, conversationMessages.get(i)); // 插入到系统提示词后面
if (estimateTotalTokens(kept) > MAX_CONTEXT_TOKENS * 0.8) {
kept.remove(1); // 移除刚加进去的消息(超限了)
break;
}
}
// 记录截断事件(这就是可观测性的价值!)
log.warn("对话上下文截断: sessionId={}, originalMessages={}, keptMessages={}, " +
"estimatedTokens={}",
sessionId,
originalSize,
kept.size(),
estimateTotalTokens(kept));
return kept;
}
private int estimateTotalTokens(List<Message> messages) {
return messages.stream()
.mapToInt(m -> m.getContent().length() * TOKENS_PER_CHAR)
.sum();
}
}八、OpenTelemetry分布式追踪
8.1 Span操作工具类
package com.laozhang.ai.trace;
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
/**
* AI调用Span工具类
* 为每次AI调用创建独立的追踪Span
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class AiSpanManager {
private final Tracer tracer;
/**
* 创建AI调用Span
* 可在Jaeger/Zipkin中看到完整调用链路
*/
public <T> T traceAiCall(String operationName,
String userId,
String model,
SpanOperation<T> operation) throws Exception {
Span span = tracer.nextSpan()
.name("ai." + operationName)
.tag("ai.model", model)
.tag("ai.user_id", userId)
.tag("ai.operation", operationName)
.start();
try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
T result = operation.execute(span);
span.tag("ai.success", "true");
return result;
} catch (Exception e) {
span.tag("ai.success", "false");
span.tag("ai.error", e.getMessage());
span.error(e);
throw e;
} finally {
span.end();
}
}
@FunctionalInterface
public interface SpanOperation<T> {
T execute(Span span) throws Exception;
}
}8.2 跨服务AI调用追踪
package com.laozhang.ai.service;
import com.laozhang.ai.metrics.AiCallMetric;
import com.laozhang.ai.metrics.AiMetricsCollector;
import com.laozhang.ai.trace.AiSpanManager;
import io.micrometer.tracing.Span;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.metadata.Usage;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.time.Instant;
/**
* 带分布式追踪的AI服务
* 每次AI调用创建独立Span,在Jaeger中可视化完整链路
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class TracedAiService {
private final ChatClient chatClient;
private final AiSpanManager spanManager;
private final AiMetricsCollector metricsCollector;
public String chatWithTracing(String message, String userId, String sessionId)
throws Exception {
return spanManager.traceAiCall("chat", userId, "gpt-4o", span -> {
Instant start = Instant.now();
// 在Span中记录上下文信息
span.tag("ai.session_id", sessionId);
span.tag("ai.message_length", String.valueOf(message.length()));
ChatResponse response = chatClient.prompt()
.user(message)
.call()
.chatResponse();
Duration elapsed = Duration.between(start, Instant.now());
Usage usage = response.getMetadata().getUsage();
// 在Span中记录Token使用情况(在Jaeger中可见)
span.tag("ai.prompt_tokens", String.valueOf(usage.getPromptTokens()));
span.tag("ai.completion_tokens", String.valueOf(usage.getGenerationTokens()));
span.tag("ai.elapsed_ms", String.valueOf(elapsed.toMillis()));
// 记录Micrometer指标
metricsCollector.record(AiCallMetric.builder()
.model("gpt-4o")
.userId(userId)
.sessionId(sessionId)
.promptTokens((int) usage.getPromptTokens())
.completionTokens((int) usage.getGenerationTokens())
.duration(elapsed)
.success(true)
.build());
return response.getResult().getOutput().getContent();
});
}
}九、日志规范:结构化日志 + 敏感信息脱敏
敏感信息不能进日志:用户的对话内容可能含个人信息、业务秘密:
package com.laozhang.ai.logging;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
/**
* AI内容脱敏工具
* 记录日志时对敏感内容进行脱敏处理
*/
@Slf4j
@Component
public class AiContentSanitizer {
private static final int MAX_LOG_CONTENT_LENGTH = 100;
/**
* 生成消息摘要(不含具体内容)
* 用于日志记录:记录"长度"和"类型",不记录"内容"
*/
public String summarize(String content) {
if (content == null || content.isEmpty()) {
return "[空消息]";
}
int length = content.length();
String type = detectContentType(content);
return String.format("[%s, %d字符]", type, length);
}
/**
* 部分脱敏:保留开头和结尾,中间替换为***
* 适用于需要一定可读性的调试场景
*/
public String partialMask(String content) {
if (content == null || content.length() <= 20) {
return "***";
}
return content.substring(0, 10) + "***" +
content.substring(content.length() - 5);
}
/**
* 手机号脱敏:138****1234
*/
public String maskPhone(String phone) {
if (phone == null || phone.length() != 11) return phone;
return phone.substring(0, 3) + "****" + phone.substring(7);
}
/**
* 邮箱脱敏:z***@example.com
*/
public String maskEmail(String email) {
if (email == null || !email.contains("@")) return email;
int atIndex = email.indexOf('@');
String local = email.substring(0, atIndex);
String domain = email.substring(atIndex);
return local.charAt(0) + "***" + domain;
}
private String detectContentType(String content) {
if (content.contains("```")) return "代码含量高";
if (content.length() > 500) return "长文本";
if (content.matches(".*[\\u4e00-\\u9fa5].*")) return "中文内容";
return "普通文本";
}
}9.1 AI请求过滤器(HTTP层面的追踪上下文)
package com.laozhang.ai.filter;
import com.laozhang.ai.trace.AiTraceContext;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.extern.slf4j.Slf4j;
import org.slf4j.MDC;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;
import java.io.IOException;
/**
* HTTP请求追踪过滤器
* 从请求头提取userId/sessionId并设置到MDC
*/
@Slf4j
@Component
@Order(1)
public class AiTraceFilter implements Filter {
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = (HttpServletResponse) response;
String userId = getHeader(httpRequest, "X-User-Id", "anonymous");
String sessionId = getHeader(httpRequest, "X-Session-Id",
AiTraceContext.generateRequestId());
String requestId = AiTraceContext.generateRequestId();
MDC.put(AiTraceContext.USER_ID, userId);
MDC.put(AiTraceContext.SESSION_ID, sessionId);
MDC.put(AiTraceContext.REQUEST_ID, requestId);
// 在响应头中返回requestId,便于客户端关联
httpResponse.setHeader("X-Request-Id", requestId);
try {
chain.doFilter(request, response);
} finally {
MDC.remove(AiTraceContext.USER_ID);
MDC.remove(AiTraceContext.SESSION_ID);
MDC.remove(AiTraceContext.REQUEST_ID);
}
}
private String getHeader(HttpServletRequest request, String name, String defaultValue) {
String value = request.getHeader(name);
return (value != null && !value.isBlank()) ? value : defaultValue;
}
}十、Prometheus告警规则:P99延迟、错误率、Token超预算
# prometheus/alert-rules.yml
groups:
- name: spring-ai-alerts
interval: 30s
rules:
# 告警1:P99延迟超过5秒
- alert: AiHighLatency
expr: |
histogram_quantile(0.99,
rate(ai_chat_response_time_seconds_bucket[5m])
) > 5
for: 2m
labels:
severity: warning
team: ai-platform
annotations:
summary: "AI响应P99延迟过高"
description: |
AI服务P99延迟为{{ $value | humanizeDuration }},超过5秒阈值。
可能原因:模型负载高、网络问题、长Prompt。
runbook_url: "https://wiki.company.com/ai-latency-runbook"
# 告警2:错误率超过1%
- alert: AiHighErrorRate
expr: |
rate(ai_openai_requests_total{status="failure"}[5m])
/
rate(ai_openai_requests_total[5m]) > 0.01
for: 1m
labels:
severity: critical
team: ai-platform
annotations:
summary: "AI服务错误率超过1%"
description: |
当前错误率:{{ $value | humanizePercentage }}
需要立即检查:OpenAI状态页、网络连接、API密钥有效性。
# 告警3:熔断器开启
- alert: AiCircuitBreakerOpen
expr: |
resilience4j_circuitbreaker_state{
name="openai-service",
state="open"
} == 1
for: 0m
labels:
severity: critical
team: ai-platform
annotations:
summary: "OpenAI熔断器已开启"
description: |
熔断器[{{ $labels.name }}]已开启,所有请求将被降级处理。
等待30秒后自动进入半开状态测试恢复。
# 告警4:Token消耗超日预算(每日100万Token预算)
- alert: AiTokenBudgetExceeded
expr: |
increase(ai_tokens_prompt_tokens_total[24h])
+ increase(ai_tokens_completion_tokens_total[24h]) > 1000000
for: 0m
labels:
severity: warning
team: ai-platform
annotations:
summary: "AI Token消耗超过日预算"
description: |
今日Token消耗:{{ $value | humanize }},超过100万Token日预算。
请检查是否有异常高频调用。
# 告警5:日成本超预算(每日$50预算)
- alert: AiDailyCostExceeded
expr: ai_daily_cost_usd > 50
for: 0m
labels:
severity: warning
team: ai-platform
annotations:
summary: "AI今日成本超$50"
description: "当前今日估算成本:${{ $value }}"
# 告警6:上下文截断频率异常(每分钟超过10次说明对话太长)
- alert: AiContextTruncationHigh
expr: |
rate(ai_context_truncations_total[5m]) * 60 > 10
for: 5m
labels:
severity: warning
team: ai-platform
annotations:
summary: "对话上下文截断频率异常"
description: |
每分钟截断次数:{{ $value | humanize }},可能影响对话质量。
建议:检查用户是否在进行超长对话,考虑增加上下文窗口或优化截断策略。十一、AI行为审计:合规下的操作记录
在金融、医疗、教育等行业,AI的行为需要留存审计记录:
package com.laozhang.ai.audit;
import lombok.Builder;
import lombok.Data;
import java.time.Instant;
/**
* AI行为审计记录
* 符合GDPR、等保三级等合规要求
* 注意:不记录完整对话内容,只记录元数据
*/
@Data
@Builder
public class AiAuditRecord {
// 身份信息
private String auditId; // 审计记录唯一ID
private String userId; // 操作用户ID(不含姓名等PII)
private String tenantId; // 租户/组织ID
private String sessionId; // 会话ID
// 操作信息
private String operation; // 操作类型:chat/search/generate
private String modelUsed; // 使用的AI模型
private Instant timestamp; // 操作时间(UTC)
private String requestId; // 请求ID(关联追踪系统)
// 内容摘要(不含原文)
private int requestLength; // 请求内容长度(字符数)
private int responseLength; // 响应内容长度
private String requestType; // 请求类型分类:问答/代码/翻译/其他
private boolean containsSensitiveKeywords; // 是否触发敏感词检测
// 系统信息
private int promptTokens;
private int completionTokens;
private long latencyMs;
private boolean success;
private String errorCode; // 失败时的错误码
// 来源信息
private String clientIp; // 客户端IP(已脱敏:只保留前16位)
private String userAgent; // User-Agent前50个字符
}package com.laozhang.ai.audit;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.slf4j.Marker;
import org.slf4j.MarkerFactory;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.util.UUID;
/**
* AI行为审计服务
* 异步写入,不影响主链路性能
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class AiAuditService {
private static final Marker AUDIT_MARKER = MarkerFactory.getMarker("AI_AUDIT");
/**
* 异步记录AI审计日志
* 写入到专用的ai-audit.log文件
*/
@Async("auditTaskExecutor")
public void logAudit(AiAuditRecord record) {
// 结构化审计日志(JSON格式,输出到ai-audit.log)
log.info(AUDIT_MARKER,
"auditId={}, userId={}, tenantId={}, operation={}, model={}, " +
"requestLength={}, responseLength={}, requestType={}, " +
"promptTokens={}, completionTokens={}, latencyMs={}, success={}, " +
"timestamp={}",
record.getAuditId(),
record.getUserId(),
record.getTenantId(),
record.getOperation(),
record.getModelUsed(),
record.getRequestLength(),
record.getResponseLength(),
record.getRequestType(),
record.getPromptTokens(),
record.getCompletionTokens(),
record.getLatencyMs(),
record.isSuccess(),
record.getTimestamp()
);
}
/**
* 构建审计记录
*/
public AiAuditRecord buildRecord(
String userId, String tenantId, String sessionId,
String operation, String model,
int requestLength, int responseLength,
int promptTokens, int completionTokens,
long latencyMs, boolean success, String errorCode) {
return AiAuditRecord.builder()
.auditId(UUID.randomUUID().toString())
.userId(userId)
.tenantId(tenantId)
.sessionId(sessionId)
.operation(operation)
.modelUsed(model)
.requestLength(requestLength)
.responseLength(responseLength)
.promptTokens(promptTokens)
.completionTokens(completionTokens)
.latencyMs(latencyMs)
.success(success)
.errorCode(errorCode)
.timestamp(java.time.Instant.now())
.build();
}
}package com.laozhang.ai.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.util.concurrent.Executor;
@Configuration
@EnableAsync
public class AsyncConfig {
/**
* 审计日志专用线程池
* 隔离审计日志写入,不影响AI主链路
*/
@Bean("auditTaskExecutor")
public Executor auditTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(2);
executor.setMaxPoolSize(4);
executor.setQueueCapacity(1000);
executor.setThreadNamePrefix("audit-");
executor.setRejectedExecutionHandler(
(r, e) -> log.warn("审计日志队列已满,丢弃一条审计记录")
);
executor.initialize();
return executor;
}
private static final org.slf4j.Logger log =
org.slf4j.LoggerFactory.getLogger(AsyncConfig.class);
}十二、Grafana Dashboard(AI监控看板)
完整的Dashboard JSON配置(核心面板):
{
"title": "Spring AI 全链路可观测性看板",
"uid": "spring-ai-observability",
"panels": [
{
"id": 1,
"title": "AI调用总量(过去1小时)",
"type": "stat",
"gridPos": {"x": 0, "y": 0, "w": 4, "h": 4},
"targets": [{
"expr": "increase(spring_ai_chat_client_operation_seconds_count[1h])",
"legendFormat": "总请求数"
}]
},
{
"id": 2,
"title": "AI请求成功率",
"type": "gauge",
"gridPos": {"x": 4, "y": 0, "w": 4, "h": 4},
"targets": [{
"expr": "rate(ai_openai_requests_total{status='success'}[5m]) / rate(ai_openai_requests_total[5m]) * 100",
"legendFormat": "成功率%"
}],
"fieldConfig": {
"defaults": {
"unit": "percent", "min": 0, "max": 100,
"thresholds": {"steps": [
{"color": "red", "value": 0},
{"color": "yellow", "value": 95},
{"color": "green", "value": 99}
]}
}
}
},
{
"id": 3,
"title": "Token消耗趋势",
"type": "graph",
"gridPos": {"x": 0, "y": 4, "w": 12, "h": 6},
"targets": [
{
"expr": "rate(ai_tokens_prompt_tokens_total[5m]) * 60",
"legendFormat": "Prompt Tokens/min"
},
{
"expr": "rate(ai_tokens_completion_tokens_total[5m]) * 60",
"legendFormat": "Completion Tokens/min"
}
]
},
{
"id": 4,
"title": "P99响应时间(秒)",
"type": "graph",
"gridPos": {"x": 12, "y": 4, "w": 12, "h": 6},
"targets": [{
"expr": "histogram_quantile(0.99, rate(ai_chat_response_time_seconds_bucket[5m]))",
"legendFormat": "P99延迟"
},
{
"expr": "histogram_quantile(0.95, rate(ai_chat_response_time_seconds_bucket[5m]))",
"legendFormat": "P95延迟"
},
{
"expr": "histogram_quantile(0.50, rate(ai_chat_response_time_seconds_bucket[5m]))",
"legendFormat": "P50延迟"
}]
},
{
"id": 5,
"title": "今日累计成本(美元)",
"type": "stat",
"gridPos": {"x": 0, "y": 10, "w": 4, "h": 4},
"targets": [{
"expr": "ai_daily_cost_usd",
"legendFormat": "今日成本"
}],
"fieldConfig": {
"defaults": {
"unit": "currencyUSD",
"thresholds": {"steps": [
{"color": "green", "value": 0},
{"color": "yellow", "value": 30},
{"color": "red", "value": 50}
]}
}
}
},
{
"id": 6,
"title": "上下文截断次数(每分钟)",
"type": "graph",
"gridPos": {"x": 4, "y": 10, "w": 8, "h": 4},
"targets": [{
"expr": "rate(ai_context_truncations_total[1m]) * 60",
"legendFormat": "截断次数/分钟"
}]
}
],
"time": {"from": "now-3h", "to": "now"},
"refresh": "30s"
}十三、可观测性体系的整体成效
李伟在实施了这套可观测性体系后,对之前的上下文截断问题做了复盘:
如果当时有这套体系,排查路径应该是:
- 查Grafana:
ai_context_truncations_total在14:22突然飙升 → 锁定上下文截断问题(2分钟) - 查日志:按sessionId过滤,看到 "上下文截断: keptMessages=2" 的WARNING → 确认截断(3分钟)
- 查追踪:Jaeger里找到对应traceId,看到系统提示词在截断后消失 → 找到根因(5分钟)
- 修复bug并验证 → 解决(10分钟)
总计20分钟,而不是3天。
生产环境数据:
| 指标 | 无可观测性 | 有完整可观测性 | 改善 |
|---|---|---|---|
| 平均故障定位时间(MTTD) | 4.2小时 | 18分钟 | 减少93% |
| 平均故障恢复时间(MTTR) | 2.8小时 | 35分钟 | 减少79% |
| 月均AI成本超支次数 | 3次 | 0次 | 100%消除 |
| Token使用趋势可见性 | 无 | 实时 | 新增能力 |
| 用户对话质量问题发现 | 靠投诉 | 主动检测 | 质的提升 |
十四、FAQ
Q1:Spring AI的内置Micrometer指标够用吗,还需要自定义?
内置指标覆盖了基础的调用时间和Token数,但对于业务监控还不够:你需要按用户、按功能模块、按模型版本拆分指标。建议两者结合:内置指标监控系统健康,自定义指标监控业务表现。
Q2:Prompt内容要不要进日志?
生产环境绝对不要。原因:1)用户输入可能含个人信息(违反GDPR);2)日志文件往往有多人访问权限;3)Prompt里可能有业务敏感信息。只记录长度、类型等元数据。调试场景可以在专用的开发环境开启 spring.ai.chat.observations.include-prompt=true。
Q3:分布式追踪的采样率设多少合适?
生产环境建议1%-10%:太高会增加存储成本和性能开销;太低可能错过关键请求。对于错误请求,建议100%采样(Micrometer Tracing支持自定义采样决策):
@Bean
SamplingDecision samplingDecision() {
return (traceId, spanId, name, kind, remoteServiceName, spanContext) ->
SamplingDecision.RECORD_AND_SAMPLE; // 100%采样错误
}Q4:Redis存储对话历史,崩溃后历史会丢失吗?
会。Redis是内存数据库,即使开了AOF持久化,重启后短时间内的数据可能丢失。对于重要的对话历史,建议:Redis做缓存(TTL 2小时),MySQL/MongoDB做持久化存储。
Q5:审计日志和应用日志分开存储有什么好处?
合规要求(等保三级、金融行业标准)往往要求审计日志保存90天以上,不可篡改。分开存储可以:1)单独设置更长的保留周期;2)限制访问权限(只有合规团队能查);3)针对审计日志专门优化(如存入不可变存储)。
总结
李伟从那次3天排查事故后,花了3周时间建立了本文描述的完整可观测性体系。六个月后,团队从"出了问题才知道"变成了"问题还没影响用户就主动发现"。
完整可观测性的核心是四个能力:
- 日志:结构化 + MDC追踪 + 敏感脱敏,让每一行日志都可被检索、可被关联
- 指标:Token消耗、成本、P99延迟、截断频率,让系统状态一眼可见
- 链路追踪:跨服务AI调用全链路,让问题定位从"大海捞针"变成"按图索骥"
- 审计:合规记录谁问了什么类型的问题,在元数据层面而非内容层面
