AI应用的分布式追踪:用OpenTelemetry看清每次AI调用的全链路
AI应用的分布式追踪:用OpenTelemetry看清每次AI调用的全链路
凌晨两点的告警
2026年1月的一个深夜,李明的手机震动了。
他是某头部电商平台的Java架构师,团队刚上线了一套AI智能客服系统——用户发问,系统先检索知识库,再调用GPT-4生成回答,最后经过审核服务过滤敏感词后返回。整套链路穿越5个微服务:API网关 → 意图识别服务 → RAG检索服务 → LLM调用服务 → 内容审核服务。
告警内容很简单:P95响应时间超过8秒,SLA告警触发。
李明打开监控大盘,各服务CPU、内存正常,错误率为零。日志?每个服务都打印着"处理成功"。
他开始逐个排查。先看API网关日志,请求进来了,时间戳正常。再看意图识别服务,也正常,50ms处理完。LLM调用服务……等等,日志里只看到"开始调用LLM",没有"调用完成"的记录时间。
"是LLM超时?"他赶紧去查LLM服务的监控,平均响应时间2.1秒,看起来正常。
就这样,他在5个服务之间来回翻日志,耗费了整整2天,才发现真正的罪魁祸首是RAG检索服务里向量数据库的连接池耗尽——高峰期并发请求一多,连接池等待时间暴涨到6秒,但因为连接池等待不算"错误",所有服务的健康检查都是绿色的。
2天。5个服务。翻了几千行日志。
如果当时有分布式追踪,这个问题30秒就能发现。
这篇文章,我们就来彻底解决这个痛点。
一、为什么AI应用特别需要分布式追踪
1.1 传统微服务 vs AI微服务的复杂度对比
传统微服务的调用链路是可预期的:请求进来,经过几个服务,返回结果。延迟通常在毫秒到几百毫秒级别。
AI微服务的调用链路完全不同:
用户请求
└─ API网关 (5ms)
└─ 意图识别 (LLM调用, 300ms~2000ms, 不确定)
└─ RAG检索 (向量检索, 50ms~500ms)
├─ Embedding生成 (LLM调用, 100ms~800ms)
└─ 向量数据库查询 (10ms~2000ms, 取决于数据量)
└─ LLM生成回答 (1000ms~30000ms, streaming)
└─ 内容审核 (LLM调用, 200ms~1000ms)AI应用的特殊挑战:
- 延迟跨度大:从10ms到30秒都有可能
- 成本不透明:每次LLM调用消耗多少Token,谁也不清楚
- 模型版本混乱:同一个接口可能在A/B测试不同模型
- 提示词版本追踪:修改了提示词,哪些请求用了新版本
- 非确定性:同样的输入,不同时间的延迟可能差10倍
没有分布式追踪,这些问题你根本无从下手。
1.2 分布式追踪能给AI应用带来什么
二、OpenTelemetry核心概念:用快递追踪来理解
2.1 快递包裹类比
把一次AI请求想象成寄一个快递:
Trace(追踪)= 一次完整的快递旅程 从你寄出到收件人签收,整个过程就是一个Trace。每个Trace有唯一的TraceId,就像快递单号。
Span(跨度)= 快递每一段的处理记录
- 你把包裹交给快递员(Span 1:API网关接收请求)
- 快递员送到分拣中心(Span 2:意图识别服务)
- 分拣中心转运(Span 3:RAG检索)
- 最后一公里配送(Span 4:LLM生成)
每个Span记录:开始时间、结束时间、操作名称、状态(成功/失败)、属性。
Baggage(行李)= 随货物传递的附加信息 比如"易碎品"标签——这个信息需要在整个运输链路上传递,每个环节都知道要小心处理。在AI请求里,Baggage可以携带:用户ID、请求优先级、实验组标识等。
Context Propagation(上下文传播)= 快递单号的传递机制 每次包裹转手,快递单号都会被记录。在分布式系统里,TraceId通过HTTP Header(traceparent)在服务间传递。
2.2 OpenTelemetry架构
OpenTelemetry(简称OTel)是CNCF的开源可观测性框架,它统一了追踪、指标、日志三个领域的数据采集标准。你只需要用OTel SDK埋点一次,数据可以发送到任何支持OTLP协议的后端(Jaeger、Zipkin、Datadog、SkyWalking等)。
三、Spring AI + OpenTelemetry集成
3.1 完整Maven依赖
<!-- pom.xml -->
<project>
<properties>
<java.version>21</java.version>
<spring-boot.version>3.3.4</spring-boot.version>
<spring-ai.version>1.0.0</spring-ai.version>
<opentelemetry.version>1.40.0</opentelemetry.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>${spring-boot.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>${opentelemetry.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Spring Boot核心 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI OpenAI集成 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Spring Boot Actuator(暴露指标端点) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- OpenTelemetry自动插桩(关键!) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<!-- OTel导出到Jaeger/OTLP -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<!-- OTel SDK自动配置 -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
<version>2.7.0</version>
</dependency>
<!-- Spring AI Micrometer追踪集成 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-spring-boot-autoconfigure</artifactId>
</dependency>
<!-- Feign Client(服务间调用,自动传播TraceId) -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-openfeign</artifactId>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
</project>3.2 application.yml完整配置
# application.yml
spring:
application:
name: ai-customer-service
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
temperature: 0.7
management:
endpoints:
web:
exposure:
include: health, info, metrics, prometheus, traces
tracing:
sampling:
probability: 1.0 # 开发环境100%采样,生产环境调低
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces # Jaeger OTLP接收端点
# OpenTelemetry配置
otel:
service:
name: ai-customer-service
exporter:
otlp:
endpoint: http://localhost:4317 # gRPC端点
protocol: grpc
traces:
exporter: otlp
metrics:
exporter: otlp
logs:
exporter: otlp
# 资源属性:这些信息会附加到所有Span上
resource:
attributes:
deployment.environment: production
service.version: 2.1.0
team.name: ai-platform
logging:
pattern:
# 在日志中自动注入TraceId,方便日志和Trace关联
console: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level [%X{traceId},%X{spanId}] %logger{36} - %msg%n"3.3 Spring Boot主启动类配置
// AiCustomerServiceApplication.java
package com.laozhang.ai.tracing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.openfeign.EnableFeignClients;
@SpringBootApplication
@EnableFeignClients
public class AiCustomerServiceApplication {
public static void main(String[] args) {
SpringApplication.run(AiCustomerServiceApplication.class, args);
}
}3.4 自动插桩效果验证
引入上述依赖后,以下调用会自动产生Span,无需任何代码修改:
- 所有HTTP请求(通过RestTemplate、WebClient、Feign)
- Spring MVC Controller方法
- Spring AI的ChatClient调用
- 数据库查询(JDBC、JPA、Redis)
四、自定义AI Span:记录Token消耗、模型版本、提示词版本
自动插桩覆盖基础调用,但AI应用需要更细粒度的信息。
4.1 AI Span属性常量定义
// AiSpanAttributes.java
package com.laozhang.ai.tracing.constant;
/**
* AI相关的OpenTelemetry Span属性常量
* 遵循OpenTelemetry语义约定草案 (GenAI)
*/
public final class AiSpanAttributes {
private AiSpanAttributes() {}
// 模型信息
public static final String GEN_AI_SYSTEM = "gen_ai.system";
public static final String GEN_AI_REQUEST_MODEL = "gen_ai.request.model";
public static final String GEN_AI_RESPONSE_MODEL = "gen_ai.response.model";
// Token消耗
public static final String GEN_AI_USAGE_INPUT_TOKENS = "gen_ai.usage.input_tokens";
public static final String GEN_AI_USAGE_OUTPUT_TOKENS = "gen_ai.usage.output_tokens";
public static final String GEN_AI_USAGE_TOTAL_TOKENS = "gen_ai.usage.total_tokens";
// 成本估算(单位:微美元,避免浮点精度问题)
public static final String GEN_AI_USAGE_COST_MICRO_USD = "gen_ai.usage.cost_micro_usd";
// 请求参数
public static final String GEN_AI_REQUEST_TEMPERATURE = "gen_ai.request.temperature";
public static final String GEN_AI_REQUEST_MAX_TOKENS = "gen_ai.request.max_tokens";
// 提示词版本(自定义扩展)
public static final String PROMPT_TEMPLATE_ID = "ai.prompt.template_id";
public static final String PROMPT_TEMPLATE_VERSION = "ai.prompt.template_version";
// 业务上下文
public static final String USER_SESSION_ID = "ai.user.session_id";
public static final String REQUEST_INTENT = "ai.request.intent";
public static final String KNOWLEDGE_BASE_ID = "ai.rag.knowledge_base_id";
public static final String RETRIEVED_CHUNKS_COUNT = "ai.rag.retrieved_chunks_count";
public static final String RETRIEVAL_SCORE_MAX = "ai.rag.score_max";
// RAG相关
public static final String EMBEDDING_MODEL = "ai.embedding.model";
public static final String VECTOR_DB_TYPE = "ai.vector_db.type";
public static final String VECTOR_DB_QUERY_LATENCY_MS = "ai.vector_db.query_latency_ms";
}4.2 AI追踪切面(AOP自动埋点)
// AiTracingAspect.java
package com.laozhang.ai.tracing.aspect;
import com.laozhang.ai.tracing.constant.AiSpanAttributes;
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.annotation.NewSpan;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.stereotype.Component;
@Aspect
@Component
@Slf4j
@RequiredArgsConstructor
public class AiTracingAspect {
private final Tracer tracer;
/**
* 拦截所有ChatClient的call方法,自动记录AI调用的详细信息
*/
@Around("execution(* org.springframework.ai.chat.client.ChatClient+.call(..))")
public Object traceChatClientCall(ProceedingJoinPoint pjp) throws Throwable {
Span currentSpan = tracer.currentSpan();
if (currentSpan == null) {
return pjp.proceed();
}
long startTime = System.currentTimeMillis();
try {
Object result = pjp.proceed();
if (result instanceof ChatResponse chatResponse) {
recordTokenUsage(currentSpan, chatResponse);
}
currentSpan.tag(AiSpanAttributes.GEN_AI_SYSTEM, "openai");
long latency = System.currentTimeMillis() - startTime;
currentSpan.tag("ai.call.latency_ms", String.valueOf(latency));
return result;
} catch (Exception e) {
currentSpan.error(e);
currentSpan.tag("ai.call.error", e.getMessage());
throw e;
}
}
private void recordTokenUsage(Span span, ChatResponse response) {
if (response.getMetadata() != null) {
var usage = response.getMetadata().getUsage();
if (usage != null) {
span.tag(AiSpanAttributes.GEN_AI_USAGE_INPUT_TOKENS,
String.valueOf(usage.getPromptTokens()));
span.tag(AiSpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS,
String.valueOf(usage.getGenerationTokens()));
span.tag(AiSpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS,
String.valueOf(usage.getTotalTokens()));
// 估算成本(GPT-4o: input $5/1M tokens, output $15/1M tokens)
long costMicroUsd = usage.getPromptTokens() * 5L +
usage.getGenerationTokens() * 15L;
span.tag(AiSpanAttributes.GEN_AI_USAGE_COST_MICRO_USD,
String.valueOf(costMicroUsd));
}
}
}
}4.3 带完整追踪的AI服务
// AiChatService.java
package com.laozhang.ai.tracing.service;
import com.laozhang.ai.tracing.constant.AiSpanAttributes;
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.stereotype.Service;
import java.util.Map;
@Service
@Slf4j
@RequiredArgsConstructor
public class AiChatService {
private final ChatClient chatClient;
private final Tracer tracer;
private final ObservationRegistry observationRegistry;
private final RagRetrievalService ragRetrievalService;
/**
* 处理用户问题:RAG检索 + LLM生成
*
* @param userQuestion 用户问题
* @param sessionId 会话ID(用于追踪上下文关联)
* @param promptTemplateId 提示词模板ID
* @return AI回答
*/
public String answerQuestion(String userQuestion, String sessionId,
String promptTemplateId) {
return Observation.createNotStarted("ai.answer_question", observationRegistry)
.lowCardinalityKeyValue("prompt.template_id", promptTemplateId)
.lowCardinalityKeyValue("session.id", sessionId)
.observe(() -> {
Span currentSpan = tracer.currentSpan();
if (currentSpan != null) {
currentSpan.tag(AiSpanAttributes.USER_SESSION_ID, sessionId);
currentSpan.tag(AiSpanAttributes.PROMPT_TEMPLATE_ID, promptTemplateId);
currentSpan.tag(AiSpanAttributes.PROMPT_TEMPLATE_VERSION, "v2.3.1");
}
// 第一步:RAG检索
var retrievedChunks = ragRetrievalService.retrieve(userQuestion);
if (currentSpan != null) {
currentSpan.tag(AiSpanAttributes.RETRIEVED_CHUNKS_COUNT,
String.valueOf(retrievedChunks.size()));
if (!retrievedChunks.isEmpty()) {
currentSpan.tag(AiSpanAttributes.RETRIEVAL_SCORE_MAX,
String.format("%.4f", retrievedChunks.get(0).getScore()));
}
}
// 第二步:构建Prompt
String context = buildContext(retrievedChunks);
String prompt = String.format("""
基于以下知识库内容回答用户问题。如果知识库中没有相关信息,请明确说明。
知识库内容:
%s
用户问题:%s
""", context, userQuestion);
// 第三步:调用LLM
log.info("[TraceId={}] 开始LLM调用,问题长度={}字",
getTraceId(), userQuestion.length());
String answer = chatClient.prompt()
.user(prompt)
.call()
.content();
log.info("[TraceId={}] LLM调用完成,回答长度={}字",
getTraceId(), answer.length());
return answer;
});
}
/**
* 获取当前TraceId(用于日志关联)
*/
private String getTraceId() {
Span span = tracer.currentSpan();
return span != null ? span.context().traceId() : "no-trace";
}
private String buildContext(java.util.List<RetrievedChunk> chunks) {
return chunks.stream()
.map(RetrievedChunk::content)
.collect(java.util.stream.Collectors.joining("\n\n---\n\n"));
}
}4.4 自定义Span手动创建
// VectorSearchService.java
package com.laozhang.ai.tracing.service;
import com.laozhang.ai.tracing.constant.AiSpanAttributes;
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
@Slf4j
@RequiredArgsConstructor
public class VectorSearchService {
private final EmbeddingModel embeddingModel;
private final Tracer tracer;
private final MilvusClient milvusClient;
/**
* 向量检索,带完整追踪信息
*/
public List<RetrievedChunk> search(String query, String knowledgeBaseId, int topK) {
// 手动创建子Span
Span vectorSearchSpan = tracer.nextSpan()
.name("vector_db.search")
.start();
try (Tracer.SpanInScope scope = tracer.withSpan(vectorSearchSpan)) {
// 标记基本信息
vectorSearchSpan.tag(AiSpanAttributes.VECTOR_DB_TYPE, "milvus");
vectorSearchSpan.tag(AiSpanAttributes.KNOWLEDGE_BASE_ID, knowledgeBaseId);
vectorSearchSpan.tag("vector_db.top_k", String.valueOf(topK));
// Step 1: 生成查询向量(子Span)
float[] queryVector = generateEmbedding(query, vectorSearchSpan);
// Step 2: 向量数据库查询
long dbStartTime = System.currentTimeMillis();
List<RetrievedChunk> results = milvusClient.search(
knowledgeBaseId, queryVector, topK
);
long dbLatency = System.currentTimeMillis() - dbStartTime;
vectorSearchSpan.tag(AiSpanAttributes.VECTOR_DB_QUERY_LATENCY_MS,
String.valueOf(dbLatency));
vectorSearchSpan.tag(AiSpanAttributes.RETRIEVED_CHUNKS_COUNT,
String.valueOf(results.size()));
// 如果查询太慢,打一个警告事件
if (dbLatency > 500) {
vectorSearchSpan.event("slow_vector_query");
log.warn("[TraceId={}] 向量查询慢!耗时{}ms,知识库={}",
vectorSearchSpan.context().traceId(), dbLatency, knowledgeBaseId);
}
return results;
} catch (Exception e) {
vectorSearchSpan.error(e);
throw e;
} finally {
vectorSearchSpan.end();
}
}
private float[] generateEmbedding(String text, Span parentSpan) {
Span embeddingSpan = tracer.nextSpan()
.name("ai.embedding.generate")
.start();
try (Tracer.SpanInScope scope = tracer.withSpan(embeddingSpan)) {
embeddingSpan.tag(AiSpanAttributes.EMBEDDING_MODEL, "text-embedding-3-small");
embeddingSpan.tag("embedding.input_chars", String.valueOf(text.length()));
long start = System.currentTimeMillis();
float[] vector = embeddingModel.embed(text);
long latency = System.currentTimeMillis() - start;
embeddingSpan.tag("embedding.latency_ms", String.valueOf(latency));
embeddingSpan.tag("embedding.dimensions", String.valueOf(vector.length));
return vector;
} catch (Exception e) {
embeddingSpan.error(e);
throw e;
} finally {
embeddingSpan.end();
}
}
}五、Jaeger部署与Spring Boot集成
5.1 docker-compose.yml(完整可用配置)
# docker-compose.yml
version: '3.8'
services:
# Jaeger一体化部署(包含UI、Collector、Query)
jaeger:
image: jaegertracing/all-in-one:1.60
container_name: jaeger
ports:
- "16686:16686" # Jaeger UI
- "14268:14268" # HTTP Collector(Thrift格式,旧版兼容)
- "4317:4317" # OTLP gRPC接收端口
- "4318:4318" # OTLP HTTP接收端口
- "9411:9411" # Zipkin兼容端口
environment:
- COLLECTOR_OTLP_ENABLED=true
- SPAN_STORAGE_TYPE=badger # 开发环境使用内嵌存储
- BADGER_EPHEMERAL=false
- BADGER_DIRECTORY_VALUE=/badger/data
- BADGER_DIRECTORY_KEY=/badger/key
volumes:
- jaeger-data:/badger
networks:
- ai-observability
# OpenTelemetry Collector(生产环境推荐,用于数据聚合和转发)
otel-collector:
image: otel/opentelemetry-collector-contrib:0.110.0
container_name: otel-collector
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4319:4317" # gRPC(应用发数据到这里)
- "4320:4318" # HTTP(应用发数据到这里)
- "8888:8888" # Collector自身指标
depends_on:
- jaeger
networks:
- ai-observability
# Prometheus(指标存储)
prometheus:
image: prom/prometheus:v2.54.1
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- ai-observability
# Grafana(可视化)
grafana:
image: grafana/grafana:11.2.0
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
networks:
- ai-observability
volumes:
jaeger-data:
prometheus-data:
grafana-data:
networks:
ai-observability:
driver: bridge5.2 OTel Collector配置
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
# 批量发送,提高吞吐量
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
# 内存限制,防止OOM
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
# 添加额外资源属性
resource:
attributes:
- key: collector.version
value: "0.110.0"
action: insert
# 过滤健康检查的Span(减少噪音)
filter/drop_health_check:
traces:
span:
- 'attributes["http.url"] == "/actuator/health"'
exporters:
# 发往Jaeger
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
# 指标发往Prometheus
prometheus:
endpoint: "0.0.0.0:8889"
# 调试用(开发环境开启)
debug:
verbosity: normal
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, filter/drop_health_check, batch]
exporters: [otlp/jaeger, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]5.3 生产级application.yml(含采样配置)
# application-prod.yml
management:
tracing:
sampling:
probability: 0.1 # 生产环境10%采样
otel:
exporter:
otlp:
endpoint: http://otel-collector:4317 # 指向Collector,不直接到Jaeger
protocol: grpc
# 超时配置
timeout: 10s
# 重试配置
retry:
enabled: true
max_attempts: 5
initial_backoff: 1s
max_backoff: 5s
backoff_multiplier: 1.5
# 批量Span导出器配置
bsp:
schedule_delay: 5000 # 每5秒发送一批
max_export_batch_size: 512 # 每批最多512个Span
export_timeout: 30000 # 导出超时30秒
max_queue_size: 2048 # 队列最大2048个Span六、跨服务追踪:TraceId传递到下游AI服务
6.1 Feign Client自动传播(零代码)
引入spring-cloud-starter-openfeign + micrometer-tracing-bridge-otel后,Feign会自动在请求头中注入traceparent,实现TraceId的跨服务传播。
// RagServiceFeignClient.java
package com.laozhang.ai.tracing.client;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
/**
* 调用RAG检索服务的Feign客户端
* TraceId会自动通过W3C traceparent header传播
*/
@FeignClient(name = "rag-service", url = "${services.rag.url}")
public interface RagServiceFeignClient {
@PostMapping("/api/v1/retrieve")
RetrievalResponse retrieve(@RequestBody RetrievalRequest request);
}HTTP请求头示例:
POST /api/v1/retrieve HTTP/1.1
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7
Content-Type: application/json6.2 WebClient手动传播(Reactor上下文)
// AiGatewayService.java
package com.laozhang.ai.tracing.service;
import io.micrometer.tracing.Span;
import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.propagation.Propagator;
import lombok.RequiredArgsConstructor;
import org.springframework.http.HttpHeaders;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
@Service
@RequiredArgsConstructor
public class AiGatewayService {
private final WebClient webClient;
private final Tracer tracer;
private final Propagator propagator;
/**
* 调用下游AI服务,手动传播TraceId
*/
public Mono<String> callDownstreamAiService(String payload) {
Span currentSpan = tracer.currentSpan();
return webClient.post()
.uri("/api/v1/generate")
.headers(headers -> injectTraceHeaders(headers, currentSpan))
.bodyValue(payload)
.retrieve()
.bodyToMono(String.class);
}
/**
* 将当前Span的追踪信息注入到HTTP Header
*/
private void injectTraceHeaders(HttpHeaders headers, Span span) {
if (span == null) return;
// 使用Propagator将Span context序列化到Header
propagator.inject(span.context(),
headers,
(carrier, key, value) -> carrier.set(key, value));
}
}6.3 Baggage传递业务上下文
// BaggageContextFilter.java
package com.laozhang.ai.tracing.filter;
import io.micrometer.tracing.BaggageInScope;
import io.micrometer.tracing.Tracer;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import lombok.RequiredArgsConstructor;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;
import java.io.IOException;
/**
* 从HTTP请求头提取业务信息,放入Baggage向下游传播
*/
@Component
@Order(1)
@RequiredArgsConstructor
public class BaggageContextFilter implements Filter {
private final Tracer tracer;
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
// 从请求头提取用户信息,放入Baggage
String userId = httpRequest.getHeader("X-User-Id");
String experimentGroup = httpRequest.getHeader("X-Experiment-Group");
String priority = httpRequest.getHeader("X-Request-Priority");
try (
BaggageInScope userIdBaggage = tracer.createBaggage("user.id", userId != null ? userId : "anonymous").makeCurrent();
BaggageInScope experimentBaggage = tracer.createBaggage("experiment.group", experimentGroup != null ? experimentGroup : "control").makeCurrent();
BaggageInScope priorityBaggage = tracer.createBaggage("request.priority", priority != null ? priority : "normal").makeCurrent()
) {
chain.doFilter(request, response);
}
}
}七、采样策略:高流量下的追踪数据采样
7.1 采样策略对比
| 采样策略 | 适用场景 | 优点 | 缺点 |
|---|---|---|---|
| 全量采样 (1.0) | 开发/测试环境 | 数据完整 | 存储成本高 |
| 固定比例采样 | 通用生产环境 | 简单 | 可能丢失低频错误 |
| 基于速率采样 | 高流量服务 | 可控成本 | 实现复杂 |
| 尾部采样 | 精准问题追踪 | 保留异常Trace | 需要Collector支持 |
7.2 尾部采样配置(推荐生产使用)
# otel-collector-config-prod.yaml(尾部采样配置)
processors:
# 尾部采样:在Trace完成后决定是否保留
tail_sampling:
decision_wait: 10s # 等待10秒收齐所有Span再决策
num_traces: 100000 # 内存中保留10万个Trace
expected_new_traces_per_sec: 1000
policies:
# 策略1:所有错误请求必须保留
- name: errors-policy
type: status_code
status_code: {status_codes: [ERROR]}
# 策略2:慢请求保留(超过3秒)
- name: slow-traces-policy
type: latency
latency: {threshold_ms: 3000}
# 策略3:AI成本超过1000微美元的请求保留
- name: high-cost-policy
type: string_attribute
string_attribute:
key: gen_ai.usage.cost_micro_usd
values: ["[1-9][0-9]{3,}"] # 数值>=1000
# 策略4:正常请求只保留5%
- name: normal-policy
type: probabilistic
probabilistic: {sampling_percentage: 5}
# 策略5:VIP用户全量保留
- name: vip-user-policy
type: string_attribute
string_attribute:
key: ai.user.session_id
values: ["vip-.*"]
enabled_regex_matching: true7.3 Java端自适应采样
// AdaptiveSamplingConfig.java
package com.laozhang.ai.tracing.config;
import io.micrometer.tracing.SamplerFunction;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.concurrent.atomic.AtomicLong;
@Configuration
public class AdaptiveSamplingConfig {
/**
* 自定义采样决策器:
* - 错误请求:100%采样
* - 慢请求(>2秒):100%采样
* - VIP用户:100%采样
* - 其他:10%采样
*/
@Bean
public SamplerFunction<String> customSampler() {
AtomicLong counter = new AtomicLong(0);
return request -> {
// 根据请求特征决定是否采样
// 注意:这里的决策在请求开始时做,不是结束后
if (request != null && request.contains("/actuator")) {
return false; // 健康检查不追踪
}
// 10%固定采样(每10个请求采1个)
return counter.incrementAndGet() % 10 == 0;
};
}
}八、慢请求分析:用Jaeger火焰图定位AI性能瓶颈
8.1 Jaeger UI使用指南
访问 http://localhost:16686,常用操作:
搜索慢请求:
- Service:
ai-customer-service - Operation:
ai.answer_question - Min Duration:
3s(只看3秒以上的慢请求) - Tags:
error=false(只看正常但慢的请求)
关键视图:
- Trace Timeline:横向甘特图,看各服务耗时分布
- Trace Statistics:同一操作的P50/P95/P99统计
- Service Dependency:服务调用拓扑图
8.2 案例:发现向量数据库连接池问题
用李明的案例来说明如何用Jaeger定位问题:
通过Jaeger,可以清晰看到:
- RAG检索服务中,有一个
vector_db.searchSpan - 这个Span的总耗时6.1秒
- 其中
wait_for_connection事件到connection_acquired事件之间:5秒 - 这就是连接池等待的时间!
// 在代码中记录连接池等待事件
public class TracedConnectionPool {
private final HikariDataSource dataSource;
private final Tracer tracer;
public Connection getConnection() throws SQLException {
Span span = tracer.currentSpan();
if (span != null) {
span.event("wait_for_connection.start");
}
long waitStart = System.currentTimeMillis();
Connection conn = dataSource.getConnection();
long waitTime = System.currentTimeMillis() - waitStart;
if (span != null) {
span.event("wait_for_connection.end");
span.tag("connection_pool.wait_ms", String.valueOf(waitTime));
span.tag("connection_pool.active",
String.valueOf(dataSource.getHikariPoolMXBean().getActiveConnections()));
span.tag("connection_pool.waiting",
String.valueOf(dataSource.getHikariPoolMXBean().getThreadsAwaitingConnection()));
}
return conn;
}
}8.3 性能数据汇总
经过追踪系统接入后,李明团队的优化成果:
| 指标 | 优化前 | 优化后 | 提升 |
|---|---|---|---|
| 慢请求定位时间 | 2天 | 30分钟 | 96x |
| P95响应时间 | 8.2秒 | 2.1秒 | 4x |
| 向量DB连接池等待 | 5秒(P95) | 50ms(P95) | 100x |
| 每日Token成本可见性 | 0% | 100% | 完全透明 |
| MTTR(平均修复时间) | 6小时 | 45分钟 | 8x |
九、告警集成:基于Span的异常率告警
9.1 Prometheus指标采集
OTel自动将Span数据转换为Prometheus指标:
# prometheus-alerts.yml(告警规则)
groups:
- name: ai-service-alerts
rules:
# AI服务错误率告警
- alert: AiServiceHighErrorRate
expr: |
(
rate(traces_spanmetrics_calls_total{
service_name="ai-customer-service",
status_code="STATUS_CODE_ERROR"
}[5m])
/
rate(traces_spanmetrics_calls_total{
service_name="ai-customer-service"
}[5m])
) > 0.05
for: 2m
labels:
severity: warning
team: ai-platform
annotations:
summary: "AI服务错误率超过5%"
description: "服务 {{ $labels.service_name }} 操作 {{ $labels.span_name }} 的错误率为 {{ $value | humanizePercentage }}"
# LLM调用P99延迟告警
- alert: LlmCallSlowP99
expr: |
histogram_quantile(0.99,
rate(traces_spanmetrics_duration_milliseconds_bucket{
service_name="ai-customer-service",
span_name="openai.chat"
}[5m])
) > 10000
for: 5m
labels:
severity: warning
annotations:
summary: "LLM调用P99延迟超过10秒"
# 向量数据库连接池耗尽告警
- alert: VectorDbConnectionPoolExhausted
expr: |
avg(traces_spanmetrics_connection_pool_waiting{
service_name="ai-customer-service"
}) > 10
for: 1m
labels:
severity: critical
annotations:
summary: "向量数据库连接池等待队列超过10个请求"9.2 告警Webhook集成(发送到飞书/钉钉)
// AlertWebhookController.java
package com.laozhang.ai.tracing.controller;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.reactive.function.client.WebClient;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/alerts")
@Slf4j
public class AlertWebhookController {
private final WebClient feishuWebClient;
@Value("${alert.feishu.webhook-url}")
private String feishuWebhookUrl;
public AlertWebhookController(WebClient.Builder webClientBuilder,
@Value("${alert.feishu.webhook-url}") String feishuUrl) {
this.feishuWebhookUrl = feishuUrl;
this.feishuWebClient = webClientBuilder.baseUrl(feishuUrl).build();
}
/**
* 接收Alertmanager的告警,转发到飞书
*/
@PostMapping("/webhook")
public ResponseEntity<Void> handleAlert(@RequestBody AlertManagerPayload payload) {
payload.alerts().forEach(alert -> {
String message = buildFeishuMessage(alert);
sendToFeishu(message);
});
return ResponseEntity.ok().build();
}
private String buildFeishuMessage(Alert alert) {
String severity = alert.labels().getOrDefault("severity", "unknown");
String emoji = "critical".equals(severity) ? "🔴" : "🟡";
return String.format("""
%s **AI服务告警** %s
**告警名称**: %s
**严重程度**: %s
**服务**: %s
**描述**: %s
**触发时间**: %s
**Jaeger链接**: http://jaeger:16686/search?service=%s
""",
emoji, emoji,
alert.labels().getOrDefault("alertname", "未知"),
severity,
alert.labels().getOrDefault("service_name", "未知"),
alert.annotations().getOrDefault("description", "无"),
alert.startsAt(),
alert.labels().getOrDefault("service_name", "")
);
}
private void sendToFeishu(String message) {
feishuWebClient.post()
.bodyValue(Map.of(
"msg_type", "text",
"content", Map.of("text", message)
))
.retrieve()
.bodyToMono(String.class)
.subscribe(
resp -> log.info("告警发送成功"),
err -> log.error("告警发送失败", err)
);
}
record AlertManagerPayload(List<Alert> alerts) {}
record Alert(
Map<String, String> labels,
Map<String, String> annotations,
String startsAt,
String status
) {}
}十、成本控制:追踪数据的存储和保留策略
10.1 追踪数据量估算
一个中等规模AI应用(1000 QPS)的追踪数据量:
每个请求产生约10个Span
每个Span平均大小:2KB(含属性、事件)
全量采样下:
每秒:1000 * 10 * 2KB = 20MB/s
每天:20MB * 86400 = 1.7TB/天
10%采样下:
每天:1.7TB * 10% = 170GB/天
保留7天:170GB * 7 = 1.19TB成本优化策略:
| 策略 | 节省比例 | 实施复杂度 |
|---|---|---|
| 降低采样率 10% | 90% | 低 |
| 尾部采样(只保留异常) | 95% | 中 |
| 压缩存储(Jaeger+ES) | 50% | 低 |
| 分层存储(热数据7天/冷数据90天) | 60% | 中 |
| 过滤健康检查Span | 5-20% | 低 |
10.2 Jaeger存储后端选择
# docker-compose-prod.yml(生产级Jaeger配置,使用Elasticsearch)
services:
jaeger-collector:
image: jaegertracing/jaeger-collector:1.60
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
- ES_INDEX_PREFIX=jaeger
- ES_TAGS_AS_FIELDS_ALL=true # 所有Tag可搜索
depends_on:
- elasticsearch
jaeger-query:
image: jaegertracing/jaeger-query:1.60
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
- QUERY_MAX_CLOCK_SKEW_ADJUSTMENT=500ms
ports:
- "16686:16686"
elasticsearch:
image: elasticsearch:8.15.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms2g -Xmx2g
volumes:
- es-data:/usr/share/elasticsearch/data
# Jaeger数据清理(保留策略)
jaeger-es-index-cleaner:
image: jaegertracing/jaeger-es-index-cleaner:1.60
environment:
- ROLLOVER=true
- ES_SERVER_URLS=http://elasticsearch:9200
# 每天凌晨2点清理7天前的数据
command: ["7", "http://elasticsearch:9200"]10.3 数据生命周期管理代码
// TracingDataLifecycleConfig.java
package com.laozhang.ai.tracing.config;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;
/**
* 追踪数据生命周期管理
* 实现自动清理过期追踪数据,控制存储成本
*/
@Configuration
@EnableScheduling
public class TracingDataLifecycleConfig {
@Component
@Slf4j
public static class TracingDataCleaner {
private final ElasticsearchClient esClient;
@Value("${tracing.retention.hot-days:7}")
private int hotRetentionDays;
@Value("${tracing.retention.cold-days:90}")
private int coldRetentionDays;
public TracingDataCleaner(ElasticsearchClient esClient) {
this.esClient = esClient;
}
/**
* 每天凌晨3点执行数据生命周期管理
*/
@Scheduled(cron = "0 0 3 * * ?")
public void manageDataLifecycle() {
log.info("开始执行追踪数据生命周期管理...");
// 1. 将热数据(7天前)移至冷存储
moveToColderStorage();
// 2. 清理冷数据(90天前)
deleteExpiredData();
log.info("追踪数据生命周期管理完成");
}
private void moveToColderStorage() {
log.info("将{}天前的追踪数据移至冷存储...", hotRetentionDays);
// 实现:更新ES索引策略,将旧索引移至低成本存储节点
}
private void deleteExpiredData() {
log.info("清理{}天前的过期追踪数据...", coldRetentionDays);
// 实现:删除超过保留期的ES索引
}
}
}十一、完整演示:从零到有的追踪系统搭建
11.1 5分钟快速启动
# 1. 克隆示例代码
git clone https://github.com/laozhang-ai/otel-spring-ai-demo
cd otel-spring-ai-demo
# 2. 启动基础设施
docker-compose up -d
# 3. 启动Spring Boot应用
./mvnw spring-boot:run -Dspring.profiles.active=dev
# 4. 发送测试请求
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-H "X-User-Id: user-001" \
-d '{"question": "Java中的虚拟线程和平台线程有什么区别?"}'
# 5. 打开Jaeger UI查看追踪
open http://localhost:1668611.2 追踪数据示例输出
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"spanId": "00f067aa0ba902b7",
"operationName": "POST /api/v1/chat",
"duration": 3240,
"tags": {
"http.status_code": 200,
"ai.user.session_id": "user-001"
}
},
{
"spanId": "a2fb4a1d1a96d312",
"parentSpanId": "00f067aa0ba902b7",
"operationName": "ai.answer_question",
"duration": 3180,
"tags": {
"ai.prompt.template_id": "customer-service-v2",
"ai.prompt.template_version": "v2.3.1",
"ai.rag.retrieved_chunks_count": 5,
"ai.rag.score_max": "0.8921"
}
},
{
"spanId": "b9c0e34f5d2a1e78",
"parentSpanId": "a2fb4a1d1a96d312",
"operationName": "vector_db.search",
"duration": 85,
"tags": {
"ai.vector_db.type": "milvus",
"ai.rag.knowledge_base_id": "product-faq-v3",
"ai.vector_db.query_latency_ms": 42
}
},
{
"spanId": "c1d2e3f4a5b6c7d8",
"parentSpanId": "a2fb4a1d1a96d312",
"operationName": "openai.chat",
"duration": 2840,
"tags": {
"gen_ai.system": "openai",
"gen_ai.request.model": "gpt-4o",
"gen_ai.usage.input_tokens": 1240,
"gen_ai.usage.output_tokens": 387,
"gen_ai.usage.total_tokens": 1627,
"gen_ai.usage.cost_micro_usd": 12005
}
}
]
}十二、FAQ
Q1:Spring AI 1.0已经内置了Micrometer Observation,还需要手动接入OTel吗?
Spring AI 1.0确实内置了Observation支持,会自动创建ChatClient相关的Span。但默认的Span属性较少,不包含Token消耗、提示词版本等AI特有信息。推荐在自动插桩基础上,用本文的AiTracingAspect补充业务属性。
Q2:Jaeger和Zipkin怎么选?
都支持OTel,不需要改代码就能切换。Jaeger的UI更现代,Zipkin更轻量。如果是新项目,推荐Jaeger。如果已有Zipkin基础设施,可以通过OTel Collector同时发送到两个系统。
Q3:采样率设置多少合适?
开发环境100%,生产环境建议从10%开始,配合尾部采样(保留所有错误和慢请求)。一般来说,保留10%的正常请求 + 100%的异常请求,已经足够故障排查使用。
Q4:TraceId和日志如何关联?
在logback-spring.xml或application.yml的日志格式中加入%X{traceId},Micrometer会自动把当前Span的TraceId注入MDC。这样每条日志都有TraceId,可以从日志跳转到Jaeger查看完整链路。
Q5:向量数据库的查询如何追踪?
本文的VectorSearchService给出了完整示例,关键是手动创建vector_db.search Span,记录查询延迟、连接池等待时间。建议把连接池的active和waiting数量也记录到Span,这样一看就知道是否存在连接池瓶颈。
Q6:OTel Collector有必要吗,直接发到Jaeger不行吗?
开发环境直接发Jaeger没问题。生产环境强烈推荐加OTel Collector,原因:1) 可以同时发送到多个后端(Jaeger + Datadog同时);2) 可以在Collector层做尾部采样,减少后端存储压力;3) 应用端只需配置一个Collector地址,后端切换不影响应用代码。
Q7:如何监控OTel Collector本身的健康状态?
OTel Collector暴露自身指标到8888端口,可以用Prometheus抓取。关键指标:
otelcol_processor_dropped_spans:丢弃的Span数量(>0说明有问题)otelcol_exporter_queue_size:导出队列大小(过大说明后端处理不过来)otelcol_receiver_refused_spans:拒绝的Span数量
总结
回到李明的故事。如果当时他们有分布式追踪:
- 告警触发后,直接在Jaeger搜索P95>5秒的Trace
- 点开一个慢Trace,一眼看到
vector_db.searchSpan耗时5.1秒 - Span的
connection_pool.waiting属性显示:等待队列15个 - 立即调整HikariCP连接池大小,问题解决
2天的排查,变成30分钟的定位+修复。
分布式追踪是AI应用可观测性的基石,它把黑盒变成白盒。在AI应用普及的今天,OpenTelemetry + Jaeger的组合已经是生产级AI系统的标配。
