第1868篇:Spring AI的监控集成——Actuator端点暴露AI服务健康状态
第1868篇:Spring AI的监控集成——Actuator端点暴露AI服务健康状态
有一次我帮一个团队排查线上问题,他们的 AI 客服系统早上出了故障,用户反馈问题收到了乱七八糟的回答。运维去看监控,CPU 正常、内存正常、接口响应时间正常……啥都正常,但 AI 就是不对劲。
问题最后查出来是 OpenAI API Key 用量超额了,模型一直在返回降级回答,但系统把这种"部分失败"当成成功处理了。
这个故障的根本原因不是代码问题,是监控盲区:团队根本不知道他们应该监控哪些 AI 特有的指标,更没有针对这些指标设置告警。
今天这篇就系统讲 Spring AI 项目的监控体系怎么建:Actuator 的配置、AI 特有指标的暴露、健康检查的实现,以及 Prometheus + Grafana 的接入。
一、AI 项目需要监控哪些指标
普通 Web 应用的监控指标(QPS、响应时间、错误率)对 AI 项目同样适用,但 AI 项目还有自己特有的指标:
这四类指标,前两类是必须要有的,后两类在有一定规模后也要补上。
二、Actuator 基础配置
先把 Actuator 配好:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus,loggers
base-path: /management # 建议用非默认路径,减少暴露面
endpoint:
health:
show-details: when-authorized
show-components: when-authorized
metrics:
enabled: true
prometheus:
enabled: true
metrics:
distribution:
percentiles-histogram:
# 对 AI 相关请求开启直方图,用于 P99 计算
"[http.server.requests]": true
"[ai.model.call]": true
percentiles:
"[ai.model.call]": 0.5, 0.75, 0.95, 0.99
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active:default}生产环境的 Actuator 端点不能对外全部开放,至少要做两件事:
- 把 base-path 改成非默认路径
- 给
/management/**配置 IP 白名单或需要 ADMIN 角色
三、自定义 AI 健康检查
Spring Boot Actuator 的 /health 端点支持自定义健康检查器,我们实现一个检查 AI 模型可用性的:
@Component
@Slf4j
public class AiModelHealthIndicator implements HealthIndicator {
private final ChatModel chatModel;
private final Duration checkTimeout;
// 缓存最近一次检查结果,避免每次健康检查都真实调用模型
private volatile HealthCheckCache cachedResult = null;
private static final Duration CACHE_DURATION = Duration.ofSeconds(30);
public AiModelHealthIndicator(
ChatModel chatModel,
@Value("${app.health.ai-check-timeout:5000}") long timeoutMs) {
this.chatModel = chatModel;
this.checkTimeout = Duration.ofMillis(timeoutMs);
}
@Override
public Health health() {
// 使用缓存,避免每次健康检查都调用模型
if (cachedResult != null && cachedResult.isValid()) {
return cachedResult.getHealth();
}
Health result = checkAiModelHealth();
cachedResult = new HealthCheckCache(result, Instant.now());
return result;
}
private Health checkAiModelHealth() {
try {
long start = System.currentTimeMillis();
// 发送一个最简单的 ping 请求
Prompt pingPrompt = new Prompt("请回复:ok");
CompletableFuture<ChatResponse> future = CompletableFuture.supplyAsync(
() -> chatModel.call(pingPrompt)
);
ChatResponse response = future.get(
checkTimeout.toMillis(), TimeUnit.MILLISECONDS
);
long latency = System.currentTimeMillis() - start;
String content = response.getResult().getOutput().getContent();
return Health.up()
.withDetail("latencyMs", latency)
.withDetail("model", chatModel.getClass().getSimpleName())
.withDetail("response", content != null ? "received" : "empty")
.build();
} catch (TimeoutException e) {
log.warn("AI 模型健康检查超时");
return Health.down()
.withDetail("error", "timeout")
.withDetail("timeoutMs", checkTimeout.toMillis())
.build();
} catch (Exception e) {
log.error("AI 模型健康检查失败: {}", e.getMessage());
return Health.down()
.withDetail("error", e.getMessage())
.withDetail("errorType", e.getClass().getSimpleName())
.build();
}
}
private static class HealthCheckCache {
private final Health health;
private final Instant checkedAt;
public HealthCheckCache(Health health, Instant checkedAt) {
this.health = health;
this.checkedAt = checkedAt;
}
public boolean isValid() {
return Duration.between(checkedAt, Instant.now()).compareTo(CACHE_DURATION) < 0;
}
public Health getHealth() { return health; }
}
}API Key 状态健康检查:
@Component
public class ApiKeyHealthIndicator implements HealthIndicator {
private final AiProviderKeyManager keyManager;
public ApiKeyHealthIndicator(AiProviderKeyManager keyManager) {
this.keyManager = keyManager;
}
@Override
public Health health() {
Map<String, Object> details = keyManager.getKeyHealthSummary();
int healthyCount = (int) details.getOrDefault("healthyKeyCount", 0);
int totalCount = (int) details.getOrDefault("totalKeyCount", 1);
if (healthyCount == 0) {
return Health.down()
.withDetail("message", "所有 API Key 均不可用")
.withDetails(details)
.build();
}
if (healthyCount < totalCount) {
return Health.status("DEGRADED")
.withDetail("message",
String.format("%d/%d 个 API Key 可用", healthyCount, totalCount))
.withDetails(details)
.build();
}
return Health.up()
.withDetail("message", "所有 API Key 可用")
.withDetails(details)
.build();
}
}健康检查返回结果示例:
{
"status": "UP",
"components": {
"aiModel": {
"status": "UP",
"details": {
"latencyMs": 342,
"model": "OpenAiChatModel",
"response": "received"
}
},
"apiKey": {
"status": "DEGRADED",
"details": {
"message": "2/3 个 API Key 可用",
"healthyKeyCount": 2,
"totalKeyCount": 3
}
}
}
}四、自定义 AI 指标:用 Micrometer 上报
Micrometer 是 Spring Boot 的指标门面,支持导出到 Prometheus、Datadog 等各种监控后端。
核心 Token 用量指标:
@Component
public class AiMetricsRecorder {
private final MeterRegistry registry;
// 预先创建计数器,避免每次调用都创建
private final Counter totalTokensCounter;
private final Counter promptTokensCounter;
private final Counter completionTokensCounter;
private final AtomicLong activeStreamConnections;
public AiMetricsRecorder(MeterRegistry registry) {
this.registry = registry;
this.totalTokensCounter = Counter.builder("ai.tokens.total")
.description("AI 调用消耗的总 Token 数")
.register(registry);
this.promptTokensCounter = Counter.builder("ai.tokens.prompt")
.description("输入 Token 数")
.register(registry);
this.completionTokensCounter = Counter.builder("ai.tokens.completion")
.description("输出 Token 数")
.register(registry);
this.activeStreamConnections = registry.gauge(
"ai.stream.active.connections",
new AtomicLong(0)
);
}
/**
* 记录一次模型调用的 Token 用量
*/
public void recordTokenUsage(String model, String userId,
int promptTokens, int completionTokens) {
Tags tags = Tags.of(
"model", model,
"user_tier", resolveUserTier(userId) // 用户层级(普通/付费)
);
Counter.builder("ai.tokens.total")
.tags(tags)
.register(registry)
.increment(promptTokens + completionTokens);
Counter.builder("ai.tokens.prompt")
.tags(tags)
.register(registry)
.increment(promptTokens);
Counter.builder("ai.tokens.completion")
.tags(tags)
.register(registry)
.increment(completionTokens);
}
/**
* 记录模型调用耗时
*/
public void recordModelLatency(String model, long latencyMs, boolean success) {
Timer.builder("ai.model.call")
.description("AI 模型调用耗时")
.tags("model", model, "success", String.valueOf(success))
.register(registry)
.record(latencyMs, TimeUnit.MILLISECONDS);
}
/**
* 跟踪流式连接数
*/
public void incrementStreamConnections() {
if (activeStreamConnections != null) {
activeStreamConnections.incrementAndGet();
}
}
public void decrementStreamConnections() {
if (activeStreamConnections != null) {
activeStreamConnections.decrementAndGet();
}
}
/**
* 记录重试事件
*/
public void recordRetry(String model, int attemptNumber) {
Counter.builder("ai.model.retries")
.description("AI 模型调用重试次数")
.tags("model", model, "attempt", String.valueOf(attemptNumber))
.register(registry)
.increment();
}
private String resolveUserTier(String userId) {
// 实际项目里从缓存查用户类型
return "standard";
}
}五、把指标记录集成到 Advisor
把 AiMetricsRecorder 集成到 Advisor,让指标记录对业务代码透明:
@Component
@Slf4j
public class MetricsAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {
private final AiMetricsRecorder metricsRecorder;
public MetricsAdvisor(AiMetricsRecorder metricsRecorder) {
this.metricsRecorder = metricsRecorder;
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest request,
CallAroundAdvisorChain chain) {
long start = System.currentTimeMillis();
String model = extractModelName(request);
try {
AdvisedResponse response = chain.nextAroundCall(request);
long latency = System.currentTimeMillis() - start;
metricsRecorder.recordModelLatency(model, latency, true);
if (response.response() != null) {
Usage usage = response.response().getMetadata().getUsage();
if (usage != null) {
String userId = extractUserId();
metricsRecorder.recordTokenUsage(
model, userId,
usage.getPromptTokens(),
usage.getGenerationTokens()
);
}
}
return response;
} catch (Exception e) {
long latency = System.currentTimeMillis() - start;
metricsRecorder.recordModelLatency(model, latency, false);
throw e;
}
}
@Override
public Flux<AdvisedResponse> aroundStream(AdvisedRequest request,
StreamAroundAdvisorChain chain) {
metricsRecorder.incrementStreamConnections();
long start = System.currentTimeMillis();
String model = extractModelName(request);
return chain.nextAroundStream(request)
.doOnComplete(() -> {
long latency = System.currentTimeMillis() - start;
metricsRecorder.recordModelLatency(model, latency, true);
metricsRecorder.decrementStreamConnections();
})
.doOnError(e -> {
metricsRecorder.decrementStreamConnections();
});
}
private String extractModelName(AdvisedRequest request) {
// 尝试从请求 options 里获取模型名
return "unknown";
}
private String extractUserId() {
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
return auth != null ? auth.getName() : "anonymous";
}
@Override
public String getName() { return "MetricsAdvisor"; }
@Override
public int getOrder() { return Ordered.LOWEST_PRECEDENCE - 1; }
}六、自定义 Actuator 端点:AI 使用统计
除了 Micrometer 指标,还可以自定义 Actuator 端点来暴露更丰富的 AI 相关信息:
@Component
@Endpoint(id = "ai-stats")
public class AiStatsEndpoint {
private final AiUsageStatisticsService statisticsService;
private final AiProviderKeyManager keyManager;
public AiStatsEndpoint(AiUsageStatisticsService statisticsService,
AiProviderKeyManager keyManager) {
this.statisticsService = statisticsService;
this.keyManager = keyManager;
}
@ReadOperation
public Map<String, Object> stats() {
Map<String, Object> stats = new LinkedHashMap<>();
// 今日统计
AiDailyStats todayStats = statisticsService.getTodayStats();
stats.put("today", Map.of(
"totalRequests", todayStats.getTotalRequests(),
"totalTokens", todayStats.getTotalTokens(),
"estimatedCostUsd", todayStats.getEstimatedCostUsd(),
"uniqueUsers", todayStats.getUniqueUsers(),
"avgTokensPerRequest", todayStats.getAvgTokensPerRequest()
));
// API Key 状态
stats.put("apiKeys", keyManager.getKeyHealthSummary());
// 模型使用分布
stats.put("modelUsage", statisticsService.getModelUsageDistribution());
// 错误统计
stats.put("errors", Map.of(
"last1h", statisticsService.getErrorCount(Duration.ofHours(1)),
"last24h", statisticsService.getErrorCount(Duration.ofHours(24))
));
return stats;
}
@ReadOperation
public Map<String, Object> userStats(@Selector String userId) {
// 查询特定用户的使用统计
AiUserStats userStats = statisticsService.getUserStats(userId);
return Map.of(
"userId", userId,
"totalRequests", userStats.getTotalRequests(),
"totalTokens", userStats.getTotalTokens(),
"lastRequestAt", userStats.getLastRequestAt()
);
}
}访问 /management/ai-stats 就能看到:
{
"today": {
"totalRequests": 1250,
"totalTokens": 520000,
"estimatedCostUsd": 1.04,
"uniqueUsers": 87,
"avgTokensPerRequest": 416
},
"apiKeys": {
"healthyKeyCount": 2,
"totalKeyCount": 3,
"degradedKeys": ["sk-abc****"]
},
"modelUsage": {
"gpt-4o": 0.35,
"gpt-3.5-turbo": 0.65
},
"errors": {
"last1h": 3,
"last24h": 12
}
}七、Prometheus 指标样例与告警规则
配好 Prometheus 抓取后,可以写如下告警规则:
# prometheus-rules.yml
groups:
- name: spring-ai-alerts
rules:
# AI 模型错误率过高
- alert: AiModelHighErrorRate
expr: |
rate(ai_model_call_total{success="false"}[5m]) /
rate(ai_model_call_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "AI模型错误率过高"
description: "过去5分钟错误率 {{ printf \"%.1f\" $value | humanize }}%"
# Token 消耗异常(超过正常值5倍)
- alert: AiTokenUsageAnomaly
expr: |
rate(ai_tokens_total[5m]) >
avg_over_time(rate(ai_tokens_total[5m])[1d:5m]) * 5
for: 5m
labels:
severity: warning
annotations:
summary: "AI Token 消耗异常"
# 所有 API Key 不可用
- alert: AllApiKeysDown
expr: ai_api_keys_healthy_count == 0
for: 1m
labels:
severity: critical
annotations:
summary: "所有 AI API Key 不可用,服务停摆"
# 模型响应时间 P99 过高
- alert: AiModelHighLatency
expr: |
histogram_quantile(0.99,
rate(ai_model_call_duration_seconds_bucket[5m])) > 30
for: 5m
labels:
severity: warning
annotations:
summary: "AI 模型 P99 响应时间超过30秒"八、Grafana Dashboard 关键面板
用 Prometheus 作为数据源,Grafana 里建议设置这几个面板:
面板1:AI 请求总览
- 每分钟请求量(QPS)
- 成功率
- P50/P95/P99 响应时间
面板2:Token 用量追踪
- 每小时 Token 消耗趋势
- 按模型分类的 Token 用量
- 预估每日费用
面板3:AI 服务健康
- 当前 API Key 健康状态
- 模型可用性(基于健康检查结果)
- 活跃流式连接数
面板4:错误分析
- 错误率趋势
- 错误类型分布(超时/认证失败/限流)
- 重试次数统计
这套监控体系建起来之后,最开始提到的那个"AI 回答乱了但监控没有告警"的问题就不会发生了——API Key 超额或不可用会立刻触发告警,运维在用户感知到问题之前就能介入处理。
监控不是可有可无的,它是系统的"神经系统"。AI 项目尤其需要,因为 AI 的失败方式往往不是服务崩溃,而是悄悄地返回错误内容,这比崩溃更难发现,也更危险。
