AI应用的服务网格实践:Istio流量治理在AI系统的应用
AI应用的服务网格实践:Istio流量治理在AI系统的应用
开篇故事:刘博的凌晨三点
2025年12月的一个凌晨三点,某金融科技公司的架构师刘博盯着监控大屏,眼里全是红色告警。
他们的AI客服系统刚刚经历了一次"雪崩":年底促销活动开始后,用户咨询量暴增至平时的8倍,上游的问答服务开始响应超时,连锁反应导致整个AI服务链路全部不可用。此时是凌晨3点,技术值班的工程师手忙脚乱地在几十个服务之间排查。
"我们的AI服务没有任何流量保护机制,"刘博事后复盘说,"每个服务都在靠着Spring的@CircuitBreaker各自为战,但服务之间的流量管控完全是空白。"
三个月后,刘博的团队完整上线了Istio服务网格。再一次促销活动中,系统面对12倍流量峰值,没有一个告警,AI服务的整体可用性达到了99.97%。
关键数据对比:
- 故障恢复时间:从平均47分钟降低到3分钟(自动)
- 请求成功率:从高峰期的73%提升到99.4%
- 运维工单:从每月平均38张降低到9张
这就是服务网格给AI系统带来的真实价值。本文将带你从零实践Istio在AI系统中的完整落地。
TL;DR
- Istio核心能力:流量分割、熔断器、重试、超时——在AI服务间透明生效
- AI流量特点:高延迟、异构(不同模型响应时间差异大)、成本敏感
- 流量分割:用于灰度切换新AI模型版本,风险可控
- 可观测性:Kiali可视化AI服务调用链,Jaeger追踪每次推理的完整链路
- mTLS:自动加密AI服务间通信,符合金融/医疗合规要求
一、为什么AI系统特别需要服务网格?
1.1 AI系统的流量特殊性
普通微服务和AI服务在流量特征上有本质不同:
| 特征 | 普通微服务 | AI服务 |
|---|---|---|
| 响应时间 | 10-200ms | 500ms-30s |
| 超时策略 | 统一设置 | 需按模型区分 |
| 重试安全性 | 大多幂等 | 非幂等(可能触发重复计费) |
| 故障模式 | 立即失败 | 慢速降级 |
| 成本 | 忽略 | 每次调用有明显成本 |
| 版本升级 | 快速推全 | 需要A/B测试验证质量 |
1.2 没有服务网格的AI系统痛点
用户 → 网关 → AI编排服务 → 问答服务 → LLM服务
↓ ↓ ↓
各自实现 各自实现 各自实现
熔断器 重试 超时
(代码分散) (配置不统一) (无集中监控)痛点列表:
- 熔断逻辑散落在各服务代码中,出问题时排查困难
- 无法统一观测调用链,不知道延迟发生在哪一段
- AI版本升级高风险,没有流量分割能力就是全量切换
- 服务间通信明文,无法满足金融/医疗行业合规要求
1.3 Istio的核心组件
Istio 架构
├── 数据平面:Envoy Proxy(自动注入到每个Pod)
│ ├── 拦截所有进出流量
│ ├── 执行流量规则(熔断/重试/超时)
│ └── 上报遥测数据(指标/日志/追踪)
└── 控制平面:istiod
├── Pilot:下发路由规则到Envoy
├── Citadel:管理证书,实现mTLS
└── Galley:验证配置二、Istio安装与AI系统集成
2.1 安装Istio
# 下载istioctl
curl -L https://istio.io/downloadIstio | sh -
export PATH=$PWD/istio-1.22.0/bin:$PATH
# 安装(带可观测性组件)
istioctl install --set profile=demo -y
# 安装可观测性组件
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/addons/kiali.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/addons/prometheus.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/addons/grafana.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/addons/jaeger.yaml
# 为AI服务命名空间启用自动注入
kubectl label namespace ai-services istio-injection=enabled
# 验证安装
istioctl verify-install
kubectl get pods -n istio-system2.2 AI服务的Kubernetes部署
# ai-chat-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-chat-service
namespace: ai-services
labels:
app: ai-chat-service
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: ai-chat-service
version: v1
template:
metadata:
labels:
app: ai-chat-service
version: v1
annotations:
# Istio自动注入Envoy sidecar
sidecar.istio.io/inject: "true"
# 配置Envoy的资源限制
sidecar.istio.io/proxyCPU: "100m"
sidecar.istio.io/proxyMemory: "128Mi"
spec:
containers:
- name: ai-chat-service
image: company/ai-chat-service:1.0.0
ports:
- containerPort: 8080
env:
- name: SPRING_AI_OPENAI_BASE_URL
valueFrom:
secretKeyRef:
name: ai-secrets
key: openai-base-url
- name: SPRING_AI_OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-secrets
key: openai-api-key
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
# 部署v2版本(新模型)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-chat-service-v2
namespace: ai-services
labels:
app: ai-chat-service
version: v2
spec:
replicas: 1
selector:
matchLabels:
app: ai-chat-service
version: v2
template:
metadata:
labels:
app: ai-chat-service
version: v2
spec:
containers:
- name: ai-chat-service
image: company/ai-chat-service:2.0.0 # 使用新模型的版本
---
apiVersion: v1
kind: Service
metadata:
name: ai-chat-service
namespace: ai-services
spec:
selector:
app: ai-chat-service # 不指定version,覆盖所有版本
ports:
- port: 8080
targetPort: 8080三、流量分割:AI模型的灰度发布
3.1 DestinationRule:定义流量目标
# destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: ai-chat-service
namespace: ai-services
spec:
host: ai-chat-service
# 全局流量策略
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 10
# 针对AI服务的熔断配置
outlierDetection:
# 5xx错误率超过30%时触发熔断
consecutiveGatewayErrors: 5
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 60s # 熔断后最少隔离60秒
maxEjectionPercent: 50 # 最多隔离50%的实例
minHealthPercent: 30 # 至少保留30%健康实例
# 版本子集定义
subsets:
- name: v1
labels:
version: v1
trafficPolicy:
connectionPool:
http:
# v1使用GPT-3.5,响应快,可以设置较短超时
http1MaxPendingRequests: 80
- name: v2
labels:
version: v2
trafficPolicy:
connectionPool:
http:
# v2使用GPT-4,响应慢,需要更多连接队列
http1MaxPendingRequests: 303.2 VirtualService:实现金丝雀发布
# virtual-service-canary.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ai-chat-service
namespace: ai-services
spec:
hosts:
- ai-chat-service
http:
# 规则1:VIP用户优先使用新版本(基于Header路由)
- match:
- headers:
x-user-tier:
exact: "vip"
route:
- destination:
host: ai-chat-service
subset: v2
weight: 100
timeout: 30s
retries:
attempts: 2
perTryTimeout: 15s
retryOn: "gateway-error,connect-failure,retriable-4xx"
# 规则2:测试流量全部走v2
- match:
- headers:
x-testing:
exact: "true"
route:
- destination:
host: ai-chat-service
subset: v2
weight: 100
# 规则3:默认流量 - 10%走v2,90%走v1
- route:
- destination:
host: ai-chat-service
subset: v1
weight: 90
- destination:
host: ai-chat-service
subset: v2
weight: 10
# AI服务超时配置(LLM推理时间长)
timeout: 60s
# AI服务重试策略(注意:AI请求通常非幂等,谨慎配置)
retries:
attempts: 2
perTryTimeout: 25s
# 只在网络层错误和连接失败时重试,不在5xx时重试(避免重复扣费)
retryOn: "gateway-error,connect-failure"
# 故障注入(用于混沌测试)
# fault:
# delay:
# percentage:
# value: 5
# fixedDelay: 5s3.3 Java代码:动态调整灰度比例
// CanaryDeploymentController.java
@RestController
@RequestMapping("/api/deployment")
@Slf4j
public class CanaryDeploymentController {
private final IstioClient istioClient; // Istio Java客户端
// 逐步增加v2流量比例
@PostMapping("/canary/advance")
public ResponseEntity<String> advanceCanary(
@RequestParam String serviceName,
@RequestParam int targetWeight,
@RequestParam(defaultValue = "5") int stepSize) {
// 获取当前VirtualService
VirtualService vs = istioClient.virtualServices()
.inNamespace("ai-services")
.withName(serviceName)
.get();
// 找到当前v2的权重
HTTPRoute defaultRoute = vs.getSpec().getHttp().stream()
.filter(r -> r.getMatch() == null || r.getMatch().isEmpty())
.findFirst()
.orElseThrow();
int currentV2Weight = defaultRoute.getRoute().stream()
.filter(r -> "v2".equals(r.getDestination().getSubset()))
.mapToInt(HTTPRouteDestination::getWeight)
.findFirst()
.orElse(0);
// 计算新权重
int newV2Weight = Math.min(currentV2Weight + stepSize, targetWeight);
int newV1Weight = 100 - newV2Weight;
// 更新VirtualService
updateVirtualServiceWeights(vs, newV1Weight, newV2Weight);
log.info("金丝雀发布进度: v1={}%, v2={}%", newV1Weight, newV2Weight);
return ResponseEntity.ok(
String.format("成功更新: v1=%d%%, v2=%d%%", newV1Weight, newV2Weight));
}
private void updateVirtualServiceWeights(
VirtualService vs, int v1Weight, int v2Weight) {
vs.getSpec().getHttp().stream()
.filter(r -> r.getMatch() == null || r.getMatch().isEmpty())
.findFirst()
.ifPresent(route -> {
route.getRoute().forEach(dest -> {
if ("v1".equals(dest.getDestination().getSubset())) {
dest.setWeight(v1Weight);
} else if ("v2".equals(dest.getDestination().getSubset())) {
dest.setWeight(v2Weight);
}
});
});
istioClient.virtualServices()
.inNamespace("ai-services")
.withName(vs.getMetadata().getName())
.replace(vs);
}
// 自动化金丝雀:基于成功率自动推进
@Scheduled(fixedRate = 300000) // 每5分钟检查一次
public void autoCanaryAdvance() {
// 查询v2的错误率
double v2ErrorRate = queryV2ErrorRate();
if (v2ErrorRate > 0.05) {
// 错误率超5%,自动回滚
log.error("v2错误率过高({}%),自动回滚!", v2ErrorRate * 100);
rollbackToV1();
} else if (v2ErrorRate < 0.01) {
// 错误率低于1%,自动推进
advanceCanary("ai-chat-service", 100, 10);
}
}
private double queryV2ErrorRate() {
// 查询Prometheus指标
// istio_requests_total{destination_version="v2",response_code!~"2.."}
// 实际实现省略
return 0.005;
}
}四、熔断器:AI服务的自我保护
4.1 AI特有的熔断配置
AI服务的熔断有一个特殊挑战:LLM的超时不是"错误",是正常现象。需要区分:
- 真实错误:连接失败、5xx错误
- 正常慢请求:LLM推理中(不应触发熔断)
# circuit-breaker-for-ai.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: llm-service-circuit-breaker
namespace: ai-services
spec:
host: llm-gateway-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 50
connectTimeout: 5s # TCP连接超时5秒
http:
http1MaxPendingRequests: 20
http2MaxRequests: 50
maxRetries: 2
# 不设置idleTimeout,避免长时间LLM请求被断开
outlierDetection:
# 仅针对连接错误和网关错误触发熔断
# 不对超时触发(因为LLM推理超时是正常的)
consecutiveGatewayErrors: 10 # 连续10次网关错误
consecutive5xxErrors: 10 # 连续10次5xx
interval: 60s # 检测窗口60秒
baseEjectionTime: 120s # 熔断后隔离120秒
maxEjectionPercent: 30 # 最多隔离30%实例
# 成功率熔断(Istio 1.19+支持)
# successRate:
# minHealthPercent: 704.2 Java端的熔断感知
// AIChatServiceClient.java
@Service
@Slf4j
public class AIChatServiceClient {
private final WebClient webClient;
private final MeterRegistry meterRegistry;
// Resilience4j的熔断器(作为Istio的补充)
private final CircuitBreaker circuitBreaker;
public AIChatServiceClient(WebClient.Builder builder,
MeterRegistry meterRegistry) {
this.webClient = builder
.baseUrl("http://ai-chat-service:8080")
.build();
this.meterRegistry = meterRegistry;
// 配置Java层熔断器(与Istio层互补)
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50)
.waitDurationInOpenState(Duration.ofSeconds(30))
.slidingWindowSize(10)
.minimumNumberOfCalls(5)
// AI请求超过60秒视为慢请求(不算失败)
.slowCallDurationThreshold(Duration.ofSeconds(60))
.slowCallRateThreshold(80)
.build();
this.circuitBreaker = CircuitBreaker.of("ai-chat-service", config);
// 注册Micrometer指标
CircuitBreakerRegistry.ofDefaults()
.addEntryIfAbsent("ai-chat-service", config);
TaggedCircuitBreakerMetrics.ofCircuitBreakerRegistry(
CircuitBreakerRegistry.ofDefaults()
).bindTo(meterRegistry);
}
public Mono<ChatResponse> chat(ChatRequest request) {
return Mono.fromCallable(
() -> circuitBreaker.executeCallable(() ->
doChat(request)
)
)
.onErrorResume(CallNotPermittedException.class, e -> {
// 熔断器开路,立即返回降级响应
log.warn("AI服务熔断,返回降级响应");
meterRegistry.counter("ai.circuit.fallback").increment();
return Mono.just(buildFallbackResponse());
})
.onErrorResume(Exception.class, e -> {
log.error("AI服务调用失败: {}", e.getMessage());
return Mono.just(buildFallbackResponse());
});
}
private ChatResponse doChat(ChatRequest request) {
return webClient.post()
.uri("/api/chat")
.bodyValue(request)
.retrieve()
.bodyToMono(ChatResponse.class)
.timeout(Duration.ofSeconds(90))
.block();
}
private ChatResponse buildFallbackResponse() {
return ChatResponse.builder()
.message("AI服务暂时不可用,请稍后重试。如需紧急帮助,请联系人工客服 400-xxx-xxxx")
.source("fallback")
.build();
}
}五、可观测性:看见AI服务内部
5.1 Istio遥测配置
# telemetry.yaml
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: ai-services-telemetry
namespace: ai-services
spec:
# 自定义访问日志,包含AI特有字段
accessLogging:
- providers:
- name: envoy
filter:
# 只记录慢请求和错误
expression: "response.duration > 5000 || response.code >= 400"
# 追踪采样率(AI系统适当降低避免性能影响)
tracing:
- providers:
- name: jaeger
randomSamplingPercentage: 10.0 # 采样10%
customTags:
# 将AI请求的模型信息注入到追踪span
ai.model:
header:
name: x-ai-model
defaultValue: "unknown"
ai.request.type:
header:
name: x-ai-request-type
defaultValue: "chat"5.2 Spring AI集成Jaeger追踪
// AiTracingConfig.java
@Configuration
public class AiTracingConfig {
@Bean
public ObservationRegistry observationRegistry(MeterRegistry meterRegistry) {
ObservationRegistry registry = ObservationRegistry.create();
// 注册Micrometer桥接
registry.observationConfig()
.observationHandler(new DefaultMeterObservationHandler(meterRegistry));
return registry;
}
// Spring AI自动通过ObservationRegistry记录追踪信息
// 只需要在Spring Boot中配置即可
}# application.yaml
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
base-url: ${OPENAI_BASE_URL}
# 分布式追踪
sleuth:
otel:
enabled: true
exporter:
otlp:
endpoint: http://jaeger-collector:4317
management:
tracing:
sampling:
probability: 0.1 # 10%采样
endpoints:
web:
exposure:
include: health,prometheus,metrics
metrics:
export:
prometheus:
enabled: true
tags:
application: ai-chat-service
version: ${app.version:1.0.0}5.3 AI服务专用Grafana看板
// 自定义指标(注册到Prometheus,在Grafana展示)
@Component
@Slf4j
public class AiMetricsCollector {
private final MeterRegistry meterRegistry;
// AI请求的Token消耗
private final DistributionSummary tokenConsumption;
// 模型响应时间分布
private final Timer modelResponseTime;
// 每个模型的成功率
private final Map<String, Counter> modelSuccessCounters = new ConcurrentHashMap<>();
private final Map<String, Counter> modelErrorCounters = new ConcurrentHashMap<>();
public AiMetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.tokenConsumption = DistributionSummary
.builder("ai.token.consumption")
.description("AI请求的Token消耗分布")
.baseUnit("tokens")
.publishPercentiles(0.5, 0.75, 0.95, 0.99)
.register(meterRegistry);
this.modelResponseTime = Timer
.builder("ai.model.response.time")
.description("AI模型响应时间")
.publishPercentiles(0.5, 0.75, 0.95, 0.99)
.register(meterRegistry);
}
public void recordRequest(String modelName, long durationMs,
int tokens, boolean success) {
// 记录Token消耗
tokenConsumption.record(tokens);
// 记录响应时间(带模型标签)
meterRegistry.timer("ai.model.response.time",
"model", modelName,
"success", String.valueOf(success)
).record(durationMs, TimeUnit.MILLISECONDS);
// 记录成功/失败计数
if (success) {
meterRegistry.counter("ai.model.requests",
"model", modelName,
"status", "success"
).increment();
} else {
meterRegistry.counter("ai.model.requests",
"model", modelName,
"status", "error"
).increment();
}
}
// 计算每个模型的P99延迟(供HPA自定义指标使用)
public double getModelP99Latency(String modelName) {
return meterRegistry.find("ai.model.response.time")
.tag("model", modelName)
.timer()
.map(t -> t.percentile(0.99) / 1_000_000.0) // 纳秒转毫秒
.orElse(0.0);
}
}六、mTLS:AI服务的安全通信
6.1 为什么AI服务特别需要mTLS?
AI系统处理的数据往往高度敏感:
- 用户的医疗咨询
- 金融投资建议
- 法律问题咨询
服务间明文通信意味着内网攻击者可以窃取或篡改AI推理结果。
6.2 启用严格mTLS
# peer-authentication.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: ai-services-mtls
namespace: ai-services
spec:
# 对整个命名空间强制要求mTLS
mtls:
mode: STRICT # 拒绝所有非mTLS连接# authorization-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: ai-chat-service-authz
namespace: ai-services
spec:
selector:
matchLabels:
app: ai-chat-service
# 只允许特定服务访问AI聊天服务
rules:
- from:
- source:
# 只允许来自同命名空间的服务
namespaces: ["ai-services"]
# 只允许特定服务账号
principals:
- "cluster.local/ns/ai-services/sa/api-gateway"
- "cluster.local/ns/ai-services/sa/ai-orchestrator"
to:
- operation:
methods: ["POST"]
paths: ["/api/chat", "/api/stream-chat"]
when:
- key: source.namespace
values: ["ai-services"]6.3 验证mTLS生效
# 验证两个Pod之间是否使用mTLS通信
istioctl authn tls-check ai-chat-service.ai-services.svc.cluster.local
# 查看服务间的认证状态
kubectl exec -n ai-services \
$(kubectl get pod -n ai-services -l app=ai-orchestrator -o jsonpath='{.items[0].metadata.name}') \
-c istio-proxy -- \
openssl s_client -connect ai-chat-service:8080 2>&1 | grep "subject\|issuer"
# 预期看到Istio颁发的证书
# subject=O = ai-services, CN = ai-chat-service
# issuer=O = cluster.local, CN = ai-services/ai-chat-service七、流量治理实战:AI版本升级的完整流程
7.1 完整的金丝雀发布SOP
#!/bin/bash
# ai-canary-release.sh
SERVICE_NAME="ai-chat-service"
NEW_VERSION="v2"
TARGET_WEIGHT=100
STEP_SIZE=10
CHECK_INTERVAL=300 # 5分钟
echo "开始金丝雀发布: $SERVICE_NAME $NEW_VERSION"
# 初始化:5%流量到v2
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: $SERVICE_NAME
namespace: ai-services
spec:
hosts:
- $SERVICE_NAME
http:
- route:
- destination:
host: $SERVICE_NAME
subset: v1
weight: 95
- destination:
host: $SERVICE_NAME
subset: v2
weight: 5
timeout: 60s
EOF
echo "初始化完成,v2流量: 5%"
# 逐步推进
for weight in $(seq $STEP_SIZE $STEP_SIZE $TARGET_WEIGHT); do
echo "等待 $CHECK_INTERVAL 秒后检查..."
sleep $CHECK_INTERVAL
# 查询v2错误率(Prometheus API)
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=\
rate(istio_requests_total\
{destination_app=\"$SERVICE_NAME\",destination_version=\"v2\",\
response_code!~\"2..\"\}[5m])\
/\
rate(istio_requests_total\
{destination_app=\"$SERVICE_NAME\",destination_version=\"v2\"\}[5m])" \
| jq -r '.data.result[0].value[1]' 2>/dev/null || echo "0")
echo "当前v2错误率: $ERROR_RATE"
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo "错误率超过5%,回滚!"
kubectl patch virtualservice $SERVICE_NAME -n ai-services \
--type='json' \
-p='[{"op":"replace","path":"/spec/http/0/route/0/weight","value":100},{"op":"replace","path":"/spec/http/0/route/1/weight","value":0}]'
echo "回滚完成"
exit 1
fi
# 更新权重
v1_weight=$((100 - weight))
echo "推进: v1=$v1_weight%, v2=$weight%"
# 更新VirtualService...(省略kubectl patch命令)
done
echo "金丝雀发布完成!v2已接收100%流量"
# 清理v1
kubectl scale deployment ${SERVICE_NAME}-v1 --replicas=0 -n ai-services八、常见问题 FAQ
Q1:Istio对AI服务的性能影响有多大?
A:Envoy sidecar的延迟增加通常在1-3ms,对AI服务(延迟500ms-30s)来说可以忽略不计。资源消耗约100m CPU + 128MB内存/Pod,在AI系统的整体资源占用中不显著。
Q2:AI请求是否应该配置重试?
A:需要谨慎配置:
- 连接层错误(connect-failure, gateway-error):可以重试
- 5xx错误:视情况,如果是真实服务错误可以重试
- 超时:不建议自动重试,因为LLM可能实际在处理中
- Stream请求(SSE/流式输出):通常不能重试
Q3:如何同时处理HTTP和gRPC的AI服务?
A:Istio原生支持gRPC,只需要在VirtualService中使用match.uri替换为match.headers["content-type"]匹配:
match:
- headers:
content-type:
exact: "application/grpc"并分别配置HTTP和gRPC的路由规则。
Q4:Istio的mTLS证书如何轮换?
A:Istio默认每24小时自动轮换工作负载证书,根CA每10年轮换一次。证书轮换对业务完全透明,无需重启Pod。可通过istioctl proxy-status查看当前证书状态。
Q5:服务网格与API网关是什么关系?
A:它们解决不同层次的问题:
- API网关:南北向流量(外部到集群),负责认证/限流/路由
- 服务网格:东西向流量(集群内部服务间),负责可靠性/安全/可观测性 AI系统中两者都需要,通常API网关处理外部鉴权,Istio处理内部服务治理。
九、总结
Istio服务网格为AI系统提供了以前需要大量编码才能实现的能力:
| 能力 | 传统方案 | Istio方案 |
|---|---|---|
| 熔断器 | 每个服务各自实现Resilience4j | 统一配置DestinationRule |
| 金丝雀发布 | 手动维护多个负载均衡规则 | VirtualService权重配置 |
| 分布式追踪 | 手动传播TraceID | Envoy自动注入 |
| 服务间加密 | 手动管理证书和TLS配置 | mTLS一键开启 |
| 可视化拓扑 | 无,排查靠经验 | Kiali实时拓扑 |
对于AI系统来说,服务网格不是"锦上添花",而是规模化运营的基础设施。当你的AI服务超过5个时,Istio就应该进入你的技术栈。
