Spring AI生产部署:Docker + K8s 容器化完整指南
2026/4/30大约 7 分钟
Spring AI生产部署:Docker + K8s 容器化完整指南
适读人群:有1-5年Java开发经验,想向AI工程师方向转型的开发者 阅读时长:约20分钟 文章价值:
- 掌握Spring AI应用的Docker容器化最佳实践
- 学会在K8s上部署和运维AI服务的完整流程
- 解决AI服务特有的扩缩容、密钥管理、健康检查等问题
那次"本地好好的,上线就挂"的血泪史
小张是一个做AI客服的工程师,花了一个月搭好了Spring AI的后端服务,本地测试一切正常:流式输出流畅,RAG召回准确,响应速度在2秒以内。
然后他把应用打了个jar包,直接部署到生产服务器。
上线第一天就出问题了:
- 高峰期请求堆积,服务OOM重启
- API Key硬编码在配置文件里,被同事不小心提交到了Git
- 服务重启后对话历史全丢了
- K8s探针一直报503,应用刚启动的时候被认为是不健康的
他给我发消息的时候语气很崩溃:"老张,这帮运维天天说我应用有问题,但我本地明明没问题啊。"
本地没问题和生产没问题,差的不只是环境,还有一整套工程化思维。
今天这篇文章,我来讲AI应用的容器化部署最佳实践,把那些坑给你列清楚。
AI服务和普通服务的部署差异
Dockerfile最佳实践
# Dockerfile - Spring AI应用生产配置
# 使用多阶段构建,减小最终镜像体积
# =========================================
# 阶段1: 构建阶段
# =========================================
FROM eclipse-temurin:17-jdk-alpine AS builder
WORKDIR /app
# 先复制pom.xml,利用Docker缓存层(依赖不变就不重新下载)
COPY pom.xml .
COPY .mvn .mvn
COPY mvnw .
# 下载依赖(这一层会被缓存)
RUN ./mvnw dependency:go-offline -q
# 再复制源码
COPY src src
# 构建,跳过测试(测试在CI中已经跑过了)
RUN ./mvnw package -DskipTests -q
# 提取Spring Boot分层jar(优化镜像层缓存)
RUN java -Djarmode=layertools -jar target/*.jar extract
# =========================================
# 阶段2: 运行阶段
# =========================================
FROM eclipse-temurin:17-jre-alpine AS runtime
# 安全加固:创建非root用户运行应用
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /app
# 从构建阶段复制分层文件(优化后续更新的镜像推送速度)
COPY --from=builder /app/dependencies/ ./
COPY --from=builder /app/spring-boot-loader/ ./
COPY --from=builder /app/snapshot-dependencies/ ./
COPY --from=builder /app/application/ ./
# JVM调优参数(AI服务特有配置)
ENV JAVA_OPTS="\
-server \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:InitialRAMPercentage=50.0 \
-Djava.security.egd=file:/dev/./urandom \
-Dspring.backgroundpreinitializer.ignore=true"
# 健康检查(K8s探针备用)
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD wget -q -O- http://localhost:8080/actuator/health || exit 1
EXPOSE 8080
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS org.springframework.boot.loader.launch.JarLauncher"]application.yml生产配置
# application-prod.yml
spring:
ai:
openai:
# 绝对不能硬编码!从环境变量读取
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o-mini
# 超时配置:避免LLM响应慢导致线程堆积
temperature: 0.7
# 连接超时配置
connect-timeout: 10s
read-timeout: 120s # 长文本生成需要更长时间
# 数据库连接池配置
datasource:
url: ${DB_URL}
username: ${DB_USER}
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
data:
redis:
host: ${REDIS_HOST}
port: 6379
timeout: 5000ms
lettuce:
pool:
max-active: 20
max-idle: 10
# 服务器配置
server:
port: 8080
tomcat:
# AI服务通常有大量并发长连接,需要增加线程数
threads:
max: 200
min-spare: 20
# 连接超时(SSE流式响应需要更长时间)
connection-timeout: 180000
# 请求大小限制(上传文档时需要更大值)
max-http-form-post-size: 50MB
# Actuator配置
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus,info,readiness,liveness
endpoint:
health:
show-details: when_authorized
probes:
enabled: true # 启用K8s Liveness/Readiness探针
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
# 日志配置(结构化日志便于日志聚合)
logging:
pattern:
console: '{"time":"%d{yyyy-MM-dd HH:mm:ss.SSS}","level":"%level","trace":"%X{traceId}","service":"ai-service","msg":"%message"}%n'
level:
root: INFO
com.laozhang.ai: DEBUG
org.springframework.ai: INFOKubernetes部署配置
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-ai-service
namespace: ai-apps
labels:
app: spring-ai-service
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: spring-ai-service
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # 零停机滚动更新
template:
metadata:
labels:
app: spring-ai-service
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/actuator/prometheus"
prometheus.io/port: "8080"
spec:
# AI服务启动较慢,给足初始化时间
terminationGracePeriodSeconds: 60
containers:
- name: spring-ai-service
image: registry.company.com/spring-ai-service:1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
# 从K8s Secret读取敏感配置
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-secrets
key: openai-api-key
- name: DB_URL
valueFrom:
secretKeyRef:
name: db-secrets
key: url
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-secrets
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: password
- name: REDIS_HOST
value: "redis-service"
- name: SPRING_PROFILES_ACTIVE
value: "prod"
# 资源限制(根据实际压测结果调整)
resources:
requests:
memory: "1Gi" # AI服务内存需求比普通服务大
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
# 存活探针:检查JVM是否存活
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 90 # 等待Spring Boot完全启动
periodSeconds: 30
failureThreshold: 3
timeoutSeconds: 10
# 就绪探针:检查是否可以接收流量
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 60 # AI连接池需要时间预热
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 10
# 优雅停机:停止接收新请求,等待进行中的请求完成
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"] # 等待负载均衡摘除
# 挂载应用配置(非敏感配置)
volumeMounts:
- name: app-config
mountPath: /app/config
readOnly: true
volumes:
- name: app-config
configMap:
name: spring-ai-config
---
# K8s Service
apiVersion: v1
kind: Service
metadata:
name: spring-ai-service
namespace: ai-apps
spec:
selector:
app: spring-ai-service
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
---
# 水平自动扩缩容(HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: spring-ai-service-hpa
namespace: ai-apps
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: spring-ai-service
# 副本数范围
minReplicas: 2
maxReplicas: 10
metrics:
# CPU使用率超过70%时扩容
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# 内存使用率超过80%时扩容
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # 避免频繁扩容
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # 缩容更保守
policies:
- type: Pods
value: 1
periodSeconds: 120密钥安全管理
# 创建K8s Secret(实际生产中通过CI/CD工具创建,不要把密钥提交到Git)
# kubectl create secret generic ai-secrets \
# --from-literal=openai-api-key=sk-xxx \
# --namespace=ai-apps
# 或者使用外部密钥管理(推荐:HashiCorp Vault)
# vault-agent-injector会自动将Vault中的密钥注入到Pod
# k8s/vault-injection.yaml(使用Vault的方式)
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-ai-service
spec:
template:
metadata:
annotations:
# Vault Agent自动注入密钥
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "spring-ai-service"
vault.hashicorp.com/agent-inject-secret-ai: "secret/data/ai-service"
vault.hashicorp.com/agent-inject-template-ai: |
{{- with secret "secret/data/ai-service" -}}
export OPENAI_API_KEY="{{ .Data.data.openai_api_key }}"
{{- end }}AI服务特有的健康检查实现
package com.laozhang.ai.health;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
/**
* AI服务健康检查指示器
* 在K8s Readiness探针调用/actuator/health时被执行
*
* 注意:这个健康检查会实际调用LLM,有成本!
* 建议使用轻量级检查:只验证连接,不实际调用
*/
@Component("aiService")
@Slf4j
@RequiredArgsConstructor
public class AIServiceHealthIndicator implements HealthIndicator {
private final ChatModel chatModel;
// 上次成功检查时间(避免频繁调用LLM)
private volatile long lastSuccessTime = 0;
private volatile boolean lastStatus = false;
private static final long CACHE_TTL_MS = 30_000; // 30秒内复用上次结果
@Override
public Health health() {
long now = System.currentTimeMillis();
// 如果30秒内刚检查过,复用结果
if (now - lastSuccessTime < CACHE_TTL_MS) {
return lastStatus
? Health.up().withDetail("source", "cached").build()
: Health.down().withDetail("source", "cached").build();
}
try {
// 发送一个极短的测试请求验证LLM连接
// 使用max_tokens=1,最小化成本
long start = System.currentTimeMillis();
chatModel.call("ping");
long latency = System.currentTimeMillis() - start;
lastSuccessTime = now;
lastStatus = true;
return Health.up()
.withDetail("latency_ms", latency)
.withDetail("model", "connected")
.build();
} catch (Exception e) {
log.warn("AI服务健康检查失败: {}", e.getMessage());
lastStatus = false;
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}CI/CD流水线配置
# .github/workflows/deploy.yml
name: Deploy Spring AI Service
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: '17'
distribution: 'temurin'
cache: maven
- name: Run Tests
run: ./mvnw test
- name: Build Docker Image
run: |
docker build -t registry.company.com/spring-ai-service:${{ github.sha }} .
docker push registry.company.com/spring-ai-service:${{ github.sha }}
- name: Deploy to K8s
run: |
# 更新镜像版本
kubectl set image deployment/spring-ai-service \
spring-ai-service=registry.company.com/spring-ai-service:${{ github.sha }} \
--namespace=ai-apps
# 等待滚动更新完成
kubectl rollout status deployment/spring-ai-service \
--namespace=ai-apps \
--timeout=300s
- name: Verify Deployment
run: |
# 验证新版本健康
kubectl run health-check --rm -i --restart=Never \
--image=curlimages/curl -- \
curl -f http://spring-ai-service/actuator/health生产上线检查清单
| 检查项 | 状态 | 说明 |
|---|---|---|
| API密钥通过K8s Secret管理 | 必须 | 绝不硬编码 |
| Liveness/Readiness探针配置 | 必须 | initialDelay要足够长 |
| 资源Requests/Limits配置 | 必须 | 防止OOM影响其他服务 |
| 优雅停机配置 | 必须 | 避免请求中断 |
| HPA自动扩缩容 | 推荐 | 应对流量波动 |
| Prometheus指标收集 | 推荐 | 可观测性 |
| 结构化日志输出 | 推荐 | 便于日志聚合分析 |
| 配置通过ConfigMap管理 | 推荐 | 方便热更新非敏感配置 |
