Spring AI生产部署：Docker + K8s 容器化完整指南

老张2026/4/30大约 7 分钟

Spring AI生产部署：Docker + K8s 容器化完整指南

适读人群：有1-5年Java开发经验，想向AI工程师方向转型的开发者 阅读时长：约20分钟 文章价值：
掌握Spring AI应用的Docker容器化最佳实践
学会在K8s上部署和运维AI服务的完整流程
解决AI服务特有的扩缩容、密钥管理、健康检查等问题

那次"本地好好的，上线就挂"的血泪史

小张是一个做AI客服的工程师，花了一个月搭好了Spring AI的后端服务，本地测试一切正常：流式输出流畅，RAG召回准确，响应速度在2秒以内。

然后他把应用打了个jar包，直接部署到生产服务器。

上线第一天就出问题了：

高峰期请求堆积，服务OOM重启
API Key硬编码在配置文件里，被同事不小心提交到了Git
服务重启后对话历史全丢了
K8s探针一直报503，应用刚启动的时候被认为是不健康的

他给我发消息的时候语气很崩溃："老张，这帮运维天天说我应用有问题，但我本地明明没问题啊。"

本地没问题和生产没问题，差的不只是环境，还有一整套工程化思维。

今天这篇文章，我来讲AI应用的容器化部署最佳实践，把那些坑给你列清楚。

AI服务和普通服务的部署差异

Dockerfile最佳实践

# Dockerfile - Spring AI应用生产配置
# 使用多阶段构建，减小最终镜像体积

# =========================================
# 阶段1: 构建阶段
# =========================================
FROM eclipse-temurin:17-jdk-alpine AS builder

WORKDIR /app

# 先复制pom.xml，利用Docker缓存层（依赖不变就不重新下载）
COPY pom.xml .
COPY .mvn .mvn
COPY mvnw .

# 下载依赖（这一层会被缓存）
RUN ./mvnw dependency:go-offline -q

# 再复制源码
COPY src src

# 构建，跳过测试（测试在CI中已经跑过了）
RUN ./mvnw package -DskipTests -q

# 提取Spring Boot分层jar（优化镜像层缓存）
RUN java -Djarmode=layertools -jar target/*.jar extract

# =========================================
# 阶段2: 运行阶段
# =========================================
FROM eclipse-temurin:17-jre-alpine AS runtime

# 安全加固：创建非root用户运行应用
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

WORKDIR /app

# 从构建阶段复制分层文件（优化后续更新的镜像推送速度）
COPY --from=builder /app/dependencies/ ./
COPY --from=builder /app/spring-boot-loader/ ./
COPY --from=builder /app/snapshot-dependencies/ ./
COPY --from=builder /app/application/ ./

# JVM调优参数（AI服务特有配置）
ENV JAVA_OPTS="\
  -server \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=200 \
  -XX:+UseContainerSupport \
  -XX:MaxRAMPercentage=75.0 \
  -XX:InitialRAMPercentage=50.0 \
  -Djava.security.egd=file:/dev/./urandom \
  -Dspring.backgroundpreinitializer.ignore=true"

# 健康检查（K8s探针备用）
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD wget -q -O- http://localhost:8080/actuator/health || exit 1

EXPOSE 8080

ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS org.springframework.boot.loader.launch.JarLauncher"]

application.yml生产配置

# application-prod.yml
spring:
  ai:
    openai:
      # 绝对不能硬编码！从环境变量读取
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o-mini
          # 超时配置：避免LLM响应慢导致线程堆积
          temperature: 0.7
      # 连接超时配置
      connect-timeout: 10s
      read-timeout: 120s   # 长文本生成需要更长时间
  
  # 数据库连接池配置
  datasource:
    url: ${DB_URL}
    username: ${DB_USER}
    password: ${DB_PASSWORD}
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
  
  data:
    redis:
      host: ${REDIS_HOST}
      port: 6379
      timeout: 5000ms
      lettuce:
        pool:
          max-active: 20
          max-idle: 10

# 服务器配置
server:
  port: 8080
  tomcat:
    # AI服务通常有大量并发长连接，需要增加线程数
    threads:
      max: 200
      min-spare: 20
    # 连接超时（SSE流式响应需要更长时间）
    connection-timeout: 180000
    # 请求大小限制（上传文档时需要更大值）
    max-http-form-post-size: 50MB

# Actuator配置
management:
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus,info,readiness,liveness
  endpoint:
    health:
      show-details: when_authorized
      probes:
        enabled: true  # 启用K8s Liveness/Readiness探针
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

# 日志配置（结构化日志便于日志聚合）
logging:
  pattern:
    console: '{"time":"%d{yyyy-MM-dd HH:mm:ss.SSS}","level":"%level","trace":"%X{traceId}","service":"ai-service","msg":"%message"}%n'
  level:
    root: INFO
    com.laozhang.ai: DEBUG
    org.springframework.ai: INFO

Kubernetes部署配置

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-ai-service
  namespace: ai-apps
  labels:
    app: spring-ai-service
    version: v1.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spring-ai-service
  
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # 零停机滚动更新
  
  template:
    metadata:
      labels:
        app: spring-ai-service
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/path: "/actuator/prometheus"
        prometheus.io/port: "8080"
    
    spec:
      # AI服务启动较慢，给足初始化时间
      terminationGracePeriodSeconds: 60
      
      containers:
      - name: spring-ai-service
        image: registry.company.com/spring-ai-service:1.0.0
        imagePullPolicy: Always
        
        ports:
        - containerPort: 8080
          name: http
        
        # 从K8s Secret读取敏感配置
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-secrets
              key: openai-api-key
        - name: DB_URL
          valueFrom:
            secretKeyRef:
              name: db-secrets
              key: url
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: db-secrets
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secrets
              key: password
        - name: REDIS_HOST
          value: "redis-service"
        - name: SPRING_PROFILES_ACTIVE
          value: "prod"
        
        # 资源限制（根据实际压测结果调整）
        resources:
          requests:
            memory: "1Gi"    # AI服务内存需求比普通服务大
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        
        # 存活探针：检查JVM是否存活
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 90   # 等待Spring Boot完全启动
          periodSeconds: 30
          failureThreshold: 3
          timeoutSeconds: 10
        
        # 就绪探针：检查是否可以接收流量
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 60   # AI连接池需要时间预热
          periodSeconds: 15
          failureThreshold: 3
          timeoutSeconds: 10
        
        # 优雅停机：停止接收新请求，等待进行中的请求完成
        lifecycle:
          preStop:
            exec:
              command: ["sh", "-c", "sleep 10"]  # 等待负载均衡摘除
        
        # 挂载应用配置（非敏感配置）
        volumeMounts:
        - name: app-config
          mountPath: /app/config
          readOnly: true
      
      volumes:
      - name: app-config
        configMap:
          name: spring-ai-config

---
# K8s Service
apiVersion: v1
kind: Service
metadata:
  name: spring-ai-service
  namespace: ai-apps
spec:
  selector:
    app: spring-ai-service
  ports:
  - port: 80
    targetPort: 8080
    name: http
  type: ClusterIP

---
# 水平自动扩缩容（HPA）
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: spring-ai-service-hpa
  namespace: ai-apps
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: spring-ai-service
  
  # 副本数范围
  minReplicas: 2
  maxReplicas: 10
  
  metrics:
  # CPU使用率超过70%时扩容
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  
  # 内存使用率超过80%时扩容
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # 避免频繁扩容
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容更保守
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

密钥安全管理

# 创建K8s Secret（实际生产中通过CI/CD工具创建，不要把密钥提交到Git）
# kubectl create secret generic ai-secrets \
#   --from-literal=openai-api-key=sk-xxx \
#   --namespace=ai-apps

# 或者使用外部密钥管理（推荐：HashiCorp Vault）
# vault-agent-injector会自动将Vault中的密钥注入到Pod

# k8s/vault-injection.yaml（使用Vault的方式）
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-ai-service
spec:
  template:
    metadata:
      annotations:
        # Vault Agent自动注入密钥
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "spring-ai-service"
        vault.hashicorp.com/agent-inject-secret-ai: "secret/data/ai-service"
        vault.hashicorp.com/agent-inject-template-ai: |
          {{- with secret "secret/data/ai-service" -}}
          export OPENAI_API_KEY="{{ .Data.data.openai_api_key }}"
          {{- end }}

AI服务特有的健康检查实现

package com.laozhang.ai.health;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

/**
 * AI服务健康检查指示器
 * 在K8s Readiness探针调用/actuator/health时被执行
 * 
 * 注意：这个健康检查会实际调用LLM，有成本！
 * 建议使用轻量级检查：只验证连接，不实际调用
 */
@Component("aiService")
@Slf4j
@RequiredArgsConstructor
public class AIServiceHealthIndicator implements HealthIndicator {

    private final ChatModel chatModel;
    
    // 上次成功检查时间（避免频繁调用LLM）
    private volatile long lastSuccessTime = 0;
    private volatile boolean lastStatus = false;
    private static final long CACHE_TTL_MS = 30_000; // 30秒内复用上次结果

    @Override
    public Health health() {
        long now = System.currentTimeMillis();
        
        // 如果30秒内刚检查过，复用结果
        if (now - lastSuccessTime < CACHE_TTL_MS) {
            return lastStatus 
                ? Health.up().withDetail("source", "cached").build()
                : Health.down().withDetail("source", "cached").build();
        }
        
        try {
            // 发送一个极短的测试请求验证LLM连接
            // 使用max_tokens=1，最小化成本
            long start = System.currentTimeMillis();
            chatModel.call("ping");
            long latency = System.currentTimeMillis() - start;
            
            lastSuccessTime = now;
            lastStatus = true;
            
            return Health.up()
                .withDetail("latency_ms", latency)
                .withDetail("model", "connected")
                .build();
                
        } catch (Exception e) {
            log.warn("AI服务健康检查失败: {}", e.getMessage());
            lastStatus = false;
            
            return Health.down()
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

CI/CD流水线配置

# .github/workflows/deploy.yml
name: Deploy Spring AI Service

on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Set up JDK 17
      uses: actions/setup-java@v4
      with:
        java-version: '17'
        distribution: 'temurin'
        cache: maven
    
    - name: Run Tests
      run: ./mvnw test
    
    - name: Build Docker Image
      run: |
        docker build -t registry.company.com/spring-ai-service:${{ github.sha }} .
        docker push registry.company.com/spring-ai-service:${{ github.sha }}
    
    - name: Deploy to K8s
      run: |
        # 更新镜像版本
        kubectl set image deployment/spring-ai-service \
          spring-ai-service=registry.company.com/spring-ai-service:${{ github.sha }} \
          --namespace=ai-apps
        
        # 等待滚动更新完成
        kubectl rollout status deployment/spring-ai-service \
          --namespace=ai-apps \
          --timeout=300s
    
    - name: Verify Deployment
      run: |
        # 验证新版本健康
        kubectl run health-check --rm -i --restart=Never \
          --image=curlimages/curl -- \
          curl -f http://spring-ai-service/actuator/health

生产上线检查清单

检查项	状态	说明
API密钥通过K8s Secret管理	必须	绝不硬编码
Liveness/Readiness探针配置	必须	initialDelay要足够长
资源Requests/Limits配置	必须	防止OOM影响其他服务
优雅停机配置	必须	避免请求中断
HPA自动扩缩容	推荐	应对流量波动
Prometheus指标收集	推荐	可观测性
结构化日志输出	推荐	便于日志聚合分析
配置通过ConfigMap管理	推荐	方便热更新非敏感配置