AI微服务架构设计:如何在Spring Cloud体系中集成AI能力
AI微服务架构设计:如何在Spring Cloud体系中集成AI能力
一次错误的"快捷方式",让整个系统陷入泥潭
2025年9月,某电商公司的后端负责人赵建明接到了一个紧急需求:在现有的订单服务里加入AI智能客服功能,能够根据订单数据自动回复用户咨询。
工期只有两周,赵建明的第一反应是"快点上":直接在order-service里加几个类,引入Spring AI依赖,调用OpenAI的API,完工。
两周后,功能确实上线了。但三个月后,问题像雪球一样滚过来:
依赖污染:order-service的pom.xml从原来的35个依赖膨胀到62个,spring-ai-openai和业务依赖产生了3处版本冲突,每次升级都是噩梦。
服务膨胀:AI对话的上下文存储、向量检索库、Prompt模板管理全都堆在order-service里,一个原本100MB的JAR包变成了340MB。
耦合混乱:product-service、user-service也想用AI能力,于是复制了一遍代码。现在有三个地方维护着几乎相同的AI调用逻辑,三套配置,三套Bug。
资源竞争:AI调用是CPU密集型+IO等待的混合型操作,高并发时直接影响了订单核心业务的响应时间,P99从200ms飙升到了1.2s。
赵建明在内网技术分享会上讲完这段经历,最后总结了一句话:"我用两周走了捷径,但接下来花了两个月还债。"
这篇文章,是他和团队重新架构后的完整方案。
为什么AI能力需要独立成微服务?
在回答"怎么做"之前,先把"为什么"说透。
维度一:资源特征不同
| 服务类型 | CPU特征 | 内存特征 | IO特征 | 扩展策略 |
|---|---|---|---|---|
| 订单/支付 | 低CPU,大量数据库操作 | 中等 | 数据库密集 | 按QPS水平扩展 |
| AI推理服务 | 高CPU/GPU | 大(模型+向量库) | LLM网络IO | 按GPU资源扩展 |
把这两类服务混在一起,无法针对性优化:要么为AI预留的资源被业务服务"挤占",要么业务服务被AI的高内存拖累。
维度二:变更频率不同
订单核心逻辑变更频率:每月1-2次(严格管控)。 AI功能变更频率:每周3-5次(Prompt调优、模型切换、新功能试验)。
把高频变更的AI代码和低频变更的核心业务绑在一起,每次AI调整都要走完整的订单服务发布流程,风险大,效率低。
维度三:故障隔离
AI服务偶尔会因为:OpenAI限流、模型响应超时、向量库连接异常等原因出现故障。
如果AI能力独立为微服务,AI服务宕机 → 降级为人工处理,订单核心流程不受影响。
反之,AI代码混在order-service里,AI调用超时60秒 → 订单接口P99超时 → 影响GMV。
AI微服务拆分原则:哪些功能该独立?
不是所有AI相关代码都要放进AI微服务,拆分有原则。
推荐放入AI微服务的功能
AI能力层(独立微服务):
├── 大模型调用抽象(多模型路由、限流、重试)
├── 对话管理(会话状态、上下文窗口)
├── RAG检索增强(向量存储、文档解析)
├── Prompt模板管理(版本控制、A/B测试)
├── 嵌入向量生成
├── AI工具函数注册中心
└── Token消耗统计与计费推荐保留在业务服务的功能
业务服务保留:
├── 业务规则判断(是否触发AI)
├── 数据权限过滤(不能把敏感数据传给AI)
├── AI结果的业务后处理(根据AI输出更新业务状态)
└── 简单的AI调用封装(只是透传,无复杂逻辑)拆分决策树
AI功能需要独立微服务吗?
1. 是否被3个以上业务服务复用?
是 → 必须独立
否 → 继续下一条
2. 资源消耗是否显著高于所在业务服务?
是 → 必须独立
否 → 继续下一条
3. 变更频率是否远高于所在业务服务?
是 → 建议独立
否 → 可以暂时保留在业务服务中系统整体架构
完整项目实现
项目模块结构
ai-microservice-platform/
├── pom.xml(父POM)
├── ai-gateway-service/ # AI统一网关
├── ai-chat-service/ # 对话服务
├── ai-rag-service/ # RAG知识库服务
├── common/ # 公共模块
│ ├── ai-common-api/ # Feign接口定义
│ └── ai-common-core/ # 公共DTO/工具类
└── business-demo/ # 业务服务示例
└── order-service-demo/父POM(pom.xml)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.laozhang</groupId>
<artifactId>ai-microservice-platform</artifactId>
<version>1.0.0</version>
<packaging>pom</packaging>
<name>AI微服务平台</name>
<modules>
<module>ai-gateway-service</module>
<module>ai-chat-service</module>
<module>ai-rag-service</module>
<module>common/ai-common-api</module>
<module>common/ai-common-core</module>
<module>business-demo/order-service-demo</module>
</modules>
<properties>
<java.version>17</java.version>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<spring-boot.version>3.3.4</spring-boot.version>
<spring-cloud.version>2023.0.3</spring-cloud.version>
<spring-cloud-alibaba.version>2023.0.1.0</spring-cloud-alibaba.version>
<spring-ai.version>1.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>${spring-boot.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-alibaba-dependencies</artifactId>
<version>${spring-cloud-alibaba.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- 内部模块 -->
<dependency>
<groupId>com.laozhang</groupId>
<artifactId>ai-common-api</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.laozhang</groupId>
<artifactId>ai-common-core</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</dependencyManagement>
</project>common/ai-common-api:Feign接口定义
// common/ai-common-api/src/main/java/com/laozhang/ai/api/AiChatApi.java
package com.laozhang.ai.api;
import com.laozhang.ai.dto.AiChatRequest;
import com.laozhang.ai.dto.AiChatResponse;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
/**
* AI对话Feign接口
* 业务服务通过此接口调用AI能力,无需关心底层实现
*/
@FeignClient(
name = "ai-chat-service",
fallbackFactory = AiChatApiFallbackFactory.class
)
public interface AiChatApi {
@PostMapping("/ai/chat")
AiChatResponse chat(@RequestBody AiChatRequest request);
@PostMapping("/ai/chat/summary")
AiChatResponse summarize(@RequestBody AiChatRequest request);
}// common/ai-common-api/src/main/java/com/laozhang/ai/api/AiChatApiFallbackFactory.java
package com.laozhang.ai.api;
import com.laozhang.ai.dto.AiChatResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.cloud.openfeign.FallbackFactory;
import org.springframework.stereotype.Component;
/**
* AI服务降级工厂
* AI服务不可用时,返回友好的降级结果,不影响业务主流程
*/
@Slf4j
@Component
public class AiChatApiFallbackFactory implements FallbackFactory<AiChatApi> {
@Override
public AiChatApi create(Throwable cause) {
return new AiChatApi() {
@Override
public AiChatResponse chat(com.laozhang.ai.dto.AiChatRequest request) {
log.warn("AI对话服务降级,原因: {}", cause.getMessage());
return AiChatResponse.builder()
.success(false)
.content("AI服务暂时不可用,请稍后重试或联系人工客服")
.degraded(true)
.build();
}
@Override
public AiChatResponse summarize(com.laozhang.ai.dto.AiChatRequest request) {
log.warn("AI摘要服务降级,原因: {}", cause.getMessage());
return AiChatResponse.builder()
.success(false)
.content("暂时无法生成摘要")
.degraded(true)
.build();
}
};
}
}common/ai-common-core:公共DTO
// AiChatRequest.java
package com.laozhang.ai.dto;
import lombok.Builder;
import lombok.Data;
import java.util.Map;
@Data
@Builder
public class AiChatRequest {
/** 调用方业务服务标识 */
private String clientId;
/** 用户ID(用于限流和统计)*/
private String userId;
/** 会话ID */
private String sessionId;
/** 用户消息 */
private String message;
/** 使用的模型(不填则用默认)*/
private String model;
/** 业务上下文(Key-Value,会注入Prompt)*/
private Map<String, String> bizContext;
/** Prompt模板ID */
private String promptTemplateId;
}// AiChatResponse.java
package com.laozhang.ai.dto;
import lombok.Builder;
import lombok.Data;
@Data
@Builder
public class AiChatResponse {
private boolean success;
private String content;
private String sessionId;
private String model;
private int inputTokens;
private int outputTokens;
/** 是否走了降级逻辑 */
private boolean degraded;
private String errorMessage;
}ai-gateway-service:AI统一入口
pom.xml(ai-gateway-service)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.laozhang</groupId>
<artifactId>ai-microservice-platform</artifactId>
<version>1.0.0</version>
</parent>
<artifactId>ai-gateway-service</artifactId>
<name>AI统一网关服务</name>
<dependencies>
<!-- Spring Cloud Gateway(注意:不能用spring-boot-starter-web)-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<!-- Nacos服务发现 -->
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
</dependency>
<!-- Nacos配置中心 -->
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-config</artifactId>
</dependency>
<!-- LoadBalancer(替代Ribbon)-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-loadbalancer</artifactId>
</dependency>
<!-- Sentinel限流 -->
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-alibaba-sentinel-gateway</artifactId>
</dependency>
<!-- Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
</project>application.yml(ai-gateway-service)
server:
port: 8090
spring:
application:
name: ai-gateway-service
cloud:
nacos:
discovery:
server-addr: nacos:8848
namespace: ai-platform
config:
server-addr: nacos:8848
namespace: ai-platform
file-extension: yaml
gateway:
discovery:
locator:
enabled: true # 开启服务发现路由
lower-case-service-id: true
routes:
# 对话服务路由
- id: ai-chat-service
uri: lb://ai-chat-service
predicates:
- Path=/ai/chat/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100 # 每秒100个请求
redis-rate-limiter.burstCapacity: 200 # 突发最多200
redis-rate-limiter.requestedTokens: 1
key-resolver: "#{@userKeyResolver}"
- name: CircuitBreaker
args:
name: ai-chat-cb
fallbackUri: forward:/fallback/chat
- AddRequestHeader=X-AI-Gateway-Version, 1.0.0
- AddResponseHeader=X-Processed-By, ai-gateway
# RAG服务路由
- id: ai-rag-service
uri: lb://ai-rag-service
predicates:
- Path=/ai/rag/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 50
redis-rate-limiter.burstCapacity: 100
redis-rate-limiter.requestedTokens: 1
key-resolver: "#{@userKeyResolver}"
# Sentinel限流配置
sentinel:
transport:
dashboard: sentinel-dashboard:8080
eager: true
# Redis(限流器使用)
data:
redis:
host: redis
port: 6379
management:
endpoints:
web:
exposure:
include: health,gateway,metricsGatewayConfig.java(网关过滤器)
package com.laozhang.gateway.config;
import lombok.extern.slf4j.Slf4j;
import org.springframework.cloud.gateway.filter.GatewayFilterChain;
import org.springframework.cloud.gateway.filter.GlobalFilter;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.Ordered;
import org.springframework.core.annotation.Order;
import org.springframework.http.HttpStatus;
import org.springframework.http.server.reactive.ServerHttpRequest;
import org.springframework.http.server.reactive.ServerHttpResponse;
import org.springframework.web.server.ServerWebExchange;
import reactor.core.publisher.Mono;
import java.util.Arrays;
import java.util.List;
@Slf4j
@Configuration
public class GatewayConfig {
/**
* 全局日志过滤器:记录AI请求链路信息
*/
@Bean
@Order(Ordered.HIGHEST_PRECEDENCE)
public GlobalFilter aiLoggingFilter() {
return (exchange, chain) -> {
ServerHttpRequest request = exchange.getRequest();
String path = request.getPath().value();
if (path.startsWith("/ai/")) {
String clientId = request.getHeaders().getFirst("X-Client-Id");
String userId = request.getHeaders().getFirst("X-User-Id");
long startTime = System.currentTimeMillis();
log.info("[AI网关] 请求进入 path={}, clientId={}, userId={}",
path, clientId, userId);
return chain.filter(exchange).doFinally(signalType -> {
long elapsed = System.currentTimeMillis() - startTime;
int statusCode = exchange.getResponse().getStatusCode() != null
? exchange.getResponse().getStatusCode().value() : 0;
log.info("[AI网关] 请求完成 path={}, status={}, elapsed={}ms",
path, statusCode, elapsed);
});
}
return chain.filter(exchange);
};
}
/**
* 认证过滤器:校验内部服务调用的合法性
*/
@Bean
@Order(1)
public GlobalFilter authFilter() {
List<String> allowedClientIds = Arrays.asList(
"order-service", "user-service", "product-service"
);
return (exchange, chain) -> {
ServerHttpRequest request = exchange.getRequest();
String path = request.getPath().value();
if (!path.startsWith("/ai/")) {
return chain.filter(exchange);
}
String clientId = request.getHeaders().getFirst("X-Client-Id");
String clientSecret = request.getHeaders().getFirst("X-Client-Secret");
// 内部服务间调用校验(生产环境应使用JWT或mTLS)
if (clientId == null || !allowedClientIds.contains(clientId)) {
log.warn("[AI网关] 非法调用来源: clientId={}", clientId);
ServerHttpResponse response = exchange.getResponse();
response.setStatusCode(HttpStatus.UNAUTHORIZED);
return response.setComplete();
}
// 在请求中注入校验信息,下游服务可信任
ServerHttpRequest mutatedRequest = request.mutate()
.header("X-AI-Authenticated", "true")
.header("X-AI-Client-Id", clientId)
.build();
return chain.filter(exchange.mutate().request(mutatedRequest).build());
};
}
/**
* 限流Key解析器:按用户ID限流
*/
@Bean
public org.springframework.cloud.gateway.filter.ratelimit.KeyResolver userKeyResolver() {
return exchange -> Mono.just(
exchange.getRequest().getHeaders().getFirst("X-User-Id") != null
? exchange.getRequest().getHeaders().getFirst("X-User-Id")
: "anonymous"
);
}
}ai-chat-service:核心对话服务
pom.xml(ai-chat-service)
<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.laozhang</groupId>
<artifactId>ai-microservice-platform</artifactId>
<version>1.0.0</version>
</parent>
<artifactId>ai-chat-service</artifactId>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<!-- Nacos -->
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-config</artifactId>
</dependency>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Redis -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<!-- 公共模块 -->
<dependency>
<groupId>com.laozhang</groupId>
<artifactId>ai-common-core</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
</project>application.yml(ai-chat-service,Nacos配置)
server:
port: 8091
spring:
application:
name: ai-chat-service
cloud:
nacos:
discovery:
server-addr: nacos:8848
namespace: ai-platform
metadata:
version: 1.0.0
ai-capabilities: chat,summary,stream
config:
server-addr: nacos:8848
namespace: ai-platform
shared-configs:
- data-id: ai-common.yaml # 公共AI配置(模型参数)
group: AI_GROUP
refresh: true # 支持热更新
ai:
openai:
base-url: ${AI_PROVIDER_URL:http://vllm-server:8000/v1}
api-key: ${AI_API_KEY:not-needed}
chat:
options:
model: ${AI_MODEL:deepseek-r1}
temperature: ${AI_TEMPERATURE:0.7}
max-tokens: ${AI_MAX_TOKENS:2048}
data:
redis:
host: redis
port: 6379
# 健康检查
management:
endpoints:
web:
exposure:
include: health,info,prometheus
endpoint:
health:
show-details: alwaysAiChatController.java(实现Feign接口)
package com.laozhang.chat.controller;
import com.laozhang.ai.api.AiChatApi;
import com.laozhang.ai.dto.AiChatRequest;
import com.laozhang.ai.dto.AiChatResponse;
import com.laozhang.chat.service.ChatOrchestrationService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;
/**
* AI对话控制器
* 实现公共API定义的Feign接口
*/
@Slf4j
@RestController
@RequestMapping("/ai")
@RequiredArgsConstructor
public class AiChatController implements AiChatApi {
private final ChatOrchestrationService orchestrationService;
@Override
@PostMapping("/chat")
public AiChatResponse chat(@RequestBody AiChatRequest request) {
log.info("AI chat request: clientId={}, userId={}, sessionId={}",
request.getClientId(), request.getUserId(), request.getSessionId());
return orchestrationService.chat(request);
}
@Override
@PostMapping("/chat/summary")
public AiChatResponse summarize(@RequestBody AiChatRequest request) {
return orchestrationService.summarize(request);
}
/**
* 流式接口(Feign不支持SSE,单独提供)
*/
@PostMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> chatStream(@RequestBody AiChatRequest request) {
return orchestrationService.chatStream(request);
}
}ChatOrchestrationService.java
package com.laozhang.chat.service;
import com.laozhang.ai.dto.AiChatRequest;
import com.laozhang.ai.dto.AiChatResponse;
import com.laozhang.chat.prompt.PromptTemplateService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;
import java.util.*;
import java.util.concurrent.TimeUnit;
/**
* 对话编排服务
* 负责:上下文管理、Prompt构建、模型调用、结果后处理
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class ChatOrchestrationService {
private final ChatClient vllmChatClient;
private final PromptTemplateService promptTemplateService;
private final RedisTemplate<String, Object> redisTemplate;
private static final String SESSION_KEY_PREFIX = "ai:session:";
private static final int MAX_HISTORY_TURNS = 10;
private static final long SESSION_TTL_SECONDS = 3600;
public AiChatResponse chat(AiChatRequest request) {
long start = System.currentTimeMillis();
// 1. 获取或创建会话ID
String sessionId = Optional.ofNullable(request.getSessionId())
.filter(s -> !s.isBlank())
.orElse(UUID.randomUUID().toString());
// 2. 构建Prompt(注入业务上下文)
String enrichedMessage = promptTemplateService.buildMessage(
request.getMessage(),
request.getBizContext(),
request.getPromptTemplateId()
);
// 3. 加载历史
List<Message> history = loadHistory(sessionId);
history.add(new UserMessage(enrichedMessage));
// 4. 调用模型
String responseContent;
try {
responseContent = vllmChatClient.prompt()
.messages(history)
.call()
.content();
} catch (Exception e) {
log.error("模型调用失败: {}", e.getMessage(), e);
return AiChatResponse.builder()
.success(false)
.sessionId(sessionId)
.errorMessage("AI服务调用失败: " + e.getMessage())
.build();
}
// 5. 保存历史
history.add(new AssistantMessage(responseContent));
saveHistory(sessionId, history);
return AiChatResponse.builder()
.success(true)
.sessionId(sessionId)
.content(responseContent)
.model(request.getModel() != null ? request.getModel() : "default")
.build();
}
public AiChatResponse summarize(AiChatRequest request) {
String summaryPrompt = String.format(
"请对以下内容生成简洁的摘要(不超过200字):\n\n%s",
request.getMessage()
);
request.setMessage(summaryPrompt);
request.setSessionId(null); // 摘要任务不需要历史上下文
return chat(request);
}
public Flux<String> chatStream(AiChatRequest request) {
String sessionId = Optional.ofNullable(request.getSessionId())
.orElse(UUID.randomUUID().toString());
List<Message> history = loadHistory(sessionId);
history.add(new UserMessage(request.getMessage()));
StringBuilder fullResponse = new StringBuilder();
return vllmChatClient.prompt()
.messages(history)
.stream()
.content()
.doOnNext(fullResponse::append)
.doOnComplete(() -> {
history.add(new AssistantMessage(fullResponse.toString()));
saveHistory(sessionId, history);
});
}
@SuppressWarnings("unchecked")
private List<Message> loadHistory(String sessionId) {
String key = SESSION_KEY_PREFIX + sessionId;
List<Object> raw = redisTemplate.opsForList().range(key, 0, -1);
if (raw == null || raw.isEmpty()) return new ArrayList<>();
List<Message> messages = new ArrayList<>();
for (Object item : raw) {
if (item instanceof String s) {
String[] parts = s.split("\\|", 2);
if (parts.length == 2) {
messages.add("user".equals(parts[0])
? new UserMessage(parts[1])
: new AssistantMessage(parts[1]));
}
}
}
int max = MAX_HISTORY_TURNS * 2;
return messages.size() > max
? new ArrayList<>(messages.subList(messages.size() - max, messages.size()))
: messages;
}
private void saveHistory(String sessionId, List<Message> messages) {
String key = SESSION_KEY_PREFIX + sessionId;
redisTemplate.delete(key);
List<String> serialized = messages.stream()
.map(m -> (m instanceof UserMessage ? "user" : "assistant") + "|" + m.getContent())
.toList();
if (!serialized.isEmpty()) {
redisTemplate.opsForList().rightPushAll(key, serialized.toArray());
redisTemplate.expire(key, SESSION_TTL_SECONDS, TimeUnit.SECONDS);
}
}
}Nacos:服务注册与动态配置
AI模型参数动态配置(Nacos Config)
package com.laozhang.chat.config;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.cloud.context.config.annotation.RefreshScope;
import org.springframework.context.annotation.Configuration;
/**
* AI参数动态配置
* 使用@RefreshScope支持Nacos配置热更新
* 修改Nacos中的配置后,无需重启服务即可生效
*/
@Slf4j
@Data
@Configuration
@RefreshScope
@ConfigurationProperties(prefix = "ai.model")
public class AiModelDynamicConfig {
/** 当前模型名称(可动态切换)*/
private String name = "deepseek-r1";
/** 温度参数(0-2,越大越有创意)*/
private float temperature = 0.7f;
/** 最大输出token数 */
private int maxTokens = 2048;
/** 最大上下文token数 */
private int maxContextTokens = 8192;
/** 是否开启流式输出 */
private boolean streamEnabled = true;
/** 系统Prompt模板ID */
private String defaultSystemPromptId = "default_v3";
// Spring会在配置更新后重新注入,添加日志便于观察
public void setName(String name) {
if (!name.equals(this.name)) {
log.info("AI模型切换: {} -> {}", this.name, name);
}
this.name = name;
}
}Nacos中对应的配置文件(ai-common.yaml,AI_GROUP):
# Nacos Config: ai-common.yaml
ai:
model:
name: deepseek-r1
temperature: 0.7
max-tokens: 2048
max-context-tokens: 8192
stream-enabled: true
default-system-prompt-id: default_v3链路追踪:Sleuth/Zipkin在AI服务中的应用
package com.laozhang.chat.service;
import brave.Span;
import brave.Tracer;
import com.laozhang.ai.dto.AiChatRequest;
import com.laozhang.ai.dto.AiChatResponse;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
/**
* 带链路追踪的AI服务包装
* 确保AI调用的每个阶段都有完整的Trace信息
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class TracedChatService {
private final ChatOrchestrationService orchestrationService;
private final Tracer tracer;
public AiChatResponse chatWithTrace(AiChatRequest request) {
// 创建AI调用的Span
Span span = tracer.nextSpan()
.name("ai.chat")
.tag("ai.client_id", request.getClientId())
.tag("ai.model", request.getModel() != null ? request.getModel() : "default")
.tag("ai.session_id", request.getSessionId() != null ? request.getSessionId() : "new")
.start();
try (Tracer.SpanInScope scope = tracer.withSpanInScope(span)) {
long start = System.currentTimeMillis();
AiChatResponse response = orchestrationService.chat(request);
long elapsed = System.currentTimeMillis() - start;
span.tag("ai.latency_ms", String.valueOf(elapsed));
span.tag("ai.success", String.valueOf(response.isSuccess()));
span.tag("ai.output_tokens", String.valueOf(response.getOutputTokens()));
return response;
} catch (Exception e) {
span.tag("error", e.getMessage());
span.error(e);
throw e;
} finally {
span.finish();
}
}
}业务服务调用AI微服务
order-service中使用Feign调用AI能力
package com.laozhang.order.service;
import com.laozhang.ai.api.AiChatApi;
import com.laozhang.ai.dto.AiChatRequest;
import com.laozhang.ai.dto.AiChatResponse;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.util.HashMap;
import java.util.Map;
/**
* 订单服务中的AI功能
* 通过Feign调用独立的AI微服务,业务逻辑和AI逻辑完全解耦
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class OrderAiService {
private final AiChatApi aiChatApi;
/**
* AI自动回复客服咨询
*/
public String handleCustomerQuery(String orderId, String userId, String query) {
// 构建业务上下文(会注入到AI的Prompt中)
Map<String, String> bizContext = new HashMap<>();
bizContext.put("order_id", orderId);
bizContext.put("user_name", "尊敬的用户");
bizContext.put("platform", "老张商城");
AiChatRequest request = AiChatRequest.builder()
.clientId("order-service") // 服务标识
.userId(userId) // 用于限流统计
.sessionId("order_" + orderId) // 订单级别的会话
.message(query)
.bizContext(bizContext)
.promptTemplateId("customer_service_v2") // 使用预定义Prompt
.build();
AiChatResponse response = aiChatApi.chat(request);
if (response.isDegraded()) {
// AI服务降级,转人工处理
log.warn("AI服务降级,订单 {} 的咨询转人工处理", orderId);
return "您的问题已转接人工客服,预计5分钟内回复。";
}
return response.getContent();
}
/**
* AI生成订单摘要(用于客服快速了解订单情况)
*/
public String generateOrderSummary(String orderDetail) {
AiChatRequest request = AiChatRequest.builder()
.clientId("order-service")
.message(orderDetail)
.build();
AiChatResponse response = aiChatApi.summarize(request);
return response.isSuccess() ? response.getContent() : "摘要生成失败";
}
}Docker镜像优化
AI微服务的Dockerfile(多阶段构建)
# Dockerfile.ai-chat-service
# 阶段1:编译
FROM maven:3.9.6-eclipse-temurin-17-alpine AS builder
WORKDIR /build
# 先复制pom.xml,利用Docker层缓存加速构建
COPY pom.xml .
COPY ai-chat-service/pom.xml ai-chat-service/
COPY common/ai-common-api/pom.xml common/ai-common-api/
COPY common/ai-common-core/pom.xml common/ai-common-core/
# 预下载依赖(利用缓存层)
RUN mvn dependency:go-offline -B -pl ai-chat-service \
-am --no-transfer-progress
# 复制源码并构建
COPY . .
RUN mvn clean package -pl ai-chat-service -am \
-DskipTests --no-transfer-progress
# 阶段2:运行(精简镜像)
FROM eclipse-temurin:17-jre-alpine
# 安全:非root用户运行
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /app
# 从builder阶段复制JAR
COPY --from=builder /build/ai-chat-service/target/*.jar app.jar
# JVM调优参数
ENV JAVA_OPTS="-Xms512m -Xmx1g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+UseContainerSupport \
-Dfile.encoding=UTF-8 \
-Dspring.profiles.active=prod"
EXPOSE 8091
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD wget -qO- http://localhost:8091/actuator/health || exit 1
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]性能对比数据
改造前后对比
| 指标 | 改造前(AI混入order-service) | 改造后(AI独立微服务) |
|---|---|---|
| order-service JAR大小 | 340MB | 95MB |
| order-service启动时间 | 42s | 18s |
| order-service P99延迟(高峰) | 1.2s | 210ms |
| AI服务故障影响范围 | 订单核心功能受损 | 仅AI功能降级 |
| AI配置变更成本 | 重新发布order-service | Nacos配置热更新,无需重启 |
| 新业务服务接入AI成本 | 复制代码+配置 | 引入Feign API,3行代码 |
FAQ
Q1:AI微服务和业务微服务之间如何保证调用安全?
推荐三层安全:①网络层:AI微服务只在内网暴露,不对外直接访问;②服务层:通过X-Client-Id + X-Client-Secret做服务间认证(如文章中Gateway所示),生产环境推荐升级为JWT或mTLS;③业务层:AI服务根据clientId做权限控制,order-service不能访问user-service的AI会话。
Q2:Nacos配置热更新AI参数,会不会影响正在进行的对话?
@RefreshScope的刷新是异步的,当前正在处理的请求使用旧配置,新配置对刷新后的新请求生效。如果你的业务不能容忍这种不一致性(比如同一会话中前后使用了不同模型),建议在会话创建时"锁定"模型版本(存入Redis),整个会话使用固定配置。
Q3:AI微服务挂了,业务服务会崩吗?
不会,这正是独立部署的价值。Feign接口配置了FallbackFactory,AI服务不可用时返回预定义的降级内容。同时,Gateway层的CircuitBreaker会在AI服务连续失败后触发熔断,不再把请求打到宕机的实例,避免雪崩。业务服务无感知,继续正常运行。
Q4:如何统计各业务服务的AI Token消耗?
在AI Gateway的全局Filter中,记录每个请求的clientId和响应头中的token数(如果模型返回),写入Redis计数器或消息队列。AI Chat Service层也可以实现拦截器,在每次成功调用后发送统计事件。建议用AOP在Service层统一埋点,减少侵入性。
Q5:服务版本兼容:升级AI微服务时,业务服务需要跟着升级吗?
这正是接口定义放在ai-common-api公共模块的价值。遵循向后兼容原则:只新增字段,不删改已有字段;如果有破坏性变更,用版本路由(/v1/ai/chat → /v2/ai/chat)并行支持旧版本,给业务服务平滑迁移的时间窗口。
Q6:AI微服务的数量怎么确定?是不是越细越好?
不是。微服务划分的核心原则是"高内聚低耦合",而不是"越细越好"。过度拆分带来的网络调用、运维复杂度、数据一致性问题可能得不偿失。建议起步阶段:一个ai-core-service包含对话+RAG,等规模增长后再根据实际瓶颈进一步拆分。文章中的多服务结构是中大型团队的目标架构,小团队从单服务开始即可。
