第1866篇:Spring AI的扩展点设计——如何优雅地扩展框架而不修改源码
第1866篇:Spring AI的扩展点设计——如何优雅地扩展框架而不修改源码
有一种能力,是区分"会用框架"和"真正懂框架"的关键——那就是知道怎么扩展框架,而不是修改框架。
我见过太多这样的代码:为了在 AI 请求里加个日志,直接在业务 Service 里的每个方法加 log 语句;为了限流,在 Controller 层手写一堆 if-else;为了统计 token,每次调用后都手动累加计数器。
这些方案都能跑,但都是"侵入式"的——业务代码和技术关切耦合在一起,改一个需求可能影响十个地方。
Spring AI 的设计其实提供了很多扩展点,利用好这些扩展点,可以在不修改任何业务代码的前提下,注入各种横切关注点。今天把这些扩展点系统梳理一遍,从浅到深,讲清楚每个扩展点的适用场景和实现方式。
一、Spring AI 扩展点全景
先建立一个整体认知:
这五类扩展点,分别对应不同层次的需求。越靠近业务层的(Advisor)越容易用,越靠近底层的(ChatModel 包装)灵活性越高。
二、第一类扩展点:Advisor 机制
Advisor 是 Spring AI 1.x 引入的核心扩展机制,设计理念类似 Spring AOP,但专门为 AI 调用链路设计。
接口定义:
// 同步调用拦截
public interface CallAroundAdvisor extends Advisor {
AdvisedResponse aroundCall(AdvisedRequest advisedRequest,
CallAroundAdvisorChain chain);
String getName();
int getOrder();
}
// 流式调用拦截
public interface StreamAroundAdvisor extends Advisor {
Flux<AdvisedResponse> aroundStream(AdvisedRequest advisedRequest,
StreamAroundAdvisorChain chain);
String getName();
int getOrder();
}Advisor 的调用链路:
实战:内容安全过滤 Advisor
这个场景很实用——在请求到达模型前,过滤掉敏感词或者不允许的话题:
@Component
@Slf4j
public class ContentSafetyAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {
private final ContentSafetyChecker safetyChecker;
public ContentSafetyAdvisor(ContentSafetyChecker safetyChecker) {
this.safetyChecker = safetyChecker;
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest request,
CallAroundAdvisorChain chain) {
// 检查用户输入
String userInput = extractUserInput(request);
ContentCheckResult checkResult = safetyChecker.check(userInput);
if (!checkResult.isSafe()) {
log.warn("内容安全检查未通过: category={}", checkResult.getViolationCategory());
// 直接返回安全提示,不调用 AI
return buildSafetyRefusalResponse(request, checkResult);
}
// 安全,继续调用链
AdvisedResponse response = chain.nextAroundCall(request);
// 也可以对输出做检查
return response;
}
@Override
public Flux<AdvisedResponse> aroundStream(AdvisedRequest request,
StreamAroundAdvisorChain chain) {
String userInput = extractUserInput(request);
ContentCheckResult checkResult = safetyChecker.check(userInput);
if (!checkResult.isSafe()) {
return Flux.just(buildSafetyRefusalResponse(request, checkResult));
}
return chain.nextAroundStream(request);
}
private AdvisedResponse buildSafetyRefusalResponse(AdvisedRequest request,
ContentCheckResult result) {
String refusalMessage = switch (result.getViolationCategory()) {
case SENSITIVE_POLITICS -> "该话题不在我的服务范围内";
case ADULT_CONTENT -> "请保持对话内容健康文明";
case PERSONAL_ATTACK -> "请友善交流,避免攻击性言论";
default -> "您的请求包含不当内容,无法处理";
};
// 构建一个"假的"响应,不经过实际模型调用
AssistantMessage message = new AssistantMessage(refusalMessage);
Generation generation = new Generation(message);
ChatResponse chatResponse = new ChatResponse(List.of(generation));
return new AdvisedResponse(chatResponse, request.adviseContext());
}
private String extractUserInput(AdvisedRequest request) {
return request.userText() != null ? request.userText() : "";
}
@Override
public String getName() { return "ContentSafetyAdvisor"; }
@Override
public int getOrder() { return 0; } // 优先级最高,最先执行
}实战:请求缓存 Advisor
对于相同问题,避免重复调用模型,直接返回缓存结果:
@Component
@Slf4j
public class ResponseCacheAdvisor implements CallAroundAdvisor {
private final Cache<String, String> responseCache;
public ResponseCacheAdvisor() {
this.responseCache = CacheBuilder.newBuilder()
.maximumSize(1000)
.expireAfterWrite(1, TimeUnit.HOURS)
.build();
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest request,
CallAroundAdvisorChain chain) {
String cacheKey = buildCacheKey(request);
String cachedContent = responseCache.getIfPresent(cacheKey);
if (cachedContent != null) {
log.debug("命中缓存 key={}", cacheKey.substring(0, 20));
return buildCachedResponse(request, cachedContent);
}
AdvisedResponse response = chain.nextAroundCall(request);
// 缓存结果
if (response.response() != null) {
String content = response.response().getResult().getOutput().getContent();
if (content != null) {
responseCache.put(cacheKey, content);
}
}
return response;
}
private String buildCacheKey(AdvisedRequest request) {
// 用 system prompt + user input 的哈希作为缓存 key
String raw = request.systemText() + "||" + request.userText();
return DigestUtils.sha256Hex(raw);
}
private AdvisedResponse buildCachedResponse(AdvisedRequest request, String content) {
AssistantMessage message = new AssistantMessage(content);
Generation generation = new Generation(message);
ChatResponse chatResponse = new ChatResponse(List.of(generation));
return new AdvisedResponse(chatResponse, request.adviseContext());
}
@Override
public String getName() { return "ResponseCacheAdvisor"; }
@Override
public int getOrder() { return 5; }
}三、第二类扩展点:包装 ChatModel
有时候 Advisor 不够用,需要在更底层做一些事,比如实现多模型负载均衡、fallback 逻辑。这时候可以包装 ChatModel 接口。
多模型 Fallback 实现:
@Slf4j
public class FallbackChatModel implements ChatModel {
private final ChatModel primary;
private final ChatModel fallback;
private final int maxFailuresBeforeFallback;
private final AtomicInteger consecutiveFailures = new AtomicInteger(0);
private volatile boolean usingFallback = false;
public FallbackChatModel(ChatModel primary, ChatModel fallback,
int maxFailures) {
this.primary = primary;
this.fallback = fallback;
this.maxFailuresBeforeFallback = maxFailures;
}
@Override
public ChatResponse call(Prompt prompt) {
if (usingFallback) {
return callWithFallback(prompt);
}
try {
ChatResponse response = primary.call(prompt);
consecutiveFailures.set(0); // 成功,重置失败计数
return response;
} catch (Exception e) {
int failures = consecutiveFailures.incrementAndGet();
log.warn("主模型调用失败 ({}/{}): {}", failures,
maxFailuresBeforeFallback, e.getMessage());
if (failures >= maxFailuresBeforeFallback) {
log.error("主模型连续失败 {} 次,切换到备用模型", failures);
usingFallback = true;
}
return callWithFallback(prompt);
}
}
@Override
public Flux<ChatResponse> stream(Prompt prompt) {
if (usingFallback) {
return fallback.stream(prompt);
}
return primary.stream(prompt)
.onErrorResume(e -> {
log.warn("主模型流式调用失败,切换备用模型: {}", e.getMessage());
return fallback.stream(prompt);
});
}
private ChatResponse callWithFallback(Prompt prompt) {
try {
return fallback.call(prompt);
} catch (Exception e) {
log.error("备用模型也失败了: {}", e.getMessage());
throw new AiServiceUnavailableException("所有 AI 模型当前不可用", e);
}
}
@Override
public ChatOptions getDefaultOptions() {
return usingFallback ? fallback.getDefaultOptions() : primary.getDefaultOptions();
}
}
// 在配置类里使用
@Bean
public ChatModel fallbackChatModel(
@Qualifier("openAiChatModel") ChatModel openAiModel,
@Qualifier("dashscopeChatModel") ChatModel dashscopeModel) {
return new FallbackChatModel(openAiModel, dashscopeModel, 3);
}四、第三类扩展点:自定义 MessageConverter
如果你的业务场景需要对消息内容做特殊处理(比如自动翻译、内容脱敏),可以在消息进入 ChatModel 之前进行转换:
@Component
public class MessageEnrichmentPostProcessor {
private final UserContextHolder userContextHolder;
public MessageEnrichmentPostProcessor(UserContextHolder userContextHolder) {
this.userContextHolder = userContextHolder;
}
/**
* 在 system message 中注入用户上下文信息
* 通过 Advisor 实现,改写 AdvisedRequest
*/
public AdvisedRequest enrichWithUserContext(AdvisedRequest request) {
UserContext ctx = userContextHolder.getCurrentUser();
if (ctx == null) return request;
String contextInfo = String.format(
"\n\n[用户信息:姓名=%s,角色=%s,偏好语言=%s]",
ctx.getName(), ctx.getRole(), ctx.getPreferredLanguage()
);
// 在 system text 末尾追加用户上下文
String enrichedSystem = (request.systemText() != null
? request.systemText() : "") + contextInfo;
return AdvisedRequest.from(request)
.withSystemText(enrichedSystem)
.build();
}
}把它集成进 Advisor:
@Component
public class UserContextEnrichmentAdvisor implements CallAroundAdvisor {
private final MessageEnrichmentPostProcessor enricher;
@Override
public AdvisedResponse aroundCall(AdvisedRequest request,
CallAroundAdvisorChain chain) {
AdvisedRequest enriched = enricher.enrichWithUserContext(request);
return chain.nextAroundCall(enriched);
}
@Override
public String getName() { return "UserContextEnrichmentAdvisor"; }
@Override
public int getOrder() { return 20; }
}五、第四类扩展点:自定义 VectorStore
Spring AI 内置了 PgVector、Redis、Chroma 等向量存储实现,但有时候公司内部有自己的向量数据库,需要自己实现。
自定义 VectorStore 接口实现:
@Component
@Slf4j
public class MilvusVectorStore implements VectorStore {
private final MilvusServiceClient milvusClient;
private final EmbeddingModel embeddingModel;
private final String collectionName;
public MilvusVectorStore(MilvusServiceClient milvusClient,
EmbeddingModel embeddingModel,
@Value("${app.milvus.collection}") String collectionName) {
this.milvusClient = milvusClient;
this.embeddingModel = embeddingModel;
this.collectionName = collectionName;
}
@Override
public void add(List<Document> documents) {
if (documents.isEmpty()) return;
// 批量生成向量
List<float[]> embeddings = documents.stream()
.map(doc -> embeddingModel.embed(doc.getContent()))
.collect(Collectors.toList());
// 构建 Milvus 插入请求
InsertParam insertParam = buildInsertParam(documents, embeddings);
milvusClient.insert(insertParam);
log.info("插入向量 {} 条到集合 {}", documents.size(), collectionName);
}
@Override
public Optional<Boolean> delete(List<String> idList) {
String expr = "id in [" + String.join(",",
idList.stream().map(id -> "\"" + id + "\"").collect(Collectors.toList()))
+ "]";
DeleteParam deleteParam = DeleteParam.newBuilder()
.withCollectionName(collectionName)
.withExpr(expr)
.build();
milvusClient.delete(deleteParam);
return Optional.of(true);
}
@Override
public List<Document> similaritySearch(SearchRequest request) {
// 生成查询向量
float[] queryEmbedding = embeddingModel.embed(request.getQuery());
// 构建 Milvus 搜索请求
SearchParam searchParam = SearchParam.newBuilder()
.withCollectionName(collectionName)
.withVectors(List.of(floatArrayToList(queryEmbedding)))
.withTopK(request.getTopK())
.withVectorFieldName("embedding")
.withOutFields(List.of("id", "content", "metadata"))
.withParams("{\"nprobe\": 10}")
.build();
SearchResults results = milvusClient.search(searchParam);
return convertToDocuments(results);
}
private List<Float> floatArrayToList(float[] array) {
List<Float> list = new ArrayList<>(array.length);
for (float f : array) list.add(f);
return list;
}
// 其他辅助方法省略...
}六、利用 AutoConfiguration 把扩展打包成公共库
前面写了好几个 Advisor,在企业内部可以把这些打包成一个"AI 公共能力库",让所有项目都能用:
// ai-commons-starter/src/main/java/.../AiCommonsAutoConfiguration.java
@AutoConfiguration(after = {
OpenAiChatAutoConfiguration.class,
DashscopeChatAutoConfiguration.class
})
@ConditionalOnClass(ChatClient.class)
@EnableConfigurationProperties(AiCommonsProperties.class)
public class AiCommonsAutoConfiguration {
@Bean
@ConditionalOnMissingBean
@ConditionalOnProperty(prefix = "ai.commons.safety",
name = "enabled", havingValue = "true",
matchIfMissing = true)
public ContentSafetyAdvisor contentSafetyAdvisor(
AiCommonsProperties properties) {
return new ContentSafetyAdvisor(
new DefaultContentSafetyChecker(properties.getSafetyConfig())
);
}
@Bean
@ConditionalOnMissingBean
@ConditionalOnProperty(prefix = "ai.commons.cache",
name = "enabled", havingValue = "true")
public ResponseCacheAdvisor responseCacheAdvisor(
AiCommonsProperties properties) {
return new ResponseCacheAdvisor(
properties.getCacheTtlMinutes(),
properties.getCacheMaxSize()
);
}
@Bean
@ConditionalOnMissingBean
public TokenUsageAdvisor tokenUsageAdvisor(MeterRegistry meterRegistry) {
return new TokenUsageAdvisor(meterRegistry);
}
}注册到 META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports:
com.yourcompany.ai.commons.AiCommonsAutoConfiguration这样,团队里任何项目只要引入这个 starter,就自动获得内容安全过滤、响应缓存、Token 统计等能力,不需要写任何配置代码。
七、扩展点的顺序控制
当多个 Advisor 同时存在时,执行顺序很重要。
order 值越小越先执行(越靠外层),order 值越大越后执行(越靠近模型)。
内容安全过滤放最外层,是因为一旦发现不安全内容,直接短路返回,后续 Advisor 和模型调用都不会执行。
Token 统计放最内层(靠近模型),是因为它只关心实际的模型调用情况,不需要在外层做什么。
八、踩坑总结
坑1:Advisor 里不要直接抛 RuntimeException
如果 Advisor 在检查请求时抛异常,会导致整个请求链中断,用户收到 500 错误。应该在 Advisor 里处理异常,要么返回一个自定义响应,要么包装成业务异常再抛出。
坑2:AdvisedRequest.from(request).build() 是深拷贝还是浅拷贝
AdvisedRequest 的 from() 方法只复制了部分字段,如果你修改了 adviseContext() 里的内容,可能影响到其他 Advisor。操作 context 时要注意线程安全。
坑3:Stream Advisor 里不要做阻塞操作
StreamAroundAdvisor 的 aroundStream 方法是在响应式管道里执行的,如果里面有阻塞 I/O(比如数据库查询),会阻塞 Reactor 的 worker thread,导致性能问题。要用 Mono.fromCallable(...).subscribeOn(Schedulers.boundedElastic()) 把阻塞操作挪到弹性线程池。
好的扩展设计不是写更多代码,而是写更少、更内聚的代码。把横切关注点从业务代码里剥离出来,放到 Advisor 里,业务代码就能保持干净,技术关切也能独立演进。这才是框架扩展点的正确用法。
