第1631篇：Spring AI插件体系深度解析——如何自定义AutoConfiguration接入私有模型

老张2026/4/30大约 8 分钟

第1631篇：Spring AI插件体系深度解析——如何自定义AutoConfiguration接入私有模型

前几天有个读者在星球里问我，他们公司有一套内部部署的LLM服务，API格式跟OpenAI完全不一样，想接入Spring AI但不知道从哪下手。我当时回了一段话，大意是：Spring AI的插件体系比你想象的要开放得多，只要理解了AutoConfiguration的加载机制，接入私有模型不过就是写几个Bean的事。

但后来我发现这个问题其实不简单。网上的资料大多停留在"用现成的starter"这个层面，真正讲清楚怎么从零自定义一套AutoConfiguration的文章几乎没有。今天我就把这块彻底讲透。

先搞清楚Spring AI的扩展点在哪

很多人第一次看Spring AI的源码会有点蒙——它的抽象层次比较多，从ChatModel到ChatClient再到各种Advisor，乍一看不知道该从哪切入。

其实核心就三个接口：

ChatModel          → 直接调用模型的抽象
EmbeddingModel     → 向量化的抽象  
ImageModel         → 图片生成的抽象

如果你要接入一个私有的对话模型，实现ChatModel接口就够了。接口定义很简洁：

public interface ChatModel extends Model<Prompt, ChatResponse> {
    ChatResponse call(Prompt prompt);
    
    default Flux<ChatResponse> stream(Prompt prompt) {
        throw new UnsupportedOperationException("Streaming not supported");
    }
}

这个Prompt里封装了消息列表和各种Options，ChatResponse里是模型返回的内容。理解这两个对象的结构，是后面所有工作的基础。

一个真实场景：接入内部的Qwen私有化部署

我们公司（不方便透露名字）去年做了一套基于Qwen的私有化部署，API格式跟阿里云的DashScope不一样，是自研的，大致长这样：

POST /api/v1/chat
{
  "model": "qwen-private-v2",
  "messages": [...],
  "temperature": 0.7,
  "max_tokens": 2048,
  "stream": false
}

返回格式也做了定制，加了一些内部用的字段比如cost_tokens、trace_id之类的。

现在要把这个接入Spring AI，让业务层能统一用ChatClient调用，不用关心底层是什么模型。

第一步：定义Options类

Spring AI里每个模型都有自己的Options类，用来承载模型特有的参数。我们需要自己定义一个：

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class PrivateQwenChatOptions implements ChatOptions {
    
    @JsonProperty("model")
    private String model = "qwen-private-v2";
    
    @JsonProperty("temperature")
    private Double temperature = 0.7;
    
    @JsonProperty("max_tokens")
    private Integer maxTokens = 2048;
    
    @JsonProperty("top_p")
    private Double topP;
    
    // 公司内部特有的参数
    @JsonProperty("business_unit")
    private String businessUnit;
    
    @JsonProperty("priority")
    private Integer priority = 1;
    
    @Override
    public Float getTemperature() {
        return temperature != null ? temperature.floatValue() : null;
    }
    
    @Override
    public Float getTopP() {
        return topP != null ? topP.floatValue() : null;
    }
    
    @Override
    public Integer getTopK() {
        return null; // Qwen不用TopK
    }
    
    @Override
    public ChatOptions copy() {
        return PrivateQwenChatOptions.builder()
                .model(this.model)
                .temperature(this.temperature)
                .maxTokens(this.maxTokens)
                .topP(this.topP)
                .businessUnit(this.businessUnit)
                .priority(this.priority)
                .build();
    }
}

这里有个坑我踩过：copy()方法必须实现，而且要做深拷贝。Spring AI在处理请求时会调用这个方法，如果你返回this，在并发场景下会出现参数污染的问题，排查起来很头疼。

第二步：实现ChatModel

@Slf4j
public class PrivateQwenChatModel implements ChatModel {
    
    private static final String DEFAULT_MODEL = "qwen-private-v2";
    
    private final PrivateQwenChatOptions defaultOptions;
    private final RestClient restClient;
    private final RetryTemplate retryTemplate;
    
    public PrivateQwenChatModel(PrivateQwenProperties properties, 
                                 RestClient.Builder restClientBuilder) {
        this.defaultOptions = PrivateQwenChatOptions.builder()
                .model(properties.getDefaultModel())
                .temperature(properties.getDefaultTemperature())
                .maxTokens(properties.getMaxTokens())
                .build();
        
        this.restClient = restClientBuilder
                .baseUrl(properties.getBaseUrl())
                .defaultHeader("Authorization", "Bearer " + properties.getApiKey())
                .defaultHeader("Content-Type", "application/json")
                .defaultHeader("X-Internal-Service", "spring-ai-client")
                .build();
        
        this.retryTemplate = RetryTemplate.builder()
                .maxAttempts(3)
                .exponentialBackoff(1000, 2, 5000)
                .retryOn(ResourceAccessException.class)
                .build();
    }
    
    @Override
    public ChatResponse call(Prompt prompt) {
        PrivateQwenRequest request = buildRequest(prompt);
        
        log.debug("调用私有Qwen模型，消息数量: {}", prompt.getInstructions().size());
        
        return retryTemplate.execute(context -> {
            try {
                PrivateQwenResponse response = restClient.post()
                        .uri("/api/v1/chat")
                        .body(request)
                        .retrieve()
                        .body(PrivateQwenResponse.class);
                
                return convertResponse(response, prompt);
            } catch (HttpClientErrorException e) {
                log.error("调用私有Qwen模型失败，状态码: {}, 响应: {}", 
                         e.getStatusCode(), e.getResponseBodyAsString());
                throw new RuntimeException("模型调用失败: " + e.getMessage(), e);
            }
        });
    }
    
    @Override
    public Flux<ChatResponse> stream(Prompt prompt) {
        // 流式调用实现
        PrivateQwenRequest request = buildRequest(prompt);
        request.setStream(true);
        
        return WebClient.builder()
                .baseUrl(restClient.toString()) // 这里要重新用WebClient
                .build()
                .post()
                .uri("/api/v1/chat/stream")
                .bodyValue(request)
                .retrieve()
                .bodyToFlux(String.class)
                .filter(line -> !line.isEmpty() && !line.equals("data: [DONE]"))
                .map(line -> {
                    String data = line.startsWith("data: ") ? line.substring(6) : line;
                    return parseStreamChunk(data);
                })
                .doOnError(e -> log.error("流式调用异常", e));
    }
    
    private PrivateQwenRequest buildRequest(Prompt prompt) {
        // 合并默认Options和请求级别的Options
        PrivateQwenChatOptions options = (PrivateQwenChatOptions) 
                mergeOptions(defaultOptions, prompt.getOptions());
        
        List<PrivateQwenMessage> messages = prompt.getInstructions().stream()
                .map(this::convertMessage)
                .collect(Collectors.toList());
        
        return PrivateQwenRequest.builder()
                .model(options.getModel())
                .messages(messages)
                .temperature(options.getTemperature())
                .maxTokens(options.getMaxTokens())
                .topP(options.getTopP())
                .businessUnit(options.getBusinessUnit())
                .build();
    }
    
    private PrivateQwenMessage convertMessage(Message message) {
        String role = switch (message.getMessageType()) {
            case SYSTEM -> "system";
            case USER -> "user";
            case ASSISTANT -> "assistant";
            default -> "user";
        };
        
        return PrivateQwenMessage.builder()
                .role(role)
                .content(message.getContent())
                .build();
    }
    
    private ChatResponse convertResponse(PrivateQwenResponse response, Prompt prompt) {
        if (response == null || response.getChoices() == null || response.getChoices().isEmpty()) {
            throw new RuntimeException("模型返回空响应");
        }
        
        PrivateQwenChoice choice = response.getChoices().get(0);
        AssistantMessage assistantMessage = new AssistantMessage(
                choice.getMessage().getContent()
        );
        
        ChatGenerationMetadata metadata = ChatGenerationMetadata.builder()
                .finishReason(choice.getFinishReason())
                .build();
        
        Generation generation = new Generation(assistantMessage, metadata);
        
        // 把内部的traceId等信息塞到metadata里，方便后续链路追踪
        Map<String, Object> responseMetadata = new HashMap<>();
        responseMetadata.put("trace_id", response.getTraceId());
        responseMetadata.put("cost_tokens", response.getCostTokens());
        responseMetadata.put("model", response.getModel());
        
        return new ChatResponse(
                List.of(generation),
                new PrivateQwenChatResponseMetadata(responseMetadata)
        );
    }
    
    private ChatOptions mergeOptions(PrivateQwenChatOptions defaultOpts, 
                                      ChatOptions requestOpts) {
        if (requestOpts == null) return defaultOpts.copy();
        
        if (requestOpts instanceof PrivateQwenChatOptions reqOpts) {
            PrivateQwenChatOptions merged = (PrivateQwenChatOptions) defaultOpts.copy();
            if (reqOpts.getModel() != null) merged.setModel(reqOpts.getModel());
            if (reqOpts.getTemperature() != null) merged.setTemperature(reqOpts.getTemperature());
            if (reqOpts.getMaxTokens() != null) merged.setMaxTokens(reqOpts.getMaxTokens());
            if (reqOpts.getBusinessUnit() != null) merged.setBusinessUnit(reqOpts.getBusinessUnit());
            return merged;
        }
        
        return defaultOpts.copy();
    }
}

第三步：写AutoConfiguration——这才是重头戏

很多教程到上面那步就结束了，但真正要做成一个可复用的插件，AutoConfiguration是必须要做好的。

首先定义Properties类：

@Data
@ConfigurationProperties(prefix = "spring.ai.private-qwen")
public class PrivateQwenProperties {
    
    /**
     * 是否启用私有Qwen
     */
    private boolean enabled = true;
    
    /**
     * API基础地址
     */
    @NotBlank
    private String baseUrl;
    
    /**
     * API密钥
     */
    @NotBlank
    private String apiKey;
    
    /**
     * 默认模型
     */
    private String defaultModel = "qwen-private-v2";
    
    /**
     * 默认温度
     */
    private Double defaultTemperature = 0.7;
    
    /**
     * 最大Token数
     */
    private Integer maxTokens = 2048;
    
    /**
     * 连接超时（毫秒）
     */
    private int connectTimeout = 5000;
    
    /**
     * 读取超时（毫秒）
     */
    private int readTimeout = 60000;
}

然后是AutoConfiguration类，这里有几个细节要注意：

@AutoConfiguration
@ConditionalOnClass(ChatModel.class)
@EnableConfigurationProperties(PrivateQwenProperties.class)
@ConditionalOnProperty(
    prefix = "spring.ai.private-qwen", 
    name = "enabled", 
    havingValue = "true", 
    matchIfMissing = true
)
@Import(PrivateQwenConnectionConfiguration.class)
public class PrivateQwenAutoConfiguration {
    
    @Bean
    @ConditionalOnMissingBean(PrivateQwenChatModel.class)
    public PrivateQwenChatModel privateQwenChatModel(
            PrivateQwenProperties properties,
            RestClient.Builder restClientBuilder,
            ObjectProvider<HttpClient> httpClientProvider) {
        
        // 自定义HTTP客户端配置
        HttpClient httpClient = httpClientProvider.getIfAvailable(() -> 
            HttpClient.newBuilder()
                .connectTimeout(Duration.ofMillis(properties.getConnectTimeout()))
                .build()
        );
        
        RestClient.Builder customBuilder = restClientBuilder.clone()
                .requestFactory(new JdkClientHttpRequestFactory(httpClient));
        
        return new PrivateQwenChatModel(properties, customBuilder);
    }
    
    @Bean
    @ConditionalOnMissingBean
    @ConditionalOnBean(PrivateQwenChatModel.class)
    public ChatClient.Builder privateQwenChatClientBuilder(
            PrivateQwenChatModel chatModel,
            List<ChatClientCustomizer> customizers) {
        
        ChatClient.Builder builder = ChatClient.builder(chatModel);
        customizers.forEach(c -> c.customize(builder));
        return builder;
    }
}

这里@ConditionalOnMissingBean很重要——如果用户自己定义了PrivateQwenChatModel的Bean，就不要自动创建了，让用户自己的配置生效。这是Spring Boot AutoConfiguration的基本礼仪。

还有一个连接配置类，单独抽出来是为了更好的复用：

@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(RestClient.class)
class PrivateQwenConnectionConfiguration {
    
    @Bean
    @ConditionalOnMissingBean
    @Scope("prototype")  // 注意这里是prototype！
    public RestClient.Builder privateQwenRestClientBuilder(
            PrivateQwenProperties properties) {
        
        return RestClient.builder()
                .baseUrl(properties.getBaseUrl())
                .defaultHeaders(headers -> {
                    headers.setBearerAuth(properties.getApiKey());
                    headers.setContentType(MediaType.APPLICATION_JSON);
                    headers.set("X-Client-Version", "spring-ai-1.0");
                });
    }
}

第四步：注册AutoConfiguration

这步很多人会忘，导致自动配置不生效然后一脸懵逼。

在src/main/resources/META-INF/spring/目录下创建文件org.springframework.boot.autoconfigure.AutoConfiguration.imports：

com.yourcompany.ai.qwen.PrivateQwenAutoConfiguration

注意：Spring Boot 2.x用的是spring.factories，Spring Boot 3.x改成了上面这个文件。如果你还在用2.x，需要在spring.factories里加：

org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
  com.yourcompany.ai.qwen.PrivateQwenAutoConfiguration

第五步：加上自动配置顺序控制

如果你的项目里同时有OpenAI和私有Qwen，需要控制Bean的注册顺序，避免冲突：

@AutoConfiguration(after = OpenAiAutoConfiguration.class)
// 或者
@AutoConfigureAfter(OpenAiAutoConfiguration.class)
public class PrivateQwenAutoConfiguration {
    // ...
}

整体架构流转图

一个容易被忽视的问题：多模型切换

有时候业务需要根据场景切换模型，比如简单问题用小模型省成本，复杂推理用大模型。Spring AI里有个ModelSelector的概念，但默认实现比较简单，我们可以自己扩展：

@Component
public class BusinessAwareChatModelSelector {
    
    private final Map<String, ChatModel> modelRegistry;
    
    public BusinessAwareChatModelSelector(
            @Qualifier("privateQwenChatModel") ChatModel qwenModel,
            @Qualifier("openAiChatModel") ChatModel openAiModel) {
        this.modelRegistry = Map.of(
                "qwen", qwenModel,
                "openai", openAiModel
        );
    }
    
    public ChatModel selectForTask(String taskType, int complexityScore) {
        // 简单任务用内部模型，降低成本
        if (complexityScore < 50 || "simple_qa".equals(taskType)) {
            return modelRegistry.get("qwen");
        }
        // 复杂推理用OpenAI
        return modelRegistry.get("openai");
    }
}

结合Spring AOP可以做得更优雅，比如通过注解来声明用哪个模型：

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface UseAIModel {
    String value() default "qwen";
    int complexityThreshold() default 50;
}

@Aspect
@Component
public class AIModelRoutingAspect {
    
    private final BusinessAwareChatModelSelector selector;
    private final ThreadLocal<ChatModel> currentModel = new ThreadLocal<>();
    
    // 拦截带有@UseAIModel注解的方法，在执行前设置当前模型
    @Around("@annotation(useAIModel)")
    public Object routeModel(ProceedingJoinPoint pjp, UseAIModel useAIModel) throws Throwable {
        ChatModel selected = selector.selectForTask(
                useAIModel.value(), 
                useAIModel.complexityThreshold()
        );
        currentModel.set(selected);
        try {
            return pjp.proceed();
        } finally {
            currentModel.remove();
        }
    }
    
    public ChatModel getCurrentModel() {
        return currentModel.get();
    }
}

踩坑总结

回顾这段经历，有几个坑值得单独说一下：

坑1：RestClient.Builder是共享的

Spring Boot默认的RestClient.Builder是一个共享的prototype Bean，如果你在AutoConfiguration里直接注入它然后修改，会影响到其他地方用到这个Builder的代码。正确做法是restClientBuilder.clone()，然后在clone上修改。

坑2：Options的线程安全

ChatOptions对象在并发调用时可能被共享，如果你的Options实现里有可变状态，一定要在copy()方法里做深拷贝。我之前漏了businessUnit字段的拷贝，导致一个租户的参数跑到另一个租户的请求里，在生产上发现这个bug的时候差点没冷静下来。

坑3：流式响应的背压处理

用Flux处理流式响应时，如果下游消费速度跟不上，会有背压问题。默认情况下Reactor会抛MissingBackpressureException。需要加上：

.onBackpressureBuffer(1000, dropped -> log.warn("背压丢弃: {}", dropped))

坑4：AutoConfiguration的条件注解顺序

@ConditionalOnClass要在@ConditionalOnProperty之前检查，因为如果类都不存在，后面的条件也没有意义。Spring Boot会按注解声明顺序处理，所以记得把@ConditionalOnClass放在前面。

完整的目录结构

一个规范的私有模型插件应该长这样：

private-qwen-spring-ai-starter/
├── src/main/java/com/yourcompany/ai/qwen/
│   ├── autoconfigure/
│   │   ├── PrivateQwenAutoConfiguration.java
│   │   ├── PrivateQwenConnectionConfiguration.java
│   │   └── PrivateQwenProperties.java
│   ├── model/
│   │   ├── PrivateQwenChatModel.java
│   │   └── PrivateQwenChatOptions.java
│   ├── api/
│   │   ├── PrivateQwenRequest.java
│   │   ├── PrivateQwenResponse.java
│   ├── metadata/
│   │   └── PrivateQwenChatResponseMetadata.java
│   └── selector/
│       └── BusinessAwareChatModelSelector.java
└── src/main/resources/
    └── META-INF/spring/
        └── org.springframework.boot.autoconfigure.AutoConfiguration.imports

把这个打成jar包，引入任何Spring AI项目就能用，配置文件里加几行spring.ai.private-qwen.*的配置，模型就接进来了。这才是真正的"开箱即用"。

写这篇的时候我顺便翻了一下Spring AI的官方issue列表，发现关于自定义模型接入的问题是被提问最多的几类之一。希望这篇能帮到更多有同样需求的同学。下篇我们来聊私有化部署LLM更大的话题——从选型到上线的完整方案。