第1658篇：多云AI部署架构——避免厂商锁定的技术选型与迁移路径

老张2026/4/30大约 10 分钟

第1658篇：多云AI部署架构——避免厂商锁定的技术选型与迁移路径

公司刚做AI转型那会儿，所有东西都堆在AWS上：SageMaker做训练，Lambda做推理，OpenSearch做向量检索，Bedrock调用基础模型。架构图画出来挺好看的，也确实省了不少自建的工夫。

直到有一天，产品提了一个需求：要支持私有化部署给金融客户。AWS Bedrock用不了，SageMaker也用不了，整套架构要重新设计。

那时候才意识到，我们已经深度锁定在一个云厂商的专有服务上了。这种锁定不是设计上的主动选择，而是"用着方便就用了"自然演进出来的结果。

这篇文章就从这个教训出发，聊聊AI系统如何做到可迁移、不被厂商锁定，以及如果已经锁定了，迁移路径怎么规划。

厂商锁定的几种形态

首先要认识到，不是所有的厂商依赖都需要避免。锁定有几种不同的程度：

轻度锁定（可接受）：使用云厂商的通用计算服务（EC2/ECS/EKS、VM、K8s），这些服务虽然细节有差异，但迁移成本不高。

中度锁定（需要注意）：使用云厂商的托管中间件（RDS、Redis、消息队列），有同类开源替代品，可以自建，迁移有一定代价但可控。

重度锁定（尽量避免）：使用云厂商的专有AI服务（AWS Bedrock、Google Vertex AI、Azure OpenAI Service），这些服务的API不兼容，迁移时代码需要大量改动。使用厂商专有的AI训练框架和模型格式，迁移时可能还需要重新训练模型。

AI系统里最容易陷入重度锁定的几个点：

模型调用的API层：各家LLM的API格式不统一，OpenAI有自己的格式，Claude有自己的，国内百度文心有自己的
向量数据库选型：Pinecone是完全托管的云服务，没有开源版本，用了Pinecone就绑定了
ML平台：AWS SageMaker和Google Vertex AI的训练任务提交方式完全不同
对象存储路径：虽然S3兼容协议很多，但如果代码里硬编码了s3://路径，换阿里云OSS就要改很多地方

架构层面的解耦策略

抽象LLM调用层

最重要的一个设计：不要在业务代码里直接调用具体的LLM客户端，而是通过一个抽象层。

// LLM调用的统一抽象接口
public interface LLMProvider {

    /**
     * 发起聊天对话
     */
    CompletableFuture<LLMResponse> chat(LLMRequest request);

    /**
     * 流式聊天
     */
    Flux<LLMStreamChunk> chatStream(LLMRequest request);

    /**
     * 生成嵌入向量
     */
    CompletableFuture<float[]> embed(String text);

    /**
     * 获取提供商名称
     */
    String getProviderName();

    /**
     * 是否可用（用于健康检查）
     */
    boolean isAvailable();
}

// 统一的请求对象（和任何具体provider解耦）
@Data
@Builder
public class LLMRequest {
    private List<Message> messages;
    private String model;
    private Integer maxTokens;
    private Double temperature;
    private Double topP;
    private Boolean stream;
    private String systemPrompt;
    private ResponseFormat responseFormat;
}

然后各个厂商的实现分别实现这个接口：

// OpenAI实现
@Component("openai-provider")
public class OpenAIProvider implements LLMProvider {

    private final OpenAiService openAiService;

    @Override
    public CompletableFuture<LLMResponse> chat(LLMRequest request) {
        ChatCompletionRequest openAiRequest = ChatCompletionRequest.builder()
            .model(request.getModel() != null ? request.getModel() : "gpt-4o")
            .messages(convertMessages(request.getMessages()))
            .maxTokens(request.getMaxTokens())
            .temperature(request.getTemperature())
            .build();

        return CompletableFuture.supplyAsync(() -> {
            ChatCompletionResult result = openAiService.createChatCompletion(openAiRequest);
            return LLMResponse.builder()
                .content(result.getChoices().get(0).getMessage().getContent())
                .inputTokens(result.getUsage().getPromptTokens())
                .outputTokens(result.getUsage().getCompletionTokens())
                .provider("openai")
                .model(result.getModel())
                .build();
        });
    }

    @Override
    public String getProviderName() { return "openai"; }

    @Override
    public boolean isAvailable() {
        try {
            // 发一个最小成本的请求验证可用性
            openAiService.listModels();
            return true;
        } catch (Exception e) {
            return false;
        }
    }
}

// Claude实现
@Component("claude-provider")
public class ClaudeProvider implements LLMProvider {

    private final AnthropicClient anthropicClient;

    @Override
    public CompletableFuture<LLMResponse> chat(LLMRequest request) {
        // 把通用请求格式转换为Claude的API格式
        MessageCreateParams params = MessageCreateParams.builder()
            .model(request.getModel() != null ? request.getModel() : "claude-3-5-sonnet-20241022")
            .maxTokens(request.getMaxTokens() != null ? request.getMaxTokens() : 4096)
            .messages(convertMessages(request.getMessages()))
            .build();

        return CompletableFuture.supplyAsync(() -> {
            Message message = anthropicClient.messages().create(params);
            String content = message.content().stream()
                .filter(block -> block instanceof ContentBlock.Text)
                .map(block -> ((ContentBlock.Text) block).text())
                .findFirst()
                .orElse("");

            return LLMResponse.builder()
                .content(content)
                .inputTokens(message.usage().inputTokens())
                .outputTokens(message.usage().outputTokens())
                .provider("anthropic")
                .model(message.model().toString())
                .build();
        });
    }
}

// 私有部署模型实现（通过OpenAI兼容API）
@Component("private-model-provider")
public class PrivateModelProvider implements LLMProvider {

    private final String baseUrl;
    private final OpenAiService localService;

    public PrivateModelProvider(@Value("${private.model.base-url}") String baseUrl) {
        this.baseUrl = baseUrl;
        // 很多开源推理框架（vLLM、Ollama）都支持OpenAI兼容API
        OpenAiService service = new OpenAiService(
            OpenAiApi.builder()
                .baseUrl(baseUrl)
                .apiKey("not-needed")  // 私有部署不需要真实Key
                .build()
        );
        this.localService = service;
    }
}

在业务服务层，通过LLMProviderRouter来决定用哪个Provider：

@Service
@Slf4j
public class LLMProviderRouter {

    private final Map<String, LLMProvider> providers;
    private final LLMRoutingConfig routingConfig;

    public LLMProviderRouter(List<LLMProvider> providerList,
                              LLMRoutingConfig routingConfig) {
        this.providers = providerList.stream()
            .collect(Collectors.toMap(LLMProvider::getProviderName,
                                      Function.identity()));
        this.routingConfig = routingConfig;
    }

    public LLMProvider selectProvider(LLMRequest request) {
        // 1. 优先根据请求中指定的Provider
        if (request.getPreferredProvider() != null) {
            LLMProvider specified = providers.get(request.getPreferredProvider());
            if (specified != null && specified.isAvailable()) {
                return specified;
            }
        }

        // 2. 根据路由配置选择（可以基于模型类型、成本、优先级）
        String primaryProvider = routingConfig.getPrimaryProvider();
        LLMProvider primary = providers.get(primaryProvider);
        if (primary != null && primary.isAvailable()) {
            return primary;
        }

        // 3. 自动故障转移到备用Provider
        log.warn("主Provider {} 不可用，尝试备用Provider", primaryProvider);
        return providers.values().stream()
            .filter(LLMProvider::isAvailable)
            .findFirst()
            .orElseThrow(() -> new RuntimeException("没有可用的LLM Provider"));
    }

    public CompletableFuture<LLMResponse> chat(LLMRequest request) {
        LLMProvider provider = selectProvider(request);
        log.debug("使用Provider: {}, 模型: {}", provider.getProviderName(),
            request.getModel());
        return provider.chat(request);
    }
}

这个抽象层做好之后，从OpenAI切换到Claude，或者从云端切换到私有部署，只需要改路由配置，业务代码零改动。

向量数据库的可迁移设计

向量数据库的选型锁定风险也很高，主要风险在于：

Pinecone是纯SaaS服务，没有开源版本
不同向量数据库的SDK和查询语法各不相同
索引创建和配置方式差异很大

解法同样是抽象层：

public interface VectorStore {

    /**
     * 存储向量
     */
    void upsert(String collectionName, List<VectorDocument> documents);

    /**
     * 相似度搜索
     */
    List<VectorSearchResult> search(String collectionName, float[] queryVector,
                                    int topK, Map<String, Object> filter);

    /**
     * 按ID删除
     */
    void delete(String collectionName, List<String> ids);

    /**
     * 创建集合
     */
    void createCollection(String collectionName, int dimension,
                          MetricType metricType);
}

@Data
@Builder
public class VectorDocument {
    private String id;
    private float[] vector;
    private Map<String, Object> metadata;
    private String content;
}

@Data
@Builder
public class VectorSearchResult {
    private String id;
    private float score;
    private Map<String, Object> metadata;
    private String content;
}

Milvus实现：

@Component
@ConditionalOnProperty(name = "vector-store.type", havingValue = "milvus")
public class MilvusVectorStore implements VectorStore {

    private final MilvusServiceClient milvusClient;

    @Override
    public List<VectorSearchResult> search(String collectionName, float[] queryVector,
                                            int topK, Map<String, Object> filter) {
        SearchParam searchParam = SearchParam.newBuilder()
            .withCollectionName(collectionName)
            .withVectors(List.of(queryVector))
            .withTopK(topK)
            .withVectorFieldName("embedding")
            .withParams("{\"nprobe\": 16}")
            .withOutFields(List.of("content", "metadata"))
            .build();

        if (filter != null && !filter.isEmpty()) {
            searchParam = SearchParam.newBuilder()
                .withExpr(buildFilterExpression(filter))
                .build();
        }

        R<SearchResults> response = milvusClient.search(searchParam);
        return convertSearchResults(response.getData());
    }
}

// Weaviate实现（开源替代）
@Component
@ConditionalOnProperty(name = "vector-store.type", havingValue = "weaviate")
public class WeaviateVectorStore implements VectorStore {

    private final WeaviateClient weaviateClient;

    @Override
    public List<VectorSearchResult> search(String collectionName, float[] queryVector,
                                            int topK, Map<String, Object> filter) {
        Result<GraphQLResponse> result = weaviateClient.graphQL().get()
            .withClassName(collectionName)
            .withNearVector(NearVectorArgument.builder()
                .vector(toFloatArray(queryVector))
                .build())
            .withLimit(topK)
            .withFields(Field.builder().name("content").build(),
                        Field.builder().name("_additional")
                            .withFields(Field.builder().name("certainty").build())
                            .build())
            .run();

        return convertWeaviateResults(result.getResult());
    }
}

通过@ConditionalOnProperty，根据配置文件里的vector-store.type来决定使用哪个实现。切换向量数据库只需要改一行配置。

对象存储的统一访问层

模型文件、训练数据、推理日志都存在对象存储里。不同厂商的对象存储API虽然都有S3兼容模式，但SDK使用方式、认证方式、特殊功能（生命周期策略、加速域名）都有差异。

推荐用Apache Commons VFS或自己封装一个简单的接口：

public interface ObjectStorageClient {

    /**
     * 上传文件
     */
    void upload(String bucketName, String key, InputStream content, long contentLength);

    /**
     * 下载文件
     */
    InputStream download(String bucketName, String key);

    /**
     * 列出文件
     */
    List<StorageObject> list(String bucketName, String prefix);

    /**
     * 生成预签名URL
     */
    String generatePresignedUrl(String bucketName, String key, Duration expiry);

    /**
     * 删除文件
     */
    void delete(String bucketName, String key);
}

// AWS S3实现
@Component
@ConditionalOnProperty(name = "storage.provider", havingValue = "aws")
public class S3StorageClient implements ObjectStorageClient {

    private final S3Client s3Client;

    @Override
    public void upload(String bucketName, String key, InputStream content,
                       long contentLength) {
        s3Client.putObject(PutObjectRequest.builder()
            .bucket(bucketName)
            .key(key)
            .build(),
            RequestBody.fromInputStream(content, contentLength));
    }
}

// 阿里云OSS实现
@Component
@ConditionalOnProperty(name = "storage.provider", havingValue = "aliyun")
public class OSSStorageClient implements ObjectStorageClient {

    private final OSS ossClient;

    @Override
    public void upload(String bucketName, String key, InputStream content,
                       long contentLength) {
        ObjectMetadata metadata = new ObjectMetadata();
        metadata.setContentLength(contentLength);
        ossClient.putObject(bucketName, key, content, metadata);
    }
}

// MinIO实现（私有化部署）
@Component
@ConditionalOnProperty(name = "storage.provider", havingValue = "minio")
public class MinIOStorageClient implements ObjectStorageClient {
    // MinIO也支持S3 API，可以直接用AWS S3 SDK
    private final S3Client s3Client;
    // 配置endpoint指向MinIO
}

迁移路径规划：如果已经深度锁定了怎么办

如果现在的系统已经深度锁定在某个云厂商，不是要立刻全部迁移，而是在新开发的部分使用解耦架构，逐步替换旧的实现。

迁移的四个阶段

阶段1：引入抽象层（不动现有代码） 先设计好接口和抽象层，但暂时只实现当前云厂商的适配器。这一步的投入很小，但为后续迁移打好基础。

阶段2：新功能使用解耦架构 新开发的功能全部通过抽象层来调用，不直接依赖云厂商的SDK。

阶段3：用适配器包裹现有代码 把现有的直接调用云厂商SDK的代码，用适配器类包裹起来，让它实现统一接口。这一步不改变实际行为，只是把调用路径统一起来。

阶段4：逐步迁移实际实现 当需要切换云厂商（或者做多云部署）时，新增一个实现适配器，通过配置切换。旧的适配器保留一段时间，验证没问题后再删除。

灰度迁移：流量分割逐步迁移

对于关键的LLM调用，迁移时不要一次性全切，用流量分割逐步验证：

@Service
public class GraduatedMigrationRouter {

    private final LLMProvider oldProvider;    // 当前用的Provider
    private final LLMProvider newProvider;    // 要迁移到的Provider
    private final AtomicInteger migrationPercentage;  // 当前迁移比例

    private final Random random = new Random();

    public CompletableFuture<LLMResponse> chat(LLMRequest request) {
        int percentage = migrationPercentage.get();
        boolean useNewProvider = random.nextInt(100) < percentage;

        LLMProvider selectedProvider = useNewProvider ? newProvider : oldProvider;

        return selectedProvider.chat(request)
            .whenComplete((result, error) -> {
                // 记录对比指标：新老Provider的延迟、错误率
                if (useNewProvider) {
                    recordMetric("new_provider", result != null, getLatency());
                } else {
                    recordMetric("old_provider", result != null, getLatency());
                }
            });
    }

    // 外部可以动态调整迁移比例
    public void setMigrationPercentage(int percentage) {
        migrationPercentage.set(percentage);
    }
}

观察新老Provider的错误率和延迟对比，确认新Provider表现正常后，逐步提高迁移比例。这比"一刀切"式的迁移风险低得多。

成本的现实考量

说完技术解耦，也要说一个现实：解耦是有成本的。

抽象层引入了额外的代码复杂度，每新增一个Provider都需要写适配器，测试也要覆盖多个Provider。如果你的业务确定不会迁移云，也不需要私有化部署，那过度解耦反而是浪费。

我的建议是：

高优先级解耦：LLM调用层（Provider可能随时切换或故障转移）、向量数据库（开源和托管产品选型变化频率高）

中优先级解耦：对象存储（基本都有S3兼容API，实际迁移成本不高，用统一接口更整洁）

低优先级解耦：训练平台（切换训练平台的场景少，且代价更多在数据迁移而非代码）

不需要追求在所有层面都做到完全的云厂商无关，把精力放在最关键、最容易锁定的地方。

总结

避免AI系统的厂商锁定，核心是在关键的集成点引入抽象层。最重要的三个点：LLM调用抽象、向量数据库抽象、对象存储抽象。

如果已经深度锁定了，不要想着一次性全部迁移，四阶段渐进式迁移（引入抽象→新功能解耦→旧代码包裹→逐步替换实现）是更稳健的路径。

多云架构不是目标，可迁移能力才是目标。有些公司确实需要多云（监管要求、灾备需求），但更多公司真正需要的只是"不被一家厂商拿捏"，有随时可以切换的能力，而不是真的要同时运行在多个云上。