第1790篇：多租户AI服务的隔离设计——让不同客户的数据和行为互不干扰

老张2026/4/30大约 11 分钟

第1790篇：多租户AI服务的隔离设计——让不同客户的数据和行为互不干扰

适读人群：正在或准备把AI服务做成SaaS产品的工程师 | 阅读时长：约20分钟 | 核心价值：理解多租户AI服务的核心隔离问题，掌握向量库、模型调用、费用控制三个维度的隔离方案

去年我帮一家做B端SaaS的公司做AI功能接入，他们的产品是一个面向企业客户的知识管理平台，每个企业客户上传自己的内部文档，用AI来做智能问答。

需求看起来简单，技术实现上却踩了很多坑。

最让人头疼的问题不是AI效果，而是数据隔离。

A公司的文档不能被B公司的用户搜到。A公司的对话历史不能泄漏给B公司。如果A公司用了大量的Embedding配额，不能影响B公司的响应速度。如果B公司要求删除他们的所有数据，一键删除功能要做到位。

每一个"不能"背后，都是一套专门的工程设计。今天把这些设计系统地讲一遍。

先想清楚：多租户隔离的三个层次

做多租户设计之前，先把隔离的层次想清楚，不同层次的隔离有不同的成本和适用场景。

物理隔离：每个租户跑独立的向量数据库实例、独立的模型服务。隔离最彻底，但成本极高，只适合高端付费客户。

逻辑隔离：共享基础设施，用租户标识（tenantId）在查询层面过滤数据。大多数中小规模SaaS的选择。

我们后面主要讲逻辑隔离，这是性价比最高的方案。

向量库的多租户隔离

这是AI服务多租户设计里最核心也最容易出错的部分。

方案一：用Collection/Namespace隔离

大多数向量数据库支持多Collection（Milvus叫Collection，Qdrant叫Collection，Chroma叫Collection，Pinecone叫Namespace）。最简单的隔离方案是每个租户一个Collection：

@Service
public class MultiTenantVectorStoreService {
    
    private final QdrantClient qdrantClient;
    
    private String getCollectionName(String tenantId) {
        // 每个租户对应一个Collection
        return "knowledge_" + tenantId;
    }
    
    public void ensureCollectionExists(String tenantId) {
        String collectionName = getCollectionName(tenantId);
        try {
            qdrantClient.getCollectionInfoAsync(collectionName).get();
        } catch (Exception e) {
            // Collection不存在，创建它
            qdrantClient.createCollectionAsync(
                collectionName,
                VectorsConfig.newBuilder()
                    .setParams(VectorParams.newBuilder()
                        .setSize(1536)  // OpenAI embedding维度
                        .setDistance(Distance.Cosine)
                        .build())
                    .build()
            ).join();
            log.info("为租户{}创建向量Collection: {}", tenantId, collectionName);
        }
    }
    
    public void addDocuments(String tenantId, List<Document> documents) {
        ensureCollectionExists(tenantId);
        String collectionName = getCollectionName(tenantId);
        
        List<PointStruct> points = documents.stream()
            .map(doc -> PointStruct.newBuilder()
                .setId(PointId.newBuilder().setUuid(doc.getId()).build())
                .setVectors(Vectors.newBuilder()
                    .setVector(Vector.newBuilder()
                        .addAllData(toFloatList(doc.getEmbedding()))
                        .build())
                    .build())
                .putAllPayload(toPayload(doc))
                .build())
            .collect(Collectors.toList());
        
        qdrantClient.upsertAsync(collectionName, points).join();
    }
    
    public List<Document> search(String tenantId, float[] queryEmbedding, int topK) {
        String collectionName = getCollectionName(tenantId);
        
        List<ScoredPoint> results = qdrantClient.searchAsync(
            SearchPoints.newBuilder()
                .setCollectionName(collectionName)
                .addAllVector(toFloatList(queryEmbedding))
                .setLimit(topK)
                .setWithPayload(WithPayloadSelector.newBuilder()
                    .setEnable(true).build())
                .build()
        ).join();
        
        return results.stream()
            .map(this::toDocument)
            .collect(Collectors.toList());
    }
    
    public void deleteTenantData(String tenantId) {
        // 直接删除整个Collection，简单干净
        String collectionName = getCollectionName(tenantId);
        qdrantClient.deleteCollectionAsync(collectionName).join();
        log.info("已删除租户{}的向量数据Collection: {}", tenantId, collectionName);
    }
}

这个方案的优点是实现简单，隔离彻底，删除租户数据也极方便（直接删Collection）。

缺点是当租户数量多的时候（比如几千个），Collection数量暴涨，对向量数据库的元数据管理有压力。

方案二：共享Collection + 租户ID过滤

当租户数量多、但每个租户数据量少的时候，更合适的方案是共享Collection，在每个向量点的payload里存储tenantId，查询时加过滤条件：

@Service
public class SharedCollectionVectorStore {
    
    private static final String COLLECTION_NAME = "shared_knowledge";
    private final QdrantClient qdrantClient;
    
    public void addDocuments(String tenantId, List<Document> documents) {
        List<PointStruct> points = documents.stream()
            .map(doc -> {
                Map<String, Value> payload = new HashMap<>(toPayload(doc));
                // 把tenantId存入payload，作为过滤条件
                payload.put("tenant_id", 
                    Value.newBuilder().setStringValue(tenantId).build());
                
                return PointStruct.newBuilder()
                    .setId(PointId.newBuilder().setUuid(doc.getId()).build())
                    .setVectors(buildVectors(doc.getEmbedding()))
                    .putAllPayload(payload)
                    .build();
            })
            .collect(Collectors.toList());
        
        qdrantClient.upsertAsync(COLLECTION_NAME, points).join();
    }
    
    public List<Document> search(String tenantId, float[] queryEmbedding, int topK) {
        // 关键：必须加租户过滤条件
        Filter tenantFilter = Filter.newBuilder()
            .addMust(Condition.newBuilder()
                .setField(FieldCondition.newBuilder()
                    .setKey("tenant_id")
                    .setMatch(Match.newBuilder()
                        .setValue(MatchValue.newBuilder()
                            .setStringValue(tenantId)
                            .build())
                        .build())
                    .build())
                .build())
            .build();
        
        List<ScoredPoint> results = qdrantClient.searchAsync(
            SearchPoints.newBuilder()
                .setCollectionName(COLLECTION_NAME)
                .addAllVector(toFloatList(queryEmbedding))
                .setFilter(tenantFilter)
                .setLimit(topK)
                .setWithPayload(WithPayloadSelector.newBuilder()
                    .setEnable(true).build())
                .build()
        ).join();
        
        return results.stream()
            .map(this::toDocument)
            .collect(Collectors.toList());
    }
    
    public void deleteTenantData(String tenantId) {
        // 按tenantId批量删除该租户的所有向量点
        Filter tenantFilter = Filter.newBuilder()
            .addMust(Condition.newBuilder()
                .setField(FieldCondition.newBuilder()
                    .setKey("tenant_id")
                    .setMatch(Match.newBuilder()
                        .setValue(MatchValue.newBuilder()
                            .setStringValue(tenantId).build())
                        .build())
                    .build())
                .build())
            .build();
        
        qdrantClient.deleteAsync(
            COLLECTION_NAME,
            PointsSelector.newBuilder().setFilter(tenantFilter).build()
        ).join();
    }
}

这个方案有一个非常重要的安全考虑：租户过滤绝不能是可选的。

我见过一个实现，用了这个方案，但把tenantId过滤放在了Service层，结果有人调用底层方法的时候忘记加了，数据直接泄漏给了其他租户。

推荐的做法是把tenantId作为强制参数放在方法签名里，或者用AOP拦截，在方法调用层面自动注入租户过滤：

@Aspect
@Component
public class TenantFilterAspect {
    
    @Autowired
    private TenantContext tenantContext;
    
    @Around("@annotation(TenantIsolated)")
    public Object enforceTenantIsolation(ProceedingJoinPoint pjp) throws Throwable {
        String currentTenantId = tenantContext.getCurrentTenantId();
        if (currentTenantId == null) {
            throw new SecurityException("未设置租户上下文，拒绝执行");
        }
        return pjp.proceed();
    }
}

// 使用方式
@TenantIsolated
public List<Document> search(float[] queryEmbedding, int topK) {
    // tenantId从TenantContext获取，不需要调用方传入
    String tenantId = tenantContext.getCurrentTenantId();
    return doSearch(tenantId, queryEmbedding, topK);
}

对话历史的隔离存储

知识库隔离之外，对话历史同样需要隔离。

我见过最简单也最常见的错误：用 sessionId 作为Key存对话历史，但 sessionId 是客户端生成的，没有跟 tenantId 绑定验证。

// 错误示范：只用sessionId，没有租户绑定
public ChatHistory getHistory(String sessionId) {
    return redis.get("history:" + sessionId);  // 任何人拿到sessionId就能读这段对话
}

// 正确做法：sessionId必须和tenantId绑定
public ChatHistory getHistory(String tenantId, String sessionId) {
    String key = "history:" + tenantId + ":" + sessionId;
    ChatHistory history = redis.get(key);
    if (history != null && !history.getTenantId().equals(tenantId)) {
        // 防止Key猜测攻击
        throw new SecurityException("会话不属于当前租户");
    }
    return history;
}

另外，对话历史要设置TTL（过期时间），既是存储管理，也是数据合规的要求：

@Service
public class ConversationHistoryService {
    
    private final RedisTemplate<String, String> redisTemplate;
    
    // 默认保留7天，企业客户可配置
    private static final Duration DEFAULT_TTL = Duration.ofDays(7);
    
    public void saveMessage(String tenantId, String sessionId, ChatMessage message) {
        String key = buildKey(tenantId, sessionId);
        String value = objectMapper.writeValueAsString(message);
        
        redisTemplate.opsForList().rightPush(key, value);
        // 每次写入时刷新TTL，保证活跃会话不会提前过期
        redisTemplate.expire(key, getTtlForTenant(tenantId));
    }
    
    public List<ChatMessage> getHistory(String tenantId, String sessionId, int maxMessages) {
        String key = buildKey(tenantId, sessionId);
        // 只取最近N条，避免过长历史导致prompt超长
        List<String> rawMessages = redisTemplate.opsForList()
            .range(key, -maxMessages, -1);
        
        return rawMessages.stream()
            .map(s -> objectMapper.readValue(s, ChatMessage.class))
            .collect(Collectors.toList());
    }
    
    public void clearSession(String tenantId, String sessionId) {
        redisTemplate.delete(buildKey(tenantId, sessionId));
    }
    
    public void clearAllTenantData(String tenantId) {
        // GDPR删除权：清除该租户的所有对话历史
        Set<String> keys = redisTemplate.keys("history:" + tenantId + ":*");
        if (!keys.isEmpty()) {
            redisTemplate.delete(keys);
        }
        log.info("已清除租户{}的所有对话历史, 共{}条Key", tenantId, keys.size());
    }
    
    private String buildKey(String tenantId, String sessionId) {
        return "history:" + tenantId + ":" + sessionId;
    }
    
    private Duration getTtlForTenant(String tenantId) {
        // 从租户配置里读TTL，支持按套餐配置不同的保留期
        TenantConfig config = tenantConfigService.getConfig(tenantId);
        return config != null ? config.getHistoryTtl() : DEFAULT_TTL;
    }
}

计算资源隔离：限速和配额控制

多租户系统里一个常见问题：某个大客户（或者被攻击的账号）疯狂调用API，把共享的LLM配额耗尽，影响其他租户的正常使用。

这需要在应用层做限速和配额控制。

令牌桶限速（控制每分钟请求数）：

@Component
public class TenantRateLimiter {
    
    // 每个租户一个令牌桶，懒初始化
    private final ConcurrentHashMap<String, RateLimiter> limiters = new ConcurrentHashMap<>();
    
    private RateLimiter getLimiter(String tenantId) {
        return limiters.computeIfAbsent(tenantId, id -> {
            TenantConfig config = tenantConfigService.getConfig(id);
            // 从租户配置读取限速，免费版10次/分钟，付费版100次/分钟
            double ratePerSecond = config.getRequestsPerMinute() / 60.0;
            return RateLimiter.create(ratePerSecond);
        });
    }
    
    public boolean tryAcquire(String tenantId) {
        return getLimiter(tenantId).tryAcquire(100, TimeUnit.MILLISECONDS);
    }
}

// 在AI调用入口统一拦截
@Service
public class AiChatGateway {
    
    @Autowired
    private TenantRateLimiter rateLimiter;
    
    public ChatResponse chat(String tenantId, String query) {
        if (!rateLimiter.tryAcquire(tenantId)) {
            throw new RateLimitExceededException("请求频率超过限制，请稍后再试");
        }
        return aiChatService.processQuery(tenantId, query);
    }
}

Token配额控制（按月限制Token用量）：

@Service
public class TokenQuotaService {
    
    private final RedisTemplate<String, String> redisTemplate;
    
    private String getQuotaKey(String tenantId) {
        // 按月统计配额，Key包含年月
        YearMonth current = YearMonth.now();
        return "quota:" + tenantId + ":" + current.toString();
    }
    
    public boolean checkAndDeduct(String tenantId, int estimatedTokens) {
        TenantConfig config = tenantConfigService.getConfig(tenantId);
        long monthlyLimit = config.getMonthlyTokenLimit();
        
        String key = getQuotaKey(tenantId);
        
        // 用Lua脚本保证原子性
        String luaScript = 
            "local current = tonumber(redis.call('get', KEYS[1]) or '0')\n" +
            "local limit = tonumber(ARGV[1])\n" +
            "local add = tonumber(ARGV[2])\n" +
            "if current + add > limit then\n" +
            "  return 0\n" +
            "else\n" +
            "  redis.call('incrby', KEYS[1], add)\n" +
            "  redis.call('expire', KEYS[1], 2678400)\n" +  // 31天过期
            "  return 1\n" +
            "end";
        
        Long result = redisTemplate.execute(
            new DefaultRedisScript<>(luaScript, Long.class),
            Collections.singletonList(key),
            String.valueOf(monthlyLimit),
            String.valueOf(estimatedTokens)
        );
        
        return result != null && result == 1L;
    }
    
    public void recordActualUsage(String tenantId, int actualTokens) {
        // LLM调用完成后记录实际用量（之前只是预估）
        // 可以在这里做配额的精确扣除
        String key = getQuotaKey(tenantId);
        redisTemplate.opsForValue().increment(key, actualTokens);
    }
    
    public long getUsage(String tenantId) {
        String value = (String) redisTemplate.opsForValue().get(getQuotaKey(tenantId));
        return value != null ? Long.parseLong(value) : 0L;
    }
}

租户级提示词个性化

多租户系统一个很有价值的特性：每个租户可以有自己的AI行为配置。

比如，医疗行业客户希望AI更保守，不要对诊断类问题给出确定性建议；法律行业客户希望AI总是附上免责声明；而科技公司客户希望AI回答更技术、更直接。

这就是租户级System Prompt的价值：

@Service
public class TenantAwarePromptService {
    
    private final TenantConfigService tenantConfigService;
    
    public String buildSystemPrompt(String tenantId) {
        TenantConfig config = tenantConfigService.getConfig(tenantId);
        
        StringBuilder systemPrompt = new StringBuilder();
        
        // 基础系统提示词（平台通用）
        systemPrompt.append("你是一个专业的AI助手，基于用户提供的知识库内容回答问题。");
        systemPrompt.append("只基于提供的上下文内容回答，如果上下文中没有相关信息，明确告知用户。\n\n");
        
        // 追加租户自定义系统提示词
        if (config.getCustomSystemPrompt() != null && !config.getCustomSystemPrompt().isBlank()) {
            systemPrompt.append("额外要求：\n");
            systemPrompt.append(config.getCustomSystemPrompt());
            systemPrompt.append("\n\n");
        }
        
        // 追加行业特定的合规要求
        if (config.getIndustry() != null) {
            String complianceInstruction = getComplianceInstruction(config.getIndustry());
            if (complianceInstruction != null) {
                systemPrompt.append(complianceInstruction);
            }
        }
        
        return systemPrompt.toString();
    }
    
    private String getComplianceInstruction(Industry industry) {
        return switch (industry) {
            case MEDICAL -> "重要：涉及医疗诊断、用药建议的问题，必须明确说明这不是医疗建议，建议用户咨询专业医生。";
            case LEGAL -> "重要：所有法律相关回答仅供参考，不构成法律意见，具体法律问题请咨询持牌律师。";
            case FINANCIAL -> "重要：投资相关信息仅供参考，不构成投资建议，市场有风险，投资需谨慎。";
            default -> null;
        };
    }
    
    public ChatResponse chat(String tenantId, String query, List<Document> context) {
        String systemPrompt = buildSystemPrompt(tenantId);
        
        String userPrompt = buildUserPrompt(query, context);
        
        return chatClient.prompt()
            .system(systemPrompt)
            .user(userPrompt)
            .call()
            .chatResponse();
    }
}

数据删除与GDPR合规

当一个租户要注销，或者提出"删除我所有的数据"请求时，多租户系统必须能做到彻底删除。

这在技术上需要覆盖所有存储介质：

@Service
public class TenantDataDeletionService {
    
    private final MultiTenantVectorStoreService vectorStoreService;
    private final ConversationHistoryService historyService;
    private final TenantDocumentRepository documentRepository;
    private final TenantConfigService configService;
    
    @Transactional
    public DeletionReport deleteTenantData(String tenantId) {
        DeletionReport report = new DeletionReport(tenantId);
        
        // 1. 删除向量数据
        try {
            vectorStoreService.deleteTenantData(tenantId);
            report.markSuccess("vector_store");
        } catch (Exception e) {
            report.markFailed("vector_store", e.getMessage());
            log.error("删除租户{}向量数据失败", tenantId, e);
        }
        
        // 2. 删除对话历史
        try {
            historyService.clearAllTenantData(tenantId);
            report.markSuccess("conversation_history");
        } catch (Exception e) {
            report.markFailed("conversation_history", e.getMessage());
        }
        
        // 3. 删除文档元数据（关系型数据库）
        try {
            int deleted = documentRepository.deleteByTenantId(tenantId);
            report.markSuccess("document_metadata", "删除" + deleted + "条记录");
        } catch (Exception e) {
            report.markFailed("document_metadata", e.getMessage());
        }
        
        // 4. 删除租户配置（最后删，前面出错的话可以重试）
        if (report.isAllSuccess()) {
            try {
                configService.deleteTenantConfig(tenantId);
                report.markSuccess("tenant_config");
            } catch (Exception e) {
                report.markFailed("tenant_config", e.getMessage());
            }
        }
        
        // 记录删除操作日志（合规审计用）
        auditLogger.log(AuditEvent.builder()
            .eventType("TENANT_DATA_DELETION")
            .tenantId(tenantId)
            .report(report)
            .operator(SecurityContext.getCurrentUser())
            .build());
        
        return report;
    }
}

监控：看清每个租户的资源使用

多租户系统的监控要能按租户维度聚合。

@Component
public class TenantMetricsCollector {
    
    private final MeterRegistry meterRegistry;
    
    public void recordLlmCall(String tenantId, String model, 
                               int tokens, long latencyMs, boolean success) {
        // 按租户记录LLM调用次数
        Counter.builder("ai.llm.calls")
            .tag("tenant_id", tenantId)
            .tag("model", model)
            .tag("status", success ? "success" : "error")
            .register(meterRegistry)
            .increment();
        
        // 按租户记录Token消耗
        Counter.builder("ai.llm.tokens")
            .tag("tenant_id", tenantId)
            .tag("model", model)
            .register(meterRegistry)
            .increment(tokens);
        
        // 按租户记录延迟分布
        Timer.builder("ai.llm.latency")
            .tag("tenant_id", tenantId)
            .tag("model", model)
            .register(meterRegistry)
            .record(latencyMs, TimeUnit.MILLISECONDS);
    }
}

在Grafana里按 tenant_id 分组，就能看到每个租户的Token用量、请求量、错误率的对比图。这对于计费、限速规则的调整都非常有用。

多租户AI服务的设计，本质上是在"共享经济"和"安全隔离"之间找平衡。

完全物理隔离最安全，但成本高；完全逻辑共享成本最低，但隔离难度大、风险高。

实际工程中，我倾向于对数据层做最严格的逻辑隔离（每个查询都带租户过滤，删除要物理彻底），对计算层做软性限流（限速和配额控制），对行为层做灵活配置（租户级提示词）。

这个组合能覆盖90%的多租户需求，剩下10%高端客户的物理隔离需求，通过专有部署方案解决。