多租户AI平台的计费系统:按用量精准计费的实现
2026/10/8大约 13 分钟计费系统多租户Token计费Spring AIJava
多租户AI平台的计费系统:按用量精准计费的实现
开篇故事:沈浩的"漏水桶"
2025年12月,某AI SaaS平台的创始人兼CTO沈浩在做年终复盘时,发现了一个令他不安的数据:
公司向客户收取的AI服务费:¥380万 公司实际支付的AI API费用:¥420万
公司以「AI服务」为核心卖点,结果自己在倒贴!
沈浩开始排查,发现了5个"漏水"的地方:
- 计费粒度太粗:按套餐收费,而非按实际用量
- Token计量不准:预估Token数,实际消耗比预估多28%
- 免费额度被滥用:部分客户用多账号绕过每月10万Token免费限制
- 嵌套调用未计费:AI Agent的工具调用产生的Token没有纳入计费
- 流式响应计量丢失:SSE流式响应的Token没有被捕获计量
修复这5个漏洞后,下一季度:
- 收入提升了31%(准确计费+超额收入)
- 成本维持不变
- 客户满意度反而提升了(因为低用量的客户账单变少了)
这就是精准计费的力量。本文将带你构建一套完整的AI Token计费系统。
TL;DR
- Token精准计量:拦截Spring AI的响应,提取实际消耗Token数
- 多维度计费:按Token类型(输入/输出)×模型类型×时段分别计价
- 实时账单系统:Redis存储实时消耗,MySQL存储历史账单
- 超额告警:接近配额时主动通知,超额后自动限流
- 防滥用:跨账号配额共享检测,免费额度保护
一、计费系统架构设计
1.1 多租户计费的核心模型
计费层次结构:
企业(Tenant)
├── 套餐(Plan): Free/Pro/Enterprise
├── 账单周期(BillingCycle): 月度
└── API密钥(ApiKey): 每个密钥独立追踪
计费维度:
├── 输入Token(Prompt Tokens)
├── 输出Token(Completion Tokens)
├── 嵌入请求(Embedding Calls)
└── 工具调用(Tool Call Tokens)
定价策略:
├── 模型差异化定价(GPT-4o > GPT-4o-mini)
├── 时段折扣(夜间0:00-6:00 8折)
├── 用量阶梯(超过100万Token单价降低)
└── 企业协议价(年付享受折扣)1.2 数据库模型设计
-- 计费核心表
-- 租户套餐配置
CREATE TABLE tenant_plans (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
tenant_id VARCHAR(64) NOT NULL,
plan_type ENUM('FREE', 'PRO', 'ENTERPRISE') NOT NULL,
monthly_token_quota BIGINT NOT NULL COMMENT '月度Token配额',
monthly_cost_limit DECIMAL(10,2) COMMENT '月度消费上限(元)',
price_per_million_input DECIMAL(8,4) NOT NULL COMMENT '输入Token单价/百万',
price_per_million_output DECIMAL(8,4) NOT NULL COMMENT '输出Token单价/百万',
discount_rate DECIMAL(4,2) DEFAULT 1.00 COMMENT '折扣率',
effective_from DATE NOT NULL,
effective_to DATE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Token使用记录(明细)
CREATE TABLE token_usage_records (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
tenant_id VARCHAR(64) NOT NULL,
api_key_id VARCHAR(64) NOT NULL,
model_id VARCHAR(64) NOT NULL COMMENT '使用的模型',
request_id VARCHAR(128) NOT NULL COMMENT '请求唯一ID',
input_tokens INT NOT NULL,
output_tokens INT NOT NULL,
total_tokens INT NOT NULL,
call_type ENUM('CHAT', 'EMBEDDING', 'TOOL_CALL', 'IMAGE') NOT NULL,
cost_amount DECIMAL(10,6) NOT NULL COMMENT '本次调用费用(元)',
request_time TIMESTAMP NOT NULL,
response_time TIMESTAMP,
INDEX idx_tenant_time (tenant_id, request_time),
INDEX idx_api_key (api_key_id),
INDEX idx_billing (tenant_id, request_time, cost_amount)
) PARTITION BY RANGE (YEAR(request_time)) (
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- 月度账单汇总
CREATE TABLE monthly_bills (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
tenant_id VARCHAR(64) NOT NULL,
billing_year SMALLINT NOT NULL,
billing_month TINYINT NOT NULL,
total_input_tokens BIGINT DEFAULT 0,
total_output_tokens BIGINT DEFAULT 0,
total_cost DECIMAL(12,2) DEFAULT 0,
free_quota_used BIGINT DEFAULT 0,
billable_cost DECIMAL(12,2) DEFAULT 0,
status ENUM('OPEN', 'LOCKED', 'PAID', 'OVERDUE') DEFAULT 'OPEN',
invoice_generated_at TIMESTAMP,
paid_at TIMESTAMP,
UNIQUE KEY uk_tenant_month (tenant_id, billing_year, billing_month)
);二、Spring AI Token拦截与计量
2.1 使用Advisor精准捕获Token消耗
// TokenUsageTrackingAdvisor.java
@Component
@Slf4j
public class TokenUsageTrackingAdvisor implements CallAroundAdvisor {
private final TokenUsageService tokenUsageService;
private final TenantContextHolder tenantContext;
@Override
public String getName() {
return "TokenUsageTrackingAdvisor";
}
@Override
public int getOrder() {
return Ordered.LOWEST_PRECEDENCE; // 最后执行,确保能拿到完整响应
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest advisedRequest,
CallAroundAdvisorChain chain) {
String tenantId = tenantContext.getTenantId();
String apiKeyId = tenantContext.getApiKeyId();
String requestId = UUID.randomUUID().toString();
long startTime = System.currentTimeMillis();
AdvisedResponse response = chain.nextAroundCall(advisedRequest);
// 从响应中提取真实Token消耗
extractAndRecordUsage(response, tenantId, apiKeyId, requestId,
advisedRequest, startTime);
return response;
}
private void extractAndRecordUsage(
AdvisedResponse response,
String tenantId, String apiKeyId, String requestId,
AdvisedRequest request, long startTime) {
if (response == null || response.response() == null) return;
ChatResponse chatResponse = response.response();
// 提取真实的Token使用量(来自模型API响应)
Usage usage = chatResponse.getMetadata().getUsage();
if (usage == null) {
log.warn("无法获取Token使用量,使用估算 [requestId={}]", requestId);
// fallback:估算Token数
usage = estimateUsage(request);
}
int inputTokens = (int) usage.getPromptTokens();
int outputTokens = (int) usage.getGenerationTokens();
// 提取模型信息
String modelId = extractModelId(chatResponse, request);
// 异步记录(不阻塞响应)
final Usage finalUsage = usage;
CompletableFuture.runAsync(() -> {
try {
tokenUsageService.record(TokenUsageRecord.builder()
.tenantId(tenantId)
.apiKeyId(apiKeyId)
.modelId(modelId)
.requestId(requestId)
.inputTokens(inputTokens)
.outputTokens(outputTokens)
.totalTokens(inputTokens + outputTokens)
.callType(CallType.CHAT)
.requestTime(Instant.ofEpochMilli(startTime))
.responseTime(Instant.now())
.build());
} catch (Exception e) {
log.error("Token使用记录失败 [requestId={}]: {}", requestId, e.getMessage());
}
});
}
private String extractModelId(ChatResponse response, AdvisedRequest request) {
// 从响应metadata中获取模型ID
String model = response.getMetadata().getModel();
if (model != null) return model;
// 从请求options中获取
ChatOptions options = request.chatOptions();
if (options instanceof OpenAiChatOptions openAiOptions) {
return openAiOptions.getModel();
}
return "unknown";
}
}2.2 流式响应的Token计量
流式响应(SSE)的Token计量更复杂,因为Token是一个一个返回的:
// StreamingTokenTrackingAdvisor.java
@Component
@Slf4j
public class StreamingTokenTrackingAdvisor implements StreamAroundAdvisor {
private final TokenUsageService tokenUsageService;
@Override
public String getName() {
return "StreamingTokenTrackingAdvisor";
}
@Override
public int getOrder() {
return Ordered.LOWEST_PRECEDENCE;
}
@Override
public Flux<AdvisedResponse> aroundStream(AdvisedRequest advisedRequest,
StreamAroundAdvisorChain chain) {
String requestId = UUID.randomUUID().toString();
String tenantId = TenantContextHolder.getTenantId();
AtomicInteger totalOutputTokens = new AtomicInteger(0);
return chain.nextAroundStream(advisedRequest)
.doOnNext(response -> {
// 流式响应:每次收到chunk都计数
if (response.response() != null) {
ChatResponse chunk = response.response();
// OpenAI在流式响应的最后一个chunk中包含完整的usage信息
Usage usage = chunk.getMetadata().getUsage();
if (usage != null && usage.getTotalTokens() > 0) {
// 收到完整的usage信息,记录
recordStreamingUsage(tenantId, requestId, usage, advisedRequest);
} else {
// 没有usage信息,估算输出token数
String content = Optional.ofNullable(chunk.getResult())
.map(r -> r.getOutput().getContent())
.orElse("");
if (content != null) {
totalOutputTokens.addAndGet(estimateTokenCount(content));
}
}
}
})
.doOnComplete(() -> {
// 流结束时,如果没有收到usage信息,用估算值
if (totalOutputTokens.get() > 0) {
recordEstimatedStreamingUsage(tenantId, requestId,
advisedRequest, totalOutputTokens.get());
}
});
}
// 估算Token数(简化的计算,不依赖tokenizer库)
private int estimateTokenCount(String text) {
if (text == null || text.isEmpty()) return 0;
// 粗略估算:中文约1字/token,英文约4字/token
int chineseChars = (int) text.chars()
.filter(c -> c >= 0x4e00 && c <= 0x9fff).count();
int otherChars = text.length() - chineseChars;
return chineseChars + otherChars / 4 + 1;
}
}三、计费引擎:精准计算账单
3.1 价格计算器
// BillingCalculator.java
@Service
@Slf4j
public class BillingCalculator {
// 各模型定价(元/百万Token)
private static final Map<String, ModelPricing> MODEL_PRICING = Map.of(
"gpt-4o", ModelPricing.of(150.0, 450.0), // ¥150/M输入, ¥450/M输出
"gpt-4o-mini", ModelPricing.of(6.0, 18.0), // ¥6/M输入, ¥18/M输出
"claude-3-5-sonnet", ModelPricing.of(22.0, 110.0),
"qwen-turbo", ModelPricing.of(2.0, 6.0),
"local-model", ModelPricing.of(0.01, 0.01) // 本地模型只算电费
);
@Autowired
private TenantPlanRepository planRepository;
/**
* 计算单次调用的费用
*/
public BillingAmount calculate(TokenUsageRecord usage) {
// 获取模型定价
ModelPricing pricing = MODEL_PRICING.getOrDefault(
usage.getModelId(),
MODEL_PRICING.get("gpt-4o-mini") // 未知模型按gpt-4o-mini计费
);
// 获取租户套餐折扣
TenantPlan plan = planRepository.findActivePlan(usage.getTenantId());
double discountRate = plan != null ? plan.getDiscountRate() : 1.0;
// 时段折扣(夜间0-6点)
LocalTime callTime = LocalTime.ofInstant(usage.getRequestTime(), ZoneId.of("Asia/Shanghai"));
double timeDiscount = (callTime.getHour() >= 0 && callTime.getHour() < 6) ? 0.8 : 1.0;
// 计算费用
BigDecimal inputCost = BigDecimal.valueOf(usage.getInputTokens())
.divide(BigDecimal.valueOf(1_000_000), 10, RoundingMode.HALF_UP)
.multiply(BigDecimal.valueOf(pricing.getInputPricePerMillion()))
.multiply(BigDecimal.valueOf(discountRate))
.multiply(BigDecimal.valueOf(timeDiscount));
BigDecimal outputCost = BigDecimal.valueOf(usage.getOutputTokens())
.divide(BigDecimal.valueOf(1_000_000), 10, RoundingMode.HALF_UP)
.multiply(BigDecimal.valueOf(pricing.getOutputPricePerMillion()))
.multiply(BigDecimal.valueOf(discountRate))
.multiply(BigDecimal.valueOf(timeDiscount));
BigDecimal totalCost = inputCost.add(outputCost)
.setScale(6, RoundingMode.HALF_UP);
return BillingAmount.builder()
.inputCost(inputCost)
.outputCost(outputCost)
.totalCost(totalCost)
.discountRate(discountRate)
.timeDiscount(timeDiscount)
.build();
}
/**
* 阶梯计费:用量越大,单价越低
*/
public BigDecimal calculateTieredCost(long monthlyTokens, String tenantId) {
TenantPlan plan = planRepository.findActivePlan(tenantId);
// 免费额度
long freeQuota = plan.getFreeTokensPerMonth();
if (monthlyTokens <= freeQuota) {
return BigDecimal.ZERO;
}
long billableTokens = monthlyTokens - freeQuota;
// 阶梯价格
BigDecimal cost = BigDecimal.ZERO;
// 第一阶梯:0-100万Token,全价
long tier1Tokens = Math.min(billableTokens, 1_000_000);
cost = cost.add(BigDecimal.valueOf(tier1Tokens)
.multiply(BigDecimal.valueOf(0.006))); // ¥6/百万
// 第二阶梯:100万-500万Token,9折
if (billableTokens > 1_000_000) {
long tier2Tokens = Math.min(billableTokens - 1_000_000, 4_000_000);
cost = cost.add(BigDecimal.valueOf(tier2Tokens)
.multiply(BigDecimal.valueOf(0.0054))); // ¥5.4/百万
}
// 第三阶梯:500万Token以上,7折
if (billableTokens > 5_000_000) {
long tier3Tokens = billableTokens - 5_000_000;
cost = cost.add(BigDecimal.valueOf(tier3Tokens)
.multiply(BigDecimal.valueOf(0.0042))); // ¥4.2/百万
}
return cost.setScale(2, RoundingMode.HALF_UP);
}
}3.2 实时配额检查
// QuotaGuardService.java
@Service
@Slf4j
public class QuotaGuardService {
private final RedisTemplate<String, Object> redisTemplate;
private final TenantPlanRepository planRepository;
private final AlertService alertService;
// 请求前检查配额(阻断超额请求)
public QuotaCheckResult checkAndDeduct(
String tenantId, String apiKeyId, int estimatedTokens) {
TenantPlan plan = planRepository.findActivePlan(tenantId);
if (plan == null) {
return QuotaCheckResult.denied("租户没有有效套餐");
}
// 使用Redis Lua脚本保证原子性
String script = """
local current = tonumber(redis.call('GET', KEYS[1]) or 0)
local quota = tonumber(ARGV[1])
local requested = tonumber(ARGV[2])
if current + requested > quota then
return -1 -- 超额
end
local new_value = redis.call('INCRBY', KEYS[1], requested)
redis.call('EXPIRE', KEYS[1], ARGV[3])
return new_value
""";
String redisKey = String.format("quota:%s:%s", tenantId, getCurrentBillingPeriod());
long quotaSeconds = calculateSecondsToEndOfMonth();
Long result = redisTemplate.execute(
RedisScript.of(script, Long.class),
List.of(redisKey),
String.valueOf(plan.getMonthlyTokenQuota()),
String.valueOf(estimatedTokens),
String.valueOf(quotaSeconds)
);
if (result == null || result == -1) {
log.warn("租户[{}]配额超限,请求拒绝", tenantId);
return QuotaCheckResult.denied("月度Token配额已耗尽,请升级套餐");
}
// 检查是否接近配额(触发预警)
double usageRate = (double) result / plan.getMonthlyTokenQuota();
if (usageRate > 0.9 && usageRate <= 0.95) {
alertService.sendQuotaWarning(tenantId, 90, plan.getMonthlyTokenQuota(), result);
} else if (usageRate > 0.95 && usageRate <= 1.0) {
alertService.sendQuotaWarning(tenantId, 95, plan.getMonthlyTokenQuota(), result);
}
return QuotaCheckResult.allowed(result, plan.getMonthlyTokenQuota() - result);
}
// 查询当前使用量
public long getCurrentUsage(String tenantId) {
String redisKey = String.format("quota:%s:%s", tenantId, getCurrentBillingPeriod());
Object value = redisTemplate.opsForValue().get(redisKey);
return value != null ? Long.parseLong(value.toString()) : 0L;
}
private String getCurrentBillingPeriod() {
return YearMonth.now().toString(); // "2026-10"
}
private long calculateSecondsToEndOfMonth() {
LocalDateTime endOfMonth = LocalDateTime.now()
.with(TemporalAdjusters.lastDayOfMonth())
.withHour(23).withMinute(59).withSecond(59);
return Duration.between(LocalDateTime.now(), endOfMonth).toSeconds() + 1;
}
}四、Token使用量追踪服务
4.1 实时消费追踪
// TokenUsageService.java
@Service
@Slf4j
@Transactional
public class TokenUsageService {
private final TokenUsageRecordRepository recordRepository;
private final MonthlyBillRepository billRepository;
private final BillingCalculator calculator;
private final RedisTemplate<String, Object> redisTemplate;
// 记录一次Token使用
public void record(TokenUsageRecord usage) {
// 计算费用
BillingAmount billing = calculator.calculate(usage);
usage.setCostAmount(billing.getTotalCost());
// 1. 写入明细表
recordRepository.save(usage);
// 2. 更新Redis实时计数器
updateRealtimeCounters(usage, billing);
// 3. 异步更新月度账单汇总
updateMonthlyBillAsync(usage, billing);
}
private void updateRealtimeCounters(TokenUsageRecord usage, BillingAmount billing) {
String period = YearMonth.now().toString();
// 更新Token计数
redisTemplate.opsForHash().increment(
"usage:tokens:" + usage.getTenantId() + ":" + period,
"input", usage.getInputTokens()
);
redisTemplate.opsForHash().increment(
"usage:tokens:" + usage.getTenantId() + ":" + period,
"output", usage.getOutputTokens()
);
// 更新费用计数(用浮点数精度的字符串存储)
redisTemplate.execute(
RedisScript.of("""
local key = KEYS[1]
local amount = ARGV[1]
local current = redis.call('INCRBYFLOAT', key, amount)
redis.call('EXPIRE', key, 86400 * 35)
return current
""", String.class),
List.of("usage:cost:" + usage.getTenantId() + ":" + period),
billing.getTotalCost().toPlainString()
);
}
@Async
protected void updateMonthlyBillAsync(TokenUsageRecord usage, BillingAmount billing) {
try {
YearMonth currentMonth = YearMonth.now();
billRepository.upsertMonthlyBill(
usage.getTenantId(),
currentMonth.getYear(),
currentMonth.getMonthValue(),
usage.getInputTokens(),
usage.getOutputTokens(),
billing.getTotalCost()
);
} catch (Exception e) {
log.error("更新月度账单失败 [tenant={}]: {}", usage.getTenantId(), e.getMessage());
}
}
// 获取实时使用量
public UsageSummary getRealtimeUsage(String tenantId) {
String period = YearMonth.now().toString();
Map<Object, Object> tokenData = redisTemplate.opsForHash()
.entries("usage:tokens:" + tenantId + ":" + period);
Object costData = redisTemplate.opsForValue()
.get("usage:cost:" + tenantId + ":" + period);
long inputTokens = tokenData.containsKey("input") ?
Long.parseLong(tokenData.get("input").toString()) : 0;
long outputTokens = tokenData.containsKey("output") ?
Long.parseLong(tokenData.get("output").toString()) : 0;
BigDecimal cost = costData != null ?
new BigDecimal(costData.toString()) : BigDecimal.ZERO;
return UsageSummary.builder()
.tenantId(tenantId)
.period(period)
.inputTokens(inputTokens)
.outputTokens(outputTokens)
.totalTokens(inputTokens + outputTokens)
.totalCost(cost)
.build();
}
}五、账单生成与发票
5.1 月度账单生成
// MonthlyBillGenerator.java
@Service
@Slf4j
public class MonthlyBillGenerator {
private final TokenUsageRecordRepository recordRepository;
private final MonthlyBillRepository billRepository;
private final BillingCalculator calculator;
private final PdfGeneratorService pdfGenerator;
private final EmailService emailService;
// 每月1日凌晨1点生成上月账单
@Scheduled(cron = "0 0 1 1 * *")
public void generatePreviousMonthBills() {
YearMonth previousMonth = YearMonth.now().minusMonths(1);
List<String> allTenantIds = billRepository.findActiveTenants();
log.info("开始生成{}月度账单,共 {} 个租户", previousMonth, allTenantIds.size());
for (String tenantId : allTenantIds) {
try {
generateMonthlyBill(tenantId, previousMonth);
} catch (Exception e) {
log.error("生成账单失败 [tenant={}, month={}]: {}",
tenantId, previousMonth, e.getMessage());
}
}
}
public MonthlyBill generateMonthlyBill(String tenantId, YearMonth month) {
// 1. 汇总当月所有使用记录
BillingSummary summary = recordRepository.summarizeByTenantAndMonth(
tenantId, month.getYear(), month.getMonthValue());
// 2. 获取套餐信息
TenantPlan plan = planRepository.findPlanForMonth(tenantId, month);
// 3. 计算可计费金额(减去免费额度)
long freeQuota = plan.getMonthlyTokenQuota();
long billableTokens = Math.max(0, summary.getTotalTokens() - freeQuota);
BigDecimal billableCost = calculator.calculateTieredCost(
summary.getTotalTokens(), tenantId);
// 4. 生成账单记录
MonthlyBill bill = MonthlyBill.builder()
.tenantId(tenantId)
.billingYear(month.getYear())
.billingMonth(month.getMonthValue())
.totalInputTokens(summary.getInputTokens())
.totalOutputTokens(summary.getOutputTokens())
.totalCost(summary.getTotalCost())
.freeQuotaUsed(Math.min(summary.getTotalTokens(), freeQuota))
.billableCost(billableCost)
.status(BillStatus.LOCKED)
.build();
bill = billRepository.save(bill);
// 5. 生成PDF账单
byte[] pdfContent = pdfGenerator.generateBillPdf(bill, summary);
// 6. 发送账单邮件
emailService.sendMonthlyBill(tenantId, bill, pdfContent);
log.info("账单生成完成 [tenant={}, month={}, billable={}元]",
tenantId, month, billableCost);
return bill;
}
}5.2 用量分析报表API
// UsageAnalyticsController.java
@RestController
@RequestMapping("/api/billing")
@Slf4j
public class UsageAnalyticsController {
private final TokenUsageService usageService;
private final MonthlyBillRepository billRepository;
private final QuotaGuardService quotaService;
// 实时使用量查询
@GetMapping("/usage/realtime")
public ResponseEntity<UsageSummary> getRealtimeUsage(
@AuthenticationPrincipal TenantPrincipal principal) {
UsageSummary summary = usageService.getRealtimeUsage(principal.getTenantId());
long quota = quotaService.getQuota(principal.getTenantId());
summary.setQuota(quota);
summary.setUsageRate((double) summary.getTotalTokens() / quota);
summary.setRemainingTokens(quota - summary.getTotalTokens());
return ResponseEntity.ok(summary);
}
// 使用量明细查询(支持分页和时间筛选)
@GetMapping("/usage/details")
public ResponseEntity<Page<TokenUsageRecord>> getUsageDetails(
@AuthenticationPrincipal TenantPrincipal principal,
@RequestParam(required = false) String startDate,
@RequestParam(required = false) String endDate,
@RequestParam(required = false) String model,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size) {
LocalDate start = startDate != null ?
LocalDate.parse(startDate) : LocalDate.now().withDayOfMonth(1);
LocalDate end = endDate != null ?
LocalDate.parse(endDate) : LocalDate.now();
Pageable pageable = PageRequest.of(page, size, Sort.by("requestTime").descending());
Page<TokenUsageRecord> records = usageService.getDetailsByTenant(
principal.getTenantId(), start, end, model, pageable);
return ResponseEntity.ok(records);
}
// 按模型/按API Key的使用量分布
@GetMapping("/usage/breakdown")
public ResponseEntity<UsageBreakdown> getUsageBreakdown(
@AuthenticationPrincipal TenantPrincipal principal,
@RequestParam(defaultValue = "CURRENT_MONTH") String period) {
UsageBreakdown breakdown = usageService.getBreakdown(
principal.getTenantId(), period);
return ResponseEntity.ok(breakdown);
}
// 历史账单列表
@GetMapping("/bills")
public ResponseEntity<List<MonthlyBillSummary>> getBills(
@AuthenticationPrincipal TenantPrincipal principal) {
List<MonthlyBillSummary> bills = billRepository
.findByTenantId(principal.getTenantId())
.stream()
.map(MonthlyBillSummary::from)
.toList();
return ResponseEntity.ok(bills);
}
}六、防滥用机制
6.1 跨账号配额共享检测
// AbuseDetectionService.java
@Service
@Slf4j
public class AbuseDetectionService {
private final RedisTemplate<String, Object> redisTemplate;
private final TokenUsageRecordRepository recordRepository;
// 检测同一IP的多账号滥用
@Scheduled(fixedRate = 300000) // 每5分钟检查
public void detectMultiAccountAbuse() {
// 查询过去1小时内使用量异常多的IP
List<IpUsageStats> suspiciousIps = recordRepository
.findHighUsageIpsInLastHour(10); // 超过10个不同账号
for (IpUsageStats stats : suspiciousIps) {
if (stats.getDistinctTenants() > 5) {
log.warn("可疑的多账号行为: IP={}, 不同租户数={}, 总Token={}",
stats.getIp(), stats.getDistinctTenants(), stats.getTotalTokens());
// 对该IP下的所有账号标记为风险
riskFlagAllTenantsForIp(stats.getIp());
}
}
}
// 检测异常使用模式(深夜突然大量消耗)
public boolean isAnomalousUsage(String tenantId, long currentUsage) {
// 获取过去7天同时段的平均使用量
long avgUsage = recordRepository.getAverageHourlyUsage(tenantId, 7);
// 如果当前使用量超过平均的5倍
if (currentUsage > avgUsage * 5 && currentUsage > 100_000) {
log.warn("检测到异常使用量 [tenant={}]: 当前={}, 平均={}",
tenantId, currentUsage, avgUsage);
return true;
}
return false;
}
// 风险账号临时限速
private void riskFlagAllTenantsForIp(String ip) {
String key = "risk:ip:" + ip;
redisTemplate.opsForValue().set(key, "flagged", Duration.ofHours(24));
// 通知人工审核
alertService.sendAbuseAlert("IP多账号滥用",
"IP: " + ip + " 过去1小时使用了多个账号");
}
}七、常见问题 FAQ
Q1:如何处理Embedding调用的计费?
A:Embedding只有输入Token(没有输出),且单价更低:
// Embedding调用计费
ModelPricing EMBEDDING_PRICING = Map.of(
"text-embedding-3-small", ModelPricing.inputOnly(0.15), // ¥0.15/M
"text-embedding-3-large", ModelPricing.inputOnly(0.65)
);在TokenUsageTrackingAdvisor中,通过CallType.EMBEDDING区分计费逻辑。
Q2:Redis失效时,计费数据会丢失吗?
A:Redis只做实时缓存,真实计费数据以MySQL为主:
token_usage_records表是账单的唯一可信来源- Redis计数器用于实时配额检查,可以从MySQL重建
- Redis配置持久化(AOF模式),减少数据丢失风险
- 月末生成账单时,从MySQL汇总,不依赖Redis数据
Q3:如何处理AI API的Token计费与实际不一致的情况?
A:
- 始终使用API响应中的
usage字段(模型厂商的权威计量) - 如果响应中没有usage字段(某些自建模型),使用tiktoken库精确计算
- 保留所有原始请求/响应日志,作为计费争议的证据
- 定期将自己统计的Token数与API提供商的账单对比验证
Q4:免费套餐如何防止被大量滥用?
A:多层防护:
- 邮箱验证+手机号验证,提高注册成本
- 同一设备指纹只能关联一个免费账号
- 同一支付方式(信用卡/支付宝)只能关联一个免费账号
- 免费额度每天重置(而不是每月),减少囤积价值
- IP黑名单:同一IP下超过N个免费账号自动触发审核
Q5:计费数据如何做审计合规?
A:
- 所有token_usage_records不可物理删除(只能逻辑归档)
- 账单数据保留至少7年(符合财务法规)
- 敏感操作(修改账单/调整配额)需要双人确认+操作日志
- 定期导出账单数据到不可变存储(S3/OSS)
八、总结
沈浩的故事告诉我们:AI SaaS的计费系统不是可选项,而是商业模式的核心基础设施。
精准计费的价值:
| 不精准计费的代价 | 精准计费的收益 |
|---|---|
| 成本倒挂(收入<成本) | 清晰的利润边界 |
| 高用量客户被补贴 | 公平定价,高用户价值 |
| 滥用行为无法发现 | 保护正当用户权益 |
| 无法优化定价策略 | 数据驱动的产品决策 |
从今天开始,把Token计量做对,让你的AI产品真正跑通商业模式。
